I've just uploaded version 0.9 of compare-locales onto pypi. It's finally the version that does all the fancy value checks that I've been talking about for a while, and that some of the localizers have seen flying by in their bugmail.
Here's what it does:
For DTDs, I create fake xml docs, and try to parse them. This should find encoding errors, as well as unbalanced XML tags or stray '&' ampersands. There's one thing that's tricky, and that is references to entities. I do get the list of entities from en-US, so I do have a good idea which should work (really, please). On the other hand, referencing other entities may not be an error. &rdquot; for example could be totally fine. If referenced in an XHTML document, that is. Not if it was included in a XUL document. Of course both breeds could include the same DTD file. I can't really tell, so I've added a new category of reporting, called warnings.
For properties files, I check a bunch of printf tricks. Some of those are warnings, some of those are errors. Which is which basically depended on code-inspection. I also did some heuristics based on comments referencing the plural docs to check for our plurals-special variable handling.
Outstanding are the installer variable checks still, didn't want to hold back this release for that. They're somewhat tricky in the details and yet more tedious to get right than the other checks.
What does that mean for localizers? You wanna get the error count down to zero. The warnings count may or may not go down to zero, that's your call.
The new version isn't in public use anywhere yet, the deployment will go like this:
- Get a round of public feedback on this release.
- Use on the dashboard (likely gonna happen when I do the 2.0 branch dance, too).
- Try to get the new version used on the build system.
Please give the new version a bit of pounding in your local l10n-merge builds, too. It should strip entities with errors from your localization, and merge in en-US strings for that.
Feel free to file bugs on issues you find.