Tuesday, February 16, 2010

Translations and locale #2

Did you know that there is no generic tool to extract translation strings from non source files (anything else not related to some programming language, like .desktop files)? Neither did I, until I tried to find one.

Yes, I know about Intltool, but it is too much tied to Automake that makes it pretty cumbersome outside of it. In README, there is short how-to about Intltool usage outside autotools, but the solution simply cries for something better.

Now, taking into account how Intltool is mostly used by default from autotool users, I checked what autotool non-users has to say about this problem, primarly focusing on KDE, as they are using CMake.

Yes, they have solution and is not intltool, but after seeing it, intltool seems to me as the best choice I can get. Besides a full python framework dedicated to this problem, they have a help from cron script that nightly (or at some part of day or week) walks over repository merging and updating translations in .desktop files and presumably, documentation and other like files.

100 heads are smarter than one, but why this (their) solution looks to me pretty cumbersome? I mean, why someone couldn't made something that rest of the world can use, no matter are utilized make, cmake, scons or jam? Clearly, translation problem is not simple as it looks like.

The clear example was/is bad state about general translation are tools we had before. Not counting GUI frontends (like KBabel) which are mostly copies in different toolkits, solutions in form of web frontends (or for God sake anything else smarter than just calling xgettext in background) we only had in form of Pootle project, famous for it's slowness and memory usage.

Now you know why I see (and presumably others) Transifex project as gift from gods. Count to that free service they put on transifex.net for your project and we, users and developers, could not be happier.

But Transifex is still frontend... if we are going to find the root of this problem, maybe we should start from gettext tools. Hell, even xgettext (part of gettext chain) has differences between GNU and OpenSolaris version (presumably the same is for other Solaris-es). Sun version of xgettext is so dumb you can't even specify extraction keyword, so you must use 'gettext' instead e.g. '_' tag we are used to.

On other hand, GNU xgettext is no perfect either: it tries to be smart where it should be dumb. GNU xgettext can recognize programming language of source it scans and can extract translation strings in the way is best in that language. But, sometimes you want different things.

Recently I tried to add translation feature to theming engine in edelib, which includes some Scheme code, inspired with translation tags found in GIMP. Basically I wanted it looks like this:

 (display _"This text will be translated")

but GNU people visualized how things should be in more lispy like:

 (display (_ "This text will be translated"))

so there is no way you can force xgettext to see the first example as valid scheme source code without hackery with sed and shell.

Count to that how Intltool (which uses xgettext) does the same job as xgettext (extract strings from source code), but in own specific way, you simply lose any desire to do anything related to translation in your project.

6 comments:

Dwayne said...

You clearly haven't tried Pootle 2.0 which is fast and lean on memory. Getting even leaner with 2.1

In terms of text extraction you might want to look at the Translate Toolkit (on which Pootle and Virtaal are built). It is not doing generic extraction of text from code like I think you're proposing, these generic extractions are still programmer centric not content centric.
But it does provide a framework that allows you to define storage classes that can read various file formats. These can then be used in format converters or can allow the format to be directly readable and editable on Pootle i.e. no intermediate files.

Sanel Z. said...

Yes, you are correct, I haven't tried latest releases, but after I saw it was rewritten using Django, then is time to try it :)

I always found translate.sf.net a little bit hard to navigate, probably because many projects are handled under single label, but looks like things are getting better.

I'll definitely try Translate Toolkit, especially after noticed a few interesting tools like txt2po or html2po and, of course, ability to write own converters.

Thank you for posting these links :)

Dimitris Glezos said...

I've faced similar issues in the past with various projects, some of them registered on Transifex.net. I'm afraid that whatever we try we won't find a silver bullet there.

The ideal scenario would be for a project maintainer to 'register' the files needing translation extraction to a tool (probably written on top of the featureful Translate Toolkit) or simply on top of the more low-level xgettext.

This is a good discussion which I'd LOVE to see its results. Lots of project have similar issues..

Sanel Z. said...

Glad to see I'm not the only one seeing this as the problem :)

I played a little bit with Translate Toolkit and is really a powerful thing, especially taking into account it's extensibility.

On other hand, probably the main drawback is it's size as it is not suitable to be shipped with the project (that is probably the reason why Intltool still exists), unless your project is OO or Firefox size, when additional 40-50 megs doesn't matter.

Also I'm not sure how you mean to register the files needing translation extraction and use xgettext on them unless support for those file types are explicitly added in xgettext or perform some Intltool-like magic on them to get content for translation. Maybe that isn't a big problem at all as (later) would be merging translated content in original form, where at least some logic is needed.

Hm... not sure, but I'm always open for new ideas and approaches :)

ChristTrekker said...

How's it going, Sanel?

Sanel Z. said...

Thank you for asking :)

Yep, things are rolling somehow, but as I changed the job, some time is needed to settle things up.

I'll try to summarize things up in upcoming post, as soon as I find some free time.