I've spent half of last week in Amsterdam joining the "Open Translation Tools 2009" unconference. It was a pretty interesting and diverse crowd to be with, and by far not as tool-author-only as it would sound like. Folks coming spread all over from sex-worker activists over global voices to translate.org.za to folks from the "professional translation companies".
We started out with a few opening ceremonies. First of which was an introduction, the regular "who are you and where are you from", along with a "how do you feel". I was the only one that didn't give a geolocation, but disclosed Mozilla as my point of origin. It's obviously a more common theme in such events that localization folks are much more focused on their geographical background as defining their cultural background (what this should be about, right? This is not couch-surfing.) and not so much what they work on. The "how do you feel" was one of the hippie-pieces, along with a lot of twinkling. Reading the urbandictionary on twinkle makes me wonder, but it was just hands-not-clapping. Honestly.
Next up was a round of spectograms in the room. Two opposite oppinions where offered, and you had to stand across a line in the room on where between the two your opinion would be. Then Gunner, our head master of ceremonies, went in and poked people on why they'd be where they were. It's an interesting exercise to figure out what kind of crowd you're with, and scales pretty well.
Agenda-building worked pretty much like it does in most unconferences these days, we created tons of sticky notes and then tried to build themes and agendas from that. The resulting sticky notes are transcribed on the wiki. Mighty job by Lena. I feel quite fortunate that I didn't have to distill those notes into an agenda. Gunner did that pretty loosely, which was probably a good combination. The resulting schedule is on the wiki, too. In the rest of my coverage, I'll focus on those sessions that I've been in. The schedule links to the full notes of each session, the note takes were usually in good shape, so do take a look.
The session that Ed Zad (both of which are shortened, the parents are not to blame for this one) led about the professional translation companies and ecosystem was OK. The actual translation is almost never done in-house, but contracted out to freelancers. The money you pay the companies goes into project management, and they hire translators, reviewers and editors to do the actual work, and get paid by the company. The interesting part here was really that those companies make their money from the project management and recruiting part. I didn't get any useful feedback beyond "you must have hired the wrong guys" on my report that any time we had to contract translation out, the results were not really usable. Maybe asking that question the wrong guy :-). The main takeaway would be that the industry is really fragmented and diverse. I doubt there are any good rules for picking a partner when looking for a company for localization, either. The process Ed described about reviews and editing seem rather low-key compared to what you can do if you develop your localized content in the open.
The next session I was in was about machine translation, led by Francis. He introduced the group to both statistical machine translation as well as rule-based MT. Interesting here are both the enormous amount of data you need for statistical MT, as well as the different stages. Rule-based MT on the other hand works well for closely related languages. For those that heard me talking about l10n-fork, that'd be rule-based MT. Francis offered to take a look at whether we can actually do better MT to share work in Mozilla localizations for at least closely related languages. All our romance languages with the Spanishs, Portugueses and French could benefit from that, possibly even patch-based.
We "closed" Monday in Vondelpark. Matt broke the aspiration techies on Monday night, though. Amsterdam is good at that. I ended up at Petra's place, together with Tanya and Pawell. Thanks to Petra for a fun evening.
Tuesday started off with a crazy "crawl on the floor and draw as many workflows as you have". I figured that I don't know enough about 90% of the workflows we use at Mozilla, and just sketched out two extremes when it comes down to localizing Firefox. One is localizing patch by patch, like for example the French team does. And then we have a long tail of localizations that work within their toolchain and just occasionally export to hg and update the upstream repos. For the cats among you, there are 40 pictures of the workflow diagrams on flickr.
Next was a pretty interesting session on translating wiki content. That made it a good fit for me to kill the session on "Localizing a hybrid organization - BOF on Mozilla". I wanted to clone myself three or four times already, so not having to do a session myself was a win. Anyway. We had tikiwiki and mediawiki represented in the group. And me with some experience on those two, plus deki via MDC. The discussion turned up two fundamentally different ways of working:
- Forking documents into different documents for different languages, with some cross-referencing that localizations exist. You'd know this from wikipedia.
- Maintaining the different translations as variants of a single document. This is what tikiwiki does, as we see it on SUMO.
The discussion around one single living document in multiple languages was more lively, which gave me a good sense of what's out there to address our needs at SUMO/MDC etc. There doesn't seem to be anything blowing tikiwiki out of the water, so in terms of finding a wiki engine with l10n, SUMO made a good choice. We talked quite a bit about the multiple edits in various languages of the document, and what tiki defines to be 100% in the end. I showed off the l10n dashboard page we have on SUMO now, which was well received. The idea to not demand that people do as a bot tells them, and instead to empower them with relevant information seemed to resonate well. There was a different session about CMSes and l10n, read drupal etc. I only overheard the last bits, didn't seem to have great answers over there. Judge yourself from the notes. Finding the right UE and UI paradigms for keeping a living document in multiple languages in sync seems to be an open item of work. In particular if your document isn't bound to get value contributed in one single source languages. We would want to understand which changes are ports of fixes in other languages, and which are new fixes to the actual document that other translations of this document including the original source language would benefit from.
Next up was a round of speed-geeking. That's similar to speed dating. A few geeks get a table each to present something to the rest of the group. The rest of the group is split up to watch one at a time. Each presentation is 4 minutes, then the groups rotate to the next table. If you're bored by something presented, you just wasted 4 minutes. I took the challenge to present l20n 8 times in a row. That's a pretty technical topic and a pretty diverse audience, so apart from being a stress test on ones vocal chords, it's also pretty heavy on your brain. I must have been doing allright, though. The feedback was generally interested to positive. I got out with an action item to work with Dwayne on how we could actually present localization choices so that they're options to fix and not just hell-bound confusion. On a general note, if you're ever found speed geeking: Don't sit in front of your computer. Don't make people walk around the table to see something. It's perfectly fine to sit next to your computer and have your laptop and yourself face your audience. Or do it like Dwayne did, just present without your damn laptop open :-). If feasible.
The last session on Tuesday was about building Volunteer Translation Communities. We had a few people there that are just starting to build such a community, but also a few people from Global Voices Online and yours truly from Mozilla. It's pretty interesting how easy it is to think "I need to get such and such in language other, how do I ask for volunteers?" and how easily that fails. The common ground of those with living communities was that you don't ask for translators, but you need to be open for contributors. At Mozilla, we're hackable. We offer opportunities for all kinds of volunteer contributions, among which localization is one. That is something different than asking for some unit of work to be done for no pay. Another key is that you find your volunteers among those that are interested in the outcome of the localization work. The project management work you need to do to empower your translation community to actually do some work and get to the results shouldn't be underestimated, too. There's a reason why people make a living out of this one.
I moved from Tuesday to Wednesday through the Waterhole. As good as it used to be. Getting up in time was tough, but not as bad as it initially felt.
The first session I joined on Wednesday was on localization issues in Africa. We had similar sessions for Central Asia, South Asia, and Asia Pacific, which I didn't manage to get to. I even didn't get to read the notes from those yet. Anyway, back to Africa. The challenges there aren't all that surprising. Connectivity is really bad, cell phones are really big. During the OTT, though, the first cable made it to Kenya, so in terms of connectivity, things are changing. Fonts in Africa are mostly based on Latin script, so there's not too much to do there, though a few characters usually need fixing. At least for web content, downloadable fonts offer a smooth upgrade path. In terms of technical abilities, a lot of the techies for African languages end up in Europe or the US and only occasionally visit home. For actual translators, there isn't enough work to actually make a living of that, so you likely end up with part time night shifters. For many people with access to computers and internet, localization is a good thing, but not something on their own list of priorities, which leaves us with a rather small potential community there. Localizing really obvious things like cell phones or Firefox is a good way to start of a community, though. I've had some off-track discussions with Dwayne on how to work together with the ANLoc project he's running, too.
The discussion about open corpora to be used for linguistic research and statistical machine translation training was OK, but not of that much interest for Mozilla. It's a good thing to do, and if we can help in asking the right people, that'd be cool, though. There's tons of politics to resolve first though, and they got enough folks for the initial group.
The next round of speed geeking had me on the consumer side. I already mentioned that you shouldn't sit in front of the laptop that you use for presenting. John talked about Transifex, which is designed to be a system to bridge various version control systems for localizers, by having write access itself to the upstream repos. They start to offer an interface to actually translate a few strings in place, which they reuse from somewhere. It's not pootle code, though. That was the one with most immediate touch point to what we do.
The last session for me was one driven by Dwayne again, closing the loop. We tried to find out how to get feedback from the localizers into tools, and into the software they localize. This was pretty interesting, thanks to the input from Rohana and Gisela, the two are actually localizers and could hint us at what they do and how. The main take away was that Localizers and l10n tool authors don't talk enough to each other. Gisela, Dwayne and I have a follow-up conference in our heads to actually do that, I'll talk about that in a different post. The other main point was that we need to get tools to support "l10n briefs" and annotations, and need to establish ways for that information to be exchanged. A localization brief might be something like a file-wide localization note that explains what the context for these strings is. Or that it's about XSLT error messages, that you should leave in English unless you have a thriving local community in your language on that technology. Annotations are more diverse, and are both to communicate among localization teams and back to the original author. The idea is to create a system that allows localizers to communicate over a particular string or set of strings in an easier fashion than using hg blame to find the bug, and then having to read through all of the bug to find out how to reproduce a problem. We might want to have annotations as simple as "star a string". If it's helpful that a string is tricky, someone else can go in and offer help or a more constructive annotation beyond "I didn't get it". How to communicate that back and forth is another follow-up project from this session.
Adam Hyde ran a book sprint on open translation tools aside all sessions, with a real face-to-face book sprinting event that closes today. It's going to be interesting to see what that comes down to. As I suck at writing (you can tell by reading this post), I didn't participate in that one myself. There is a version on the net already on flossmanuals.net.
So much for the actual sessions. As always, floor communication was essential, too. I made contact with folks from the Tajik, Khmer, and Nepali localization efforts for Firefox, and there's already traction on some. If you know someone willing to help with Nepali, please make them introduce themselves in m.d.l10n. I have met a ton of other interesting people, of course. I had some really great conversations with Dwayne on a bunch of different topics, ranging from technical bits in tools to mission statements. Generally, there was a lot of interest in Mozilla, and how we do things. Thanks to Aspiration for inviting me, and thanks to all the people at OTT for the warm welcome to this new community for us.
Last but not least, thanks to Mozilla. In environments like OTT it becomes really obvious how rare organisations like Mozilla are. We had a lot of discussion on how hard it is to do localization as an afterthought, and we just don't. How valuable it is for the localization community to get acknowledged. Which happens throughout Mozilla, pretty independent on whether it's John and Mitchell most anywhere they talk, or our developers fixing their patches to have a prettier localization note, or our marketing folks empowering our local communities to localize the message. And we're still learning and eager to get better. It is an honor to represent such an organization.
Pictures in this post are by Lena under CC by-nc-nd.