Drupal localization and the t() function

Technology Blog

Drupal is a lot easier to localize than it used to be, but it’s still hard work; partly because localization is intrinsically difficult, and partly because of flaws in the design of Drupal’s internationalization approach.


In localizing a site there are four main areas to deal with:

  1. Human-readable formats for data items such as dates and numbers.
  2. The (generally short) text strings used commonly throughout the user interface, for such things as labels on form buttons.
  3. Major items of content such as blog articles, informational pages and static blocks.
  4. Theme content. To accomodate different local conventions – and quite often to make room for longer text strings – the layout of pages and their components may need to be adjusted.

This list is not exhaustive, but I only want to talk about item 2 (and to some extent 3) here for now. Drupal provides for the internationalization of such strings with the t() function, meant to be called whenever a module outputs an English text string. This takes as arguments a string for translation and, optionally, an array of values to be substituted into it, so for example, you might call it like this:

t('My hat it has @count corners', array('@count'=>$number_of_corners));

The Engish result when $number_of_corners is 3 would be “My hat it has 3 corners” whereas the German translation might be “Mein Hut er hat 3 Ecken”.

If you are familiar with software localization such as that provided in Java, you’ll have spotted a problem right away with this function: it doesn’t make allowance for plurals. If you have a phrase that should vary depending on whether the value to be substituted is singular, plural or zero, then you must test for the different cases in your own code, rather than the t() function doing the job for you as the Java MessageFormat class does.

Edit: as the commenter below points out, the format_plural() function is designed to solve the problem of plurals and does so very well. Sorry, my mistake!

Another criticism of the t() function is that it gives you no means of providing context which could be used to deal with ambiguity. Here’s the problem: suppose there are two modules you use that both output the word “Store”: one module being for e-commerce, using the term to refer to an online shop; the other used to manage some form of data repository, using it as a label on a button which causes it to store a record.

The t() function is unable to distinguish between these cases, so you can’t translate them differently even though they are quite different meanings of the word (one a noun and the other a verb, for a start). The module writer has to be alert to such possibilities, which is quite a heavy burden when design and coding is hard enough as it is. In my experience programmers who are sensitive even to issues of use of English are rare enough, so expecting them to fully consider the needs of those who speak other languages is asking a lot.

So, how might the t() function be improved for better localisation? An optional more sophisticated version with similar power to Java’s MessageFormat class would be helpful, but for the problems arising from ambiguity I’d like to suggest a new function, called perhaps t1(). This would take two arguments, the first being a context. This string could then be used in localisation, when required to distinguish between two cases, so the value would need to be meaningful and guidelines would need to be developed. Perhaps module name + one of several predefined constant values would cover most common situations.

Any other thoughts on the subject?

(Footnote: the languages I have worked with include Japanese and Welsh – though I don’t speak either – every situation has its own interesting challenges, not least the thorny one of character encoding, especially a few years ago when Unicode was less commonly supported.)

5 Responses to “Drupal localization and the t() function”

  1. neochief

    As for the plurals, there is great function called format_plurals() that solves problem very nice. But as for the context, I don’t understand why it hadn’t been added to core long time ago. It’s real pain in the ass to make sites in Russian and Ukrainian langues because lack of context translations.

  2. Niklas Bivald

    I couldn’t agree more, the fact that it doesn’t exists bothers me quite a bit. I need it. For now I will make a ugly hack, with using strings such as:

    t(‘STATICPAGE_LOGIN’)

    “faking” a defined message. Then I use the string override function if I wish to modify them. It’s not optimal, far from it. But it should work. Later on I will make it possible for me to categorize strings using t(). Labor intensive, but it means I will be able to categorize STATICPAGE_LOGIN as category “Static page titles” in the GUI.

    Thoughts on my solution? Another posibility is to simple make my own t() function, but that would require me to rewrite:

    t()
    locale()

    And make a GUI for the new solution. Better, but consuming.

  3. Anonymous

    Not sure if folks are open to a third-party solution for localization.

    NativeTung – http://www.nativetung.com – is a powerful solution that works with the Drupal CMS platform and is currently in private beta.

    Drupal makes it easy to achieve the following with the NativeTung Globalizer: Globalize —> Optimize —> Analyze your site for increased ROI and reach across the language barrier.

    Hope folks find this post helpful!

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>