Friday, April 24, 2009

GSoC: Implementation of additional i18n features on Django

Here you have my proposal for Google Summer of Code 2009. It was approved previous week, and I'll be working on it during this summer.

The problem

While Django provides an amazing system to translate texts, and displays localized dates in some parts of the admin; it has many data that could be internationalized, not it's not yet.

The information that developers should be able to localize/translate is mainly:

  • All dates and related information (times, calendars...)

  • All numbers (mainly decimal ones)

  • Texts (and any data in general) saved on the database

More information on these issues can be found in the following blog post and this ticket:

http://vaig.be/2008/07/django-i18n-status.html

http://code.djangoproject.com/ticket/7980

Proposal

The proposed solution for improving Django i18n includes several different tasks. Those tasks are:

  • Import locale data from CLDR

  • Apply i18n to Django dates and times

  • Apply i18n to Django numbers

  • Allow translating content on the database

  • Fix already reported bug about i18n

Next are the details for every task. Note that all those specifications are subject to change, according to discussions with the mentor of the project, Django core developers team, and the main Django community.

Importing locale data

The main repository of locale data is the Common Locale Data Repository (CLDR) by the Unicode Consortium http://cldr.unicode.org/. It provides a set of XML files with information such as date, time and number formatting for most languages.

The idea of this task would be to create a python script (probably as a django-admin command), that will extract all necessary data from those XML files and put it into configuration files on the Django structure. This information will be used by Django to internationalize data on applications.

The idea of this script is to be used just by Django developers. It would mainly be a one-time execution script, and then it would be executed just when new locales are added (are some are changed).

All information gathered from CLDR files could be saved on django/conf/locale/{ language code }/formats/django.po

Specific settings imported from CLDR could be (with English localized example):

  • SHORT_DATETIME_FORMAT (12-31-2000 11:59 p.m.)

  • LONG_DATETIME_FORMAT (December 31th 2000, 11:59 p.m.)

  • SHORT_DATE_FORMAT (12-31-2000)

  • LONG_DATE_FORMAT (December 31th 2000)

  • FIRST_DAY_OF_WEEK (0 meaning Sunday)

  • TIME_FORMAT (11:59 p.m.)

  • YEAR_MONTH_FORMAT (December of 2000)

  • MONTH_DAY_FORMAT (December 31th)

  • DECIMAL_NUMBER FORMAT (1,000,000.123)

There are some locale based parameters that already exist on Django, on translation files (LC_MESSAGES) and could be deprecated on future releases of Django (when breaking backward compatibility). Those are:

  • DATETIME_FORMAT

  • DATE_FORMAT

  • TIME_FORMAT

  • YEAR_MONTH_FORMAT

  • MONTH_DAY_FORMAT

For keeping the system flexible, existing default values on settings will be kept. Probably it would be worth to add new ones for the new customizable formats.

Dates, times and calendar i18n

All dates and times displayed using Django should use the format defined for the current session locale. This is already implemented for some dates, like the ones displayed in admin's lists. Also a filter for formatting dates already exists in templates, which, together with the formats in the translation files, can do the job. But the good way to do that would be displaying the date by default on the session locale.

All Django forms (including admin forms) should accept the short date/datetime format of the current locale. Now it's possible to define the accepted formats using parameters of the widget, and this can be kept, but at least support for entering data formatted in current locale should be added. ISO and/or English locale can be kept as well. Existing data on input fields should be displayed in current locale too.

As Django 1.0 series is maintaining backward compatibility, those changes have to be implemented being compatible with existing behavior by default.

The calendar on admin's date/datetime field should also be displayed according to user session locale.

So basically those are the main tasks required for internationalizing Django dates:

  • Format all python date/datetimes objects using locale settings when converted to string to be displayed. Basically it means models.DateField and models.DateTimeField values on model instances.

  • Change input widgets to display data and to allow entering data on the format of the current locale.

  • Display admin calendar starting weeks on the day defined for current locale.

With those changes next tickets would be fixed:

  • #1061 About first day on calendars

  • #5526 About accepting non-English formats on input widgets

  • #6231 About the output format of the SelectDateWidget

  • #6449 About default format of displayed dates

  • #6483 About supporting European dates on javascript routines

  • #7509 About supporting different formats on SplitDateTimeWidget

  • #7656 About inheriting i18n features of AdminDateWidget

Number i18n

Right now, Django doesn't provide anything for localizing numbers on applications. All numeric values within Django applications are formatted using American formats. Users from many countries are not used to dealing with the American format, and a simple shop using Django can create confusion among users who, for example, expect the comma to be the decimal separator, and they find the point on prices.

As for the previous section, changes must be applied keeping backward compatibility.

So Django should display, and use by default the language of the current locale to format numbers. Basically that means:

  • Format numbers on templates using current session locale

  • Display and allow entering data using session locale on input widgets

With those changes next ticket should be fixed:

  • #3940 About comma as decimal separator

Translating dynamic content

Django has an amazing system for translating texts to any language. The only problem of this system is that it can just be used for static content (defined on source files, including templates), and not for dynamic content, created by users after deploying the web site. This can be useful for many different situations like an application that has a product catalog where product names and descriptions have to be translated, or a news website, where news can be translated to any language.

There are some external applications, widely used, that allow to do that on Django, but all of them have many different problems, like complex and tricky syntax for developers, ugly interface for users, bad design, bad scalability... Main applications are:

  • django-multilingual

  • transdb

  • django-transmeta

  • django-multilingual-model

A comparison of the two first applications, and some ideas for a better solution, can be found on a presentation at

http://docs.google.com/Presentation?docid=dfbzs3ks_7f2z85hvr&hl=en

Basically, a good solution to allow Django developers to translate their models should include:

  • An easy way to specify translatable fields on models (or outside the models)

  • An easy way to allow translating content using the admin or custom forms

  • Displaying translated fields in session language by default (allowing to get the value for a specific value)

  • A scalable way to save translations on the database

To achieve those targets a lot of analysis is required, so, just some ideas are detailed here.

For the model syntax there are many different options, some of them can be checked on this blog post, and this poll:

http://vaig.be/2009/03/django-multilingual-syntax-poll.html

http://doodle.com/aicvayf8ss2mxm2h

The most popular one is (using an example):

class MyModel(model.Model):
my_field = CharField()
my_i18n_field = CharField()

class Meta:
translate = ('my_i18n_field',)

A way to translate models (and whole applications) without modifying its code would be great, in order to translate applications that already exist.

For the database backend there are also different options, including:

  • To create a field on the model for every translation
  • To create a related model

There is just one generic ticket on Django that would be fixed:

  • #6460 About multilingual content on database

May be it's not possible having a generic solution that fits most of the user-cases, and in that case could be worth making some modifications on Django to make it easier creating external applications that can do this job.

Fix i18n bugs

There are many bugs already accepted on Django trac, that would be fixed on this Summer of Code. A better review will be done, but some of them could be:

  • #3907: LocaleMiddleware allows languages not supported by Django
  • #5494: Javascript catalog doesn't check project level locales
  • #7050: make-messages should ignore applications with custom localization

Timeline

The estimated time line for this project, detailed in a weekly basis is:

  • Week 01: Analysis and working environment setup

  • Week 02: Import CLDR

  • Week 03: Import CLDR

  • Week 04: I18n of dates and numbers

  • Week 05: I18n of dates and numbers

  • Week 06: I18n of dates and numbers

  • Week 07: Translation of dynamic content

  • Week 08: Translation of dynamic content

  • Week 09: Translation of dynamic content

  • Week 10: Translation of dynamic content

  • Week 11: Fix i18n bugs

  • Week 12: Fix i18n bugs

My dedication to the project will be full time, around 40 hours per week. A total of 480 hours are estimated for the whole project.

About me

My name is Marc Garcia, I'm from Barcelona, Europe, and I'm 29 years old.

I am studying computer science at Universitat Oberta de Catalunya, an Internet-based university from Barcelona. Currently I'm not working but I have almost 8 years of programming experience (with different technologies, mainly Python, PHP and VB).

I started using Django in 2006, and at this time I developed and participated on the development of many websites, as well as many reusable applications for Django.

As examples of reusable Django applications note:

  • django-stdimage: Saves ImageField files with standard names, allowing to delete them, and creating automatic thumbnails.

  • Transdb: Allows translating database content

  • django-transmeta: Also allows translating database content (different approach)

  • django-cart: Simple cart object to easily add/update/remove products to user session

As examples of websites, note next ones:

  • http://elisa.fluendo.com (main developer)

  • http://www.andalucia.org (developer of some parts, mainly the shop and the registration system)

  • http://www.muchomasqueunregalo.com (developer of the Django part of the web site, including the shopping system and product detail pages).

  • http://www.accopensys.com (only developer)

  • http://www.showroom.es (only developer)

  • http://www.tierratenis.com (only developer)

  • http://www.latelierdelraval.com (only developer)

  • http://www.restaurantalpunt.com (only developer)

I'm also one of the two official translators of Django to Castilian Spanish and Catalan. In addition, I was interviewed about localization on Django on This Week in Django 20 (on 2008-04-27). I maintain a blog with many Django related posts at http://vaig.be.

12 comments:

  1. Animo, es lo que necesito para pasar de symfony a django.

    El modelo de symfony, propel, tiene una internacionalizacion del contenido dinamico buena.

    ReplyDelete
  2. Awesome! Can't wait to have better support in Django for translated db content.

    ReplyDelete
  3. Hello, check this ticket during you work http://code.djangoproject.com/ticket/3591

    ReplyDelete
  4. It would be nice if you could also add i18n support for urls.py. For SEO compliance localized urls are important and at the moment its very hard to accomplish this.

    ReplyDelete
  5. whats wrong with the sites framework?

    ReplyDelete
  6. This is good news. I've been using django a lot lately (of course, you already know it), but I keep myself using plone for some jobs. One of the reasons behind that choice is Linguaplone, which allow users to manage translatable content in a very easy way (perhaps you should take a look there too and gather some ideas ;D).

    Ah!, and if you need some help with this, just ask me, as usual.

    ReplyDelete
  7. Have you checked Babel (http://babel.edgewall.org/wiki/BabelDjango)? It already has a mechanism for storing all localization data from (possibly) all languages of the world and also it has functions for converting dates and numbers to localized format and vice versa.

    I have already written form fields and widgets for localized input using Babel. They should just be tested. I could share the code, just contact me by bendoraitis (a.t) studio38 (d.o.t) de if you are interested.

    ReplyDelete
  8. Another very important goal for translatable fields is being able to sort items by the translated values and search in translated values in administration.

    And another problem to solve with translatable fields is how to define fixtures if the number of languages is varying.

    ReplyDelete
  9. Hi,
    this ticket could be useful:
    http://code.djangoproject.com/ticket/7231
    See the example.

    ReplyDelete
  10. Wouter van der GraafMay 20, 2009 at 10:13 AM

    As Aidas mentions above, changing the ordering of a model to the natural language in request.LANGUAGE_CODE in the admin site is a very important feature, especially when users of the same admin speak different languages.

    Also, I'm involved in a discussion about using gettext all the way, also for translating model fields. Not sure if that's possible or something you'd consider. You might want to drop a line there: http://groups.google.com/group/django-users/browse_thread/thread/d301bb407d53b3eb.

    ReplyDelete
  11. I come across your post while i was searching how to convert decimal separator from . to , in django template. I think it is better to use template filter for this kind of format conversion. Also you can see my post http://unilixmania.blogspot.com/2009/07/at-this-moment-i-am-writing-django.html

    ReplyDelete
  12. Hi Marc, I appreciate your afford to make django more i18n.

    Concerning the "translation of dynamic content in db" I've been using django-multilingual for more than a year. I'm very satisfied. I think best you can do is to use core of django-multilingual make new syntax of model as you show the most popular one, change the position of fields in admin (one field for all languages one beside other) and enable sorting in admin.

    Then it would be fine to have a django_languages model where would be fixtures of all languages of the world to be able to use these ids in translated tables.

    class Languages(models.Model):
    laguage_code = models.CharField(max_length=5) #de-at
    original_name = models.CharField(max_length=100) # Deutsch
    name = models.CharField(max_length=100)

    class Meta:
    translate = ('name',)

    ReplyDelete