UNITE FOR CHILDREN-- UNICEF

About This Web Site: Internationalisation

Internationalisation of the World Wide Web is an accessibility issue. The medium is used everywhere and this means that it should be accessible to all people, regardless of their language, script, writing system or cultural conventions.

The handling of language is of course the foundation of Internationalisation although other cultural factors such as the handling of numbers and dates should also be considered.

Language handling on the World Wide Web requires attention to two aspects:

  • Unicode allows character encoding of text in the principle languages to be handled in a uniform way for searching, sorting and manipulation without loss or corruption of data.
  • XHTML which provides the means to identify and treat language in a consistently machine-readable way.

The handling of language by Unicode and XHTML is demonstrated in the following passage of text:

The Arabic numeral system was introduced to Europe through the work of the Persian mathematician Abu Abdullah Muhammad bin Musa al-Khwarizmi. His name, خوارزمى in Persian or أبو عبد الله محمد بن موسى الخوارزمي in Arabic, is the root of the term ‘algorithm’ and ‘algebra’ is derived from the title of his book on the subject, Al-Jabr wa-al-Muqabilah.

Readers of Arabic will see that the mathematician’s name reads from right-to-left in the main body of left-to-right reading text. XHTML identifies the language of the shorter Persian form of his name as Farsi, the longer version as Arabic and specifies the right-to-left direction of text in both cases for machine-readability. A screen reader with the capacity to read all three languages should be able to handle the text correctly.

Language

Unicode allows for publication of both Turkish and English on the same page without recourse to using a local character set (in the case of Turkish this could be ISO 8859-9 or windows-1254 amongst several other options).

All pages are presented either in Turkish or English. Students of either language will find it useful that each page is directly linked to its alternative version. This comparative feature is especially useful on international agreements such as the Convention on the Rights of the Child (CRC) and the Universal Declaration of Human Rights (UDHR) since it is possible to compare the Turkish version with the authentic, or official, English version.

Users of screen reading applications such as JAWS should be aware that English pages are declared in British English since it is currently the official language of the United Nations. Some fine-tuning may be required on the user’s part to account for spellings such as ‘kilometre’ and ‘centre’ which are not handled well by browsers that are programmed to speak the American form of English.

Turkish pages are of course set for Turkish. Where Turkish words or phrases occur on English pages, they are ‘wrapped’ with the appropriate language attribute (tr) and vice-versa (en-GB).

Language is declared using the xml:lang attribute, which allows language to be identified in a consistent cross-platform manner on a given page or within any element on that page. For example, while the default language of this page is English, the phrase Kız Çocuklarının Eğitimi is appropriately declared as Turkish. Occasional nouns or phrases that aren’t names are translated via the title attribute. When the cursor is placed over a word or words in Turkish a tool-tip will flag the appropriate translation.

Numeral system

Figures are rendered using the British and American convention with a full point for the decimal or radix and a comma to separate thousands ie: 1,262.50 or one thousand two hundred and sixty-two point five.

In Turkey, the system of units does not follow a consistent pattern. Generally the roles of the comma and full point are reversed, rendering the figure in our example as 1.262,50. Many individuals and institutions use the SI (Système International d’Unités) whereby a space is used to separate thousands ie: 1 262,50. This is the most widely used international system of units. However, there is a further complication in that figures rendered according to the SI system in Turkey often use the full point instead of the comma ie: 1 262.50.

Mixtures of these quite different conventions are sometimes used interchangeably by the same institution or individual.

The decision to use the British and American convention was based on three factors:

  • the need to represent statistics and figures consistently;
  • whitespace used to separate thousands in the SI system can be problematic in terms of machine-readability;
  • assistive technologies such as speech readers are more likely to process figures correctly when they are represented in this way.

Date

The short presentation of dates can also be the cause of some confusion.

Americans will read 04/03/05 as April 3rd, 2005 whereas Europeans will read it as 4th of March 2005 and Japanese will read it as 5th March 2004.

We have used the International standard (ISO 8601) where a short form of the date is required such as for headings in the Press section. ISO 8601 recommends unambiguous use of four digits to denote the year at the beginning: YYYY/MM/DD. This is consistent with 24-hour time notation where the largest units of hours come before the smaller ones -- minutes followed by seconds.

The ISO standard does not apply to language-dependent dates that are explicitly written out such as the example preceding the Search Engine on this page (visible only to browsers that support JavaScript).

 ◀ Previous page  |   ▶ Next page