Friday, May 6, 2011

HTML encoding: eastern european languages

My program is fetching messages from a database, which contains English, German and several Eastern European languages. My Python script sets the encoding via:

<meta  http-equiv="Content-Type" content="text/html; charset=utf-8"/>

and use the values fetched correctly from the database (if I check within my logs).

Unfortunately all browsers I tested (IE8, Firefox 3.0.10, Opera 9.64) switch based on my local language settings to:

  • Western ISO-8859-1 in Firefox
  • Western European (Windows) in IE
  • Automatic in Opera

Everything works fine as soon as I switch the character encoding manually in the browser.

The same happens if I manually generate the HTML file using UTF-8 (tested with TextMate respective jEdit), although both editors display the content correctly.

That works fine for English and German, but i.e. not for Russian. How can I force the "correct" character encoding?

ANSWER

The following entry within the VirtualHost (Apache configuration) section did the trick for me:

AddDefaultCharset utf-8

Many thanks for pointing me into the right direction, that helped a lot!

From stackoverflow
  • When the document is transfered over HTTP, the HTTP header information are the crutial information:

    […] conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):

    1. An HTTP "charset" parameter in a "Content-Type" field.
    2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
    3. The charset attribute set on an element that designates an external resource.

    So make sure you declare the character encoding in the Content-Type header field and not just inside the document.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.