Web Browser Language Detection

All web browsers pass language information to web servers when requesting content. If your website makes content available to users in a variety of languages, this information can be read by code running on the server and return content in the most appropriate language. Many websites (e.g. Google) use this information.

I was searching around on the internet for a test page that would show this raw data to me. I couldn’t quite find what I was looking for, so I wrote my own. Feel free to use it yourself to see what your browser is telling web services about your language preferences.

Here is an example of the data one web browser (IE9) sent to my web server:

en-US,en;q=0.8,de-CH;q=0.5,de;q=0.3

 

In this case, the browser is telling the server the following:

  1. My most preferred language is US English – give me that content if you have it.
  2. If you don’t have US English, ok, just give me generic English if you have it.
  3. If you don’t have English at all, that’s fine, give me Swiss German if you have it.
  4. Don’t have Swiss German either? OK, give me generic German.

 

Things To Watch Out For

In the process of creating my test page I was reminded that, while all browsers send language information to the server, there are differences in implementation from one browser to the next that can make developer’s lives difficult:

 

  1. Language list: The supported language list varies significantly from one browser to another, both in the number of languages and language variants that each browser allows users to choose from, and where some of the language IDs are implemented differently as well, even for major languages. Chinese could certainly be considered a very major world language, but some of the “newer” RFC 5646 variants are not implemented uniformly. Here are the varieties the current browsers support for Chinese today:
    Firefox 4 zh,zh-cn;q=0.8,zh-hk;q=0.6,zh-sg;q=0.4,zh-tw;q=0.2
    Chrome 11 zh-CN,zh;q=0.8,zh-TW;q=0.6
    IE 9 zh,zh-CN;q=0.9,zh-SG;q=0.8,zh-HK;q=0.6,zh-MO;q=0.5,zh-TW;q=0.4,zh-Hans;q=0.3,zh-Hant;q=0.1

    As you can see, IE “wins” here and is the only browser that supports the ISO 15924 script identifiers Hans (Simplified Han) or Hant (Traditional Han) that were included in RFC 5646 within the past couple of years, and is the only browser to support Chinese (Macao SAR). Chrome’s options were surprisingly limited. You can find this type of variation or inconsistent browser support for other languages as well.

     

  2. Capitalization is inconsistent: You’ll probably want to lowercase everything when handling in code, for consistency.
    Firefox 4 en-us,en;q=0.8,de-ch;q=0.5,de;q=0.3
    Chrome 11 en-US,en;q=0.8,de-CH;q=0.6,de;q=0.4
    IE 9 en-US,en;q=0.8,de-CH;q=0.5,de;q=0.3

     

  3. The “Q” factor is inconsistent: Probably not a big deal, as it should not change the overall order of preference, but if you are using it, I found Chrome to be slightly “off” in its weight values compared to IE and Firefox.
    Firefox 4 en-us,en;q=0.8,de-ch;q=0.5,de;q=0.3
    Chrome 11 en-US,en;q=0.8,de-CH;q=0.6,de;q=0.4
    IE 9 en-US,en;q=0.8,de-CH;q=0.5,de;q=0.3

     

  4. IE user-specified values: Internet Explorer allows users to specify their own unique value, so you may see some strange values appearing. While it may be garbage, it also may be a serious attempt by the user to overcome a limitation in the available predefined list. Firefox/Chrome just allow users to choose from predefined lists.
    IE 9 helloworld,en-US;q=0.8,en;q=0.6,de-CH;q=0.4,de;q=0.2

     

  5. No data: It is possible that the string you receive from the browser to be blank, if the user has removed all languages. Internet Explorer has always allowed users to do this, and Firefox also allows it. Chrome does not, as it ties its language settings into the UI language settings for the browser UI itself.

 

The Bottom Line

When you’re writing your case statement to parse the browser language string, make sure you cover the variants as well. Don’t assume all Chinese speaking users will send you zh-CN or zh-TW. Decide what to do with the neutral zh. And so on. The bottom line is that a proper case statement is going to be more complicated than you may have believed initially.

I hope you found this little guide useful. Let me know in the comments.

 

JQuery Globalization/i18n plugin – Open Source from Microsoft!

Your users are international. They deserve to experience your software in a way which makes sense to them. Just as a US user has a right to expect that the date displayed on your website is in US format (month-day-year), a UK user has the right to see that same date displayed in day-month-year format. This often leads to confusion: is 11/12/2010 November 12th or December 11th? There is simply no way to know, and both the US and UK user visiting your site will form their own interpretation. Only one will be right. And I feel like I’m stating the obvious – bottom line, you should know this already, and you should want to do the right thing.

But sometimes you find yourself using a technology platform which doesn’t provide good support for locale or culture data formats. JavaScript would be a good example. If the technology doesn’t provide a way, you need to consider using an alternate technology that provides better support for internationalization, or be prepared for some ugly workarounds and/or a poor user experience.

Earlier this month, JavaScript got a lot better! JQuery now has an approved plugin from Microsoft which provides all this formatting data for you to use. I suggest you use it!

Here are links to Scott Guthrie’s blog on MSDN for more information: First, an in-depth explanation of how it works, with examples and sample code. Second, a follow-up post from early October, confirming that it has been accepted as an official JQuery plugin. (Important for anyone about to flame me after seeing it described as a “prototype” in the first link…!)