Unicode nearing 50% of the web

According to a recent post from the Google Blog, Unicode nearing 50% uptake on the web. A rather steep graph as well:

unicode uptake graph

This is pretty good news. I've had the 'pleasure' of working with a number of integration project where the 3rd party was still using iso-8859-1 (aka latin-1). Usually when this is the case, its not by choice but because of their software's default settings (Browsers, MySQL, etc.). I for one hope non-unicode charsets will soon be a thing of the past.

One other note in the post was about ligatures, such as fi and the dutch ij. If this is the first time you heard about these, you might be surprised to see that you can (likely) only copy-paste ij as a whole, and not just the i or j. It's one unicode character, not two. It just made me wonder: what kind of software would generate these, and more importantly why?


7 Responses to Unicode nearing 50% of the web

  1. 1066 Dave 2010-01-29 1:30 pm

    "It just made me wonder: what kind of software would generate these, and more importantly why?"

    Well, the answer is right there in the post you referenced, it just looks better in documents intended for printing: "[...] especially generated PDF documents."

  2. 1067 Jordan Walker 2010-01-29 2:01 pm

    Let the battle and competition rage.

  3. 1068 Evert 2010-01-29 6:03 pm

    @Dave,

    Maybe I'm crazy, but shouldn't it be a job of the font to make a combination of 2 characters look better?

  4. 1069 Lars Gunther 2010-01-29 6:24 pm

    And of course this means that PHP 6 is becoming more important with each day. But is it in sight?

  5. 1070 Jay Pipes 2010-01-29 7:53 pm

    Drizzle got rid of all non-UTF-8 character sets a long time ago. The web is UTF8 and so should be the data behind it.

    One minor thing, though. UTF-8 != Unicode :) UTF-8 is technically just a mapping of Unicode code points to a range of values.

    I would argue that the web has standardized on UTF-8, not UCS4, UTF-32, UTF-16 or other Unicode tranformation mappings...

    Cheers!

    jay

  6. 1071 Nelson Menezes 2010-01-30 11:41 am

    As mentioned above, ligatures simply look better on print or large font sizes on-screen.

    If you are getting situations where ligatures are being copied-pasted then someone screwed up -- the ligatures are meant to be applied on rendering only, not on source material. So, it would be the job of a browser to introduce ligatures on screen, but still allow copy/paste of individual characters.

    BTW, great things are coming... http://hacks.mozilla.org/2009/10/font-control-for-designers/

  7. 1072 Joost 2010-02-03 8:26 am

    Ligatures like IJ are also important because of capitalization rules, I know Bing Maps only uppercases the first letter, which is wrong in Dutch.

    http://www.bing.com/maps/#JnE9eXAuaGV0K2lqJTdlc3N0LjAlN2VwZy4xJmJiPTUzLjAxOTQzMDQyMDYxODIlN2U1LjYzOTk5NTU2MDA1MDAxJTdlNTMuMDAzNzU3NTgxOTI4JTdlNS42MDAwODQyODk5MDg0MQ==

    http://maps.google.nl/maps?f=q&source=s_q&hl=nl&geocode=&q=het+ij&sll=52.469397,5.509644&sspn=3.935848,9.876709&ie=UTF8&hq=&hnear=Het+IJ&ll=52.369992,4.997234&spn=0.030814,0.077162&z=14

Leave a Reply



About

My name is Evert, and I've been writing semi-regularly on this blog since 2006.

I'm currently available for contract work.

more info.

Subscribe

Dropbox

Dropbox is a simple cross-platform online backup and sync application. The first 2GB of space is free, and both you and me get an extra 250MB extra space if you sign up through this link.