Unicoding the Esperanto Wikipedia (Part 3 of 4) Posted by Chuck Smith on Jan 10, 2011 in Uncategorized
In the first part of this series, I told how I found the Esperanto Wikipedia and then in the second part of how I founded it. Now you’ll learn of how technical issues were dealt with and their influence on the entire multilingual Wikipedia project!
So, this leaves one more problem… the site doesn’t support Esperanto letters! You see, Esperanto has some special letters (ĉĝĥĵŝŭ) which were no problem to type on Zamenhof’s typewriter in 1887. How could he know that one day Americans would “advance” technology with computers to make typing these letters difficult today? In any case, he stated that whenever you couldn’t use these special characters, you could always write using the h-system. That means that a letter with a circumflex would be written normally and followed by an h (ĉ = ch, ĝ = gh, etc). The breve over the u is simply dropped. Also note that a similar system is used in German as well (ä = ae, ö = oe, ü = ue, ß = ss).
Some purists on the Internet didn’t like that such text couldn’t be automatically translated to “proper Esperanto” by a script, so some people started to use the unambiguous, but uglier x-system (ĉ = cx, ĝ = gx, … ŭ = ux). The h-system could be problematic for such Esperanto words like flughaveno, chashundo, traumata. Is that last word traŭmata (traumatized) or tra-um-ata (went through)? Such occurrences are rare, but it would indeed be confusing to see fluĝaveno or ĉaŝundo. In any case, we didn’t want any ambiguity in the Vikipedio, so we stuck to the x-system, drawing criticism from some h-system purists.
Brion Vibber started investigating the Wikipedia code for the sole purpose of allowing us to edit with the x-system, but have it automatically convert to show properly in unicode. On January 10, 2002, he succeeded in converting the entire Esperanto Wikipedia over to unicode… the first language on this super-modern text encoding standard (Unicode 2.0 was only finalized in 1996!) I tried to convince him to store the text (and page titles in URLs) with the x-system and just display them in unicode, but his foresight far surpassed mine. [Ok Brion, I admit it now, you were right!]
He then went on to convert every other language Wikipedia into unicode which finally solved the Japanese and Russian problem mentioned earlier as well making it viable to write in many other previously-neglected languages. Given his current knowledge of the depths of the code, he became the first full-time employee on the Wikipedia project as lead developer. Many people should be thankful for the incredible software support that Brion has given to the multilingual Wikipedia. Go ahead and mark June 1 on your calendar now as Brion Vibber Day.
In my next and final part, I discuss how the Wikipedia spread so quickly through the Esperanto community and the steps I took to fuel the fire.