Unicoding the Esperanto Wikipedia (Part 3 of 4)

Posted on 10. Jan, 2011 by in Uncategorized

In the first part of this series, I told how I found the Esperanto Wikipedia and then in the second part of how I founded it. Now you’ll learn of how technical issues were dealt with and their influence on the entire multilingual Wikipedia project!

So, this leaves one more problem… the site doesn’t support Esperanto letters! You see, Esperanto has some special letters (ĉĝĥĵŝŭ) which were no problem to type on Zamenhof’s typewriter in 1887. How could he know that one day Americans would “advance” technology with computers to make typing these letters difficult today? In any case, he stated that whenever you couldn’t use these special characters, you could always write using the h-system. That means that a letter with a circumflex would be written normally and followed by an h (ĉ = ch, ĝ = gh, etc). The breve over the u is simply dropped. Also note that a similar system is used in German as well (ä = ae, ö = oe, ü = ue, ß = ss).

Some purists on the Internet didn’t like that such text couldn’t be automatically translated to “proper Esperanto” by a script, so some people started to use the unambiguous, but uglier x-system (ĉ = cx, ĝ = gx, … ŭ = ux). The h-system could be problematic for such Esperanto words like flughaveno, chashundo, traumata. Is that last word traŭmata (traumatized) or tra-um-ata (went through)? Such occurrences are rare, but it would indeed be confusing to see fluĝaveno or ĉaŝundo. In any case, we didn’t want any ambiguity in the Vikipedio, so we stuck to the x-system, drawing criticism from some h-system purists.

Brion Vibber thinking about Unicode

Brion Vibber started investigating the Wikipedia code for the sole purpose of allowing us to edit with the x-system, but have it automatically convert to show properly in unicode. On January 10, 2002, he succeeded in converting the entire Esperanto Wikipedia over to unicode… the first language on this super-modern text encoding standard (Unicode 2.0 was only finalized in 1996!) I tried to convince him to store the text (and page titles in URLs) with the x-system and just display them in unicode, but his foresight far surpassed mine. [Ok Brion, I admit it now, you were right!]

He then went on to convert every other language Wikipedia into unicode which finally solved the Japanese and Russian problem mentioned earlier as well making it viable to write in many other previously-neglected languages. Given his current knowledge of the depths of the code, he became the first full-time employee on the Wikipedia project as lead developer. Many people should be thankful for the incredible software support that Brion has given to the multilingual Wikipedia. Go ahead and mark June 1 on your calendar now as Brion Vibber Day.

In my next and final part, I discuss how the Wikipedia spread so quickly through the Esperanto community and the steps I took to fuel the fire.

About Chuck Smith

I was born in the US, but Esperanto has led me all over the world. I started teaching myself Esperanto on a whim in 2001, not knowing how it would change my life. The timing couldn’t have been better; around that same time I discovered Wikipedia in it’s very early stages and launched the Esperanto version. When I decided to backpack through Europe, I found Esperanto speakers to host me. These connections led me to the Esperanto Youth Organization in Rotterdam, where I worked for a year, using Esperanto as my primary language. Though in recent years I’ve moved on to other endeavors like iPhone and Ouya development, I remain deeply engrained in the Esperanto community, and love keeping you informed of the latest news. The best thing that came from learning Esperanto has been the opportunity to connect with fellow speakers around the globe, so feel free to join in the conversation with a comment!

8 Responses to “Unicoding the Esperanto Wikipedia (Part 3 of 4)”

  1. Paŭlo Ebermann 10 January 2011 at 18:09 #

    You have a little mis-translation here:
    “traŭmata” = related to “traŭmato” (= wound/trauma), so it would mean about “traumatic”.

    It is not related to “dream” (this would be “sonĝata” or “revata”).

    (Evidente vi jam tro longe loĝas en Germanio.)

  2. russ 11 January 2011 at 07:51 #

    Psst, your German is showing: “traŭmata” does not mean “dreamed” in Esperanto despite the German “Traum”… :)

    Cool that the Esperanto Wikipedia was the impetus for fixing the code to use Unicode.

  3. Chuck Smith 11 January 2011 at 09:13 #

    Thanks Russ! I just changed the meaning to traumatized. I have to admit, I can’t remember seeing this word used before, so I hope I hit the definition right here.


Leave a Reply