![]() |
|
Welcome to FoodBanter.com forums which provide access to the finest food and drink related newsgroups. You are currently viewing our boards as a guest which gives you limited access to view most newsgroup discussions and access our other FREE features. By joining our free community you will have access to post topics to the food related newsgroups, communicate privately with other FoodBanter.com members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today! If you have any problems with the registration process or your account login, please contact support. |
|
|||||||
| Tea (rec.drink.tea) Discussion relating to tea, the world's second most consumed beverage (after water), made by infusing or boiling the leaves of the tea plant (C. sinensis or close relatives) in water. |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
All,
I just found an interesting extension for all of you Unicode fans out there who post chinese characters. There is an extension for Mozilla Thunderbird called "Mnenhy" that allows encoding and decoding of text between various formats. One can control what is converted by highlighting the selection. For example, if I see 茶, I can determine its unicode value by highlighting it and selecting "Encode-Decimal" and it is replaced with 33590, the unicode value for 茶 (cha). I'm sure there are other ways to do this but I found this and thought some folks here might find it helpful. I also found: http://www.csse.monash.edu.au/~jwb/c...wwwjdic.cgi?1C which looks like it might be quite helpful for the person trying to find Japanese translations. It seems to also translate some Chinese although maybe the languages have similarities in their noun-space. For example, I found pu-erh tea in there. It translated it as 普アル茶. This is close to what Mike has on his website (普洱茶), although it seems to substitute the middle character for two characters. -- Steven Hay moc.liamg at evets.yah |
|
|||
|
There are more web pages in the native language sets than Unicode.
I've developed routines that convert from the two major Chinese native language sets GB2312 Simplified BIG5 Traditional and the two Japanese language sets JISX208 SHIFT_JIS to Unicode. I use Unicode.Org to see the glyph and the Unicode character for Google searches. I did some previous posts on the process. In summary download the Unicode CJK table from Unicode.Org. Use the Simplified and Traditional language pairs to do a lookup for the Unicode. The JISX208 Japanese code stored on Unicode is the KUTEN value. You need to convert from JISX208 and SHIFT_JS to KUTEN. All 32 bit MS OSes are Unicode compliant except for 95,98,Me which are 16 bit. It takes 4 bytes to store a UTF-8 and UTF-16 value. Jim PS: I'll let Kuri explain the two 'Japanese' characters for Pinyin ER. In this case PU and CHA are intact. Curious Unicode doesn't show any KUTEN value for the two 'Japanese' characters. Steve Hay wrote: All, I just found an interesting extension for all of you Unicode fans out there who post chinese characters. There is an extension for Mozilla Thunderbird called "Mnenhy" that allows encoding and decoding of text between various formats. One can control what is converted by highlighting the selection. For example, if I see 茶, I can determine its unicode value by highlighting it and selecting "Encode-Decimal" and it is replaced with 33590, the unicode value for 茶 (cha). I'm sure there are other ways to do this but I found this and thought some folks here might find it helpful. I also found: http://www.csse.monash.edu.au/~jwb/c...wwwjdic.cgi?1C which looks like it might be quite helpful for the person trying to find Japanese translations. It seems to also translate some Chinese although maybe the languages have similarities in their noun-space. For example, I found pu-erh tea in there. It translated it as 普アル茶. This is close to what Mike has on his website (普洱茶), although it seems to substitute the middle character for two characters. -- Steven Hay moc.liamg at evets.yah |
|
|||
|
The other thing I noticed is the Chinese character for ER3 only exists
in JIS212 fontset. I don't know what would happen if you pasted in a typical JIS208 IME. Probably as you described. You will get some hits on Japanese webpages for Puer with the Unicode string that Steve provided. However probably due to same paste problem you described. I also understand why Unicode.Org didn't provide any information for the two Unicode characters but defaulted to erroneous Japanese Unicode strings from the Japanese WWW Edict server which provided Steve's string in the first place. I tried I don't know how he came up with the string in the first place. In other words you can't plug the two characters back into EDICT and find a definition for either. Jim kuri wrote: "Steve Hay" wrote in message For example, I found pu-erh tea in there. It translated it as 普アル茶. No, it isn't a translation. There word was transformed when it was pasted into a Japanese program. "アル" (aru) is the reading of 2nd character written in katakana (Japanese phonetic reading). The problem is most Japanese programs don't display systematically the 洱 character because they don't have the fonts. If you insist to write 普洱茶, the Japanese computer that don't get fonts for the 2nd character will transform it. Here it was transformed into its*reading* (it could have been cut into 2 characters, replaced by something unrelated, not displayed... ). Usually to avoid display problem, they write "puer" in phonetics : プアール茶 or プアル茶 or プーアール茶, and even "puer cha" completely in phonetics :. プーアールチャ. On packages in Japan, they write the Chinese characters + Japanese reading. Kuri |
|
|||
|
"Steve Hay" wrote in message For example, I found pu-erh tea in there. It translated it as 普アル茶. No, it isn't a translation. There word was transformed when it was pasted into a Japanese program. "アル" (aru) is the reading of 2nd character written in katakana (Japanese phonetic reading). The problem is most Japanese programs don't display systematically the 洱 character because they don't have the fonts. If you insist to write 普洱茶, the Japanese computer that don't get fonts for the 2nd character will transform it. Here it was transformed into its*reading* (it could have been cut into 2 characters, replaced by something unrelated, not displayed... ). Usually to avoid display problem, they write "puer" in phonetics : プアール茶 or プアル茶 or プーアール茶, and even "puer cha" completely in phonetics :. プーアールチャ. On packages in Japan, they write the Chinese characters + Japanese reading. Kuri |