View Single Post
  #3 (permalink)   Report Post  
Lewis Perin
 
Posts: n/a
Default Japanese Chinese tea web sites

Warning: nerdy details abound here!

"Space Cowboy" > writes:
>
> Lewis Perin wrote:
>> [...why are there Chinese tea names that appear only in Japanese sites...]

>
> The charset=shift_jis of the webpage indicates Japanese. All 2
> character pairs are used for Japanese font sets. The characters you
> see are from the Japanese fonts and not Chinese. That character may
> very well exist in the Chinese font set and vice versa but the charset
> setting on the HTML page tells where to look. Basically non Roman
> languages take two characters for representation and a corresponding
> font set. For example the Cha character in Japanese JIS is 3567 and
> simplified Chinese GB 1872.


Yes, but it's still the same Unicode code point (33590, or 8336 in
hex), which is why you get both .cn and .jp web sites if you Google
for it.

> The Glyph representation from both will look the same and the same
> argument for "zhou da tie cha" in Japanese JIS and Chinese GB where
> the Glyphs look the same but not the pairs.


But Google, smart though it is, can't see the glyph; it can only see
the codepoint in whatever encoding is there. I've run these through
the Unihan database, and they're the Chinese codepoints that
correspond to the Pinyin on the same line of the page.

> Google will find computer strings anywhere which in your case just
> happens to be on web pages with charset indicating JIS. It looks
> like to me you did a post with Linux which comes with default
> international language support.


BSD, actually, but I didn't post anything that wasn't ASCII.

> In Windows you optionally load the Unicode font set called
> CJK for Chinese, Japanese, Korean which is the international
> standard to replace national language sets like JIS and GB.


Right, I use that a lot.

Thanks, Jim, for trying, but I don't see how this explains the phenomenon.

/Lew
---
Lew Perin /
http://www.panix.com/~perin/babelcarp.html