View Single Post
  #15 (permalink)   Report Post  
Lewis Perin
 
Posts: n/a
Default Japanese Chinese tea web sites

"Space Cowboy" > writes:

> Lewis Perin wrote:
> > Warning: nerdy details abound here!
> >
> > "Space Cowboy" > writes:
> > >
> > > Lewis Perin wrote:
> > >> [...why are there Chinese tea names that appear only in Japanese sites...]
> > >
> > > The charset=shift_jis of the webpage indicates Japanese. All 2
> > > character pairs are used for Japanese font sets. The characters you
> > > see are from the Japanese fonts and not Chinese. That character may
> > > very well exist in the Chinese font set and vice versa but the charset
> > > setting on the HTML page tells where to look. Basically non Roman
> > > languages take two characters for representation and a corresponding
> > > font set. For example the Cha character in Japanese JIS is 3567 and
> > > simplified Chinese GB 1872.

> >
> > Yes, but it's still the same Unicode code point (33590, or 8336 in
> > hex), which is why you get both .cn and .jp web sites if you Google
> > for it.

>
> Only if the Chinese or Japanese websites uses Unicode codepoints such
> as 8336. There are plenty of Chinese and Japanese sites that use
> charset=UTF-8.


But UTF-8 *is* Unicode. More pedantically, it's an encoding of
Unicode. The codepoints exist at the abstract level of Unicode; the
encodings, like UTF-8, mediate between that level and what you see in
your browser. See

http://www.unicode.org/standard/principles.html

for an explanation.

> I'm not sure of the particulars but you can also mix language sets
> on a webpage. I use Unicode strings for Google searches. I could
> get additional hits if I used JIS or GB strings but I only track
> Unicode. On TaoBao I have to use GB strings. Ebay China uses
> Unicode.


JIS, GB, and Big5 are all parts of Unicode.

> Babelfish doesn't accept Unicode strings.


Do you mean Babelfish or Babelcar? If it's the latter, and you want
to try the alpha version that searches on Chinese characters, email me.

/Lew
---
Lew Perin /
http://www.panix.com/~perin/babelcarp.html