Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can't effectively use one character set well for everything; different applications have different requirements.

In our application, our users gets data from systems around the world, and might have to change some of it before sending a file with the data to some official system. The data includes names of people and places. How would you do this using character sets?

One file might need to contain names with Cyrillic characters and with Norwegian characters. There's no character set with both. Should each string in the file have an attribute saying which character set the string is encoded in? What are the odds that people implementing that won't mess that up when oh so many can't even get a single encoding attribute right[1]?

Or, just maybe, strings in the file could be Unicode, encoded in say UTF-8, so that the handling of all of them are uniform...

[1]: https://www.w3.org/TR/xml/#charencoding



> Or, just maybe, strings in the file could be Unicode, encoded in say UTF-8, so that the handling of all of them are uniform...

Actually, that won't work. There are cases where a character may be different according to the language, where capitalization may differ depending on the language, where sort order may depend on the language, etc.


If your application is allowing users to edit the text, or if you know which languages will be used, or if you don't care about capitalization, then you don't have to worry about any of those edge cases, and Unicode is useful.


Unicode solves all that. It has case folding rules to handle capitalization differences. It has collation rules to handle sorting differences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: