Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> it treats filenames like a binary blob

Which can result in files with seemingly identical names when one UTF-8-encoded file name uses combining characters and the other one uses precomposed ones. Not to mention the fact that e.g. Å is contained in Unicode as both "latin capital letter A with ring above" and "Angstrom sign".

After you've solved encoding, next comes Unicode normalization ;-)



It is not a big deal as long as seemingly identical file names are treated by OS as different.

Unicode normalization is not the only problem here, e.g. Latin 'a', 'e', 'T' are exactly the same as Cyrillic 'а', 'е', 'Т' in most fonts which makes it possible for two files to have seemingly same names even in some 8-bit encodings.


Even Latin 'I' and 'l' and the digit '1' are visually indistinguishable in some fonts! So are trailing spaces and different numbers of spaces. This is such a pervasive problem that maybe we can just give up and expect users to get used to it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: