0

Changing the file encoding has done something terrible.

Alice Bevan-McGregor 6 years ago • updated by Alexander Blach (Developer) 6 years ago 4

Apologies for the somewhat non-specific title.


All of my files should be in UTF-8.  For source code it doesn't really matter (since valid symbols are in the ASCII 7-bit range and the primary base language is English) but when editing a gettext .po symbols file to update some of the translations I noticed, too late, that the encoding that was selected was ISO-8859-1†.  Before changing it, all of the previously UTF-8 text looked fine, or at least, I didn't notice any problems.  After changing the encoding back to UTF-8 and saving the file (where the file looks OK in the editor) and recompiling my .po file, the webapp encoding is completely destroyed.  Closing and re-opening the file, I get text like:


"les données à  une analyse"


The webapp presented this even worse if I attempted to manually set the browser encoding to ISO 8859-1; double conversion is ugly business.  Luckily I version control everything, so it could be rolled back, but a close look needs to be given to how Textastic handles file encoding.  So far I haven't been able to reproduce, but I will try to create a basic test case for you after work.


† Incorrectly called "ISO Latin 1" in your menus; it's "Latin 1" or "ISO 8859-1"… only Microsoft calls it by a mixed name as ISO standards are all numbered. ;)  More correct would be "Western (ISO 8859-1, Latin 1)", same for all of the other ISO 8859 variants, but that's a separate issue.

Things get weirder, actually, as I try to recover from this problem.


I close the file, revert the commit, and re-open the file.  The encoding is still wrong, and the symbols are still garbled.  I open the file in another editor, the encoding is correctly guessed and the symbols are fine.  See screenshot, Textastic on the right:




And yes, I like my editors to look a certain way.  ¬_¬  ^_^

Can you send me the file from the reverted commit so I can have a look at it and see why it is opened with the wrong encoding?


Did you try to use "Reopen with Encoding" from the File menu to open it as UTF-8?

The encoding names actually come from the operating system.


I'm using the Core Foundation function CFStringGetNameOfEncoding:


https://developer.apple.com/library/mac/documentation/CoreFoundation/Reference/CFStringRef/Reference/reference.html#//apple_ref/doc/uid/20001211-CH201-F11146


TextWrangler has the same names for example.


Even Apple's TextEdit uses those encoding names in its preferences.