![]() After that I can simply run iconv('MACROMAN', 'UTF-8', $str) or iconv('Windows-1252', 'UTF-8', $str) to receive UTF-8 valid string to go forward with. So using the contains_non_utf8 function to detect when a string first has character(s) in it that's not utf-8 encoded, then a function to detect whether it's MacRoman or Windows-1252. ![]() If not then I do nothing to the string and continue as usual. If one of the above is true/assumed then i use iconv to convert the string to UTF-8. * If the string contains one of the following bytes then assume Windows-1252: 0x92, 0x95, 0x96, 0x97, 0xAE, 0xB0, 0xB7, 0圎8, 0圎9, 0xF6 * If the string contains one of the following bytes then assume MacRoman: 0x8E, 0x8F, 0x9A, 0xA1, 0xA5, 0xA8, 0xD0, 0xD1, 0xD5, 0圎1 When reading data from the CSV I use the is_non_utf8 check on the string and if true I run the following logic: I'm going to go ahead and answer my own question with the solution I ended up with.Īs you'll read above in the question the last update I added is pretty much the end solution.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |