I am exporting text files from 2 queries in ms access 2010. Queries are from different linked ODBC tables (but tables are different only by data, structure and data types are same). I set up export specification to export text file in utf-8 encoding for both files. Now here come the trouble part. When I export the queries and open them in notepad, one query is in utf-8 and second one is in ANSI. I don't know how is this possible when both queires has the same export specification and it is driving me crazy.
This is my VBA code to export queries:
DoCmd.TransferText acExportDelim, "miniflow", "qry01_CZ_test", "C:\TEST_CZ.txt", no
DoCmd.TransferText acExportDelim, "miniflow", "qry01_SK_test", "C:\TEST_SK.txt", no
I also tried to modify it by adding 65001 as coding argument by the results were same.
Do you have any idea what could be wrong?
Don't rely on the File Open dialog in Notepad to tell you whether a text file is encoded as "ANSI" or UTF-8. That is just Notepad's "guess" based on whether the file begins with the bytes EF BB BF, which is the UTF-8 Byte Order Mark (BOM).
Many (most?) Windows applications will include the UTF-8 BOM at the beginning of a text file that is UTF-8 encoded. Some Unicode purists insist, often quite vigorously, that the BOM is not required for UTF-8 files and should be excluded, but that is the way Windows applications tend to behave.
Unfortunately, Access does not always follow that pattern when it exports files to text. A UTF-8 text file exported from Access may omit the BOM and that can confuse applications like Notepad if they assume that a UTF-8 encoded file will always include the BOM as the first three bytes of the file.
For a more reliable way of determining the encoding of a text file consider using an application like Notepad++ to open the file. It will differentiate between the UTF-8 files with a BOM (which it designates as "UTF-8") and UTF-8 files without a BOM (which it designates as "ANSI as UTF-8")
To illustrate, consider the following Access table
When exported to text (CSV) with UTF-8 encoding,
the File Open dialog in Notepad reports that it is encoded as "ANSI"
but a hex editor shows that it is in fact encoded as UTF-8 (the character é is encoded as C3 A9, not simply E9 as would be the case for true "ANSI" encoding)
and Notepad++ recognizes it as "ANSI as UTF-8"
in other words, a UTF-8 encoded file without a BOM.
Related
I am reading in data from a .xlsx file, apparently encoded in ANSI(?). Labview can take the data just fine and when creating a text file based on the data, when opened/viewed with encoding ANSI (notepad++ or just notepad) it looks fine. The problem being that Notepad++ is defaulted to UTF-8 so not many people know to change the encoding to "ANSI" and the ° symbol does not translate well.
I use the Report Generation Toolkit Excel Get Data VI to get the data from excel and return it as a 2D string array in LabVIEW.
I am making the assumption that it's encoded in ANSI because when I open the text file (the .xml that I insert the excel data into) in Notepad++
I get 2 characters for what was supposed to be my degree symbol °, and when I change the encoding from UTF-8 to ANSI then the data is as how I read it. Also when I open the .xml file in Notepad, the degree symbol shows normally.
I am getting the following error:
'ascii' codec can't decode byte 0xf4 in position 560: ordinal not in range(128)
I find this very weird given that my .csv file doesn't have special characters. Perhaps it has special characters that specify header rows and what not, idk.
But the main problem is that I don't actually have access to the source code that reads in the file, so I cannot simply add the keyword argument encoding='UTF-8'. I need to figure out which encoding is compatible with codecs.ascii_decode(...). I DO have access to the .csv file that I'm trying to read, and I can adjust the encoding to that, but not the source file that reads it.
I have already tried exporting my .csv file into Western (ASCII) and Unicode (UTF-8) formats, but neither of those worked.
Fixed. Had nothing to do with unicode shenanigans, my script was writing a parquet file when my Cloud Formation Template was expecting a csv file. Thanks for the help.
I have an issue when trying to read a string from a .CSV file. When I execute the application and the text is shown in a textbox, certain characters such as "é" or "ó" are shown as a question mark symbol.
The idea is that this code reads the whole CSV file and then splits each line into variables depending on the first word of the line.
The code I'm using to read is:
Dim test() As String
test = IO.File.ReadAllLines("Libro1.csv")
Dim test_chart As String = Array.Find(vls1load, Function(x) (x.StartsWith("sample")))
Dim test_chart_div() As String = test_chart.Split(";")
variable1 = test_chart_div(1)
variable2 = test_chart_div(2)
...etc
I have also tried with:
Dim test() As String
test = IO.File.ReadAllLines("Libro1.csv", System.Text.Encoding.UTF8)
But none of them works. The .csv file is supposed to be UTF8. The "web options" that you can see when saving the file in excel show encoding UTF8. I also tried the trick of changing the file extension to HTML and opening it with the browser to see that the encoding is also correct.
Can someone advice anything else I can try?
Thanks in advance.
When an Excel file is exported using the CSV Comma Separated output format, the Encoding selected in Tools -> Web Option -> Encoding of Excel's Save As... dialog doesn't actually generate the expected result:
the Text file is saved using the Encoding relative to the current Language selected in the Excel Application, not the Unicode (UTF16-LE) or UTF-8 Encoding selected (which is ignored) nor the default Encoding determined by the current System Language.
To import the CSV file, you can use the Encoding.GetEncoding() method to specify the Name or CodePage of the Encoding used in the machine that generated the file: again, not the Encoding related to System Language, but the Encoding of the Language that the Excel Application is currently using.
CodePage 1252 (Windows-1252) and ISO-8859-1 are commonly used in Latin1 zone.
Based the symbols you're referring to, this is most probably the original encoding used.
In Windows, use the former. ISO-8859-1 is still used, mostly in old Web Pages (or Web Pages created without care for the Encoding used).
As a note, CodePage 1252 and ISO-8859-1 are not exactly the same Encoding, there are subtle differences.
If you find documentation that states the opposite, the documentation is wrong.
I am using streamReader to read a text file and then saving content of file into another text file using streamwriter. File which I am trying to read can be in ANSI or UTF-8 format.
I am facing problem with non english characters. When input file contains Chinese or Japanese language every thing works fine.
But if input file contains characters like ã then output text file show question mark type symbol.
I tried to fix this by using encoding iso-8859-1 for streamreader but now Chinese and Japanese language coming like this ¢ãƒãƒ¼ãƒ»ãƒ¦ãƒ¼ã
I have a CSV files with copy that contains apostrophes and when I import it into the database using MAMP, it turns all the apostrophes into question marks. Is there a fix for this?
Thank you in advance!
This is the result of saving your files in non UTF-8 format such as ANSI. When you open a file with text editors that don't support UTF-8 and save it again, this will happen.
Download a text editor such as Notepad++ and check and convert all your (plugin or theme) files to UTF-8 or UTF-8 Without BOM
http://wordpress.org/support/topic/how-fix-black-diamond-question-marks-in-wp-321
I had the same problem even when saving my CSV file as UTF-8. So in Microsoft Word, I edited the text and replaced the actual apostrophes and quotation marks with HTML code (' and ") and then copy and pasted it into my CSV file. This seemed to work for me. I used this website for the html code: http://www.ascii.cl/htmlcodes.htm
For special characters that aren't importing properly you could try TSQL Bulk Insert and include either:
CODEPAGE = 'ACP' --For ANSI files
CODEPAGE = '65001' --For UTF-8 files