LABVIEW ° symbol (potentially encoded in ANSI) to UTF-8 ° symbol - labview

I am reading in data from a .xlsx file, apparently encoded in ANSI(?). Labview can take the data just fine and when creating a text file based on the data, when opened/viewed with encoding ANSI (notepad++ or just notepad) it looks fine. The problem being that Notepad++ is defaulted to UTF-8 so not many people know to change the encoding to "ANSI" and the ° symbol does not translate well.
I use the Report Generation Toolkit Excel Get Data VI to get the data from excel and return it as a 2D string array in LabVIEW.
I am making the assumption that it's encoded in ANSI because when I open the text file (the .xml that I insert the excel data into) in Notepad++
I get 2 characters for what was supposed to be my degree symbol °, and when I change the encoding from UTF-8 to ANSI then the data is as how I read it. Also when I open the .xml file in Notepad, the degree symbol shows normally.

Related

Fix Unicode Decode Error Without Specifying Encoding='UTF-8'

I am getting the following error:
'ascii' codec can't decode byte 0xf4 in position 560: ordinal not in range(128)
I find this very weird given that my .csv file doesn't have special characters. Perhaps it has special characters that specify header rows and what not, idk.
But the main problem is that I don't actually have access to the source code that reads in the file, so I cannot simply add the keyword argument encoding='UTF-8'. I need to figure out which encoding is compatible with codecs.ascii_decode(...). I DO have access to the .csv file that I'm trying to read, and I can adjust the encoding to that, but not the source file that reads it.
I have already tried exporting my .csv file into Western (ASCII) and Unicode (UTF-8) formats, but neither of those worked.
Fixed. Had nothing to do with unicode shenanigans, my script was writing a parquet file when my Cloud Formation Template was expecting a csv file. Thanks for the help.

Characters not displayed correctly when reading CSV file

I have an issue when trying to read a string from a .CSV file. When I execute the application and the text is shown in a textbox, certain characters such as "é" or "ó" are shown as a question mark symbol.
The idea is that this code reads the whole CSV file and then splits each line into variables depending on the first word of the line.
The code I'm using to read is:
Dim test() As String
test = IO.File.ReadAllLines("Libro1.csv")
Dim test_chart As String = Array.Find(vls1load, Function(x) (x.StartsWith("sample")))
Dim test_chart_div() As String = test_chart.Split(";")
variable1 = test_chart_div(1)
variable2 = test_chart_div(2)
...etc
I have also tried with:
Dim test() As String
test = IO.File.ReadAllLines("Libro1.csv", System.Text.Encoding.UTF8)
But none of them works. The .csv file is supposed to be UTF8. The "web options" that you can see when saving the file in excel show encoding UTF8. I also tried the trick of changing the file extension to HTML and opening it with the browser to see that the encoding is also correct.
Can someone advice anything else I can try?
Thanks in advance.
When an Excel file is exported using the CSV Comma Separated output format, the Encoding selected in Tools -> Web Option -> Encoding of Excel's Save As... dialog doesn't actually generate the expected result:
the Text file is saved using the Encoding relative to the current Language selected in the Excel Application, not the Unicode (UTF16-LE) or UTF-8 Encoding selected (which is ignored) nor the default Encoding determined by the current System Language.
To import the CSV file, you can use the Encoding.GetEncoding() method to specify the Name or CodePage of the Encoding used in the machine that generated the file: again, not the Encoding related to System Language, but the Encoding of the Language that the Excel Application is currently using.
CodePage 1252 (Windows-1252) and ISO-8859-1 are commonly used in Latin1 zone.
Based the symbols you're referring to, this is most probably the original encoding used.
In Windows, use the former. ISO-8859-1 is still used, mostly in old Web Pages (or Web Pages created without care for the Encoding used).
As a note, CodePage 1252 and ISO-8859-1 are not exactly the same Encoding, there are subtle differences.
If you find documentation that states the opposite, the documentation is wrong.

text file encoding detect has issue

I am using streamReader to read a text file and then saving content of file into another text file using streamwriter. File which I am trying to read can be in ANSI or UTF-8 format.
I am facing problem with non english characters. When input file contains Chinese or Japanese language every thing works fine.
But if input file contains characters like ã then output text file show question mark type symbol.
I tried to fix this by using encoding iso-8859-1 for streamreader but now Chinese and Japanese language coming like this ¢ãƒ­ãƒ¼ãƒ»ãƒ¦ãƒ¼ã

SINGLE_NCLOB requires a UNICODE (widechar) input file

I have followed this process:
Open notepad and enter some text: "Hello World"
Save the ansi file as: c:\HelloWorld.txt
I then run the following query:
select * from openrowset(bulk 'C:\HelloWorld.txt',single_clob) as test
The text appears in a column called: BulkColumn.
I then do this:
Open notepad and enter some text: "Hello World"
Save the unicode file as: c:\HelloWorld.txt
I then run the following query:
select * from openrowset(bulk N'C:\HelloWorld.txt',single_nclob) as test
The error I get is:
SINGLE_NCLOB requires a UNICODE (widechar) input file. The file specified is not Unicode.
Why is this?
You need to double-check how you saved the "Unicode" file. In Windows / .NET / SQL Server, the term "Unicode" refers specifically to "UTF-16 Little Endian (LE)". When dealing with UTF-16 Big Endian (BE), it will be referred to as "Unicode Big Endian" or "Big Endian Unicode". UTF-8 is always UTF-8.
I created a file in Notepad and went to "Save As" and selected "Unicode" from the "Encoding" drop-down and it worked just fine with the statement you are using:
SELECT *
FROM OPENROWSET(BULK N'C:\temp\OPENROWSET_BULK_NCLOB-test.txt', SINGLE_NCLOB) AS [Test];
If I re-saved it with any other encoding, I got the error message you are seeing.
I also used Notepad++ and in the "Encoding" menu selected "Encode in UCS-2 Little Endian". UCS-2 and UTF-16 are identical for Code Points U+0000 through U+FFFF and there is no UTF-16 option in Notepad++ so this was the closest thing. And yep, it also worked.
So somehow you did not actually save the file as "Unicode". If you selected "Unicode big endian" in Notepad, that is not "Unicode" in terms of how Windows is using that term, even if it is a valid Unicode encoding.

exporting text file with utf-8 encoding in ms access

I am exporting text files from 2 queries in ms access 2010. Queries are from different linked ODBC tables (but tables are different only by data, structure and data types are same). I set up export specification to export text file in utf-8 encoding for both files. Now here come the trouble part. When I export the queries and open them in notepad, one query is in utf-8 and second one is in ANSI. I don't know how is this possible when both queires has the same export specification and it is driving me crazy.
This is my VBA code to export queries:
DoCmd.TransferText acExportDelim, "miniflow", "qry01_CZ_test", "C:\TEST_CZ.txt", no
DoCmd.TransferText acExportDelim, "miniflow", "qry01_SK_test", "C:\TEST_SK.txt", no
I also tried to modify it by adding 65001 as coding argument by the results were same.
Do you have any idea what could be wrong?
Don't rely on the File Open dialog in Notepad to tell you whether a text file is encoded as "ANSI" or UTF-8. That is just Notepad's "guess" based on whether the file begins with the bytes EF BB BF, which is the UTF-8 Byte Order Mark (BOM).
Many (most?) Windows applications will include the UTF-8 BOM at the beginning of a text file that is UTF-8 encoded. Some Unicode purists insist, often quite vigorously, that the BOM is not required for UTF-8 files and should be excluded, but that is the way Windows applications tend to behave.
Unfortunately, Access does not always follow that pattern when it exports files to text. A UTF-8 text file exported from Access may omit the BOM and that can confuse applications like Notepad if they assume that a UTF-8 encoded file will always include the BOM as the first three bytes of the file.
For a more reliable way of determining the encoding of a text file consider using an application like Notepad++ to open the file. It will differentiate between the UTF-8 files with a BOM (which it designates as "UTF-8") and UTF-8 files without a BOM (which it designates as "ANSI as UTF-8")
To illustrate, consider the following Access table
When exported to text (CSV) with UTF-8 encoding,
the File Open dialog in Notepad reports that it is encoded as "ANSI"
but a hex editor shows that it is in fact encoded as UTF-8 (the character é is encoded as C3 A9, not simply E9 as would be the case for true "ANSI" encoding)
and Notepad++ recognizes it as "ANSI as UTF-8"
in other words, a UTF-8 encoded file without a BOM.