the catch is that the xlsx files have some Korean text which on converting to csv is changing to "??"
First convert the contents of xlsx file from Korean to English as shown in below link:
https://www.microsoft.com/en-us/translator/excel.aspx
Then proceed to convert xlsx to csv.
You might consider simply coding a loop saving as Unicode Text (*.txt) file format, and then changing the file extension to .csv
UTF-16 is useful if your Excel data contains any Asian characters e.g. Korean.
Note:
It is not fully compatible with ASCII files and requires some Unicode-aware programs to display this so be careful if exporting outside of Excel.
Information on options are discussed here
To continue quoting from there:
How to convert an Excel file to CSV UTF-16 Exporting an Excel file as
CSV UTF-16 is much quicker and easier than converting to UTF-8. This
is because Excel automatically employs the UTF-16 format when saving a
file as Unicode (.txt).
So, what you do is simply click File > Save As in Excel, select the
Unicode Text (*.txt) file format, and then change the file extension
to .csv in Windows Explorer. Done!
If you need a comma-separated or semicolon-separated CSV file, replace
all tabs with commas or semicolons, respectively, in a Notepad or any
other text editor of your choosing (see Step 6 above for full
details).
Every text file has a character encoding for a character set. You have to pick one.
If you pick one that doesn't support all the characters in the file, what would you like to happen? Replacing with ? is a commonly used option.
Picking UTF-8 for Unicode is a good choice for an Excel workbook (and almost all documents) because it uses the Unicode character set (as does VBA, BTW).
In any case, for a text file you have to communicate which encoding you use; And, for a CSV text file, whether there is a header row, what the field separator is, what the text qualifier is (quoting), text qualifier escape, line separator line characters, and column types are. (All of these are questions the Excel's text import wizard asks. Your users need the answers.)
Related
We export information out of our CRM (via a tab delimited text file) and import the data into a PDF form. This process used to work but recently any French characters in the text file do not import into the PDF——it turns into a jumbled mess of characters.
Adobe has said it could be a bug but there's been no resolution yet.
I'm thinking it might be an issue of setting the files with the correct format/unicode. I can supply example files.
Step 1: we export the information as a .csv file (French characters intact)
Step 2: Save the .csv file as Tab Delimited Text (.txt) (French characters intact)
Step 3: In the PDF, import data selecting the .txt file. (French characters get jumbled)
I use Kantu Seleniun Ide to extract from webpages and save data in csv.
Kantu Seleniun Ide save all data extracted in csv with all lines merged, I need a classic format of csv with all lines separated.
I need a bat file to convert these csv with all merged lines in classic csv with separated lines for windows 7.
Csv saved by Kantu Seleniun Ide
Csv in classic format with separated lines
I would suggest that your lines are not being output merged at all but they're simply being viewed in the wrong program, notepad.exe.
The problem is most likely that your program outputs the CSV using Unix or Mac, LF or CR line endings instead of Windows CRLF.
If your file is using Unix LF, endings, (most likely), then you could fix that by using:
Find /V ""<"test.csv">"converted.csv"
Then open converted.csv in Windows notepad to verify the conversion.
When reading a csv file containing ID numbers, excel is reading strings as numbers. This also occurs when reading the same style of ID's in an excel vba array.
Under locals, the elements of the array are displayed as datatype "String", but the format is still a number.
I have tried changing the style to text as well as using CStr() on individual elements of an array. Is there a way to have excel read the ID's as a string instead of a number?
Thanks.
You need to bypass the automatic conversion when you open the .csv file.
Use the Import Wizard to open the file and tell the Wizard that the field is text.
To convert back this might suit:
=SUBSTITUTE(LEFT(A1,3),".","")&"E"&TEXT(RIGHT(A1,3)-1,"0000")
I am working on developing an expert system using CLIPS. For the case at hand I need to read data from a excel file. How do I do that? Or what solutions do you propose? Thank you in advance.
You can use the open command to open a file for reading in either text or binary mode. If you opened a xlsx file in binary mode, you could use the get-char function to retrieve individual characters from the file. There's no built-in functionality for parsing a xlsx file, so you'd have to add code to do the parsing and create appropriate CLIPS values from the data. If possible, it would be easier to save your excel file as tab-delimited text. If each cell is a valid CLIPS token, then you can use the read function to retrieve the cell values. If each cell is not a valid CLIPS token (for example, a cell representing a string that has spaces but lacks quotation marks at the beginning and end), then you need to use the readline function to grab an entire row of data and then use some of the string functions to locate the tabs and split the string into valid tokens.
I have a pdf file. Then i select and copy "K([2.2.2]crypt)]5[Co2Sn17".
But in clipboard there is "KACHTUNGTRENUNG([2.2.2]crypt)]5ACHTUNGTRENUNG[Co2Sn17".
Any ideas what is "ACHTUNGTRENUNG"? Is it a kind of protection?
There likely are a few extra (invisible) characters in the file. When you copy the text, the application you use to copy translates the characters in the PDF file into something that can be stored on the clipboard. Most likely that happens by translating every character into the unicode string stored in the PDF file for that character in the used font.
For most normal characters the Unicode string should be the same as the character you visually see; here you probably have invisible spaces in the PDF file that are called "achtungtrenung" in the font.
If you have the PDF file available somewhere, I'll be happy to take a look and verify this is indeed what is happening.
It's extra characters between lines.
You can try the PDF Copy Paste software, and see if your desired portion can be converted to text of your preferences.