Excel VBA: Best way to work with data of CSV file - vba

I am looking for a direct and efficient method to read out csv-files and handily work with the data in Excel/VBA?
The best thing would be: direct access of data by specifying row and column. Can you tell me of your preferred option? Do you know an additional option to the following two?
A: Use Workbooks.Open or Workbooks.OpenText to open the csv-file as a workbook. Then work with the workbook (compare this thread).
B: Use Open strFilename For Input As #1 to write the data into a string. Work with the string (compare this thread).
Thanks a lot!
==========EDIT=========
Let me add what I have learned from your posts so far: The optimal option to do the task depends too much on what you want to do exactly, thus no answer possible. Also, there are the following additional options to read csv files:
C: Use VBScript-type language with ADO (SQL-type statements). I still am figuring out how to create a minimal example that works.
D: Use FileSystemObject, see e.g. this thread

The fastest and most efficient way to add CSV data to excel is to use Excel's text import wizard.
This parses CSV file, giving you several options to format and organize the data.
Typically, when programming one's own CSV parser, one will ignore the odd syntax cases, causing rework of the parsing code. Using the excel wizard covers this and gives you some other bonuses (like formatting options).
To load csv, (in Excel 2007/2010) from the "data" tab, pick "From Text" to start the "Import Text Wizard". Note the default delimiter is tab, so you'll need to change it to comma (or whatever character) in step 2.

Related

How to use BeanShell method in LibreOffice Writer's table formula as function?

I'm struggling for some time with this problem and I am not able to find any solution on-line. Closest I get is how to write Basic function that can be used in Calc's formula, but since Basic and BeanShell are completely different languages, I can't find the right syntax/procedure to achieve the same functionality in the latter language.
In Writer one can have a table (not the spreadsheet—just ordinary table), where you can press F2 over cell and enter some formula, e.g. =<C2>*<E2> to calculate product of values in C2 and E2 cells.
I wrote BeanShell method String amountInWords(String amount, String currency) which converts passed amount (e.g. 1,234.59) and currency (e.g. "USD") into words (one thousand two hundred thirty four dollars, fifty nine cents). Now I would like to hit F2 over some cell and type formula like =amountInWords(<Table2.D3>, "USD") and to see above mentioned output as the cell content. Unfortunately I get ** Expression is faulty ** message.
Can someone, please, advice me, how to make use of this method in the described manner or alternatively confirm, that this is impossible? Thank you very much in advance!
Writer table formulas are much more limited than spreadsheet formulas. There is a list of supported functions at http://libreoffice-dev.blogspot.com/2019/11/calculations-inside-of-writer-tables.html. AFAIK calling macros is not supported.
Probably what you want instead of a table is to embed a Calc spreadsheet into the Writer document at that location by going to Insert -> Object -> OLE Object. Then rewrite amountInWords as a Basic user-defined function (UDF).
If you must use BeanShell then write a Basic UDF that loads and calls the BeanShell method. Or you could create a BeanShell Calc add-in although that is more difficult, requiring XML configuration files. For add-ins, it may be easier to write in Java rather than BeanShell as there are more examples and documentation available.
Or, instead of embedding a spreadsheet, create a search-and-replace Writer macro in BeanShell that performs any needed calculations in the table. Set the macro to run on an event such as when the document is opened.

matching two columns in excel with slight difference in the spelling

I am working on huge excel sheets from different sources about the same thing. The way the sources report it and write down information is different. So, for example, one would write the location as "Khurais" whereas the other would write it as "Khorais".
Since both of these files are contain important information, I would like to combine them in one excel sheet so that I can deal with them more easily. So if you have any suggestion or tool that you think would be beneficial, please share it here.
P.s. The words in the excel sheet are translations of Arabic words.
You could use Levenshtein distance to determine if two words are "close" to each other. Based on that you could match.
You could use FuzzyLookup, a macro that allows you to do appropriate matching. It worked really well for me in the past and is actually really well documented.
You can find it here: https://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html including examples on how to use it.
Hope that helps!
PS obviously you can also use it stricly within VBA (not using worksheet functions)
The Double Metaphone algorithm springs to mind. It attempts to convert strings into phonetic representations. For example, "Folly" and "Pholee" should have the same phonetic code.
If you could generate these codes, you could then match your records based on them, instead of the strings.
Here's an article that explains, along with sample VBA code:
https://bytes.com/topic/access/insights/965241-fuzzy-string-matching-double-metaphone-algorithm
Hope that inspires you :)

String Algorithm Comparison VB.Net

I would like to ask some suggestions cause I've been doing this for a week.
It's basically a data cleanup program.
I have this excel file which contains thousands of company name and I have this database which contains the correct names of companies.
What I want is to read the excel file which I already did and compare each of the company in the excel file with the values I have on database. For example
Data in Excel
Hewlett-Packard, Costa Rica
Hewlett-Packard (HP)
Hewlett-Packard Singapore (Private) Limited
Data in Database
Hewlett-Packard
It will auto detect that those 3 value in excel is Hewlett-Packard because the excel is in free type form. I want to correct everything that is inputted in it and find the similar value in my database. Like if the Hewlett-Packard is spelled wrong it will automatically tell that its Hewlett-Packard. Any idea?
It's like an autocomplete but with thinking. Autocomplete but decides the correct value
I'm doing it in VB.Net btw. I'm researching about fuzzy search algorithim and levenstein and stuff. But I still don't get it how can i use that
See my blog, Solving the right problem, which is somewhat similar. You're probably better off doing a simple match and outputting any failures to a text file, which you manually edit. It's drudgery, but it'll get the job done. When you start talking about Levenstein distance and fuzzy search, you're turning a simple, if dull, task into a research project.
If your database contains only "thousands" (rather than millions) of names, then one thing you can do is load all the names into a list, and sort them. Then sort the names in the Excel file. Then go through the two lists (a standard merge-type algorithm). For example, you might have in your database:
Hasbro
Hewlett Packard
Home Depot
and in your Excel file:
Grainger
Halliburton
Hewlet Packard, Costa Rica
Hewlett Packard (HP)
Humana
Using the merge algorithm, you'd be comparing "Hewlet Packard, Costa Rica" against "Hewlett Packard", and you might even output that as the suggested replacement. That would probably constitute the majority of your errors.
In any case, I strongly recommend using the computer to identify the mismatches, and then manually resolve them. That's usually the fastest way to solve this type of problem.

How to read number of rows and columns in a CSV file using VBA

I have more than 100 CSV/text files (vary in size between 1MB to 1GB). I Just need to create a excel sheet for each csv file, presenting:
name of columns
types of column i.e. numeric or string
number of records in each column
min & max values & length of each column
so the output on a sheet would be something like this (I can not paste table image here as I am new on this site, so please consider below dummy table as excel sheet):
A B C D E F G
1 Column_name Type #records min_value max_value min_length max_length
2 Name string 123456 Alis Zomby 4 30
3 Age numeric 123456 10 80 2 2
Is is possible to create any vba code for this? I am at very beginner stage so if any expert can help me out on code side, would be really helpful.
thanks!!!
You could try writing complex VBA file- and string-handling code for this; my advice is: don't.
A better approach is to ask: "What other tools can read a csv file?"
This is tabulated data, and the files are quite large. Larger, really, than you should be reading using a spreadsheet: it's database work, and your best toolkit will be SQL queries with MIN() MAX() and COUNT() functions to aggregate the data.
Microsoft Access has a good set of 'external data' tools that will read fixed-width files, and if you use 'linked data' rather than 'import table' you'll be able to read the files using SQL queries without importing all those gigabytes into an Access .mdb or .accdb file.
Outside MS-Access, you're looking at intermediate-to-advanced VBA using the ADODB database objects (Microsoft Active-X Data Objects) and a schema.ini file.
Your link for text file schema.ini files is here:
http://msdn.microsoft.com/en-us/library/ms709353%28v=vs.85%29.aspx
...And you'll then be left with the work of creating an ADODB database 'connection' object that sees text files in a folder as 'tables', and writing code to scan the file names and build the SQL queries. All fairly straightforward for an experienced developer who's used the ADO text file driver.
I can't offer anything more concrete than these general hints - and nothing like a code sample - because this is quite a complex task, and it's not really an Excel-VBA task; it's a programming task best undertaken with database tools, except for the very last step of displaying your results in a spreadsheet.
This is not a task I'd give a beginner as a teaching exercise, it demands so many unfamiliar concepts and techniques that they'd get nowhere until it was broken down into a structured series of separate tutorials.

Importing/Pasting Excel data with changing fields into SQL table

I have a table called Animals. I pull data from this table to populate another system.
I get Excel data with lists of animals that need to go in the Animals table.
The Excel data will also have other identifiers, like Breed, Color, Age, Favorite Toy, Veterinarian, etc.
These identifiers will change with each new excel file. Some may repeat, others are brand new.
Because the fields change, and I never know what new fields will come with each new excel file, my Animals table only has Animal Id and Animal Name.
I've created a Values table to hold all the other identifier fields. That table is structured like this:
AnimalId
Value
FieldId
DataFileId
And then I have a Fields table that holds the key to each FieldId in the Values table.
I do this because the alternative is to keep a big table with fields that may not even be used each time I need to add data. A big table with a lot of null columns.
I'm not sure my way is a good way either. It can seem overly complex.
But, assuming it is a good way, what is the best way to get this excel data into my Values table? The list of animals is easy to add to my Animals table. But for each identifier (Breed, Color, etc.) I have to copy or import the values and then update the table to assign a matching FieldId (or create a new FieldId in the Fields table if it doesn't exist yet).
It's a huge pain to load new data if there are a lot of identifiers. I'm really struggling and could use a better system.
Any advice, help, or just pointing me in a better direction would be really appreciated.
Thanks.
Depending on your client (eg, I use SequelPro on a Mac), you might be able to import CSVs. This is generally pretty shaky, but you can also export your Excel document as a CSV... how convenient.
However, this doesn't really help with your database structure. Granted, using foreign keys is a good idea, but importing that data unobtrusively (and easily) is something that will need to likely be done a row at a time.
However, you could try modifying something like this to suit your needs, by first exporting your Excel document as a CSV, removing the header row (the first one), and then using regular expressions on it to change it into a big chunk of SQL. For example:
Your CSV:
myval1.1,myval1.2,myval1.3,myval1.4
myval2.1,myval2.2,myval2.3,myval2.4
...
At which point, you could do something like:
myCsvText.replace(/^(.+),(.+),(.+)$/mg, 'INSERT INTO table_name(col1, col2, col3) VALUES($1, $2, $3)')
where you know the number of columns, their names, and how their values are organized (via the regular expression & replacement).
Might be a good place to start.
Your table looks OK. Since you have a variable number of fields, it seems logical to expand vertically. Although you might want to make it easier on yourself by changing DataFileID and FieldID into FieldName and DataFileName, unless you will use them in a lot of other tables too.
Getting data from Excel into SQL Server is unfortunately not so easy as you would expect from two Microsoft products interacting with eachother. There are several routes that I know of that you can take:
Work with CSV files instead of Excel files. Excel can edit CSV files just as easily as Excel files, but CSV is an infinitely more reliable datasource when it comes to importing. You don't get problems with different file formats for different Excel versions, Excel having to be installed on the computer that will run the script or quirks with automatic datatype recognition. A CSV can be read with the BCP commandline tool, the BULK INSERT command or with SSIS. Then use stored procedures to convert the data from a horizontal bulk of columns into a pure vertical format.
Use SSIS to read the data directly from the Excel file(s). It is possible to make a package that loops over several Excel files. A drawback is that the column format and the sheet name of the Excel file has to be known beforehand, and so a different template (with a separate loop) has to be made each time a new Excel format arrives. There exist third-party SSIS components that claim to be more flexible, but I haven't tested them out yet.
Write a Visual C# program or PowerShell script that grabs the Excel file, extracts the data and outputs into your SQL table. Visual C# is a pretty easy language with powerful interfaces into Office and SQL Server. I don't know how big the learning curve is to get started, but once you do, it will be a pretty easy program to write. I have also heard good things about Powershell.
Create an Excel Macro that uses VB code to open other Excel files, loop through their data and write the results either in a predefined sheet or as CSV to disk. Once everything is in a standard format it will be easy to import the data using one of the above methods.
Since I have had headaches with 1) and 2) before, I would advise on either 3) or 4). Because of my greater experience with VBA than Visual C# or Powershell, I´d go for 4) if I was in a hurry. But I think 3) is the better investment for the long term.
(You could also go adventurous and use another script language, such as Python as I once did because Python is cool, unfortunately Python offers pretty slow and limited interfaces to SQL server and Excel)
Good luck!