Performance issues with select query for a large CSV file - vba

I'm trying to search for data in a very large CSV file (one million records) using ADO SELECT query and I have a few WHERE clauses in that query.
I cannot transfer this data to any DB (MySql or SQL Server or MS Access) because it is generated daily and I can't transfer it daily to the DB.
I do not have any row id in this .csv file. If there is a row id generated for every .csv by default then please let me know.
Here is the CSV file data sample (first field is date, second is time, third is a value):
CSV FILE SAMPLE DATA
====================
20130714,170056,1.30764
20130714,170122,1.30743
20130714,170132,1.30744
20130714,170205,1.30743
20130714,170214,1.30744
20130714,170216,1.30743
20130714,170244,1.30744
20130714,170325,1.30744
20130714,170325,1.30743
20130714,170325,1.30743
20130714,170325,1.30742
20130714,170504,1.30741
20130714,170519,1.30741
20130714,170519,1.30739
20130714,170522,1.30739
20130714,170522,1.30732
20130714,170522,1.30722
All the CSV records are in order by date and time.
I'm using ADO connection from Excel to CSV file with this source code:
strsql = "SELECT * FROM " & sItem & ".csv WHERE F3>=" & trigPrice & " AND (F1 in (SELECT distinct TOP " & trigWin & " f1 FROM " & sItem & ".csv WHERE (F1>=" & sDay & ")) AND f2>=" & sTime & ")"
Set rs = cn.Execute(strsql)
This one query takes about 10 minutes to execute. How do I reduce the execution time?

The reason that database queries can be fast is that the data is already indexed - that is, it will have quick lookups on some of the fields. When you run a "query" on raw CSV files, the ADO engine must first parse the text into records into a set of records, then search through them row by row to find records that match your search criteria. If you are planning on doing much more than a few queries on the data, you might as well import it into an indexed database table and avoid the duplication of parsing the CSV multiple times.
UPDATE
To import the CSV file from VBA, you can use the 'DoCmd.TransferText' function. For example, to import a CSV file to a table (with the correct layout) called "tblData" and from a comma seperated CSV with headers, you could do the following:
DoCmd.TransferText acImportDelim, , "tblData", "C:\Path\OF\THE.csv", True
This is the same method used by the Access Import Wizard.

Related

MS Access SQL - Update field in one table with a count from another table

I have a table called 'FilesUploaded' which has a summary of all files uploaded to my access DB. I want to add a field in here that contains the count of all errors from another table.
My FilesUploaded table contains a field called 'FileName' which has
the full name of the file.
I want to get a count of all records in table1 where the 'ValidityCheck' field contains 'Error'. Table1 also contains a field called 'Name_of_Report' which has the file name which will match back to the FilesUploaded table.
The 'vFileName' variable will contain what is in both the 'Filename' field and the 'Name_of_Report' field
The below is the code I have tried using, but it says this type of join is not allowed and I have no idea what other way I can achieve this.
Call RunSQL("UPDATE FilesUploaded " & _
"LEFT JOIN (SELECT table1.Name_of_Report, Sum(IIf([table1].[ValidityCheck] Like '*Error*',1,0)) AS ErrorCount FROM table1 GROUP BY table1.Name_of_Report) AS temp on temp.Name_of_Report = FilesUploaded.FileName " & _
"SET " & _
"FilesUploaded.[ErrorCount] = temp.ErrorCount " & _
"WHERE FilesUploaded.[FileName] = '" & vFileName & "' ")
Does anybody know a different way can update the FilesUploaded table with a count of the ValidityCheck field from the Table1 table?
In MS Access, UPDATE...JOIN requires its analogous SELECT...JOIN to be updateable. Aggregate queries using SUM are not updateable queries. Therefore, consider domain functions like DSum.
Additionally, consider a stored query and call it in VBA with parameterization via QueryDefs. Do note the use of ALIKE to use % for wildcards in case you need to run query outside of the MS Access GUI such as in ODBC or OLEDB connections where * is not recognized.
SQL (save as a stored query)
PARAMETERS paramFileName TEXT;
UPDATE FilesUploaded f
SET f.[ErrorCount] = DSUM("*", "table1",
"[ValidityCheck] ALIKE '%Error%' AND [Name_of_Report]='" & f.[FileName] & "'")
WHERE f.[FileName] = [paramFileName];
VBA (run query without string concatenation)
Dim qdef As QueryDef
Set qdef = CurrentDb.QueryDefs("mySavedQuery")
qdef![paramFileName] = vFileName ' BIND PARAM VALUE
qdef.Execute ' RUN ACTION QUERY
Set qdef = Nothing

ms access vba unique table naming method datestamp

I want to make the created table name unique, possibly by using hh:mm:ss in the table name so that if the macro is played time after time, it won't be telling me "table name already exists".
There are two parts to the query. One for creating the table, and one for refreshing access data objects so that the new table becomes visible.
Sub SelectIntoX()
Dim dbs As Database
Set dbs = CurrentDb
' Part 1 Select all records in the scheme table
' and copy them into a new table
dbs.Execute "SELECT * INTO " _
& Format(Date, "yymmdd") & "_Scheme" & " FROM dbo_scheme;"
'Part 2 refresh Access data objects to see new table appear
DBEngine(0)(0).TableDefs.Refresh
DoCmd.SelectObject acTable, Format(Date, "yymmdd") & "_Scheme", True
End Sub
The problem I have is that yymmdd is not unique and I am running it a lot each day.
I have also tried this hhmmss, but it only adds on zeroes.
This should be a good alternative:
Format(Now(), "yyyymmddhhmmss")

VBA Provider Microsoft.ACE.OLDEDB.12.0 for CSV file and sum of values in column doesn't work

I have problem with sql query for some columns.
In VBA I use this code for opening database:
db.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" & "Data Source=" & Dirname(Name) & ";" & "Extended Properties=""text;HDR=YES;FMT=CSVDelimited;MaxScanRows=0"""
And in sql query I want to sum column mv according to group (AFS or HTM) with corresponding time - and it works.
But for column sold it doesn't work - probably this column looks like string column instead of value/double because in first aproximately 1000 rows nothing or zero is. I tried to solve it with different ways, e.g. MaxScanRows=0 or IMEX=1, or schema.ini but it doesn't work...
Example of csv file:
Thanks in advance for any advice

How to write a SQL Loader control file to load data into multiple tables

I have a data file with 3 different sets of information like this below:
###08016995BUILD 12/15/04POSITION
"AABPX ","76826309","M","L"," 1509.4340"
----More similar Records_-------------------------
###08016995BUILD 12/15/04SECDESC
"AABPX ","mf","AMERICAN AADVANTAGE BALA","NCED PLAN AHEAD "," "," "," 14.4500","121504"," 14.4500"
-----More similar Records-----------------------
###08016995BUILD 12/15/04CUSTOMER
"xxxxxxx","FINANCIAL SOLUTIONS ","ACCOUNT ","xxxxx ST ","xx xxxxx"," ","000-000-0000","xxx-xxx-xxxx","xxxxx","xx","xxxxx"," ","xx"," "," "," "
"," 14.4500","121504"," 14.4500"
-----More similar Records-----------------------
end of data file
Now I want to write a control file that would push the first set of data to a position table, second set of data to a sec desc table, and 3rd set of data to a customer table. HOw can I do this?
http://www.orafaq.com/wiki/SQL*Loader_FAQ#Can_one_load_data_from_multiple_files.2F_into_multiple_tables_at_once.3F

Generate insert SQL statements from a CSV file

I need to import a csv file into Firebird and I've spent a couple of hours trying out some tools and none fit my needs.
The main problem is that all the tools I've been trying like EMS Data Import and Firebird Data Wizard expect that my CSV file contains all the information needed by my Table.
I need to write some custom SQL in the insert statement, for example, I have a CSV file with the city name, but as my database already has all the cities in another table (normalized), I need to write a subselect in the insert statement to lookup for the city and write its ID, also I have a stored procedure to cread GUIDS.
My insert statement would be something like this:
INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES((SELECT NEW_GUID FROM CREATE_GUID), :NAME, (SELECT CITY_ID FROM CITY WHERE NAME = :CITY_NAME)
How can I approach this?
It's a bit crude - but for one off jobs, I sometimes use Excel.
If you import the CSV file into Excel, you can create a formula which creates an INSERT statement by using string concatenation in the formula. So - if your CSV file has 3 columns that appear in columns A, B, and C in Excel, you could write a formula like...
="INSERT INTO MyTable (Col1, Col2, Col3) VALUES (" & A1 & ", " & B1 & ", " & C1 & ")"
Then you can replicate the formula down all of your rows, and copy, and paste the answer into a text file to run against your database.
Like I say - it's crude - but it can be quite a 'quick and dirty' way of getting a job done!
Well, if it's a CSV, and it this is a one time process, open up the file in Excel, and then write formulas to populate your data in any way you desire, and then write a simple Concat formula to construct your SQL, and then copy that formula for every row. You will get a large number of SQL statements which you can execute anywhere you want.
Fabio,
I've done what Vaibhav has done many times, and it's a good "quick and dirty" way to get data into a database.
If you need to do this a few times, or on some type of schedule, then a more reliable way is to load the CSV data "as-is" into a work table (i.e customer_dataload) and then use standard SQL statements to populate the missing fields.
(I don't know Firebird syntax - but something like...)
UPDATE person
SET id = (SELECT newguid() FROM createguid)
UPDATE person
SET cityid = (SELECT cityid FROM cities WHERE person.cityname = cities.cityname)
etc.
Usually, it's much faster (and more reliable) to get the data INTO the database and then fix the data than to try to fix the data during the upload. You also get the benefit of transactions to allow you to ROLLBACK if it does not work!!
I'd do this with awk.
For example, if you had this information in a CSV file:
Bob,New York
Jane,San Francisco
Steven,Boston
Marie,Los Angeles
The following command will give you what you want, run in the same directory as your CSV file (named name-city.csv in this example).
$ awk -F, '{ print "INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES ((SELECT NEW_GUID FROM CREATE_GUID), '\''"$1"'\'', (SELECT CITY_ID FROM CITY WHERE NAME = '\''"$2"'\''))" }' name-city.csv
Type awk --help for more information.
Two online tools which helped me in 2020:
https://numidian.io/convert/csv/to/sql
https://www.convertcsv.com/csv-to-sql.htm
The second one is based on JS and does not upload your data (at least not at the time I am writing this)
You could import the CSV file into a database table as is, then run an SQL query that does all the required transformations on the imported table and inserts the result into the target table.
Assuming the CSV file is imported into temp_table with columns n, city_name:
insert into target_table
select t.n, c.city_id as city
from temp_table t, cities c
where t.city_name = c.city_name
Nice tip about using Excel, but I also suggest getting comfortable with a scripting language like Python, because for some tasks it's easier to just write a quick python script to do the job than trying to find the function you need in Excel or a pre-made tool that does the job.
You can use the free csvsql to do this.
Install it using these instructions
Now run a command like so to import your data into your database. More details at the links above, but it'd be something like:
csvsql --db firebase:///d=mydb --insert mydata.csv
The following works with sqlite, and is what I use to convert data into an easy to query format
csvsql --db sqlite:///dump.db --insert mydata.csv
use the csv-file as an external table. Then you can use SQL to copy the data from the external table to your destination table - with all the possibilities of SQL.
See http://www.firebirdsql.org/index.php?op=useful&id=netzka
Just finished this VBA script which might be handy for this purpose. All should need to do is change the Insert statement to include the table in question and the list of columns (obviously in the same sequence they appear on the Excel file).
Function CreateInsertStatement()
'Output file location and start of the insert statement
SQLScript = "C:\Inserts.sql"
cStart = "Insert Into Holidays (HOLIDAY_ID, NAT_HOLDAY_DESC, NAT_HOLDAY_DTE) Values ("
'Open file for output
Open SQLScript For Output As #1
Dim LoopThruRows As Boolean
Dim LoopThruCols As Boolean
nCommit = 1 'Commit Count
nCommitCount = 100 'The number of rows after which a commit is performed
LoopThruRows = True
nRow = 1 'Current row
While LoopThruRows
nRow = nRow + 1 'Start at second row - presuming there are headers
nCol = 1 'Reset the columns
If Cells(nRow, nCol).Value = Empty Then
Print #1, "Commit;"
LoopThruRows = False
Else
If nCommit = nCommitCount Then
Print #1, "Commit;"
nCommit = 1
Else
nCommit = nCommit + 1
End If
cLine = cStart
LoopThruCols = True
While LoopThruCols
If Cells(nRow, nCol).Value = Empty Then
cLine = cLine & ");" 'Close the SQL statement
Print #1, cLine 'Write the line
LoopThruCols = False 'Exit the cols loop
Else
If nCol > 1 Then 'add a preceeding comma for all bar the first column
cLine = cLine & ", "
End If
If Right(Left(Cells(nRow, nCol).Value, 3), 1) = "/" Then 'Format for dates
cLine = cLine & "TO_DATE('" & Cells(nRow, nCol).Value & "', 'dd/mm/yyyy')"
ElseIf IsNumeric(Left(Cells(nRow, nCol).Value, 1)) Then 'Format for numbers
cLine = cLine & Cells(nRow, nCol).Value
Else 'Format for text, including apostrophes
cLine = cLine & "'" & Replace(Cells(nRow, nCol).Value, "'", "''") & "'"
End If
nCol = nCol + 1
End If
Wend
End If
Wend
Close #1
End Function
option 1:
1- have you tried IBExert? IBExpert \ Tools \ Import Data (Trial or Customer Version).
option 2:
2- upload your csv file to a temporary table with F_BLOBLOAD.
3- create a stored procedure, which used 3 functions (f_stringlength, f_strcopy, f_MID)
you cross all your string, pulling your fields to build your INSERT INTO.
links:
2: http://freeadhocudf.org/documentation_english/dok_eng_file.html
3: http://freeadhocudf.org/documentation_english/dok_eng_string.html
A tool I recently tried that worked outstandingly well is FSQL.
You write an IMPORT command, paste it into FSQL and it imports the CSV file into the Firebird table.
you can use shell
sed "s/,/','/g" file.csv > tmp
sed "s/$/'),(/g" tmp > tmp2
sed "s/^./'&/g" tmp2 > insert.sql
and then add
INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES(
...
);