Generate insert SQL statements from a CSV file - sql

I need to import a csv file into Firebird and I've spent a couple of hours trying out some tools and none fit my needs.
The main problem is that all the tools I've been trying like EMS Data Import and Firebird Data Wizard expect that my CSV file contains all the information needed by my Table.
I need to write some custom SQL in the insert statement, for example, I have a CSV file with the city name, but as my database already has all the cities in another table (normalized), I need to write a subselect in the insert statement to lookup for the city and write its ID, also I have a stored procedure to cread GUIDS.
My insert statement would be something like this:
INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES((SELECT NEW_GUID FROM CREATE_GUID), :NAME, (SELECT CITY_ID FROM CITY WHERE NAME = :CITY_NAME)
How can I approach this?

It's a bit crude - but for one off jobs, I sometimes use Excel.
If you import the CSV file into Excel, you can create a formula which creates an INSERT statement by using string concatenation in the formula. So - if your CSV file has 3 columns that appear in columns A, B, and C in Excel, you could write a formula like...
="INSERT INTO MyTable (Col1, Col2, Col3) VALUES (" & A1 & ", " & B1 & ", " & C1 & ")"
Then you can replicate the formula down all of your rows, and copy, and paste the answer into a text file to run against your database.
Like I say - it's crude - but it can be quite a 'quick and dirty' way of getting a job done!

Well, if it's a CSV, and it this is a one time process, open up the file in Excel, and then write formulas to populate your data in any way you desire, and then write a simple Concat formula to construct your SQL, and then copy that formula for every row. You will get a large number of SQL statements which you can execute anywhere you want.

Fabio,
I've done what Vaibhav has done many times, and it's a good "quick and dirty" way to get data into a database.
If you need to do this a few times, or on some type of schedule, then a more reliable way is to load the CSV data "as-is" into a work table (i.e customer_dataload) and then use standard SQL statements to populate the missing fields.
(I don't know Firebird syntax - but something like...)
UPDATE person
SET id = (SELECT newguid() FROM createguid)
UPDATE person
SET cityid = (SELECT cityid FROM cities WHERE person.cityname = cities.cityname)
etc.
Usually, it's much faster (and more reliable) to get the data INTO the database and then fix the data than to try to fix the data during the upload. You also get the benefit of transactions to allow you to ROLLBACK if it does not work!!

I'd do this with awk.
For example, if you had this information in a CSV file:
Bob,New York
Jane,San Francisco
Steven,Boston
Marie,Los Angeles
The following command will give you what you want, run in the same directory as your CSV file (named name-city.csv in this example).
$ awk -F, '{ print "INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES ((SELECT NEW_GUID FROM CREATE_GUID), '\''"$1"'\'', (SELECT CITY_ID FROM CITY WHERE NAME = '\''"$2"'\''))" }' name-city.csv
Type awk --help for more information.

Two online tools which helped me in 2020:
https://numidian.io/convert/csv/to/sql
https://www.convertcsv.com/csv-to-sql.htm
The second one is based on JS and does not upload your data (at least not at the time I am writing this)

You could import the CSV file into a database table as is, then run an SQL query that does all the required transformations on the imported table and inserts the result into the target table.
Assuming the CSV file is imported into temp_table with columns n, city_name:
insert into target_table
select t.n, c.city_id as city
from temp_table t, cities c
where t.city_name = c.city_name
Nice tip about using Excel, but I also suggest getting comfortable with a scripting language like Python, because for some tasks it's easier to just write a quick python script to do the job than trying to find the function you need in Excel or a pre-made tool that does the job.

You can use the free csvsql to do this.
Install it using these instructions
Now run a command like so to import your data into your database. More details at the links above, but it'd be something like:
csvsql --db firebase:///d=mydb --insert mydata.csv
The following works with sqlite, and is what I use to convert data into an easy to query format
csvsql --db sqlite:///dump.db --insert mydata.csv

use the csv-file as an external table. Then you can use SQL to copy the data from the external table to your destination table - with all the possibilities of SQL.
See http://www.firebirdsql.org/index.php?op=useful&id=netzka

Just finished this VBA script which might be handy for this purpose. All should need to do is change the Insert statement to include the table in question and the list of columns (obviously in the same sequence they appear on the Excel file).
Function CreateInsertStatement()
'Output file location and start of the insert statement
SQLScript = "C:\Inserts.sql"
cStart = "Insert Into Holidays (HOLIDAY_ID, NAT_HOLDAY_DESC, NAT_HOLDAY_DTE) Values ("
'Open file for output
Open SQLScript For Output As #1
Dim LoopThruRows As Boolean
Dim LoopThruCols As Boolean
nCommit = 1 'Commit Count
nCommitCount = 100 'The number of rows after which a commit is performed
LoopThruRows = True
nRow = 1 'Current row
While LoopThruRows
nRow = nRow + 1 'Start at second row - presuming there are headers
nCol = 1 'Reset the columns
If Cells(nRow, nCol).Value = Empty Then
Print #1, "Commit;"
LoopThruRows = False
Else
If nCommit = nCommitCount Then
Print #1, "Commit;"
nCommit = 1
Else
nCommit = nCommit + 1
End If
cLine = cStart
LoopThruCols = True
While LoopThruCols
If Cells(nRow, nCol).Value = Empty Then
cLine = cLine & ");" 'Close the SQL statement
Print #1, cLine 'Write the line
LoopThruCols = False 'Exit the cols loop
Else
If nCol > 1 Then 'add a preceeding comma for all bar the first column
cLine = cLine & ", "
End If
If Right(Left(Cells(nRow, nCol).Value, 3), 1) = "/" Then 'Format for dates
cLine = cLine & "TO_DATE('" & Cells(nRow, nCol).Value & "', 'dd/mm/yyyy')"
ElseIf IsNumeric(Left(Cells(nRow, nCol).Value, 1)) Then 'Format for numbers
cLine = cLine & Cells(nRow, nCol).Value
Else 'Format for text, including apostrophes
cLine = cLine & "'" & Replace(Cells(nRow, nCol).Value, "'", "''") & "'"
End If
nCol = nCol + 1
End If
Wend
End If
Wend
Close #1
End Function

option 1:
1- have you tried IBExert? IBExpert \ Tools \ Import Data (Trial or Customer Version).
option 2:
2- upload your csv file to a temporary table with F_BLOBLOAD.
3- create a stored procedure, which used 3 functions (f_stringlength, f_strcopy, f_MID)
you cross all your string, pulling your fields to build your INSERT INTO.
links:
2: http://freeadhocudf.org/documentation_english/dok_eng_file.html
3: http://freeadhocudf.org/documentation_english/dok_eng_string.html

A tool I recently tried that worked outstandingly well is FSQL.
You write an IMPORT command, paste it into FSQL and it imports the CSV file into the Firebird table.

you can use shell
sed "s/,/','/g" file.csv > tmp
sed "s/$/'),(/g" tmp > tmp2
sed "s/^./'&/g" tmp2 > insert.sql
and then add
INSERT INTO PERSON (ID, NAME, CITY_ID) VALUES(
...
);

Related

Performance issues with select query for a large CSV file

I'm trying to search for data in a very large CSV file (one million records) using ADO SELECT query and I have a few WHERE clauses in that query.
I cannot transfer this data to any DB (MySql or SQL Server or MS Access) because it is generated daily and I can't transfer it daily to the DB.
I do not have any row id in this .csv file. If there is a row id generated for every .csv by default then please let me know.
Here is the CSV file data sample (first field is date, second is time, third is a value):
CSV FILE SAMPLE DATA
====================
20130714,170056,1.30764
20130714,170122,1.30743
20130714,170132,1.30744
20130714,170205,1.30743
20130714,170214,1.30744
20130714,170216,1.30743
20130714,170244,1.30744
20130714,170325,1.30744
20130714,170325,1.30743
20130714,170325,1.30743
20130714,170325,1.30742
20130714,170504,1.30741
20130714,170519,1.30741
20130714,170519,1.30739
20130714,170522,1.30739
20130714,170522,1.30732
20130714,170522,1.30722
All the CSV records are in order by date and time.
I'm using ADO connection from Excel to CSV file with this source code:
strsql = "SELECT * FROM " & sItem & ".csv WHERE F3>=" & trigPrice & " AND (F1 in (SELECT distinct TOP " & trigWin & " f1 FROM " & sItem & ".csv WHERE (F1>=" & sDay & ")) AND f2>=" & sTime & ")"
Set rs = cn.Execute(strsql)
This one query takes about 10 minutes to execute. How do I reduce the execution time?
The reason that database queries can be fast is that the data is already indexed - that is, it will have quick lookups on some of the fields. When you run a "query" on raw CSV files, the ADO engine must first parse the text into records into a set of records, then search through them row by row to find records that match your search criteria. If you are planning on doing much more than a few queries on the data, you might as well import it into an indexed database table and avoid the duplication of parsing the CSV multiple times.
UPDATE
To import the CSV file from VBA, you can use the 'DoCmd.TransferText' function. For example, to import a CSV file to a table (with the correct layout) called "tblData" and from a comma seperated CSV with headers, you could do the following:
DoCmd.TransferText acImportDelim, , "tblData", "C:\Path\OF\THE.csv", True
This is the same method used by the Access Import Wizard.

Problems with field names and appending files in Access SQL

Okay, so I have nearly 200 tables in an Access database. The tables are of plant species abundance data, and I would like to combine them into a master data file. Each table contains basically the same columns of species; however, many are spelled slightly differently.
When I run an SQL query in MS Access it won't let me append the tables with each other because of the field names being spelled just a little different.
Any thoughts that would help?
The query I am running is an append query:
INSERT INTO masterTable SELECT * FROM siteTable
and, as an example, the differences in field names are pretty minor
(e.g. "Spp.A" vs "SppA" or "SpeciesOne" vs "Species1")
Thanks for any help,
Paul
You'll need to use vba for this, you'll also need to change the column names I'm using in the masterTable, which in my example are just column1, column2 & column3, and to set the maximum column index in a couple of places (I've stuck some comments in, so you can see what needs to be changed).
If you dont usually use vba, Create a form with a button, and a click event for the button & put this code in it, then open the form and click the button.
Dim db As Database
Dim tdf As TableDef
Dim ii As Long
dim sql as String
Set db = CurrentDb()
docmd.setwarnings false
For Each tdf In db.TableDefs
'change column list as required:
sql = "INSERT INTO masterTable (Column1, Column2, Column3) SELECT "
'change 2 to maximum column number - 1:
for ii = 0 to 2
sql = sql & tdf.Fields(ii).Name
'change 2 to maximum column number - 1 again:
if ii < 2 then
sql = sql & ","
end if
next
sql = sql & ")"
docmd.runsql sql
Next
docmd.setwarnings true
This should work I think. (I'm hoping there's no syntax errors, as I havent tested it, but the logic isnt exactly rocket science)
Hope this helps

Generate sql insert script from excel worksheet

I have a large excel worksheet that I want to add to my database.
Can I generate an SQL insert script from this excel worksheet?
I think importing using one of the methods mentioned is ideal if it truly is a large file, but you can use Excel to create insert statements:
="INSERT INTO table_name VALUES('"&A1&"','"&B1&"','"&C1&"')"
In MS SQL you can use:
SET NOCOUNT ON
To forego showing all the '1 row affected' comments. And if you are doing a lot of rows and it errors out, put a GO between statements every once in a while
You can create an appropriate table through management studio interface and insert data into the table like it's shown below. It may take some time depending on the amount of data, but it is very handy.
There is a handy tool which saves a lot of time at
http://tools.perceptus.ca/text-wiz.php?ops=7
You just have to feed in the table name, field names and the data - tab separated and hit Go!
You can use the following excel statement:
="INSERT INTO table_name(`"&$A$1&"`,`"&$B$1&"`,`"&$C$1&"`, `"&$D$1&"`) VALUES('"&SUBSTITUTE(A2, "'", "\'")&"','"&SUBSTITUTE(B2, "'", "\'")&"','"&SUBSTITUTE(C2, "'", "\'")&"', "&D2&");"
This improves upon Hart CO's answer as it takes into account column names and gets rid of compile errors due to quotes in the column. The final column is an example of a numeric value column, without quotes.
Depending on the database, you can export to CSV and then use an import method.
MySQL - http://dev.mysql.com/doc/refman/5.1/en/load-data.html
PostgreSQL - http://www.postgresql.org/docs/8.2/static/sql-copy.html
Use the ConvertFrom-ExcelToSQLInsert from the ImportExcel in the PowerShell Gallery
NAME
ConvertFrom-ExcelToSQLInsert
SYNTAX
ConvertFrom-ExcelToSQLInsert [-TableName] <Object> [-Path] <Object>
[[-WorkSheetname] <Object>] [[-HeaderRow] <int>]
[[-Header] <string[]>] [-NoHeader] [-DataOnly] [<CommonParameters>]
PARAMETERS
-DataOnly
-Header <string[]>
-HeaderRow <int>
-NoHeader
-Path <Object>
-TableName <Object>
-WorkSheetname <Object>
<CommonParameters>
This cmdlet supports the common parameters: Verbose, Debug,
ErrorAction, ErrorVariable, WarningAction, WarningVariable,
OutBuffer, PipelineVariable, and OutVariable. For more information, see
about_CommonParameters (http://go.microsoft.com/fwlink/?LinkID=113216).
ALIASES
None
REMARKS
None
EXAMPLE
ConvertFrom-ExcelToSQLInsert MyTable .\testSQLGen.xlsx
You could use VB to write something that will output to a file row by row adding in the appropriate sql statements around your data. I have done this before.
Here is another tool that works very well...
http://www.convertcsv.com/csv-to-sql.htm
It can take tab separated values and generate an INSERT script. Just copy and paste and in the options under step 2 check the box "First row is column names"
Then scroll down and under step 3, enter your table name in the box "Schema.Table or View Name:"
Pay attention to the delete and create table check boxes as well, and make sure you examine the generated script before running it.
This is the quickest and most reliable way I've found.
You can use the below C# Method to generate the insert scripts using Excel sheet just you need import OfficeOpenXml Package from NuGet Package Manager before executing the method.
public string GenerateSQLInsertScripts() {
var outputQuery = new StringBuilder();
var tableName = "Your Table Name";
if (file != null)
{
var filePath = #"D:\FileName.xsls";
using (OfficeOpenXml.ExcelPackage xlPackage = new OfficeOpenXml.ExcelPackage(new FileInfo(filePath)))
{
var myWorksheet = xlPackage.Workbook.Worksheets.First(); //select the first sheet here
var totalRows = myWorksheet.Dimension.End.Row;
var totalColumns = myWorksheet.Dimension.End.Column;
var columns = new StringBuilder(); //this is your columns
var columnRows = myWorksheet.Cells[1, 1, 1, totalColumns].Select(c => c.Value == null ? string.Empty : c.Value.ToString());
columns.Append("INSERT INTO["+ tableName +"] (");
foreach (var colrow in columnRows)
{
columns.Append("[");
columns.Append(colrow);
columns.Append("]");
columns.Append(",");
}
columns.Length--;
columns.Append(") VALUES (");
for (int rowNum = 2; rowNum <= totalRows; rowNum++) //selet starting row here
{
var dataRows = myWorksheet.Cells[rowNum, 1, rowNum, totalColumns].Select(c => c.Value == null ? string.Empty : c.Value.ToString());
var finalQuery = new StringBuilder();
finalQuery.Append(columns);
foreach (var dataRow in dataRows)
{
finalQuery.Append("'");
finalQuery.Append(dataRow);
finalQuery.Append("'");
finalQuery.Append(",");
}
finalQuery.Length--;
finalQuery.Append(");");
outputQuery.Append(finalQuery);
}
}
}
return outputQuery.ToString();}
Here is a link to an Online automator to convert CSV files to SQL Insert Into statements:
CSV-to-SQL
This query i have generated for inserting the Excel file data into database
In this id and price are numeric values and date field as well. This query summarized all the type which I require It may useful to you as well
="insert into product (product_id,name,date,price) values("&A1&",'" &B1& "','" &C1& "'," &D1& ");"
Id Name Date price
7 Product 7 2017-01-05 15:28:37 200
8 Product 8 2017-01-05 15:28:37 40
9 Product 9 2017-01-05 15:32:31 500
10 Product 10 2017-01-05 15:32:31 30
11 Product 11 2017-01-05 15:32:31 99
12 Product 12 2017-01-05 15:32:31 25
I had to make SQL scripts often and add them to source control and send them to DBA.
I used this ExcelIntoSQL App from windows store https://www.microsoft.com/store/apps/9NH0W51XXQRM
It creates complete script with "CREATE TABLE" and INSERTS.
I have a reliable way to generate SQL inserts batly,and you can modify partial parameters in processing.It helps me a lot in my work, for example, copy one hundreds data to database with incompatible structure and fields count.
IntellIJ DataGrip , the powerful tool i use.
DG can batly receive data from WPS office or MS Excel by column or line.
after copying, DG can export data as SQL inserts.

Using a VBA array in a SQL statement

I am trying to write some code that uses SQL to delete rows from several tables.
A user would type type numbers into a textbox that are separated by a comma which is used in the WHERE clause of a SQL DELETE statement.
I have managed to split the string into a variant array and now I want to insert it into my SQL statement.
How do I insert the variable into the SQL statement and have it run through every element of the array?
EDIT: A bit more digging has taught me about For Each Next statements. This is probably what im looking for.
I suggest you build your query in VBA, then your list of numbers can be an IN statement:
sSQL = "DELETE FROM table WHERE ID In (" & MyList & ")"
Where MyList = "1,2,3,4" or such like.
As you can see, you do not need an array and a textbox would be more suitable than a combobox. If you wish to allow your users to select by say, a name, then a listbox is useful. You can iterate through the selected items in the listbox and build a string from IDs to be used in the Delete statement. ( MS Access 2007 - Cycling through values in a list box to grab id's for a SQL statement )
You can then execute the sql against an instance of a database. For example:
Dim db As Database
Set db = CurrentDB
db.Execute sSQL, dbFailOnError
MsgBox "You deleted " & db.RecordsAffected & " records."
A generic approach
WHERE
','+Array+',' like '%,'+col+',%'
It will consider all the numbers available in your Array
You could make it simple and elaborate a string, something like
stringBuilder sb = StringBuilder("DELETE FROM YOURTABLE WHERE ");
foreach(string st in stringArray){
sb.append("YOURFIELD='" + st + "'");
//If it is not the last element, add an "OR"
if (st != stringArray[stringArray.length -1]) sb.append(" OR ");
}
//Now, you have a string like
//DELETE FROM YOURTABLE WHERE YOURFIELD='hello' OR YOURFIELD='YES'
//... do something with the command
This method will fail if you want to run SQL query on two (or multiple) columns using array values from two different arrays. .e.g
where col1=array1(i) and col2=array2(i)

how to replace text in a multifield value column in access

I've got a tablea such as below, I know its bad design having multifield value column but I'm really looking for a hack right now.
Student | Age | Classes
--------|------|----------
foo | 23 | classone, classtwo, classthree, classfour
bar | 24 | classtwo, classfive, classeight
When I run a simple select query as below, I want the results such a way that even occurrence of classtwo is displayed as class2
select student, classes from tablea;
I tried the replace() function but it doesnt work on multivalued fields >_<
You are in a tough situation and I can't think of a SQL solution for you. I think your best option would be to write a VB function that will take the string of data, parse it out (replacing) the returning you the updated string that you can update your data with.
I can cook up quite a few ways to solve this.
You can explode the mv by using Classes.Value in your query. This will cause one row to appear for each value in the query and thus you now can use replace on that. However, this will result in one separate row for each class.
So use this:
Select student, classes.Value from tablea
Or, for this example:
Select student, replace(classes.Value,"classtwo","class2") as myclass
from tablea
If you want one line, AND ALSO the multi value classes are NOT from another table (else they will be returning ID not text), then then you can use the following trick
Select student, dlookup("Classes","tablea","id = " & [id]) as sclasses
from tablea
The above will return the classes separated by a space as a string if you use dlookup(). So just add replace to the above SQL. I suppose if you want, you could also do replace on the space back to a "," for display.
Last but not least, if this those classes are coming from another table, then the dlookup() idea above will not work. So just simply create a VBA function.
You query becomes:
Select student, strClass([id]) as sclasses from tablea
And in a standard code module you create a public function like this:
Public Function strClass(id As Variant) As String
Dim rst As DAO.Recordset
If IsNull(id) = False Then
Set rst = CurrentDb.OpenRecordset("select Classes.Value from tableA where id = " & id)
Do While rst.EOF = False
If strClass <> "" Then strClass = strClass & ","
strClass = strClass & Replace(rst(0), "classtwo", "class2")
rst.MoveNext
Loop
rst.Close
Set rst = Nothing
End If
End Function
Also, if you sending this out to a report, then you can DUMP ALL of the above ideas, and simply bind the above to a text box on the report and put the ONE replace command around that in the text box control. It is quite likely you going to send this out to a report, but you did ask how to do this in a query, and it might be the wrong question since you can "fix" this issue in the report writer and not modify the data at the query level. I also think the replace() command used in the report writer would likely perform the best. However, the above query can be exported, so it really depends on the final goal here.
So lots of easy ways to do this.