SQL BULK INSERT with conditions - sql

What i'm trying to do is read a text file and then use BULK INSERT to create a table.
This is an example of how the text file looks
TIME DATE USER_NAME VALUE
11:10:04 10/02/15 Irene I. Moosa
There are a lot of rows and i mean a lot but sometimes the time is empty or the end character is not just a simple enter and I'm trying to compensate for it
Is something like this possible:
BULK INSERT #TEMP FROM 'C:\QPR_Logs\Audit\MetricsServerAudit.txt'
WHERE [TIME] IS NOT NULL WITH (FIELDTERMINATOR =' ', ROWTERMINATOR = '\n')
Something like that if it reads a null value that it just skips the line?
For the end character I'm not exactly sure what to use.
Has anyone got any suggestions?

Try OPENROWSET. since you have custom row/column terminators, you might require a format file.
select t1.*
from openrowset(bulk 'c:\folder\file1.csv'
, formatfile = 'c:\folder\values.fmt'
, firstrow = 2) as t1
where t1.[TIME] is not null

Related

How to insert null value when datatype error is fired using Bulk Insert?

I'm using Bulk Insert command in T-SQL cause I have a too big csv file. This file has fields that use numeric datatypes like float, but in these fields sometimes there are data with strings like "S/N".
So, how can I force that fields to NULL when I get an string value while I'm inserting?
Now I'm using this command, can you help me?
BULK INSERT [CUBOS_FINDIRECT].[dbo].[ListadoRobinsonTXT]
FROM '\\10.0.20.17\d$\listadoRobinson\listadoRobinsonDef.csv'
WITH
(
FIRSTROW = 2,
MAXERRORS = 0,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
Thank you in advance so much.

Import CSV into SQL (CODE)

I want to import several CSV files automatically using SQL-code (i.e. without using the GUI). Normally, I know the dimensions of my CSV file. So, in many cases I create an empty table with, let say, x columns with the corresponding data types. Then, I import the CSV file into this table using BULK INSERT. However, in this case I don't know much about my files, i.e. information about data types and dimensions are not given.
To summerize the problem:
I receive a file path, e.g. C:...\DATA.csv. Then, I want to use this path in SQL-code to import the file to a table without knowing anything about it.
Any ideas on how to solve this problem?
Use something like this:
BULK INSERT tbl
FROM 'csv_full_path'
WITH
(
FIRSTROW = 2, --Second row if header row in file
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'error_file_path',
TABLOCK
)
If columns are not known, you could try with:
select * from OpenRowset
Or, do a bulk insert with only the first row as one big column, then parse it to create the dynamic main insert. Or bulk insert the whole file into a table with just one column, then parse that...
You can use OPENROWSET (documantation).
SELECT *
INTO dbo.MyTable
FROM
OPENROWSET(
BULK 'C:\...\mycsvfile.csv',
SINGLE_CLOB) AS DATA;
In addition, you can use dynamic SQL to parameterize table name and location of csv file.

sql put " inside "" when doing a bulk insert stored procedure

My procedure for bulk insert is below:
ALTER PROCEDURE [dbo].[usp_impt]
AS
BEGIN
Declare #SQL1 varchar(150), #path varchar(100),
#pathtable varchar(100), #date datetime
set #date = getdate()
-- set path for files
set #path= '\\ff\avc\ce\ed_imp\'
set #pathtable = #path + 'ABC20140723.csv'
-- Delete data from tables
delete from table1
-- set sql
set #SQL1 = "BULK INSERT dbo.table1 FROM '" + #pathtable
+ "' WITH (FIRSTROW = 3,MAXERRORS = 0,FIELDTERMINATOR = ',')"
-- Bulk insert
exec(#sql1)
end
It works fine except when my data has "Google,Inc", it is converted to "Google" "Inc".
I want to write FIELDTERMINATOR = '","' instead of ',', however, I don't know how to put it into my #sql1 string?
Also, is it recommended I write a format file? My csv file has 200 columns and rows. Do I need to write each row? Thanks for any advice.
Short answer is you can't do this with BULK INSERT as is. BULK INSERT can't tell what comma is a delimiter and what comma is part of the data. Your source data apparently doesn't have the string field values quoted (i.e. surrounded by quotes). For comma separated fields, if ANY of the data contains a comma, it must be surrounded by quotation marks or BULK INSERT will view it as a field terminator. You will have to update your source data to surround strings with quotation marks before inserting.
I ran into this frequently while running data loads from text files and csv files. The only solution was to get the source of the data to add quotes or to use a different delimiter, such as a pipe ( | ) or to switch to fixed width fields. In either case, the source data has to be fixed.

Inserting Dates with BULK INSERT

I have a CSV file, which contains three dates:
'2010-07-01','2010-08-05','2010-09-04'
When I try to bulk insert them...
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
I get an error:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (CREATED_ON).
So I'm assuming this is because I have an invalid date format. What is the correct format to use?
EDIT
CREATE TABLE [dbo].[STUDY]
(
[CREATED_ON] DATE,
[COMPLETED_ON] DATE,
[AUTHORIZED_ON] DATE,
}
You've got quotes (') around your dates. Remove those and it should work.
Does your data file have a header record? If it does, obviously your table names will not be the correct data type, and will fail when SQL Server tries to INSERT them into your table. Try this:
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
According to MSDN the BULK INSERT operation technically doesn't support skipping header records in the CSV file. You can either remove the header record or try the above. I don't have SQL Server in front of me at the moment, so I have not confirmed this works. YMMV.

SQL Bulk Insert with FIRSTROW parameter skips the following line

I can't seem to figure out how this is happening.
Here's an example of the file that I'm attempting to bulk insert into SQL server 2005:
***A NICE HEADER HERE***
0000001234|SSNV|00013893-03JUN09
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
Here's my bulk insert statement:
BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
But, for some reason the only output I can get is:
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
The first record always gets skipped, unless I remove the header altogether and don't use the FIRSTROW parameter. How is this possible?
Thanks in advance!
I don't think you can skip rows in a different format with BULK INSERT/BCP.
When I run this:
TRUNCATE TABLE so1029384
BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
SELECT * FROM so1029384
I get:
col1 col2 col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234 SSNV 00013893-03JUN09
0000005678 ABCD 00013893-03JUN09
0000009112 0000 00013893-03JUN09
0000009112 0000 00013893-03JUN09
It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.
You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.
Maybe check that the header has the same line-ending as the actual data rows (as specified in ROWTERMINATOR)?
Update: from MSDN:
The FIRSTROW attribute is not intended
to skip column headers. Skipping
headers is not supported by the BULK
INSERT statement. When skipping rows,
the SQL Server Database Engine looks
only at the field terminators, and
does not validate the data in the
fields of skipped rows.
I found it easiest to just read the entire line into one column then parse out the data using XML.
IF (OBJECT_ID('tempdb..#data') IS NOT NULL) DROP TABLE #data
CREATE TABLE #data (data VARCHAR(MAX))
BULK INSERT #data FROM 'E:\filefromabove.txt' WITH (FIRSTROW = 2, ROWTERMINATOR = '\n')
IF (OBJECT_ID('tempdb..#dataXml') IS NOT NULL) DROP TABLE #dataXml
CREATE TABLE #dataXml (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, data XML)
INSERT #dataXml (data)
SELECT CAST('<r><d>' + REPLACE(data, '|', '</d><d>') + '</d></r>' AS XML)
FROM #data
SELECT d.data.value('(/r//d)[1]', 'varchar(max)') AS col1,
d.data.value('(/r//d)[2]', 'varchar(max)') AS col2,
d.data.value('(/r//d)[3]', 'varchar(max)') AS col3
FROM #dataXml d
You can use the below snippet
BULK INSERT TextData
FROM 'E:\filefromabove.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'E:\ErrorRows.csv',
TABLOCK
)
To let SQL handle quote escape and everything else do this
BULK INSERT Test_CSV
FROM 'C:\MyCSV.csv'
WITH (
FORMAT='CSV'
--FIRSTROW = 2, --uncomment this if your CSV contains header, so start parsing at line 2
);
In regards to other answers, here is valuable info as well:
I keep seeing this in all answers: ROWTERMINATOR = '\n'
The \n means LF and it is Linux style EOL
In Windows the EOL is made of 2 chars CRLF so you need ROWTERMINATOR = '\r\n'
Given how mangled some data can look after BCP importing into SQL Server from non-SQL data sources, I'd suggest doing all the BCP import into some scratch tables first.
For example
truncate table Address_Import_tbl
BULK INSERT dbo.Address_Import_tbl
FROM 'E:\external\SomeDataSource\Address.csv'
WITH (
FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10
)
Make sure all the columns in Address_Import_tbl are nvarchar(), to make it as agnostic as possible, and avoid type conversion errors.
Then apply whatever fixes you need to Address_Import_tbl. Like deleting the unwanted header.
Then run a INSERT SELECT query, to copy from Address_Import_tbl to Address_tbl, along with any datatype conversions you need. For example, to cast imported dates to SQL DATETIME.