SQL; csv import with semicolons in data and double quotes - sql

I'm wanting to import a CSV file which has some values as such:
123;456;"78;9";1011
Simply said, there are some quotes in a value, but the value is within double quotes. When I use a bulk import, the value '"78' is put into one column, whereas '9"' is put into the next column. How can I prevent this?
I am using below query:
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
GO
I'm using SQL Server!
In a test environment i've setup the new sql server, and the fieldquote seems to be ignored in the statement, and the fields are still split up. What am I doing wrong? I'm doing:
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
FIELDQUOTE='"'
)
GO

Related

Bulk Insert of CSV still has double quotes it in SQL server database 2019

Please help. This is the Bulk insert code to insert Employee data into SQL server 2019.The data is being inserted. but still with quotes. How to insert data without quotes?
Please let me know what wrong AM' I doing?
BULK INSERT dbo.tblActiveDirData
FROM 'E:\EmployeeData\ATadUserlistDB.csv'
WITH
(
DATAFILETYPE = 'char',
FIRSTROW = 2,
FIELDTERMINATOR= ',',
ROWTERMINATOR = '\n'
)
This worked:
BULK INSERT Test_CSV
FROM 'C:\MyCSV.csv'
WITH ( FORMAT='CSV');
No other other code with field terminator, datafiletype was needed

Remove quotation marks using bulk insert in SQL Server

I am trying to use bulk insert for a .txt file, which is separated using a comma, but a few columns also have a double quotes, because of which when bulk insert is used, some rows are not inserted properly.
Also, I have to use bulk insert and not import/export functionality since I am automating my process of inserting the values in the table.
Here is the sample data: test.txt
ID, Date, Phone, Name
1,12/31/2017,"7415236541","Name1"
2,12/31/2017,"8524123652","Name2"
3,12/31/2017,"9853214536","Name2"
I use the following code, but it does not help
BULK INSERT xImportTable
FROM 'C:\Files\CSV\test.csv'
WITH
( FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
But this code does not remove the double quotes.

Many FIELDTERMINATORs or first two characters?

I want to perform BULK INSERT like this:
USE AdventureWorks;
GO
BULK INSERT myDepartment FROM 'C:\myDepartment-c-t.txt'
WITH (
DATAFILETYPE = 'SQLCHAR',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
);
GO
The strings to be imported are like "/useless" or "/practical". I need it to be transformed to "/u" or "/p" (two first characters).
There are two options to do this:
select only two first characters
make FIELDTERMINATOR equal to "s" OR "r"
Is any of these options possible?
A fixed-width import may be more suitable for this task than using delimiters.
Bulk insert fixed width fields

Special characters displaying incorrectly after BULK INSERT

I'm using BULK INSERT to import a CSV file. One of the columns in the CSV file contains some values that contain fractions (e.g. 1m½f).
I don't need to do any mathematical operations on the fractions, as the values will just be used for display purposes, so I have set the column as nvarchar. The BULK INSERT works but when I view the records within SQL the fraction has been replaced with a cent symbol (¢) so the displayed text is 1m¢f.
I'm interested to understand why this is happening and any thoughts on how to resolve the issue. The BULK INSERT command is:
BULK INSERT dbo.temp FROM 'C:\Temp\file.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' );
You need to BULK INSERT using the CODEPAGE = 'ACP', which converts string data from Windows codepage 1252 to SQL Server codepage.
BULK INSERT dbo.temp FROM 'C:\Temp\file.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', CODEPAGE = 'ACP');
If you are bringing in UTF-8 data on a new enough version of SQL Server:
[...] , CODEPAGE = '65001');
You may also need to specify DATAFILETYPE = 'char|native|widechar|widenative'.

SQL Bulk Insert with FIRSTROW parameter skips the following line

I can't seem to figure out how this is happening.
Here's an example of the file that I'm attempting to bulk insert into SQL server 2005:
***A NICE HEADER HERE***
0000001234|SSNV|00013893-03JUN09
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
Here's my bulk insert statement:
BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
But, for some reason the only output I can get is:
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
The first record always gets skipped, unless I remove the header altogether and don't use the FIRSTROW parameter. How is this possible?
Thanks in advance!
I don't think you can skip rows in a different format with BULK INSERT/BCP.
When I run this:
TRUNCATE TABLE so1029384
BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
SELECT * FROM so1029384
I get:
col1 col2 col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234 SSNV 00013893-03JUN09
0000005678 ABCD 00013893-03JUN09
0000009112 0000 00013893-03JUN09
0000009112 0000 00013893-03JUN09
It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.
You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.
Maybe check that the header has the same line-ending as the actual data rows (as specified in ROWTERMINATOR)?
Update: from MSDN:
The FIRSTROW attribute is not intended
to skip column headers. Skipping
headers is not supported by the BULK
INSERT statement. When skipping rows,
the SQL Server Database Engine looks
only at the field terminators, and
does not validate the data in the
fields of skipped rows.
I found it easiest to just read the entire line into one column then parse out the data using XML.
IF (OBJECT_ID('tempdb..#data') IS NOT NULL) DROP TABLE #data
CREATE TABLE #data (data VARCHAR(MAX))
BULK INSERT #data FROM 'E:\filefromabove.txt' WITH (FIRSTROW = 2, ROWTERMINATOR = '\n')
IF (OBJECT_ID('tempdb..#dataXml') IS NOT NULL) DROP TABLE #dataXml
CREATE TABLE #dataXml (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, data XML)
INSERT #dataXml (data)
SELECT CAST('<r><d>' + REPLACE(data, '|', '</d><d>') + '</d></r>' AS XML)
FROM #data
SELECT d.data.value('(/r//d)[1]', 'varchar(max)') AS col1,
d.data.value('(/r//d)[2]', 'varchar(max)') AS col2,
d.data.value('(/r//d)[3]', 'varchar(max)') AS col3
FROM #dataXml d
You can use the below snippet
BULK INSERT TextData
FROM 'E:\filefromabove.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'E:\ErrorRows.csv',
TABLOCK
)
To let SQL handle quote escape and everything else do this
BULK INSERT Test_CSV
FROM 'C:\MyCSV.csv'
WITH (
FORMAT='CSV'
--FIRSTROW = 2, --uncomment this if your CSV contains header, so start parsing at line 2
);
In regards to other answers, here is valuable info as well:
I keep seeing this in all answers: ROWTERMINATOR = '\n'
The \n means LF and it is Linux style EOL
In Windows the EOL is made of 2 chars CRLF so you need ROWTERMINATOR = '\r\n'
Given how mangled some data can look after BCP importing into SQL Server from non-SQL data sources, I'd suggest doing all the BCP import into some scratch tables first.
For example
truncate table Address_Import_tbl
BULK INSERT dbo.Address_Import_tbl
FROM 'E:\external\SomeDataSource\Address.csv'
WITH (
FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10
)
Make sure all the columns in Address_Import_tbl are nvarchar(), to make it as agnostic as possible, and avoid type conversion errors.
Then apply whatever fixes you need to Address_Import_tbl. Like deleting the unwanted header.
Then run a INSERT SELECT query, to copy from Address_Import_tbl to Address_tbl, along with any datatype conversions you need. For example, to cast imported dates to SQL DATETIME.