SQL Server Bulk Insert Text File SEC Data

SQL Server Bulk Insert Text File SEC Data - sql

I am trying to do a bulk insert from the SEC text file named tag. A picture is shown below which includes several columns. I have a table that I am trying to insert the data into but it inserts a single row and so somehow I think the delimiters or something are messed up. Here is the DDL for a table In SQL Server:
CREATE TABLE [dbo].[Tag1](
[tag] [char](1000) NULL,
[version] [char](5000) NULL,
[custom] [char](100) NULL,
[abstract] [char](100) NULL,
[datatype] [char](500) NULL,
[iord] [char](22) NULL,
[crdr] [char](22) NULL,
[tlabel] [varchar](max) NULL,
[doc] [varchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
And here is the code I am using to do a bulk insert. It only inserts a single row and I wonder if I haven't correctly specified the delimiter.
BULK INSERT dbo.Tag1
FROM 'F:\SEC\FirstQuarter2020\Tag.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
);

The only way I was able to get it to work was to remove the \r ROWTERMINATOR from the BULK INSERT and leave just the \n for New Line\Line Feed. Now I don't have your exact file but I was able to replicate my own version. I tested this using csv and a tab delimited version.
BULK INSERT dbo.Tag1
FROM 'C:\STORAGE\Tag.txt'
WITH
(
FIRSTROW = 2, --First row is header
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
);
SELECT *
FROM dbo.Tag1
In Notepad++ I do see that there is actually a \r\n... you can see this in Notepadd++ as CR LF. But for some reason the ROWTERMINATOR when using \r\n for the Bulk Insert ends up inserting everything on one single line as you said in your post.
Notepad++ Tab Delimited Screenshot:
SQL Server Screenshot of Bulk Insert:

Here is what worked! The field terminator needed to be in hex so thank you for pointing me to that!
BULK INSERT dbo.Tag1
FROM 'F:\SEC\FirstQuarter2020\Tag.txt'
WITH
(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '0x0a'
);

Related

SQL Server Bulk Insert of UTF-8 CSV file not working correctly

I am trying to bulk insert a UTF-8 CSV file that I downloaded as that type from Google Drive, because Excel was not saving my CSV correctly.
I opened the Google Drive generated CSV file in notepad++ and went to View > Show Symbol > Show All Characters and I could see that it contained LF line feeds for the row terminator (correct me if I am wrong here)
So I tried the below and I don't get any records in the temp table. This works for other CSV files that are not UTF-8 when I use the default row terminator (i.e. '\r\n' when you don't specify one).
I have also tried '\t', '\r\n', '\r' & '\0' for the row terminators and with and without a data file type.. nothing seems to be working? is this to do with my field types in the temp table? or something else?
CREATE TABLE #TEMPResourceContents (
[ResourceName] [nvarchar](250) NOT NULL,
[Language] [nvarchar](250) NOT NULL,
[Content] [nvarchar](max) NOT NULL
)
GO
BULK INSERT #TEMPResourceContents
FROM 'C:\import-resources.csv'
WITH
(FIRSTROW = 2, DATAFILETYPE = 'widechar', FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
GO
SELECT * FROM #TEMPResourceContents

By the way BULK INSERT doesn’t support UTF-8.
See Reference Link MSDN
See Reference Link

Bulk insert to table

I have next table:
CREATE TABLE [dbo].[tempTable](
[id] [varchar](50) NULL,
[amount] [varchar](50) NULL,
[bdate] [varchar](50) NULL
)
and next insert statement:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR ='\n'
)
I get next error:
Bulk load data conversion error (truncation) for row 1, column 3
(bdate).
Data example in file:
12313 24 2012-06-08 13:25:49
12314 26 2012-06-08 12:25:49

It does look like it is just not ever delimiting the row. I've had to separate rows by column delimiter AND row delimiter because the text file had a post ceding (and unnecessary) column delimiter after the last value that it took me awhile to spot. Those dates would certainly fit the format (assuming there just isn't some bad data in a huge file you can't visually spot, and since it doesn't fail until 10 errors by default there'd be at least that many bad records) and it looks like it is making it to that point correctly. View the file in hex in a good text editor if you can and see or just try:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR = '\t\n'
)
Another possibility (that I doubt considering it is varchar(50)) is that there are headers in the inv123.txt file and the header is being perceived as a row and is exceeding varchar(50) and it is what is being truncated. In this case you can add
FIRSTROW = 2,
If it still fails after these things, try to force some data in or grab the rows that are errorring so you'll truly know where the problem is. Look into set ansi_warnings off or using ERRORFILE depending on what flavor SQL SERVER or create a temp table with text as the datatype. SQL Server 2005 forces stricter data validation and forcing an insert without fail is more difficult but can be done.

Bulk insert with a different schema

I am trying to get data from a csv file with the following data.
Station code;DateBegin;DateEnd
01;20100214;20100214
02;20100214;20100214
03;20100214;20100214
I am trying bulk insert as
BULK INSERT dbo.#tmp_station_details
FROM 'C:\station.csv'
WITH (
FIELDTERMINATOR ='';'',
FIRSTROW = 2,
ROWTERMINATOR = ''\n''
)
But the table tmp_station_details has one extra column as Priority.
Its schema is like
[Station code] [Priority] [DateBegin] [DateEnd]
Now is this possible to bulk insert without altering the schema of the table.

Add FORMATFILE = 'format_file_path' to your "with" block. Refer to BOL: using a format file to skip a table column for an example.

Inserting Dates with BULK INSERT

I have a CSV file, which contains three dates:
'2010-07-01','2010-08-05','2010-09-04'
When I try to bulk insert them...
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
I get an error:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (CREATED_ON).
So I'm assuming this is because I have an invalid date format. What is the correct format to use?
EDIT
CREATE TABLE [dbo].[STUDY]
(
[CREATED_ON] DATE,
[COMPLETED_ON] DATE,
[AUTHORIZED_ON] DATE,
}

You've got quotes (') around your dates. Remove those and it should work.

Does your data file have a header record? If it does, obviously your table names will not be the correct data type, and will fail when SQL Server tries to INSERT them into your table. Try this:
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
According to MSDN the BULK INSERT operation technically doesn't support skipping header records in the CSV file. You can either remove the header record or try the above. I don't have SQL Server in front of me at the moment, so I have not confirmed this works. YMMV.

SQL Bulk Insert with FIRSTROW parameter skips the following line

I can't seem to figure out how this is happening.
Here's an example of the file that I'm attempting to bulk insert into SQL server 2005:
***A NICE HEADER HERE***
0000001234|SSNV|00013893-03JUN09
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
Here's my bulk insert statement:
BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
But, for some reason the only output I can get is:
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09
The first record always gets skipped, unless I remove the header altogether and don't use the FIRSTROW parameter. How is this possible?
Thanks in advance!

I don't think you can skip rows in a different format with BULK INSERT/BCP.
When I run this:
TRUNCATE TABLE so1029384
BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)
SELECT * FROM so1029384
I get:
col1 col2 col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234 SSNV 00013893-03JUN09
0000005678 ABCD 00013893-03JUN09
0000009112 0000 00013893-03JUN09
0000009112 0000 00013893-03JUN09
It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.
You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.

Maybe check that the header has the same line-ending as the actual data rows (as specified in ROWTERMINATOR)?
Update: from MSDN:
The FIRSTROW attribute is not intended
to skip column headers. Skipping
headers is not supported by the BULK
INSERT statement. When skipping rows,
the SQL Server Database Engine looks
only at the field terminators, and
does not validate the data in the
fields of skipped rows.

I found it easiest to just read the entire line into one column then parse out the data using XML.
IF (OBJECT_ID('tempdb..#data') IS NOT NULL) DROP TABLE #data
CREATE TABLE #data (data VARCHAR(MAX))
BULK INSERT #data FROM 'E:\filefromabove.txt' WITH (FIRSTROW = 2, ROWTERMINATOR = '\n')
IF (OBJECT_ID('tempdb..#dataXml') IS NOT NULL) DROP TABLE #dataXml
CREATE TABLE #dataXml (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, data XML)
INSERT #dataXml (data)
SELECT CAST('<r><d>' + REPLACE(data, '|', '</d><d>') + '</d></r>' AS XML)
FROM #data
SELECT d.data.value('(/r//d)[1]', 'varchar(max)') AS col1,
d.data.value('(/r//d)[2]', 'varchar(max)') AS col2,
d.data.value('(/r//d)[3]', 'varchar(max)') AS col3
FROM #dataXml d

You can use the below snippet
BULK INSERT TextData
FROM 'E:\filefromabove.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'E:\ErrorRows.csv',
TABLOCK
)

To let SQL handle quote escape and everything else do this
BULK INSERT Test_CSV
FROM 'C:\MyCSV.csv'
WITH (
FORMAT='CSV'
--FIRSTROW = 2, --uncomment this if your CSV contains header, so start parsing at line 2
);
In regards to other answers, here is valuable info as well:
I keep seeing this in all answers: ROWTERMINATOR = '\n'
The \n means LF and it is Linux style EOL
In Windows the EOL is made of 2 chars CRLF so you need ROWTERMINATOR = '\r\n'

Given how mangled some data can look after BCP importing into SQL Server from non-SQL data sources, I'd suggest doing all the BCP import into some scratch tables first.
For example
truncate table Address_Import_tbl
BULK INSERT dbo.Address_Import_tbl
FROM 'E:\external\SomeDataSource\Address.csv'
WITH (
FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10
)
Make sure all the columns in Address_Import_tbl are nvarchar(), to make it as agnostic as possible, and avoid type conversion errors.
Then apply whatever fixes you need to Address_Import_tbl. Like deleting the unwanted header.
Then run a INSERT SELECT query, to copy from Address_Import_tbl to Address_tbl, along with any datatype conversions you need. For example, to cast imported dates to SQL DATETIME.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server Bulk Insert Text File SEC Data - sql

Here is what worked! The field terminator needed to be in hex so thank you for pointing me to that! BULK INSERT dbo.Tag1 FROM 'F:\SEC\FirstQuarter2020\Tag.txt' WITH ( FIELDTERMINATOR = '\t', ROWTERMINATOR = '0x0a' );

Related

SQL Server Bulk Insert of UTF-8 CSV file not working correctly

Bulk insert to table

Bulk insert with a different schema

Inserting Dates with BULK INSERT

SQL Bulk Insert with FIRSTROW parameter skips the following line

Categories

Resources