SQL Server Bulk Insert fixed width file failure - sql

I am attempting to Use Bulk Insert to upload a very large data file (5M rows). All columns are just varchars no conversion. So the Format file is simple...
11.0
29
1 SQLCHAR 0 8 "" 1 AccountId ""
2 SQLCHAR 0 10 "" 2 TranDate ""
3 SQLCHAR 0 4 "" 3 TransCode ""
4 SQLCHAR 0 2 "" 4 AdditionalCode ""
5 SQLCHAR 0 11 "" 5 CurrentPrincipal ""
6 SQLCHAR 0 11 "" 6 CurrentInterest ""
7 SQLCHAR 0 11 "" 7 LateInterest ""
...
27 SQLCHAR 0 8 "" 27 Operator ""
28 SQLCHAR 0 10 "" 28 UpdateDate ""
29 SQLCHAR 0 12 "" 29 TimeUpdated ""
but each time, at some point, I get the same error:
Msg 4832, Level 16, State 1, Line 1 Bulk load: An unexpected end of
file was encountered in the data file. Msg 7399, Level 16, State 1,
Line 1 The OLE DB provider "BULK" for linked server "(null)" reported
an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1 Cannot fetch a row from OLE DB
provider "BULK" for linked server "(null)".
I have tried the following:
Bulk Insert
[TableName] From 'dataFilePPathSpecification'
With (FORMATFILE = 'formatFilePPathSpecification')
but I get the error after about 5-6 minutes, and no data has been inserted.
When I added BatchSize parameter, I get the error after a much longer time, near the end of the file, after all except a very few of the rows have been inserted successfully.
Bulk Insert
[TableName] From 'dataFilePPathSpecification'
With (BATCHSIZE = 200,
FORMATFILE = 'formatFilePPathSpecification')
When I set the BatchSize to 2000 it runs much faster, (Fewer, larger transacxtions I assume), but it still fails.
Does this have something to do with how the Bulk Insert recognizes the end of the file? If so, what do I need to do to the format file to fix it ?

Explicitly state your row terminator:
BULK INSERT TableName FROM 'Path'
WITH (
DATAFILETYPE = 'char',
ROWTERMINATOR = '\r\n'
With (FORMATFILE = 'formatFilePPathSpecification')
);
If this still fails, check your file to see if you have unexpected terminators embedded in text fields.

Trying using the errorFile specifier in the WITH portion to find the offending data:
ERRORFILE = 'C:\offendingdata.log'

If you still have problem even after enabling the errorfile output, you can do a binary search for the problem by setting the FirstRow and LastRow options and running bulk insert repeatedly to isolate the problem.
To be honest your input format looks so simple it might be a good idea to write a small C#, Python, or whatever floats your boat app to quality check you data before attempt import. You could simply discard invalid rows (or possibly fix them) or write them to an exceptions file for hand processing, or simply stop the job -- I.e., file must be perfect or it is considered corrupted. Validating 5M rows this way will be quite fast -- essentially as fast as you can read the file (and possible write) the file.

Thanks for the suggestions to all, I applied both ideas... I wrote a small .Net (c#) file processor utility and it told me there were additional nulls (binary zeroes (\0) at the end of every line, and I was able to strip them off using a simple c# program.
The error file indicated the issue was at the very end, (That's what the error msg said!)
The actual issue was that the Bulk Insert could not recognize the EOF.. I had to modify the format file like this to fix it.. Then it worked.
11.0
29
1 SQLCHAR 0 8 "" 1 AccountId ""
2 SQLCHAR 0 10 "" 2 TranDate ""
3 SQLCHAR 0 4 "" 3 TransCode ""
4 SQLCHAR 0 2 "" 4 AdditionalCode ""
5 SQLCHAR 0 11 "" 5 CurrentPrincipa ""
6 SQLCHAR 0 11 "" 6 CurrentInterest ""
7 SQLCHAR 0 11 "" 7 LateInterest ""
...
27 SQLCHAR 0 8 "" 27 Operator ""
28 SQLCHAR 0 10 "" 28 UpdateDate ""
29 SQLCHAR 0 12 "\r\n" 29 TimeUpdated ""

Related

Unable to identify strange whitespace character in MSSQL table

We have a process that reads an XML file into our database and inserts any rows that aren't currently in another table to that table.
This process also has a trigger to write to an audit table and a nightly snapshot is also held in another table.
In the XML holding table a field looks like 1234567890123456 but it exists on our live table as 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6. Those spaces will not be removed by any combination of REPLACE functions. We have tried all CHAR values and it does not recognise the character. The audit table and nightly snapshot, however, contain the correct values.
Similarly, if we run a comparison between SELECT CASE WHEN '1234567890123456' = '1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ' THEN 1 ELSE 0 END, this returns 1, so they match. However LEN('1234567890123456') is 16 and LEN('1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 ') is 32.
We have ran some queries to loop through the characters in the field and output the ASCII and Unicode values for the characters. The digits return the correct ASCII/Unicode values, but this random whitespace character does not return a value.
An example of the incorrectly displayed one is 0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000 and a correct one is 0x3500320038003600380033003200300030003000360033003600380036003000. Both were added by the same means on the same day. One has the extra bytes, the other is fine.
How can we identify this character and get rid of it? Is there a reason this would have been inserted originally? How can we avoid this in future?
Data entry
It looks like some null (i.e. Char(0)) characters have got into the data.
If the data was supposed to be ASCII when it was entered but UTF-16 data got, then it could be:
Entered character codes: 48 00
Sent to database: 48 00 00 00
To avoid that, remove disallowed characters as the first step in processing the input, say by using a regex to replace [\x00-\x1F] with an empty string.
Data repair
Search for entries which a Char(0) in them to confirm that they can be found that way.
If so, replace the Char(0) with an empty string.
If that doesn't work, you could convert the data to the format '0x35000000320000003800000036000000380000003300000039000000370000003800000037000000330000003000000035000000340000003000000033000000', replace '000000' with '00', and then convert back.

Cannot bulk load error message

Attempting to do a bulk insert. The sample data and the format file are given below. It was brought to my attention that we need to use a Universal naming convention (UNC) hence why the '\FR-6RSGJH2.xyz.st\C$ item in the code. However, the same error occurs if you simply it to '\C\Users\myname\Desktop\testimport.csv'. Any ideas as to what is missing in the syntax or any settings changes that could be done?
BULK INSERT testimport
FROM '\\FR-6RSGJH2.xyz.st\C$\Users\myname\Desktop\testimport.csv'
WITH (FORMATFILE = '\\FR-
6RSGJH2.xyz.st\C$\Users\myname\Desktop\format.txt')
GO
Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file
"\C\Users\myname\Desktop\testimport.csv" could not be opened. Operating
system error code 3(The system cannot find the path specified.).
Sample data
32003012017010316
32001022017040218
32003032017030213
32002042017020111
32002052017020110
format file
13.0
5
1 SQLCHAR 0 02 "" 1 st ""
2 SQLCHAR 0 03 "" 2 cnty ""
3 SQLCHAR 0 02 "" 3 v1 ""
4 SQLCHAR 0 08 "" 4 date ""
5 SQLCHAR 0 02 "\r\n" 5 v2 ""
Not sure how it worked but when I made the testimport into a .txt versus a .csv, it worked. Anyways, that is the answer.

My last column isn't getting populated

I'm trying to use a non-xml format file to bulk import a null delimited file into sql. I've added a column to the staging table in question, and updated the format file to reflect this. Everything seems to be inserting fine, except this last column. The column I added is
Comments (nvarchar(256), null)
The format file looks like this:
11.0
8
1 SQLNCHAR 0 4 "\0\0" 1 ClaimCheckSetId ""
2 SQLNCHAR 0 4 "\0\0" 2 BatchValidationId ""
3 SQLNCHAR 0 4 "\0\0" 3 SourceCommunicationId ""
4 SQLNCHAR 0 4 "\0\0" 4 TargetCommunicationId ""
5 SQLNCHAR 0 1800 "\0\0" 5 TargetExternalCommunicationId ""
6 SQLNCHAR 0 8 "\0\0" 6 TargetSentDateTime ""
7 SQLNCHAR 0 2000 "\0\0" 7 TargetSubject ""
8 SQLNCHAR 0 256 "\r\0\n\0" 8 Comments ""
The SQL looks like this:
DECLARE #filepath NVARCHAR(MAX) = 'C:\{file to import}_512fc21d-dbc9-4975-8169-2ca383ac2bdf.txt';
DECLARE #formatpath NVARCHAR(MAX) = 'C:\{format file}.txt';
DECLARE #bulkinsert NVARCHAR(MAX);
SET
#bulkinsert =
N'BULK INSERT
[The Table]
FROM ''' +
#filepath + N'''
WITH
(
FORMATFILE = ''' +
#formatpath + N''',
DATAFILETYPE = ''WIDECHAR'',
FIRSTROW = 1
)';
SET ANSI_WARNINGS OFF;
EXEC sp_executesql #Bulkinsert;
SET ANSI_WARNINGS ON;
I'm getting no errors, and it is returning a number of rows affected. Unfortunately, I don't know enough about SQL to diagnose this problem. A few hours of googling have not helped either. I hope one of you kind guys or gals can set me back on the straight and narrow.
Update: I edited the \r\0\n\0 to \r\n and am now getting an error!
OLE DB provider 'BULK' for linked server '(null)' returned invalid data for column '[BULK].InsertedDateTime'.
You should check the input file in an editor that shows special symbols. Personally I use Notepad++ (free) for that (View > Show Symbol > Show All Characters), but any decent editor will do.
That way the row terminator (ie the last field terminator) should be clearly visible. In Notepad++ the \0 will be visible as NUL, \r AS CR and \n AS LF.
So with your settings as you currently have, you should be seeing CR NUL LF NUL. If you don't then change the last field terminator to what you see in the editor you are using.
With the limited information I have, can you please change the following
8 SQLNCHAR 0 256 "\r\0\n\0" 8 Comments ""
to
8 SQLNCHAR 0 256 "\r\n" 8 Comments ""
or
8 SQLNCHAR 0 256 "\0\0" 8 Comments ""
It seems the last one should wrap to new line.

BCP format file editing for Bulk Import into SQL

I'm attempting to import a large amount of data contained in a CSV file into a SQL database. The CSV is 4g in size. The CSV has 329 columns and 300,000+ rows of data. So far I've successfully created the database and table that will hold the data once imported. The data contains string (VARCHAR(x), numeric (INT), and dates (DATE).
The data contained within the CSV file is separated by a deliminator "," but all of the data fields are encased in double quotes, with some fields not containing data values. Below is a mock example of the data.
"123244234","09/12/2012","First Name","Last Name","Address 1","","","555-555-5555","","CountryCode"
In research I've determined the easiest way to import the data will be to use BCP to create a format file and then uses that with BULK INSERT. The only probably is in formatting the format file to remove the double quotes. When attempting to import without a format file it fails on row one because the first column first row is numeric and has "" around it.
I've reviewed the following link that talks about removing the double quotes "http://support.microsoft.com/default.aspx?scid=kb;EN-US;132463" with the use of a dummy entry to remove the quotes. In this case that is a lot of manual editing. Does anyone know of a better way to edit the format file?? Here is a sample of the format file:
10.0
329
1 SQLCHAR 0 12 "," 1 NPI ""
2 SQLCHAR 0 12 "," 2 Entity Type Code ""
3 SQLCHAR 0 12 "," 3 Replacement NPI ""
4 SQLCHAR 0 9 "," 4 Employer Identification Number (EIN) SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 70 "," 5 Provider Organization Name (Legal Business Name) SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 35 "," 6 Provider Last Name (Legal Name) SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 20 "," 7 Provider First Name SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 20 "," 8 Provider Middle Name SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 5 "," 9 Provider Name Prefix Text SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 5 "," 10 Provider Name Suffix Text

SQL Server 2005 bulk insert with format file error

error list
Msg 4866, Level 16, State 7, Line 2
The bulk load failed. The column is too long in the data file for row 1, column 1.
Verify that the field terminator and row terminator are specified correctly.
Msg 7399, Level 16, State 1, Line 2
The OLE DB provider "BULK" for linked server "(null)" reported an error.
The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 2
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
fmt file
9.0
10
1 SQLCHAR 2 50 "," 2 EmployeeSSN SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 2 50 "," 3 DOB SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 2 50 "," 4 Gender SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 2 50 "," 5 Relcode SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 2 50 "," 6 EmployeeID SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 2 50 "," 7 AssessmentType SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 2 50 "," 8 MeasurementDate SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 2 50 "," 9 RecordCreationDate SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 2 50 "," 10 AttributeID SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 2 50 "/r/n" 11 AttributeValue SQL_Latin1_General_CP1_CI_AS
Bulk insert code
BULK insert *******_raw_data
from 'E:\*****_csv\BWC_To_*****_2.csv'
with (formatfile = 'c:\*******_raw_data-n.fmt');
first line from csv
NULL,07/14/1983,F,S,105***,HRA,09/28/2011,09/28/2011,19,1
I am trying to figure out where I am going wrong here.... I have gotten other files to work but have been unsuccessful with this one. The files' names are correct in my code they are starred out because they are company names
First error:
Msg 4866, Level 16, State 7, Line 2
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
This is either a problem with NULL or the row terminator.
The last terminator for the row may not be "/r/n", it could be "/n". It is best to confirm that with a Hex Editor.
Second and Third Error:
These all look like a NULL problem.
The correct way to handle nulls in BULK INSERT is to specify the KEEPNULLS option.
with (formatfile = 'c:\*******_raw_data-n.fmt',KEEPNULLS);
Create the csv files with an empty field for NULL values.
,07/14/1983,F,S,105***,HRA,09/28/2011,09/28/2011,19,1