Bulk load data conversion error - sql

I'm trying to load data from csv into an SQL table.
My DDL:
CREATE TABLE pcm.dbo.partitiondocumentcount
(
partitionkey NVARCHAR(30) NOT NULL,
documentcount INT NOT NULL,
datetime DATETIME2(3) DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT pk_partitiondocumentcount PRIMARY KEY (partitionkey ASC)
)
CREATE NONCLUSTERED INDEX partitionkey_index
ON pcm.dbo.partitiondocumentcount (partitionkey AS
My file (I also tried without the quote marks, didn't work as well) :
"partition-1",1
"partition-2",1
My query:
BULK INSERT partitionDocumentCount
FROM 'C:\files\pcmInitialConfiguration\partitionCount.csv'
WITH(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
SELECT * FROM partitionDocumentCount
There error I get from DBVisualizer:
15:36:26 [BULK - 0 row(s), 0.008 secs] [Error Code: 4864, SQL State: S0001] Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (documentcount).
15:36:26 [SELECT - 0 row(s), 0.004 secs] Empty result set fetched
... 2 statement(s) executed, 0 row(s) affected, exec/fetch time: 0.012/0.000 sec [0 successful, 1 warnings, 1 errors]

I see one problem in your approach, but I'm not sure it would generate that error. The table has three columns but your data only has two. Even with a default constraint, bulk insert still looks for the third column. It might get that error because it is looking for a comma but encounters an end-of-line.
The solution to that is to use a view:
create view pcm.dbo.partitiondocumentcount_2 as
select partitionkey, documentcount
from pcm.dbo.partitiondocumentcount;
Then:
BULK INSERT partitionDocumentCount_2
FROM 'C:\files\pcmInitialConfiguration\partitionCount.csv'
WITH (FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
);

See: https://msdn.microsoft.com/en-AU/library/ms188365.aspx for BULK INSERT options.
CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' } Specifies the code
page of the data in the data file. CODEPAGE is relevant only if the
data contains char, varchar, or text columns with character values
greater than 127 or less than 32.

The solution was simply add a column.
I guess since my table had three columns it expected three in the file.
My file (I also tried without the quote marks, didn't work as well) :
Here's the working one
partition-1,1,
partition-2,1,

Related

SQL Bulk insert ignores first data row

I am trying to import a pipeline delimited file into a temporary table using bulk insert (UTF-8 with unix style row terminator), but it keeps ignoring the first data row (the one after the header) and i don't know why.
Adding | to the header row will not help either...
File contents:
SummaryFile_20191017140001.dat|XXXXXXXXXX|FIL-COUNTRY|128
File1_20191011164611.dat|2|4432|2|Imported||
File2_20191011164611.dat|3|4433|1|Imported||
File3_20191011164611.dat|4|4433|2|Imported||
File4_20191011164611.dat|5|4434|1|Imported|INV_ERROR|
File5_20191011164611.dat|6|4434|2|Imported||
File6_20191011164611.dat|7|4434|3|Imported||
The bulk insert throws no error, but it keeps ignoring the first data line (File1_...)
SQL below:
IF OBJECT_ID('tempdb..#mycsv') IS NOT NULL
DROP TABLE #mycsv
create table #mycsv
(
tlr_file_name varchar(150) null,
tlr_record_id int null,
tlr_pre_invoice_number varchar(50) null,
tlr_pre_invoice_line_number varchar(50) null,
tlr_status varchar (30) null,
tlr_error_code varchar(30) null,
tlr_error_message varchar (500) null)
bulk insert #mycsv
from 'D:\TestData\Test.dat'
with (
rowterminator = '0x0A',
fieldTerminator = '|',
firstrow = 2,
ERRORFILE = 'D:\TestData\Import.log')
select * from #mycsv
It's really bugging me, since i don't really know what am i missing.
If i specify FirstRow = 1 th script will throw:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (tlr_record_id).
Thanks in advance!
"UTF-8 with unix style row terminator" I assume you're using a version of SQL Server that doesn't support UTF-8. From BULK INSERT (Transact-SQL)
** Important ** Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding).
If you are using 2016+, then specify the code page for UTF-8:
BULK INSERT #mycsv
FROM 'D:\TestData\Test.dat'
WITH (ROWTERMINATOR = '0x0A',
FIELDTERMINATOR = '|',
FIRSTROW = 1,
CODEPAGE = '65001',
ERRORFILE = 'D:\TestData\Import.log');
If you aren't using SQL Server 2016+, then you cannot use BULK INSERT to import a UTF-8 file; you will have to use a different code page or use a different tool.
Note, also, that the above document states the below:
The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
if you are skipping rows, you still need to ensure the row is valid, but it's not for skipping headers. This means you should be using FIRSTROW = 1 and fixing your header row as #sarlacii points out.
Of course, that does not fix the code page problem if you are using an older version of SQL Server; and my point stands that you'll have to use a different technology on 2014 and prior.
To import rows effectively into a SQL database, it is important to make the header formatting match the data rows. Add the missing delimiters, like so, to the header and try the import again:
SummaryFile_20191017140001.dat|XXXXXXXXXX|FIL-COUNTRY|128|||
The number of fields in the header, versus the data fields, must match, else the row is ignored, and the first satisfactory "data" row will be treated as the header.

String or binary data would be truncated. The statement has been terminated

I have met some problem with the SQL server, this is the function I created:
ALTER FUNCTION [dbo].[testing1](#price int)
RETURNS #trackingItems1 TABLE (
item nvarchar NULL,
warehouse nvarchar NULL,
price int NULL
)
AS
BEGIN
INSERT INTO #trackingItems1(item, warehouse, price)
SELECT ta.item, ta.warehouse, ta.price
FROM stock ta
WHERE ta.price >= #price;
RETURN;
END;
When I write a query to use that function like the following it getting the error
String or binary data would be truncated. The statement has been terminated
How can I fix this problem?
select * from testing1(2)
This is the way I create the table
CREATE TABLE stock(item nvarchar(50) NULL,
warehouse nvarchar(50) NULL,
price int NULL);
When you define varchar etc without a length, the default is 1.
When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30.
So, if you expect 400 bytes in the #trackingItems1 column from stock, use nvarchar(400).
Otherwise, you are trying to fit >1 character into nvarchar(1) = fail
As a comment, this is bad use of table value function too because it is "multi statement". It can be written like this and it will run better
ALTER FUNCTION [dbo].[testing1](#price int)
RETURNS
AS
SELECT ta.item, ta.warehouse, ta.price
FROM stock ta
WHERE ta.price >= #price;
Of course, you could just use a normal SELECT statement..
The maximal length of the target column is shorter than the value you try to insert.
Rightclick the table in SQL manager and go to 'Design' to visualize your table structure and column definitions.
Edit:
Try to set a length on your nvarchar inserts thats the same or shorter than whats defined in your table.
In my case, I was getting this error because my table had
varchar(50)
but I was injecting 67 character long string, which resulted in thi error. Changing it to
varchar(255)
fixed the problem.
Specify a size for the item and warehouse like in the [dbo].[testing1] FUNCTION
#trackingItems1 TABLE (
item nvarchar(25) NULL, -- 25 OR equal size of your item column
warehouse nvarchar(25) NULL, -- same as above
price int NULL
)
Since in MSSQL only saying only nvarchar is equal to nvarchar(1) hence the values of the column from the stock table are truncated
SQL Server 2016 SP2 CU6 and SQL Server 2017 CU12
introduced trace flag 460 in order to return the details of truncation warnings.
You can enable it at the query level or at the server level.
Query level
INSERT INTO dbo.TEST (ColumnTest)
VALUES (‘Test truncation warnings’)
OPTION (QUERYTRACEON 460);
GO
Server Level
DBCC TRACEON(460, -1);
GO
From SQL Server 2019 you can enable it at database level:
ALTER DATABASE SCOPED CONFIGURATION
SET VERBOSE_TRUNCATION_WARNINGS = ON;
The old output message is:
Msg 8152, Level 16, State 30, Line 13
String or binary data would be truncated.
The statement has been terminated.
The new output message is:
Msg 2628, Level 16, State 1, Line 30
String or binary data would be truncated in table 'DbTest.dbo.TEST', column 'ColumnTest'. Truncated value: ‘Test truncation warnings‘'.
In a future SQL Server 2019 release, message 2628 will replace message 8152 by default.

Bulk insert to table

I have next table:
CREATE TABLE [dbo].[tempTable](
[id] [varchar](50) NULL,
[amount] [varchar](50) NULL,
[bdate] [varchar](50) NULL
)
and next insert statement:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR ='\n'
)
I get next error:
Bulk load data conversion error (truncation) for row 1, column 3
(bdate).
Data example in file:
12313 24 2012-06-08 13:25:49
12314 26 2012-06-08 12:25:49
It does look like it is just not ever delimiting the row. I've had to separate rows by column delimiter AND row delimiter because the text file had a post ceding (and unnecessary) column delimiter after the last value that it took me awhile to spot. Those dates would certainly fit the format (assuming there just isn't some bad data in a huge file you can't visually spot, and since it doesn't fail until 10 errors by default there'd be at least that many bad records) and it looks like it is making it to that point correctly. View the file in hex in a good text editor if you can and see or just try:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR = '\t\n'
)
Another possibility (that I doubt considering it is varchar(50)) is that there are headers in the inv123.txt file and the header is being perceived as a row and is exceeding varchar(50) and it is what is being truncated. In this case you can add
FIRSTROW = 2,
If it still fails after these things, try to force some data in or grab the rows that are errorring so you'll truly know where the problem is. Look into set ansi_warnings off or using ERRORFILE depending on what flavor SQL SERVER or create a temp table with text as the datatype. SQL Server 2005 forces stricter data validation and forcing an insert without fail is more difficult but can be done.

Determine ROW that caused "unexpected end of file" error in BULK INSERT?

i am doing a bulk insert:
DECLARE #row_terminator CHAR;
SET #row_terminator = CHAR(10); -- or char(10)
DECLARE #stmt NVARCHAR(2000);
SET #stmt = '
BULK INSERT accn_errors
FROM ''F:\FullUnzipped\accn_errors_201205080105.txt''
WITH
(
firstrow=2,
FIELDTERMINATOR = ''|'' ,
ROWS_PER_BATCH=10000
,ROWTERMINATOR='''+#row_terminator+'''
)'
exec sp_executesql #stmt;
and am getting the following error:
Msg 4832, Level 16, State 1, Line 2
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 2
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 2
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
is there a way to know on which ROW this error occurred?
i am able to import 10,000,000 rows without a problem and error occurs after that
To locate the troublesome row use the errorfile specifier.
BULK INSERT myData
FROM 'C:\...\...\myData.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\...\...\myRubbishData.log'
);
myRubbishData.log will have the offending rows and a companion file
myRubbishData.log.txt will give you row numbers and offsets into the file.
Companion file example:
Row 3 File Offset 152 ErrorFile Offset 0 - HRESULT 0x80004005
Row 5 File Offset 268 ErrorFile Offset 60 - HRESULT 0x80004005
Row 7 File Offset 384 ErrorFile Offset 120 - HRESULT 0x80004005
Row 10 File Offset 600 ErrorFile Offset 180 - HRESULT 0x80004005
Row 12 File Offset 827 ErrorFile Offset 301 - HRESULT 0x80004005
Row 13 File Offset 942 ErrorFile Offset 416 - HRESULT 0x80004005
Fun, fun, fun. I haven't found a good way to debug these problems, so I use brute force. That is, the FirstRow and LastRow options are very useful.
Start with LastRow = 2 and keep trying. Load the results into a throw-away table, that you can readily truncate.
And, you should also keep in mind that the first row could be causing you problems as well.
I have a csv file that i import using Bulk
BULK INSERT [Dashboard].[dbo].[3G_Volume]
FROM 'C:\3G_Volume.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '","',
ROWTERMINATOR = '\n'
)
GO
Usually I used this script and it has no problems but in rare occassions.
I encounter this error..
"The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error."
Usually, this happens when the last row have blank values(null).
You need to link your csv file in MS access db to check the data..
(If your csv is not more than 1.4million rows you can open it in excel)
Since my data is around 3million rows I need to use access db.
Then check the number of the last row with blanks and subtract the number of null rows to your total rows for csv.
if you have 2 blank rows at the end and the total number of rows is 30000005
The script will become like this..
BULK
INSERT [Dashboard].[dbo].[3G_Volume]
FROM 'C:\3G_Volume.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '","',
ROWTERMINATOR = '\n',
Lastrow = 30000003
)
GO
Cheers...
Mhelboy
If CHAR(10) is the row terminator, I don't think you can put it in quotes like you are trying to in BULK INSERT. There is an undocumented way to indicate it, though:
ROWTERMINATOR = '0x0A'
Yeah - BULK INSERT would have done will with a bit more detail in its error messages, and the only way around this is to use the brute force approach, as Gordon rightly pointed out. First, though, based on the error you're getting, it is either not understanding your row terminator, or there is a row terminator missing at the end of the file. Using FIRSTROW and LASTROW will help to determine that.
So, you need to do the following:
Check that there is a row terminator at the end of the file. If not, put one in and try again. Also make sure that the last row contains all of the necessary fields. It it says 'EOF', then that is your problem.
Are you sure there's a LF at the end of each line? Try a CR (\n, 0x0D) and see if that works.
Still not working? Try setting LASTROW=2 and try again. Then try LASTROW=3. If you have more than three rows in your file and this step fails, then the row terminator isn't working.
I ran into the same issue. I had written a shell script to create a .csv in Linux. I took this .csv to Windows and tried to bulk load the data. It did not "like" the commas.... Don't ask me why, but I changed to a * as a delimiter in the bulk import and performed a find and replace for comma with * in my .csv .. that worked.. I changed to a ~ as a delimiter, that worked... tab also worked- it didn't like the comma.... Hope this helps someone.
In my experience this is almost always caused by something in the last two lines. tail the import file and it should still give you the failure. Then open it in a full text editor that lets you see non-printing characters like CR, LF, and EOF. That should enable you to kludge it into working, even if you don't know why. E.g., BULK INSERT fails with row terminator on last row
I got around the problem by converting all fields to strings and then using a common FIELDTERMINATOR. This worked:
BULK INSERT [dbo].[workingBulkInsert]
FROM 'C:\Data\myfile.txt' WITH (
ROWTERMINATOR = '\n',
FIELDTERMINATOR = ','
)
My data file looks like this now:
"01502","1470"
"01504","686"
"02167","882"
"106354","882"
"106355","784"
"106872","784"
The second field had been a decimal type with no double-quote delimiter (like , 1470.00) . Formatting both as strings eliminated the error.
I have a CSV file that I import using Bulk
You need to create one table and all columns should be nullable and remove space in the last row, add only those columns that available in excel. And please do not create a primary column, this process is not Identity increment automatically that's why creating the error.
I have done a bulk insert like this:
CREATE TABLE [dbo].[Department](
[Deptid] [bigint] IDENTITY(1,1) NOT NULL,
[deptname] [nvarchar](max) NULL,
[test] [nvarchar](max) NULL,
CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED
(
[Deptid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE TABLE [dbo].[Table_Column](
[column1] [nvarchar](max) NULL,
[column2] [nvarchar](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
BULK INSERT Table_Column
FROM 'C:\Temp Data\bulkinsert1.csv'
WITH (
FIELDTERMINATOR = ',',
ROWTERMINATOR='\n' ,
batchsize=300000
);
insert into [dbo].[Department]
select column1,column2 from Table_Column
I got around the problem if I converted all fields to string and then used a common fielddelimiter.
the rows generating this error don't have CHAR(10) terminator or have unnecessary spaces

Inserting Dates with BULK INSERT

I have a CSV file, which contains three dates:
'2010-07-01','2010-08-05','2010-09-04'
When I try to bulk insert them...
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
I get an error:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (CREATED_ON).
So I'm assuming this is because I have an invalid date format. What is the correct format to use?
EDIT
CREATE TABLE [dbo].[STUDY]
(
[CREATED_ON] DATE,
[COMPLETED_ON] DATE,
[AUTHORIZED_ON] DATE,
}
You've got quotes (') around your dates. Remove those and it should work.
Does your data file have a header record? If it does, obviously your table names will not be the correct data type, and will fail when SQL Server tries to INSERT them into your table. Try this:
BULK INSERT [dbo].[STUDY]
FROM 'StudyTable.csv'
WITH
(
MAXERRORS = 0,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
According to MSDN the BULK INSERT operation technically doesn't support skipping header records in the CSV file. You can either remove the header record or try the above. I don't have SQL Server in front of me at the moment, so I have not confirmed this works. YMMV.