I am making a table that absorbs values from a csv file. I have several numeric columns and when converting text to number, when inserting values, I get an error.
I have NULL values in the csv. Is there a way to remove them from the database. I remove them manually.
NULL -> '0'.
I tried DEFAULT. ISNULL, I don't know if I can use it when creating a table.
What I want to try is not to open the csv anymore. This way I get it. And when importing to the database, the NULLs go to zero.
I receive the files in csv. I open them, remove the NULL values.
7939102772 2401679 108271 0 3000062862 174529 8129
7939102772 2401679 108271 0 3000062862 174529 8129
7939102772 2401679 108271 0 3000062862 174529 8129
1. NULL NULL NULL NULL NULL NULL NULL
And I use:
BULK INSERT [dbo]
FROM 'C:csv'
WITH (FIRSTROW = 2, FIELDTERMINATOR = ',', ROWTERMINATOR='\n');
You can allow nulls while importing to remove errors caused by null insertion. If you are importing csv using a script or sql query you can update your query as
INSERT INTO table(Col1,Col2,Col3.....) VALUES(ISNULL(value1,0),ISNULL(value2,0),ISNULL(value3,0))
Or You can update your table after successfuly importing the csv with following script
update Tableset Col1=IIf([Col1] Is Null, "0", [Col1]),
[Col2]=IIf([Col2] Is Null, "0", [Col2]),
[Col3] =IIf([Col3] Is Null, "0", [Col3])
where [Col1] is null or [Col2] is null or [Col3] is null
The best approach though is to process and clean your data before importing into a database using python, jupyter or any language you are comfortable with.
SQL Server Import Wizard treats NULL as literal string 'NULL'
another idea is to put the imported data into a staging table then apply and update for the string 'NULL' resulting from the import
Related
For example, I have a column with type int.
The raw data source has integer values, but the null values, instead of being empty (''), is 'NIL'
How would I handle those values when trying to Bulk Insert into MSSQL?
My code is
create table test (nid INT);
bulk insert test from #FILEPATH with (format="CSV", firstrow=2);
the first 5 rows of my .csv file looks like
1
2
3
NIL
7
You can replace the nil with " (empty string) directly in your data source file or insert the data into a staging table and transform it:
BULK INSERT staging_sample_data
FROM '\\data\sample_data.dat';
INSERT INTO [sample_data]
SELECT NULLIF(ColA, 'nil'), NULLIF(ColB, 'nil'),...
Of course if your field is for example a numeric, the staging table should have a string field. Then, you can do as Larnu offers: 'TRY_CONVERT(INT, ColA)'.
*Note: if there are default constraints you may need to check how to keep nulls
I am trying to import a pipeline delimited file into a temporary table using bulk insert (UTF-8 with unix style row terminator), but it keeps ignoring the first data row (the one after the header) and i don't know why.
Adding | to the header row will not help either...
File contents:
SummaryFile_20191017140001.dat|XXXXXXXXXX|FIL-COUNTRY|128
File1_20191011164611.dat|2|4432|2|Imported||
File2_20191011164611.dat|3|4433|1|Imported||
File3_20191011164611.dat|4|4433|2|Imported||
File4_20191011164611.dat|5|4434|1|Imported|INV_ERROR|
File5_20191011164611.dat|6|4434|2|Imported||
File6_20191011164611.dat|7|4434|3|Imported||
The bulk insert throws no error, but it keeps ignoring the first data line (File1_...)
SQL below:
IF OBJECT_ID('tempdb..#mycsv') IS NOT NULL
DROP TABLE #mycsv
create table #mycsv
(
tlr_file_name varchar(150) null,
tlr_record_id int null,
tlr_pre_invoice_number varchar(50) null,
tlr_pre_invoice_line_number varchar(50) null,
tlr_status varchar (30) null,
tlr_error_code varchar(30) null,
tlr_error_message varchar (500) null)
bulk insert #mycsv
from 'D:\TestData\Test.dat'
with (
rowterminator = '0x0A',
fieldTerminator = '|',
firstrow = 2,
ERRORFILE = 'D:\TestData\Import.log')
select * from #mycsv
It's really bugging me, since i don't really know what am i missing.
If i specify FirstRow = 1 th script will throw:
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (tlr_record_id).
Thanks in advance!
"UTF-8 with unix style row terminator" I assume you're using a version of SQL Server that doesn't support UTF-8. From BULK INSERT (Transact-SQL)
** Important ** Versions prior to SQL Server 2016 (13.x) do not support code page 65001 (UTF-8 encoding).
If you are using 2016+, then specify the code page for UTF-8:
BULK INSERT #mycsv
FROM 'D:\TestData\Test.dat'
WITH (ROWTERMINATOR = '0x0A',
FIELDTERMINATOR = '|',
FIRSTROW = 1,
CODEPAGE = '65001',
ERRORFILE = 'D:\TestData\Import.log');
If you aren't using SQL Server 2016+, then you cannot use BULK INSERT to import a UTF-8 file; you will have to use a different code page or use a different tool.
Note, also, that the above document states the below:
The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.
if you are skipping rows, you still need to ensure the row is valid, but it's not for skipping headers. This means you should be using FIRSTROW = 1 and fixing your header row as #sarlacii points out.
Of course, that does not fix the code page problem if you are using an older version of SQL Server; and my point stands that you'll have to use a different technology on 2014 and prior.
To import rows effectively into a SQL database, it is important to make the header formatting match the data rows. Add the missing delimiters, like so, to the header and try the import again:
SummaryFile_20191017140001.dat|XXXXXXXXXX|FIL-COUNTRY|128|||
The number of fields in the header, versus the data fields, must match, else the row is ignored, and the first satisfactory "data" row will be treated as the header.
I've got a front table that essentially matches our SSMS database table t_myTable. Some columns I'm having problems with are those with numeric data types in the db. They are set to allow null, but from the front end when the user deletes the numeric value and tries to send a blank value, it's not posting to the database. I suspect because this value is sent back as an empty string "" which does not translate to the null allowable data type.
Is there a trigger I can create to convert these empty strings into null on insert and update to the database? Or, perhaps a trigger would already happen too late in the process and I need to handle this on the front end or API portion instead?
We'll call my table t_myTable and the column myNumericColumn.
I could also be wrong and perhaps this 'empty string' issue is not the source of my problem. But I suspect that it is.
As #DaleBurrell noted, the proper place to handle data validation is in the application layer. You can wrap each of the potentially problematic values in a NULLIF function, which will convert the value to a NULL if an empty string is passed to it.
The syntax would be along these lines:
SELECT
...
,NULLIF(ColumnName, '') AS ColumnName
select nullif(Column1, '') from tablename
SQL Server doesn't allow to convert an empty string to the numeric data type. Hence the trigger is useless in this case, even INSTEAD OF one: SQL Server will check the conversion before inserting.
SELECT CAST('' AS numeric(18,2)) -- Error converting data type varchar to numeric
CREATE TABLE tab1 (col1 numeric(18,2) NULL);
INSERT INTO tab1 (col1) VALUES(''); -- Error converting data type varchar to numeric
As you didn't mention this error, the client should pass something other than ''. The problem can be found with SQL Profiler: you need to run it and see what exact SQL statement is executing to insert data into the table.
This seems like a trivial question. And it is. But I have googled for over a day now, and still no answer:
I wish to do a bulk insert where for a column whose datatype is varchar(100), I wish to insert an empty string. Not Null but empty. For example for the table:
create table temp(columnName varchar(100))
I wish to insert an empty string as the value:
BULK INSERT sandbox..temp FROM
'file.txt' WITH ( FIELDTERMINATOR = '|#', ROWTERMINATOR = '|:' );
And the file contents would be row1|:row2|:|:|:. So it contains 4 rows where last two rows are intended to be empty string. But they get inserted as NULL.
This question is not the same as the duplicate marked question: In a column, I wish to have the capacity to insert both: NULL and also empty-string. The answer's provided does only one of them but not both.
Well instead of inserting empty string explicitly like this why not let your table column have a default value of empty string and in your bulk insert don't pass any values for those columns. Something like
create table temp(columnName varchar(100) default '')
I have next table:
CREATE TABLE [dbo].[tempTable](
[id] [varchar](50) NULL,
[amount] [varchar](50) NULL,
[bdate] [varchar](50) NULL
)
and next insert statement:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR ='\n'
)
I get next error:
Bulk load data conversion error (truncation) for row 1, column 3
(bdate).
Data example in file:
12313 24 2012-06-08 13:25:49
12314 26 2012-06-08 12:25:49
It does look like it is just not ever delimiting the row. I've had to separate rows by column delimiter AND row delimiter because the text file had a post ceding (and unnecessary) column delimiter after the last value that it took me awhile to spot. Those dates would certainly fit the format (assuming there just isn't some bad data in a huge file you can't visually spot, and since it doesn't fail until 10 errors by default there'd be at least that many bad records) and it looks like it is making it to that point correctly. View the file in hex in a good text editor if you can and see or just try:
BULK INSERT dbo.tempTable
FROM 'C:\files\inv123.txt'
WITH
(
FIELDTERMINATOR ='\t',
ROWTERMINATOR = '\t\n'
)
Another possibility (that I doubt considering it is varchar(50)) is that there are headers in the inv123.txt file and the header is being perceived as a row and is exceeding varchar(50) and it is what is being truncated. In this case you can add
FIRSTROW = 2,
If it still fails after these things, try to force some data in or grab the rows that are errorring so you'll truly know where the problem is. Look into set ansi_warnings off or using ERRORFILE depending on what flavor SQL SERVER or create a temp table with text as the datatype. SQL Server 2005 forces stricter data validation and forcing an insert without fail is more difficult but can be done.