SQL: Loading a CSV file with BULK statement causing problems with hebrew strings

SQL: Loading a CSV file with BULK statement causing problems with hebrew strings - sql

I'm trying to insert very large csv file into a table on SQL server.
On the table itself the fields are defined as nvarchar but when i'm trying to use the bulk statement to load that file - all the hebrew fields are gibberish.
When i'm using the INSERT statement everything is ok but the BULK one's getting all wrong. I even tried to put the string in the CSV file with the N'string' thing - but it just came to be (in the table: N'gibberish'.
The reason i'm not using just INSERT is because the file contains more than 250K long rows.
This is the statement that i'm using. The delimiter is '|' on purpose:
BULK INSERT [dbo].[SomeTable]
FROM 'C:\Desktop\csvfilesaved.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n',
ERRORFILE = 'C:\Desktop\Error.csv',
TABLOCK
)
And this is two row sample of the csv file:
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ב. קלה|יהודים|ב. בין 31 ל-180 יום||הנדסאים, טכנאים, סוכנים ובעלי משלח יד נלווה|1|0|0|1|0|0
2017-03|"מחוז ש""ת דן"|בני 18 עד 24|זכר|א. לא למד|ג. בינונית|יהודים|ב. בין 31 ל-180 יום||עובדי מכירות ושירותים|1|0|0|1|0|0
Thanks!

Related

SQL Server 2017: IID_IColumnsInfo Bulk Insert Error

I've used the following script in the past without issue, so I'm not sure why it's causing me issues now.
Msg 7301, Level 16, State 2, Line 8
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
My code:
(
FORMAT = 'CSV',
FIELDQUOTE = '"',
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
TABLOCK
)
screenshot of setup and error
File Size: 112 MB
Rows: 322,190
Microsoft Server Management Studio v17.4

Can you try
ROWTERMINATOR = '\r\n'
or
ROWTERMINATOR = '0x0a'

Since you're using a CSV file the row terminator may be a line feed (LF), which 0x0a in the hexadecimal notation for. The example below accounts accounts for this type of row terminator.
BULK INSERT dbo.YourTable
FROM 'C:\FilePath\DataFile.csv'
WITH (
FORMAT = 'CSV',
FIRSTROW = 2,
FIELDQUOTE = '"',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '0x0a',
TABLOCK
);

try removing the FORMAT= 'CSV' line
your file may not be RFC 4180 compliant.
this has worked for me and this error

Make sure there is not a byte-order mark (BOM) at the beginning of the file, which will cause this to fail with this error.

How to write a SQL script to read contents of a file

I have a SQL script and a ".csv" file. I want the SQL script to read the data from the ".csv" file instead of manually entering the data in the script. Is it possible?
....
.....
......
and SP_F.trade_id = SP_R.trade_id
and SP_R.iSINCode IN (here is where I can manually enter the data)
ps: I am new to SQL and I am still learning.

Here is good solution.
BULK INSERT CSVTest
FROM 'c:\csvtest.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
More explained:
1) We have csv file named test.csv with such content:
'JE000DT', 'BE000DT2J', 'DE000DT2'
1, 2, 3
2, 3, 4
4, 5, 6
2) We need to create table for this file in DB:
CREATE TABLE CSVTest ([columnOne] int, [columnTwo] int, [columnThree] int)
3) Insert your data with BULK INSERT. The columns count and type must match your csv.
BULK INSERT CSVTest
FROM 'C:\test.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
4) Use your this table in yours subquery:
Select
SP_F.trade_id, -- as 'Trade ID',
SP_F.issuer_id, --as 'Issuer ID',
SP_R.iSINCode --as 'ISIN'
from t_SP_Fundamentals SP_F
JOIN t_SP_References SP_R ON SP_F.trade_id = SP_R.trade_id
where
(SP_F.issuer_id = 3608 or SP_F.issuer_id = 3607)
and SP_R.iSINCode IN (SELECT [columnOne] FROM CSVTest)
There is another solution with OPENROWSET statement, that allows direct reading from the file. But I strongly recommend you to use the solution above. Reading direct from the file in QUERY is not very great choose.

BULK INSERT 4866 and 7301

Trying to BULK import data in SQL server with below lines but getting error:
Msg 4866, Level 16, State 8, Line 3
The bulk load failed. The column is too long in the data file for row 1, column 96. Verify that the field terminator and row terminator are specified correctly.
Msg 7301, Level 16, State 2, Line 3
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
Is there anything wrong with my statements? As when I use import wizard it works fine.
BULK INSERT BICX.dbo.raw
FROM 'D:\NEW_CDR\NEW.txt'
WITH
(
FIRSTROW = 5,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
);

As you say the table contains 95 columns, and the error says column 96 is too long you have a problem with your row delimiter.
If your file came from a windows system it most likely is \r\n or you could try 0x0a if that doesn't work
BULK INSERT BICX.dbo.raw
FROM 'D:\NEW_CDR\NEW.txt'
WITH
(
FIRSTROW = 5,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\r\n'
);
or
BULK INSERT BICX.dbo.raw
FROM 'D:\NEW_CDR\NEW.txt'
WITH
(
FIRSTROW = 5,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '0x0a'
);

BCP/ Bulk Insert Fails (tab delimited file)

I have been trying to import data (tab delimited) into SQL server. The source data is exported from IBM Cognos. Data can be downloaded from: sample data
I have tried BCP / Bulk Insert, but it did not help. The original datafile contains a header row (which needs to be skipped).
==================================
Schema:
CREATE TABLE [dbo].[DIM_Assessment](
[QueryType] [nvarchar](4000) NULL,
[QueryDate] [nvarchar](4000) NULL,
[APUID] [nvarchar](4000) NULL,
[AssessmentID] [nvarchar](4000) NULL,
[ICDCode] [nvarchar](4000) NULL,
[ICDName] [nvarchar](4000) NULL,
[LoadDate] [nvarchar](4000) NULL
) ON [PRIMARY]
GO
=============================
Format File generated using the following command
bcp [dbname].dbo.dim_assessment format nul -c -f C:\config\dim_assessment.Fmt -S <IP> -U sa -P Pwd
Content of the format file:
11.0
7
1 SQLCHAR 0 8000 "\t" 1 QueryType SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 8000 "\t" 2 QueryDate SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 8000 "\t" 3 APUID SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 8000 "\t" 4 AssessmentID SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 8000 "\t" 5 ICDCode SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 8000 "\t" 6 ICDName SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 8000 "\r\n" 7 LoadDate SQL_Latin1_General_CP1_CI_AS
=============================
I tried importing data using BCP / Bulk Insert, however, non of them worked.
bcp [dbname].dbo.dim_assessment IN C:\dim_assessment.dat -f C:\config\dim_assessment.Fmt -S <IP> -U sa -P Pwd
BULK INSERT dim_assessment FROM '\\dbserver\DIM_Assessment.dat'
WITH (
DATAFILETYPE = 'char',
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
);
GO
Thank you in advance for your help#

Your input file is in a terrible format.
Your format file and your BULK INSERT command both state that the end of a row should be a carriage return and line feed combination, and that there are seven columns of data. However if you open your CSV file in Notepad you will quickly see that the carriage returns and line feeds are not observed correctly in Windows (meaning they must be something other than precisely \r\n). You can also see that there aren't actually seven columns of data, but five:
QueryType QueryDate APUID AssessmentID ICDCode ICDName LoadDate
PPIC 2013-11-20 10:23:14 11431 10963 Tremors
PPIC 2013-11-20 10:23:14 11431 11299 THUMB PAIN
PPIC 2013-11-20 10:23:14 11431 11348 Environmental allergies
...
Just looking at it visually you can tell it isn't right, and you need to get a better source file before throwing it over the wall at SQL Server and expecting it to handle it smoothly:

Just Saved your file as .CSV and bulk inserted with the following statement.
BULK INSERT dim_assessment FROM 'C:\Blabla\TestFile.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
);
GO
Returned Message
(22587 row(s) affected)
Loaded Data
Just notice that some data from ICD name has overflown into LoadDate Column, Just use the | pipe character to deliminate and use the same bulk insert statement with FIELDTERMINATOR = '|' and happy days .

Opening the file via Excel shows the following:
There are indeed 7 row headers
Only the first six of them are populated
Columns 1, 2 and 3 hold identical values
There is some confusing data, where the fifth column can be either empty, or filled with numbers, or filled with text.
I guess that, in these conditions, bulk insert might not work properly. As Excel seems to manage your file in quite a clean way, you should think about an extra step, from CSV to Excel and then to your database.

Ok, so, this was a seemingly simple task to push delimited data from flat-file to SQL server. I thought BCP was the way to go (i used it earlier and was successful).
A quick rundown of what was suggested:
a. fix the source file
b. saving source data in native excel format
c. saving source data as pipe-delimited data
I tried all the options, but, it was adding multiple steps to my process, but was do-able.
I stumbled upon invoke-sqlcmd & import-csv commandlets from powershell. Turns out, I can import the data using powershell directly. it is a bit slow at this time, but, i can live with that for now.
$DATA=IMPORT-CSV dim_assessment.CSV -Delimiter "`t"
FOREACH ($LINE in $DATA)
{
$QueryType="`'"+$Line.QueryType+"`'"
$QueryDate="`'"+$Line.QueryDate+"`'"
$APUID="`'"+$Line.APUID+"`'"
$AssessmentID="`'"+$Line.AssessmentID+"`'"
$ICDCode="`'"+$Line.ICDCode+"`'"
$ICDName=$Line.ICDName
$ICDName = $ICDName.replace("'","''")
$ICDName="`'"+$ICDName+"`'"
$LoadDate="`'"+$Line.LoadDate+"`'"
$SQLHEADER="INSERT INTO [dim_assessment] ([QueryType],[QueryDate],[APUID],[AssessmentID],[ICDCode],[ICDName],[LoadDate])"
$SQLVALUES="VALUES ($QueryType,$QueryDate,$APUID,$AssessmentID,$ICDCode,$ICDName,$LoadDate)"
$SQLQUERY=$SQLHEADER+$SQLVALUES
Invoke-Sqlcmd –Query $SQLQuery -ServerInstance HA -U sa -P Pwd
}
Thanks for all your help!

Oracle PL-SQL : Import multiple delimited files into table

I have multiple files (f1.log, f2.log, f3.log etc)
Each file has the data in ; & = delimited format. (new lines are delimited by ; and fields are delimited by =) e.g.
data of f1:
1=a;2=b;3=c
data of f2:
1=p;2=q;3=r
I need to read all these files and import data into table in format:
filename number data
f1 1 a
f1 2 b
f1 3 c
f2 1 p
[...]
I am new to SQL. Can you please guide me, how can do it?

Use SQL*Loader to get the files into a table. Assuming you have a table created a bit like:
create table FLOG
(
FILENAME varchar2(1000)
,NUM varchar2(1000)
,DATA varchar2(1000)
);
Then you can use the following control file:
LOAD DATA
INFILE 'f1.log' "str ';'"
truncate INTO TABLE flog
fields terminated by '=' TRAILING NULLCOLS
(
filename constant 'f1'
,num char
,data char
)
However, you will need a different control file for each file. This can be done by making the control file dynamically using a shell script. A sample shell script can be:
cat >flog.ctl <<_EOF
LOAD DATA
INFILE '$1.log' "str ';'"
APPEND INTO TABLE flog
fields terminated by '=' TRAILING NULLCOLS
(
filename constant '$1'
,num char
,data char
)
_EOF
sqlldr <username>/<password>#<instance> control=flog.ctl data=$1.log
Saved as flog.sh it can then be run like:
./flog.sh f1
./flog.sh f2

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Loading a CSV file with BULK statement causing problems with hebrew strings - sql

Related

SQL Server 2017: IID_IColumnsInfo Bulk Insert Error

How to write a SQL script to read contents of a file

BULK INSERT 4866 and 7301

BCP/ Bulk Insert Fails (tab delimited file)

Oracle PL-SQL : Import multiple delimited files into table

Categories

Resources