Additional characters are coming during bulk insert - sql

I am trying to bulk insert the first row from a csv file into a table with only one column.
But I am getting some extra characters('n++') in the begining like this:
n++First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column
CSV file contents are like:
First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column
You can find the test.csv file here
And this is the code I am using to get the first row data in a table
declare #importSQL nvarchar(2000)
declare #tempstr varchar(max)
declare #path varchar(100)
SET #path = 'D:\test.csv'
CREATE TABLE #tbl (line VARCHAR(max))
SET #importSQL =
'BULK INSERT #tbl
FROM ''' + #path + '''
WITH (
LASTROW = 1,
FIELDTERMINATOR = ''\n'',
ROWTERMINATOR = ''\n''
)'
EXEC sp_executesql #stmt=#importSQL
SET #tempstr = (SELECT TOP 1 RTRIM(REPLACE(Line, CHAR(9), ';')) FROM #tbl)
print #tempstr
drop table #tbl
Any idea where this extra 'n++' is coming from?

It seems UTF-8 files are not supported by SQL Server 2005 and 2008, it will only be available in version 11!
https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001

The extra charectors are caused by the encoding. You can use used notepad to change the encoding format from UTF-8 to Unicode. This removed the 'n++' on the first row.

It might be the Unicode Byte Order Mark that are being picked up.
I suggest your try setting the DATAFILETYPE option as part of your statement. See MSDN documentation for more detail: http://msdn.microsoft.com/en-us/library/aa173832%28SQL.80%29.aspx

Unfortunatelly, Old SQL Server versions not supports utf-8. Add the codepage parameter to bulk insert method. In your question please change your code as exists.
SET #importSQL =
'BULK INSERT #tbl
FROM ''' + #path + '''
WITH ( LASTROW = 1,
FIELDTERMINATOR = ''\n'',
ROWTERMINATOR = ''\n'' ,
CODEPAGE=''65001'')'
Note that, your file must be in utf-8 format.
But the problem there is, if you're upgrade your server from 2005 to 2008 the codepage 65001(utf-8) not supported and then you will get the " codepage not supported"message

In later versions of SQL server you can add '-C 65001' to the command to tell it to use utf-8 encoding. This will remove the n++ from the first line. That is a capital C. Of course when you type the command don't include the quotes.

Related

How to query SQL Server insert data from file CSV with declare variable [duplicate]

The following code gives an error (its part of a T-SQL stored procedure):
-- Bulk insert data from the .csv file into the staging table.
DECLARE #CSVfile nvarchar(255);
SET #CSVfile = N'T:\x.csv';
BULK INSERT [dbo].[TStagingTable]
-- FROM N'T:\x.csv' -- This line works
FROM #CSVfile -- This line will not work
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
The error is:
Incorrect syntax near the keyword 'with'.
If I replace:
FROM #CSVfile
with:
FROM 'T:\x.csv'
... then it works nicely.
As I know only literal string is required in the from. In that case you have to write a dynamic query to use bulk insert
declare #q nvarchar(MAX);
set #q=
'BULK INSERT [TStagingTable]
FROM '+char(39)+#CSVfile+char(39)+'
WITH
(
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
FIRSTROW = 1
)'
exec(#q)
Have you tried with dynamic SQL?
SET #SQL = "BULK INSERT TmpStList FROM '"+#PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
and then
EXEC(#SQL)
Ref.: http://www.sqlteam.com/article/using-bulk-insert-to-load-a-text-file
you have to engage in string building & then calling EXEC() or sp_executesql BOL
has an example:
DECLARE #bulk_cmd varchar(1000)
SET #bulk_cmd = 'BULK INSERT AdventureWorks2008R2.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(#bulk_cmd)
A string literal is required.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
You could use dynamic sql to generate the string literal.
Most of the time the variable i'm looking for in a file name is the date, and this one works perfectly for bulk inserting files with date, for use such as in a daily job. Change as per your need, date format, table name, file path, file name and delimiters.
DECLARE #DT VARCHAR (10)
DECLARE #INSERT VARCHAR (1000)
SET #DT = (CONVERT(VARCHAR(10),GETDATE()-1,120))
SET #INSERT = 'BULK INSERT dbo.table FROM ''C:\FOLDER\FILE'+#DT+'.txt'''+' WITH (FIRSTROW=2, FIELDTERMINATOR=''\t'', ROWTERMINATOR=''\n'')'
EXEC (#INSERT);
Can you try FROM ' + #CSVfile + '

SQL SERVER bulk load

I am working on the SQL Server 2017. I need to import 20 text file into one table. Every text file has the same data type and column name. I have checked the data and they have in the same order also.
I need to import in SQL Table and create a new column, the last column saying that
Row 1 to Row 150 data comes from "textfile-1"
, Row151 to Row300 data comes from "textfile-2"
, Row301 to Row400 data comes from "textfile-3"
We don't have any packages like SSIS.
Can we do it in Advance SQL Query? if so can someone please guide me
SQL BULK INSERT
First of all you have to make sure that the table structure is identical with the file structure.
You can store the text files path inside a table, loop over these values using a cursor, build the command dynamically then execute the command:
DECLARE #strQuery VARCHAR(4000)
DECLARE #FileName VARCHAR(4000)
DECLARE file_cursor CURSOR
FOR SELECT FilePath FROM FilesTable
OPEN file_cursor
FETCH NEXT FROM file_cursor INTO #FileName;
WHILE ##FETCH_STATUS = 0
BEGIN
SET #strQuery = 'BULK INSERT SchoolsTemp
FROM ''' + #FileName + '''
WITH
(
FIELDTERMINATOR = '','', --Columns delimiter
ROWTERMINATOR = ''\n'', --Rows delimiter
TABLOCK
)
EXEC(#strQuery)
FETCH NEXT FROM file_cursor INTO #FileName;
END
CLOSE file_cursor
DEALLOCATE file_cursor
More information
BULK INSERT (Transact-SQL)
C# approach: SchemaMapper class library
If you are familiar with C#, recently i started a new project on Github, which is a class library developed using C#. You can use it to import tabular data from excel, word , powerpoint, text, csv, html, json and xml into a unified SQL server table. check it out at:
SchemaMapper: C# Schema mapping class library
You can follow this Wiki page for a step-by-step guide:
Import data from multiple files into one SQL table step by step guide

SQL Server Bulk Insert Ingore or skip last row

I'm using a bulk insert script to import a number of flat files into SQL Server
A few files end with
-----------------------------------
So what I want to do is or skip last row(s) or remove ------------------ in the bulk insert. Is one of these options possible?
SET #s = N'BULK INSERT ' + #t + '
FROM ''' + #f + '''
WITH (FIELDTERMINATOR = ''|'',
ROWTERMINATOR = ''0x0a'',
FIRSTROW=2) '
lastrow = 1 doesn't work
The only way I can think of is to first bulk insert the whole file into a single column table (as varchar(max)). Than you can identify the last row, and use that value in your actual bulk insert.
This is not a very straight forward approach, but I don't think there is another (unless you write a custom solution in C# or java or whatever). Maybe you can use SQLCMD to first read the number of lines in the file, but I don't know how.
Please note there is a connect item which Microsoft has closed. On that page Microsoft suggests using an openrowset solution, could be worthwhile to try, but I doubt it would work in your situation.
use script like next:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#csvData') IS NOT NULL DROP TABLE #csvData
IF OBJECT_ID('tempdb..#csvRowCount') IS NOT NULL DROP TABLE #csvRowCount
CREATE TABLE #csvRowCount
(
v1 nvarchar(max) -- get only first column - for all rows in file
)
BULK INSERT #csvRowCount
FROM 'C:\TEMP\HR_FOR_LOAD\PP_SICKLIST.CSV'
WITH
(
Firstrow=2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
);
declare #textRowCount int=(select count(1) from #csvRowCount)
CREATE TABLE #csvData
(
v1 nvarchar(2000),
v2 nvarchar(2000),
v3 nvarchar(2000)
--etc
)
declare #sql varchar(max)
set #sql = '
BULK INSERT #csvData
FROM ''C:\TEMP\HR_FOR_LOAD\PP_SICKLIST.CSV''
WITH
(
Firstrow=2,
FIELDTERMINATOR = ''\t'',
ROWTERMINATOR = ''\n'',
Lastrow = '+cast(#textRowCount as varchar(100))+'
);'
exec (#sql)
select
v1,
v2,
v3
from #csvData

How do I use bulk insert to import a file just based on its file extension?

I have a folder that new log files get created every hour. Each time the file name is different. How do I bulk insert just based on any file that has the extension .log? Here is my code
select * from [data_MaximusImport_t]
BULK
INSERT Data_MaximusImport_t
FROM 'C:\Program Files (x86)\DataMaxx\*.log'
WITH
(FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
Right now I get the error *.log" could not be opened. Operating system error code 123(The filename, directory name, or volume label syntax is incorrect.).
***this is an edit to my original question. I was able to figure out the file name with this code
DECLARE #Path varchar(256) = 'dir C:\datamaxx\*.log'
DECLARE #Command varchar(1024) = #Path + ' /A-D /B'
INSERT INTO myFileList
EXEC MASTER.dbo.xp_cmdshell #Command
SELECT * FROM myFileList
Now i just need to figure out how to stick that name in the path. SHould i delcare the file name as a variable?
You'll need dynamic SQL for this.
Assuming that the file names are already in myFileList, then this is how I would do it:
DECLARE #sql As VARCHAR(MAX);
SET #sql = '';
SELECT #sql = #sql + REPLACE('
BULK INSERT Data_MaximusImport_t
FROM ''C:\Program Files (x86)\DataMaxx\*''
WITH (FIELDTERMINATOR = '','', ROWTERMINATOR = ''\n'' );
', '*', myFileName)
FROM myFileList
WHERE myfileName != '';
PRINT #sql;
EXEC(#sql);
You unfortunately can't use wild cards in the file path with SQL server bulk inserts.
Possible workarounds are scripting a loop to get the filenames from the system and inserting one at a time, or using SSIS

How to cast variables in T-SQL for bulk insert?

The following code gives an error (its part of a T-SQL stored procedure):
-- Bulk insert data from the .csv file into the staging table.
DECLARE #CSVfile nvarchar(255);
SET #CSVfile = N'T:\x.csv';
BULK INSERT [dbo].[TStagingTable]
-- FROM N'T:\x.csv' -- This line works
FROM #CSVfile -- This line will not work
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
The error is:
Incorrect syntax near the keyword 'with'.
If I replace:
FROM #CSVfile
with:
FROM 'T:\x.csv'
... then it works nicely.
As I know only literal string is required in the from. In that case you have to write a dynamic query to use bulk insert
declare #q nvarchar(MAX);
set #q=
'BULK INSERT [TStagingTable]
FROM '+char(39)+#CSVfile+char(39)+'
WITH
(
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
FIRSTROW = 1
)'
exec(#q)
Have you tried with dynamic SQL?
SET #SQL = "BULK INSERT TmpStList FROM '"+#PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
and then
EXEC(#SQL)
Ref.: http://www.sqlteam.com/article/using-bulk-insert-to-load-a-text-file
you have to engage in string building & then calling EXEC() or sp_executesql BOL
has an example:
DECLARE #bulk_cmd varchar(1000)
SET #bulk_cmd = 'BULK INSERT AdventureWorks2008R2.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(#bulk_cmd)
A string literal is required.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
You could use dynamic sql to generate the string literal.
Most of the time the variable i'm looking for in a file name is the date, and this one works perfectly for bulk inserting files with date, for use such as in a daily job. Change as per your need, date format, table name, file path, file name and delimiters.
DECLARE #DT VARCHAR (10)
DECLARE #INSERT VARCHAR (1000)
SET #DT = (CONVERT(VARCHAR(10),GETDATE()-1,120))
SET #INSERT = 'BULK INSERT dbo.table FROM ''C:\FOLDER\FILE'+#DT+'.txt'''+' WITH (FIRSTROW=2, FIELDTERMINATOR=''\t'', ROWTERMINATOR=''\n'')'
EXEC (#INSERT);
Can you try FROM ' + #CSVfile + '