I'm using a bulk insert script to import a number of flat files into SQL Server
A few files end with
-----------------------------------
So what I want to do is or skip last row(s) or remove ------------------ in the bulk insert. Is one of these options possible?
SET #s = N'BULK INSERT ' + #t + '
FROM ''' + #f + '''
WITH (FIELDTERMINATOR = ''|'',
ROWTERMINATOR = ''0x0a'',
FIRSTROW=2) '
lastrow = 1 doesn't work
The only way I can think of is to first bulk insert the whole file into a single column table (as varchar(max)). Than you can identify the last row, and use that value in your actual bulk insert.
This is not a very straight forward approach, but I don't think there is another (unless you write a custom solution in C# or java or whatever). Maybe you can use SQLCMD to first read the number of lines in the file, but I don't know how.
Please note there is a connect item which Microsoft has closed. On that page Microsoft suggests using an openrowset solution, could be worthwhile to try, but I doubt it would work in your situation.
use script like next:
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#csvData') IS NOT NULL DROP TABLE #csvData
IF OBJECT_ID('tempdb..#csvRowCount') IS NOT NULL DROP TABLE #csvRowCount
CREATE TABLE #csvRowCount
(
v1 nvarchar(max) -- get only first column - for all rows in file
)
BULK INSERT #csvRowCount
FROM 'C:\TEMP\HR_FOR_LOAD\PP_SICKLIST.CSV'
WITH
(
Firstrow=2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
);
declare #textRowCount int=(select count(1) from #csvRowCount)
CREATE TABLE #csvData
(
v1 nvarchar(2000),
v2 nvarchar(2000),
v3 nvarchar(2000)
--etc
)
declare #sql varchar(max)
set #sql = '
BULK INSERT #csvData
FROM ''C:\TEMP\HR_FOR_LOAD\PP_SICKLIST.CSV''
WITH
(
Firstrow=2,
FIELDTERMINATOR = ''\t'',
ROWTERMINATOR = ''\n'',
Lastrow = '+cast(#textRowCount as varchar(100))+'
);'
exec (#sql)
select
v1,
v2,
v3
from #csvData
Related
I am using SSMS and trying to create a stored procedure (because it needs to survive batches) ,so i can bulk insert from multiple csv files (one at a time) into specific tables.
So far I have:
CREATE PROCEDURE AddDataToTable #TableName VARCHAR(25), #DataFolderPath VARCHAR(250),
#DataFile VARCHAR(50), #FieldDeterminator VARCHAR(10)
AS
BEGIN
DECLARE #SQL_BULK VARCHAR(MAX)
SET #SQL_BULK =
'BULK INSERT '+#TableName+'
FROM '''+#DataFolderPath+#DataFile+'''
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '''+#FieldDeterminator+''',
ROWTERMINATOR = ''0x0A'',
TABLOCK
)'
PRINT #SQL_BULK
EXEC #SQL_BULK
END
GO
And using with
EXEC AddDataToTable 'dbo.People',#DataFolderPath,'\ImdbName.csv',';'
And I get the error:
Msg 911, Level 16, State 4, Procedure AddDataToTable, Line 18 (Batch Start Line 161) Database 'BULK INSERT dbo' does not exist. Make sure that the name is entered correctly.
The thing is that I also added a print statement in the procedure and the print result looks like it should with every quote which is:
BULK INSERT dbo.People
FROM 'C:\Users\PC\Desktop\Data\ImdbName.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '0x0A',
TABLOCK
)
You are using a wrong syntax. Execute the dynamically generated statement like this:
EXEC (#SQL_BULK)
or
DECLARE #err int
EXEC #err = sp_executesql #SQL_BULK
IF #err <> 0 PRINT 'Error found.'
As an additional note, always use QUOTENAME() to prevent possible SQL injection issues, when you generate an SQL Server identifier from an input string:
SET #SQL_BULK = 'BULK INSERT ' + QUOTENAME(#TableName) + ' FROM ... '
The following code gives an error (its part of a T-SQL stored procedure):
-- Bulk insert data from the .csv file into the staging table.
DECLARE #CSVfile nvarchar(255);
SET #CSVfile = N'T:\x.csv';
BULK INSERT [dbo].[TStagingTable]
-- FROM N'T:\x.csv' -- This line works
FROM #CSVfile -- This line will not work
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
The error is:
Incorrect syntax near the keyword 'with'.
If I replace:
FROM #CSVfile
with:
FROM 'T:\x.csv'
... then it works nicely.
As I know only literal string is required in the from. In that case you have to write a dynamic query to use bulk insert
declare #q nvarchar(MAX);
set #q=
'BULK INSERT [TStagingTable]
FROM '+char(39)+#CSVfile+char(39)+'
WITH
(
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
FIRSTROW = 1
)'
exec(#q)
Have you tried with dynamic SQL?
SET #SQL = "BULK INSERT TmpStList FROM '"+#PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
and then
EXEC(#SQL)
Ref.: http://www.sqlteam.com/article/using-bulk-insert-to-load-a-text-file
you have to engage in string building & then calling EXEC() or sp_executesql BOL
has an example:
DECLARE #bulk_cmd varchar(1000)
SET #bulk_cmd = 'BULK INSERT AdventureWorks2008R2.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(#bulk_cmd)
A string literal is required.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
You could use dynamic sql to generate the string literal.
Most of the time the variable i'm looking for in a file name is the date, and this one works perfectly for bulk inserting files with date, for use such as in a daily job. Change as per your need, date format, table name, file path, file name and delimiters.
DECLARE #DT VARCHAR (10)
DECLARE #INSERT VARCHAR (1000)
SET #DT = (CONVERT(VARCHAR(10),GETDATE()-1,120))
SET #INSERT = 'BULK INSERT dbo.table FROM ''C:\FOLDER\FILE'+#DT+'.txt'''+' WITH (FIRSTROW=2, FIELDTERMINATOR=''\t'', ROWTERMINATOR=''\n'')'
EXEC (#INSERT);
Can you try FROM ' + #CSVfile + '
I currently have a CURSOR that loops through a temporary table that contains the paths to hundreds of various .txt files that need to be BULK INSERT into a Persons table. The temporary table is also established from a single .txt file that has no identity columns. The FileList.txt text file looks like the following:
...
E:\Dept1\Type1\2005.txt
E:\Dept1\Type1\2006.txt
E:\Dept1\Type1\2007.txt
E:\Dept2\Type1\2005.txt
E:\Dept2\Type1\2006.txt
...
I'm loading the FileList.txt into a temporary table with a BULK INSERT. Given this, the only column in the temporary table is the Path column. If I had an additional identity column, the BULK INSERT would fail due to there being as mismatch of column numbers.
I want to fine tune this to use a WHILE loop. However, I've hit a bit of a dead brain on how best to utilize one. Every solution that uses aWHILE loop that I've seen assumes that a sequential identity column exists. My current query is as follows:
CREATE TABLE #NAMES_filelist (Path VARCHAR(MAX))
BULK INSERT #NAMES_filelist FROM 'E:\FileList.txt' WITH
(
ROWTERMINATOR = '\n'
)
DECLARE #FILEPATH VARCHAR(MAX)
DECLARE #SQLBULK VARCHAR(MAX)
-- Beginning the cursor to loop through the file list.
DECLARE C1 CURSOR
FOR SELECT Path FROM #NAMES_filelist
OPEN C1
FETCH NEXT FROM C1 INTO #FILEPATH
WHILE
##FETCH_STATUS <> -1
BEGIN
-- Setting #SQLBULK to do the bulk insert of the index files.
SET #SQLBULK =
-- Utilizing a view due to additional columns in dbo.Persons in comparison to the index file.
'BULK INSERT Persons_view FROM ''' + #FILEPATH + ''' WITH
(
MAXERRORS = 0
,FIELDTERMINATOR = ''|''
,ROWTERMINATOR = ''\n''
)'
EXEC (#SQLBULK)
-- Updating the DEPT and INDEXFILE columns for lookup purposes.
UPDATE
Persons
SET
DEPT = REVERSE(SUBSTRING(REVERSE(#FILEPATH), CHARINDEX('\',REVERSE(#FILEPATH)) + 1, CHARINDEX('\',REVERSE(#FILEPATH), CHARINDEX('\',REVERSE(#FILEPATH)) + 1) - CHARINDEX('\',REVERSE(#FILEPATH)) - 1))
,INDEXFILE = REVERSE(LEFT(REVERSE(#FILEPATH),CHARINDEX('\', REVERSE(#FILEPATH), 1) - 1))
WHERE
DEPT IS NULL
AND INDEXFILE IS NULL
FETCH NEXT FROM C1 INTO #FILEPATH
END
CLOSE C1
DEALLOCATE C1
DROP TABLE #NAMES_filelist
Any fresh ideas you guys have would be incredibly useful.
You can bulk insert into a view. So:
CREATE TABLE NAMES_filelist (
NAMES_filelist int identity(1, 1) primary key,
Path VARCHAR(MAX)
);
CREATE VIEW v_NAMES_filelist as
select NAMES_filelist
from NAMES_filelist;
BULK INSERT v_NAMES_filelist FROM 'E:\FileList.txt' WITH (ROWTERMINATOR = '\n');
That said, the cursor is a fine approach. But, this will give you an automatically incrementing id in the table of file names.
The following code gives an error (its part of a T-SQL stored procedure):
-- Bulk insert data from the .csv file into the staging table.
DECLARE #CSVfile nvarchar(255);
SET #CSVfile = N'T:\x.csv';
BULK INSERT [dbo].[TStagingTable]
-- FROM N'T:\x.csv' -- This line works
FROM #CSVfile -- This line will not work
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
)
The error is:
Incorrect syntax near the keyword 'with'.
If I replace:
FROM #CSVfile
with:
FROM 'T:\x.csv'
... then it works nicely.
As I know only literal string is required in the from. In that case you have to write a dynamic query to use bulk insert
declare #q nvarchar(MAX);
set #q=
'BULK INSERT [TStagingTable]
FROM '+char(39)+#CSVfile+char(39)+'
WITH
(
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
FIRSTROW = 1
)'
exec(#q)
Have you tried with dynamic SQL?
SET #SQL = "BULK INSERT TmpStList FROM '"+#PathFileName+"' WITH (FIELDTERMINATOR = '"",""') "
and then
EXEC(#SQL)
Ref.: http://www.sqlteam.com/article/using-bulk-insert-to-load-a-text-file
you have to engage in string building & then calling EXEC() or sp_executesql BOL
has an example:
DECLARE #bulk_cmd varchar(1000)
SET #bulk_cmd = 'BULK INSERT AdventureWorks2008R2.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>''
WITH (ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(#bulk_cmd)
A string literal is required.
http://msdn.microsoft.com/en-us/library/ms188365.aspx
You could use dynamic sql to generate the string literal.
Most of the time the variable i'm looking for in a file name is the date, and this one works perfectly for bulk inserting files with date, for use such as in a daily job. Change as per your need, date format, table name, file path, file name and delimiters.
DECLARE #DT VARCHAR (10)
DECLARE #INSERT VARCHAR (1000)
SET #DT = (CONVERT(VARCHAR(10),GETDATE()-1,120))
SET #INSERT = 'BULK INSERT dbo.table FROM ''C:\FOLDER\FILE'+#DT+'.txt'''+' WITH (FIRSTROW=2, FIELDTERMINATOR=''\t'', ROWTERMINATOR=''\n'')'
EXEC (#INSERT);
Can you try FROM ' + #CSVfile + '
I am trying to bulk insert the first row from a csv file into a table with only one column.
But I am getting some extra characters('n++') in the begining like this:
n++First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column
CSV file contents are like:
First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column
You can find the test.csv file here
And this is the code I am using to get the first row data in a table
declare #importSQL nvarchar(2000)
declare #tempstr varchar(max)
declare #path varchar(100)
SET #path = 'D:\test.csv'
CREATE TABLE #tbl (line VARCHAR(max))
SET #importSQL =
'BULK INSERT #tbl
FROM ''' + #path + '''
WITH (
LASTROW = 1,
FIELDTERMINATOR = ''\n'',
ROWTERMINATOR = ''\n''
)'
EXEC sp_executesql #stmt=#importSQL
SET #tempstr = (SELECT TOP 1 RTRIM(REPLACE(Line, CHAR(9), ';')) FROM #tbl)
print #tempstr
drop table #tbl
Any idea where this extra 'n++' is coming from?
It seems UTF-8 files are not supported by SQL Server 2005 and 2008, it will only be available in version 11!
https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001
The extra charectors are caused by the encoding. You can use used notepad to change the encoding format from UTF-8 to Unicode. This removed the 'n++' on the first row.
It might be the Unicode Byte Order Mark that are being picked up.
I suggest your try setting the DATAFILETYPE option as part of your statement. See MSDN documentation for more detail: http://msdn.microsoft.com/en-us/library/aa173832%28SQL.80%29.aspx
Unfortunatelly, Old SQL Server versions not supports utf-8. Add the codepage parameter to bulk insert method. In your question please change your code as exists.
SET #importSQL =
'BULK INSERT #tbl
FROM ''' + #path + '''
WITH ( LASTROW = 1,
FIELDTERMINATOR = ''\n'',
ROWTERMINATOR = ''\n'' ,
CODEPAGE=''65001'')'
Note that, your file must be in utf-8 format.
But the problem there is, if you're upgrade your server from 2005 to 2008 the codepage 65001(utf-8) not supported and then you will get the " codepage not supported"message
In later versions of SQL server you can add '-C 65001' to the command to tell it to use utf-8 encoding. This will remove the n++ from the first line. That is a capital C. Of course when you type the command don't include the quotes.