Using bulk insert to import .csv file - sql

I have a csv format file, which I want to import to sql server 2008 using bulk insert. I have 80 columns in csv file which has comma for example, column state has NY,NJ,AZ,TX,AR,VA,MA like this for few millions of rows.
So I enclosed the state column in double quotes using custom format in excel, so that this column will be treated as single column and does not split at comma in between the column. But still the import is not successful; still it is splitting at comma. Can anyone please suggest successful import of the columns containing comma using bulk insert
I am using this code
bulk insert test from 'C:\test.csv'
with (
fieldterminator=',', rowterminator='\n'
)
go
I saw similar question previously asked here, but I don't know visual basic to apply the code. Is there any other option to modify file in excel?

Is there any other option to modify file in excel?
It turns out there is, at least in Windows.
Go to Start Menu > Control Panel > Regional and Language Options.
In the Regional Options tab, click the Customize Button.
In the List Separator field, replace the , with a |. Click OK.
Saving a file as a .CSV through Excel will now create a pipe-separated value file. Be sure to undo this change to the Regional Options setting, as Excel uses the list separator for other things like functions.
Then you can do as datagod suggests and bulk upload the file using | as the column delimiter.

You should create a format file: http://msdn.microsoft.com/en-us/library/ms191516.aspx
If your data contains commas, I would choose a different delimiter. You can specify "|" as the delimiteter in the format file.
Example:
10.0
4
1 SQLCHAR 0 100 "|" 1 Col1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "|" 2 Col2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "|" 3 Col3 SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 7000 "\r\n" 4 Col11 SQL_Latin1_General_CP1_CI_AS

Related

How to import a CSV, which is not a CSV file, file into SQL database using PowerShell

I have a txt file that look something like this
Number Name #about 4 spaces between
89273428 John #about 7 spaces between
59273423 Hannah
95693424 David
I'm trying to upload into my SQL Server Database using PowerShell but I'm not sure how to do it so any suggestion or help would be really appreciated.
I tried to convert to csv file but all the content are merge together in 1 single column so I can't do it like this.
$CSVImport = Import-CSV $Global:TxtConvertCSV
ForEach ($CSVLine in $CSVImport) {
$CSVNumber = $CSVLine.Number.ToUpper()
$CSVName = $CSVLine.Name.ToUpper()
$Date = $CurDate
$query = "INSERT INTO Table (Number, Name, Added_Date) VALUES('$CSVNumber', '$CSVName','$Date');"
Invoke-Sqlcmd -Query $query
}
In order to successfully use the import-csv cmdlet your file must have a reliable delimiter. For example if your file is actually tab delimited then you can use:
import-csv -delimiter "`t"
If the file has no delimiter, but uses fixed positions of a known length for each "field" then you can do the following:
Please note this will only work if the file uses a fixed layout.
As an example, assume we have a file which contains a number and a name on each row. On each line of the file positions 0 through 7 contain the number. Even if not all numbers have a length of 8, those positions are still reserved for them. Positions 15 through 23 contain the name. As for the numbers, even if each name does not take up all of the positions, they are still reserved for the "name" field.
Example file contents:
Number Name
12345 John
333 Brittany
2222 Jeff
12345678 Johannes
Since there are 7 unused spaces between the end of the number field and the start of the name field, we will ignore those.
You could process this file as follows:
$fileContents = Get-Content Path_to_my_DataFile.txt
#use -skip 1 to ignore the first line, since that line just contains the column headings
$recordsFromFile = $fileContents | Select-Object -Skip 1 -Property #{name = 'number'; expression={$_.substring(0,8).trim()}},
#{name='name';expression={$_.substring(15,8).trim()}}
When this completes you will have an array of objects, where each item is a PSCustomObject containing the properties "number" and "name".
You can confirm that the fields look correct by using out-gridview like this:
$recordsFromFile | out-gridview
You can also convert this into a CSV like this:
$recordsFromFile | convertto-csv -notypeinformation
If the file is not actually fixed width, then the substring(start,length) will likely not work. In that case you can potentially leave out the "length" argument to substring from start position to the end of the line, but that will really only work on the last field of each line. Failing that you would have to resort to pattern matching to actually identify where each "field" begins and ends, processing each line individually.

Toad SQL Data Import - Flat File - Carriage Return in Middle of String Field

How do I import a CSV file where there is a carriage return in the middle of one string (quoted) field?
For example, suppose this is my CSV file with 3 records, where the column names are included:
FieldName1, FieldName2, FieldName3, FieldName4
"DataA1
",DataA2,"DataA3",DataA4
"DataB1
",DataB2,"DataB3",DataB4
"DataC1
",DataC2,"DataC3",DataC4
My data always puts a newline before the end quote for the first column in the data.

Save Excel file as text with Tab delimited without ignoring empty rows at begin

After run Macro on my Excel file (.xlsx) I have output like this:
With 3 first empty columns for each row.
Then when I try to save this as Text with Tab delimited I got output (.txt) but without 3 first empty rows:
Others empty rows was displayed properly as tabulation, but these 3 first rows was somehow deleted. But in my case I need this.
Any solution how to avoid that situation? Adding it manually don't be a soltuion, because I have huge amounts of data.
Thanks.
In the First Row of First 3 Columns enter any dummy special character like "#".
Example:
# # # 1 999 999 2 10 3
Just enter these # symbols in first ROW. and now save the excel as Tab delimited text file. I get output as below.
Output:
# # # 1 999 999 2 10 3
1 999 999 2 10 3
1 999 999 2 10 3
1 999 999 2 10 3
Hope this solves the problem in this case. If the empty rows or columns are not consistent, then the code present in Alex page can be used.
Put a formula in the last columns of rows that are empty that evaluate to empty (e.g. =""). And then export.

BCP format file editing for Bulk Import into SQL

I'm attempting to import a large amount of data contained in a CSV file into a SQL database. The CSV is 4g in size. The CSV has 329 columns and 300,000+ rows of data. So far I've successfully created the database and table that will hold the data once imported. The data contains string (VARCHAR(x), numeric (INT), and dates (DATE).
The data contained within the CSV file is separated by a deliminator "," but all of the data fields are encased in double quotes, with some fields not containing data values. Below is a mock example of the data.
"123244234","09/12/2012","First Name","Last Name","Address 1","","","555-555-5555","","CountryCode"
In research I've determined the easiest way to import the data will be to use BCP to create a format file and then uses that with BULK INSERT. The only probably is in formatting the format file to remove the double quotes. When attempting to import without a format file it fails on row one because the first column first row is numeric and has "" around it.
I've reviewed the following link that talks about removing the double quotes "http://support.microsoft.com/default.aspx?scid=kb;EN-US;132463" with the use of a dummy entry to remove the quotes. In this case that is a lot of manual editing. Does anyone know of a better way to edit the format file?? Here is a sample of the format file:
10.0
329
1 SQLCHAR 0 12 "," 1 NPI ""
2 SQLCHAR 0 12 "," 2 Entity Type Code ""
3 SQLCHAR 0 12 "," 3 Replacement NPI ""
4 SQLCHAR 0 9 "," 4 Employer Identification Number (EIN) SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 70 "," 5 Provider Organization Name (Legal Business Name) SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 35 "," 6 Provider Last Name (Legal Name) SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 20 "," 7 Provider First Name SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 20 "," 8 Provider Middle Name SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 5 "," 9 Provider Name Prefix Text SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 5 "," 10 Provider Name Suffix Text

Reading sparse columns from a CSV

I get a CSV that I need to read into a SQL table. Right now it's manually uploaded with a web application, but I want to move this into SQL server. Rather than port my import script straight across into a script in SSIS, I wanted to check and see if there was a better way to do it.
The issue with this particular CSV is that the first few columns are known, and have appropriate headers. However, after that group, the rest of the columns are sparsely populated and might not even have headers.
Example:
Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4
What makes this tolerable is that everything after Col3 can get concatenated together. The script checks each row for these trailing columns and puts them together into a "misc" column. It has to do this in a bit of a blind method because there is no way of knowing ahead of time how many of these columns will be out there.
Is there a way to do this with SSIS tools, or should I just port my existing import script to an SSIS script task?
Another option outside of SSIS is using BulkInsert with format files.
Format files allow you to describe the format of the incoming data.
For example..
9.0
4
1 SQLCHAR 0 100 "," 1 Header1 SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "," 2 Header2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3 Header3 SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "\r\n" 4 Misc SQL_Latin1_General_CP1_CI_AS
Bulk Insert>> http://msdn.microsoft.com/en-us/library/ms188365.aspx
Format Files >> http://msdn.microsoft.com/en-us/library/ms178129.aspx
Step 0. My test file with an additional line
Col1,Col2,Col3,,,,,,
value1,value2,value3,,value4
value1,value2,value3,value4,value5
value1,value2,value3,,value4,value5
value1,value2,value3,,,value4
ends,with,comma,,,value4,
Drag a DFT on the Control flow surface
Inside the DFT, on the data flow surface, drag a Flat file source
Let is map by itself to start with. Check Column names in the first data row.
You will see Col1, Col2, Col3 which are your known fields.
You will also see Column 3 through Column 8. These are the columns
that need to be lumped into one Misc column.
Go to the Advanced section of the Flat File Manager Editor.
Rename Column 3 to Misc. Set field size to 4000.
Note: For longer than that, you would need to use Text data type.
That will pose some challenge, so be ready for fun ;-)
Delete Columns 4 through 8.
Now add a script component.
Input Columns - select only Misc field. Usage Type: ReadWrite
Code:
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
string sMisc = Row.Misc;
string sManipulated = string.Empty;
string temp = string.Empty;
string[] values = sMisc.Split(',');
foreach (string value in values)
{
temp = value;
if (temp.Trim().Equals(string.Empty))
{
temp = "NA";
}
sManipulated = string.Format("{0},{1}", sManipulated, temp);
}
Row.Misc = sManipulated.Substring(1);
}
-- Destination.
Nothing different from usual.
Hope I have understood your problem and the solution works for you.