Treat NaN's as NULL in SSIS package - sql

I am trying to load a large .txt file into a table in an SQL Server 2012 database through a SSIS package, I have created through the SQL Server Import Wizard.
Some of my numeric columns contain a few "NaN" text values, and I would like them to be converted to NULL. I just don't know how specify this in the wizard nor in the saved SSIS package.
Note: The .txt file is too big for me to replace "NaN" with blanks (I can't install any program like Notepad++ on my computer), and then enabling RetainNulls.
Is it possible to specify that "NaN" should be read as 'NULL'?

If you are going with package, in data flow use derived column and in that use replace function
replace('NAN',column,null)
Or if you want to change data in database you can use update statement in "OLEDB command" transformation
Update table
set column_name=null
where column_name='NAN'

Related

SSIS Pipe delimited file not failing when the row has more number pipes than the column number?

My Source File is (|) Pipe Delimited text file(.txt). I am trying load the file into SQL Server 2012 using SSIS(SQL Server Data Tools 2012). I have three columns. Below is the example for how data in file looks like.
I am hoping my package should fail as this is pipe(|) delimited instead my package is a success and the last row in the third column with multiple Pipes into last column.
My Question is Why is't the package failing? I believe it has corrupt data because it has more number of columns if we go by delimiter?
If I want to fail the package what are my options,If number of delimiters are more than the number columns?
You can tell what is happening if you look at the advanced page of the flat file connection manager. For all but the last field the delimiter is '|', for the last field it is CRLF.
So by design all data after the last defined pipe and the end of the line (CRLF) is imported into your last field.
What I would do is add another column to the connection manager and your staging table. Map the new 'TestColumn' in the destination. When the import is complete you want to ensure that this column is null in every row. If not then throw an error.
You could use a script task but this way you will not need to code in c# and you will not have to process the file twice. If you are comfortable coding a script task and / or you can not use a staging table with extra column then that will be the only other route I could think of.
A suggestion for checking for null would be to use an execute sql task with single row result set to integer. If the value is > 0 then fail the package.
The query would be Select Count(*) NotNullCount From Table Where TestColumn Is Not Null.
You can write a script task that reads the file, counts the pipes, and raises an error if the number of pipes is not what you want.

Want to use SSIS to get csv file with Null values into SQL Server DB table while preserving Nulls and maintaining floating point casting

Here is an example csv file:
Col1,Col2,Col3,Col4
1.0E+4,2.0E+3,3.1E-2,4.1E+4
NULL,1.0E-2,2.0E+1,3.2E-2
Using SSIS in Visual Studio, I want to get this file from csv format to a SQL Server DB table. I have a Data Flow Task which contains a Flat File Source and an ADO NET Destination. The SQL table has already been created with all columns cast as float. In the Flat File source I cast all columns as (DT_R4). An error is raised when I execute the package. The error is [Flat File Source [21]], data conversion failure for Col1. It is because I have a "Null" in the file. If instead of a Null I have an empty space, the SQL data table contains a "0" rather than a "Null." Is there anything I can put in place of "Null" in the csv file that SQL Server will interpret as Null and won't cause errors for SSIS? Please keep in mind that I actually have 100+ data files, each 500 MB big and each with 600+ columns.
Use a derived column component - Create a DerivedCol1 as
[Col1]=="Null"? NULL(DT_R4):[Col1] and map it to the destination column. Hope this helps.
Did you try
IsNull(col)?" ":col in derived column
If you look at the technical error when you click OK you can see that it needs cast:
"null" == LOWER(myCol) ? (DT_STR, 50, 1252) NULL(DT_STR, 50, 1252) : myCol
It's weird becase NULL(DT_STR,50,1252) should already return a null of that type.

TSQL Bulk Insert

I have such csv file, fields delimiter is ,. My csv files are very big, and I need to import it to a SQL Server table. The process must be automated, and it is not one time job.
So I use Bulk Insert to insert such csv files. But today I received a csvfile that has such row
1,12312312,HOME ,"House, Gregory",P,NULL,NULL,NULL,NULL
The problem is that Bulk Insert creates this row, specially this field "House, Gregory"
as two fields one '"House' and second ' Gregory"'.
Is there some way to make Bulk Insert understand that double quotes override behaviour of comma?
When I open this csv with Excel it sees this field normally as 'House, Gregory'
You need preprocess your file, look to this answer:
SQL Server Bulk insert of CSV file with inconsistent quotes
If every row in the table has double quotes you can specify ," and ", as column separators for that column using format files
If not, get it changed or you'll have to write some clever pre-processing routines somewhere.
The file format need to be consistent for any of the SQL Server tools to work
Since you are referring to Sql Server, I assume you have Access available as well (Microsoft-friendly environment). If you do have Access, I recommend you use its Import Wizard. It is much smarter than the import wizard of Sql Server (even version 2014), and smarter than the Bulk Insert sql command as well.
It has a widget where you can define the Text seperator to be ", it also makes no problems with string length because it uses the Access data type Text.
If you are satisfied with the results in Access you can import them later to Sql Server seamlessly.
The best way to move the data from Access to Sql is using Sql Server Migration Assistant, available here

BCP utility to create a format file, to import Excel data to SQL Server 2008 for BULK insertion

Am trying to import Excel 2003 data into SQL table for SQL Server 2008.
Tried to add a linked server but have met with little success.
Now am trying to check if there's a way to use the BCP utility to do a BULK insert or BULK operation with OPENROWSET, using a format file to get the Excel mapping.
First of all, how can I create a format file for a table, that has differently named columns than the Excel spreadsheet colums?
Next, how to use this format file to import data from say a file at: C:\Folder1\Excel1.xsl
into table Table1 ?
Thank you.
There's some examples here that demonstrate what the data file should look like (csv) and what the format file should look like. Unless you need to do this lots I'd just hand-craft the format file, save the excel data to csv, then try using bcp or OPENROWSET.
The format file specifies the column names for the destination. The data file doesn't have column headings so you don't need to worry about the excel (source) cols being different.
If you need to do more mapping etc, then create an SSIS package. You can use the data import wizard to get you started, then save as SSIS package, then edit to your heart's content.
If it's a one-off I'd use the SQL data import size, from right-click on database in mgmt studio. If you just have a few rows to import from excel I typically open a query to Edit Top 200 rows, edit the query to match the columns I have in excel, then copy and paste the rows from excel into SQL mgmt studio. Doesn't handle errors very well, but quick.

Retaining NULLs in numerical columns using SSIS Import/Export Wizard? [duplicate]

This question already has answers here:
Can't import as null value SQL Server 2008 TSV file
(3 answers)
Closed 2 years ago.
I am having a problem uploading data from tab-delimited flat files (TSV files) into SQL Server 2005 using the SSIS Data Import wizard. I did not experience this problem using the equivalent procedure in SQL Server 2000, and I have checked that the internal structure of the files I am trying to import is unchanged since well before the SQL Server upgrade took place.
The problem is that all blank values in columns with numeric data types (e.g. smallint, float etc) are being converted to 0s on import, instead of NULL. This means that AVGing across these data is giving erroneous output.
The TSV files do not include text qualifiers, however testing the use of qualifiers with some dummy data did not lead to a resolution of this problem.
It is possible to retain the NULLs by importing into VARCHAR columns, however this is far from ideal. Is there a way of instructing the SSIS Import/Export wizard to import blank values from flat files into columns with numeric data types as NULL rather than 0?
#gbn: Thanks for the pointer. I believe I have now found a way around this problem and have been able to successfully import into my SQL Server 2005 database data containing NULL values in numerical columns.
In case anyone else is having the same problem:
I imported the data using the Data Flow task in the Business Intelligence Development Studio (rather than using the dtswizard as previously) by building a Data Flow task from Flat File Source to OLE DB Destination.
In the Flat File Source Editor box there is a 'retain null values from the source as null values in the data flow' tick-box. Ticking this appears to resolve this problem.
As #gbn pointed out, this box is missing from the wizard.