SSIS Pipe delimited file not failing when the row has more number pipes than the column number? - sql

My Source File is (|) Pipe Delimited text file(.txt). I am trying load the file into SQL Server 2012 using SSIS(SQL Server Data Tools 2012). I have three columns. Below is the example for how data in file looks like.
I am hoping my package should fail as this is pipe(|) delimited instead my package is a success and the last row in the third column with multiple Pipes into last column.
My Question is Why is't the package failing? I believe it has corrupt data because it has more number of columns if we go by delimiter?
If I want to fail the package what are my options,If number of delimiters are more than the number columns?

You can tell what is happening if you look at the advanced page of the flat file connection manager. For all but the last field the delimiter is '|', for the last field it is CRLF.
So by design all data after the last defined pipe and the end of the line (CRLF) is imported into your last field.
What I would do is add another column to the connection manager and your staging table. Map the new 'TestColumn' in the destination. When the import is complete you want to ensure that this column is null in every row. If not then throw an error.
You could use a script task but this way you will not need to code in c# and you will not have to process the file twice. If you are comfortable coding a script task and / or you can not use a staging table with extra column then that will be the only other route I could think of.
A suggestion for checking for null would be to use an execute sql task with single row result set to integer. If the value is > 0 then fail the package.
The query would be Select Count(*) NotNullCount From Table Where TestColumn Is Not Null.

You can write a script task that reads the file, counts the pipes, and raises an error if the number of pipes is not what you want.

Related

What does this error mean: Required column value for column index: 8 is missing in row starting at position: 0

I'm attempting to upload a CSV file (which is an output from a BCP command) to BigQuery using the gcloud CLI BQ Load command. I have already uploaded a custom schema file. (was having major issues with Autodetect).
One resource suggested this could be a datatype mismatch. However, the table from the SQL DB lists the column as a decimal, so in my schema file I have listed it as FLOAT since decimal is not a supported data type.
I couldn't find any documentation for what the error means and what I can do to resolve it.
What does this error mean? It means, in this context, a value is REQUIRED for a given column index and one was not found. (By the way, columns are usually 0 indexed, meaning a fault at column index 8 is most likely referring to column number 9)
This can be caused by myriad of different issues, of which I experienced two.
Incorrectly categorizing NULL columns as NOT NULL. After exporting the schema, in JSON, from SSMS, I needed to clean it
up for BQ and in doing so I assigned IS_NULLABLE:NO to
MODE:NULLABLE and IS_NULLABLE:YES to MODE:REQUIRED. These
values should've been reversed. This caused the error because there
were NULL columns where BQ expected a REQUIRED value.
Using the wrong delimiter The file I was outputting was not only comma-delimited but also tab-delimited. I was only able to validate this by using the Get Data tool in Excel and importing the data that way, after which I saw the error for tabs inside the cells.
After outputting with a pipe ( | ) delimiter, I was finally able to successfully load the file into BigQuery without any errors.

SSIS: Excel data source - if column not exists use other column

I am using select statement in excel source to select just specific columns data from excel for import.
But I am wondering, is it possible to select data such way when I select for example column with name: Column_1, but if this column is not exists in excel then it will try to select column with name Column_2? Currently if Column_1 is missing, then data flow task fails.
Use a Script task and write .net code to read the excel file and then perform the check for the Column_1 availability in the file. If the column does not present then use Column_2 as input. Script Task in SSIS can act as a source.
SSIS is metadata based and will not support dynamic metadata, however you can use Script Component as #nitin-raj suggested to handle all known source columns. There is a good post below on how it can be done.
Dynamic File Connections
If you have many such files that can have varying columns then it is better to create a custom component.However, you cannot have dynamic metadata even with custom component, the set of columns should be known upfront to SSIS.
If the list of columns keep changing and you cannot know in advance what are expected columns then you are better off handling the entire thing in C#/VB.Net using Script Task of control flow
As a best practice, because SSIS meta data is static, any data quality and formatting issues in source files should be corrected before ssis data flow task runs.
I have seen this situation before and there is a very simple fix. In the beginning of your ssis package, using a file task to create copy of the source excel file and then run a c# script or execute a powershell to rename the columns so that if column 1 does not exist, it is either added at the appropriate spot in excel file or in case the column name is wrong is it corrected.
As a result of this, you will not need to refresh your ssis meta data every time it fails. This is a standard data standardization practice.
The easiest way is to add two data flow tasks, one data flow for each Excel source select statement and use precedence constraints to execute the second data flow when the first one fails.
The disadvantage of this approach is that if the first data flow task fails for another reason, it will also try to execute the second one. You will need some advanced error handling to check if the error is thrown due to missing columns or not.
But if have a similar situation, I will use a Script Task to check if the column exists and build the SQL command dynamically. Note that this SQL command must always return the same metadata (you must use aliases).
Helpful links
Overview of SSIS Precedence Constraints
Working with Precedence Constraints in SQL Server Integration Services
Precedence Constraints

generated excel from SSIS but getting quote in every column?

I have generated and excel from SSIS package successfully.
But every column is having extra ' (quote) mark why is it so?
My source sql table is like below
Name price address
ashu 123 pune
jkl 34 UK
In my sql table i took all column as varchar(50) datatype.
In Excel Manager when it is going to create table
Excel Destination took all column as same varchar(50) datatype.
And in Data Flow I have used Data Conversion transformation to prevent unicode conversion error.
Please advice where i need to change to get the clear columns in excel file.
You could create a template Excel file in which you have specified all the column types (change to Text from General) and headers you will need. Store it in a /Template directory and have copy it over to where you will need it from within the SSIS package.
In your SSIS package:
Use Script Component to copy Excel Template file into directory of choice.
Programatically change its name and store the whole filepath in a variable that will be used in your corresponding Data Flow Task.
Use Expression Builder for your Excel Connection Manager. Set the ExcelFilePath to be retrieved from your variable.
the single quote or apostrophe is a way of entering any data (in Excel) and ensure it is treated as text so numbers with leading zeros or fractions are not interpreted by Excel as numeric or dates.
a NJ zip code for instance 07456 would be interpreted as 7456 but by entering it as '07456 it keeps its leading zero (please note that numbers in your example are left aligned, like text is)
I guess SSIS is adding the quotes because your data is of VARCHAR type
First, define the field types for your excel destination in SSIS, any non-text fields will format properly without the '. Then, add a derived column transformation between your source and destination, and use a replace statement for any text columns.
Should be:
(REPLACE(Column1, "'","")
This caused me major problems! So I completed the following:
You can change the excel version to 'Microsoft Excel 4.0' within the excel connection manager in your SSIS package.
Then within excel follow Options > Trust Center > Trust Center Settings > File Block Settings > Untick the 'Open' checkbox for 'Excel 4 workbooks' and 'sheets'.
It is a particular problem when using the Excel destination, at least with older versions of SSIS anyway. To answer the why question, there is this in the Microsoft documentation:
The following behaviors of the Jet provider that is included with the Excel driver can lead to unexpected results when saving data to an Excel destination.
Saving text data. When the Excel driver saves text data values to an Excel destination, the driver precedes the text in each cell with the single quote character (') to ensure that the saved values will be interpreted as text values. If you have or develop other applications that read or process the saved data, you may need to include special handling for the single quote character that precedes each text value.
Taken from https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms137643(v=sql.105)

Number of values in load file is not equal to number of columns

I delete some data by wrong and i want to retrieve them
I try to execute the following command :
LOAD FROM 'C:\db\rqrequesttrans.dat' delimiter '~' insert into rqrequesttrans
but i get the following error :
-- [Informix][Dynamic Server plus Universal Data Option][arch] SQL Error (-847) : Error in load file line 220.
-- [Informix][Dynamic Server plus Universal Data Option][arch] SQL Error (-846) : Number of values in load file is not equal to number of columns.
How to fix this problem ?
There's a different number of data columns in the file than in the schema of the table for row 220. Make sure row 220 has the right number of delimiters for your table.
I don't how many rows in your data table but check to make sure the file is complete.
Make sure you have no leading/trailing empty lines.
Informix complained about line 1 having the wrong number of columns and it was fixed by removing the last empty line.
Old thread, but I still found it and hopefully this helps the next person.

Reading the value from text file and updating it to a field in sql table

I have atext file with data like
Patient name: Patient1 Medical rec #: A1Admit date: 04/26/2009 Discharge date: 04/26/2009
DRG: 982 and so on.
In the format as given above i am having several records in text file.each field is separated by colon
I have to read this file and find out values and update corresponding fields in my sql table.Say drg value 982 has to be updated in drg column of sql table)
Please help in doing it through sql query or ssis package.
If I get this task I'll use SSIS.
Create 2 DataSources: flat file (for text file) and SQL Server connection
Use Lookup task to lookup value from text file for each record in the db table
Use execute SQL Task to update records by lookuped value
You MIGHT try doing this by means of BULK INSERT.
Create a temp-table to get hold the new values
BULK INSERT the file into said table (**)
[optionally do some data-enrichment/cleaning here]
merge the information from the temp-table into the actual table
The only problem with this MIGHT be that
the server cannot access the file directly (eg. when the file is on a
network share)
the file is of a format that can't be handled by BULK INSERT
Given the example data above you might need to load the data into one big column and then do the splitting into different columns by means of creative-sql (PatIndex, substring, the works...). You might try giving colon as a field-separator, but you'll still end up with data that needs (quite a bit) of cleaning.