Help Importing CSV file with Variable Columns per Row into SQL Table using Import tool or SSIS - sql

I am stuck with a CSV file with over 100,000 rows that contains product images from a provider. Here are the details of the issue, I would really appreciate some tips to help resolve this. Thanks.
The File has 1 Row per product and the following 4 columns.
ID,URL,HEIGHT,WIDTH
example: 1,http://i.img.com,100,200
Problem starts when a product has multiple images.
Instead of having 1 row per image the file has more columns in same row.
example:
1,http://i.img.com,100,200,//i.img.com,20,100,//i.img.com,30,50
Note that only first image has "http://" remaining images start with "//"
There is no telling how many images per product hence no way to tell how many total columns per row or max columns.
How can I import this using SSIS or sql import wizard.
Also I need to do this on regular intervals.
Thank you for your help.

I don't think that you can use any standard SSIS task or wizard to do this. You're going to have to write some custom code which parses each line. You can do this in SSIS using VB code or you can import the file into a staging table that's just a single column to hold each row and do the parsing in SQL. SSIS will probably be faster for this kind of operation.
Another possibility is to preprocess the file using regex or a search-and-replace command. Try to get double-quotes around the image list then you should be able to import the whole file fine, with the quoted part going into a single column. Catching the start of the string should be easy enough given the "http:\" for which you can search. Determining where the end quote goes might be more of a problem.
A third potential solution would be to get the source to fix the data. Even if you can't get the images in separate rows (or another file with separate rows, which would be ideal), maybe you can get the double-quotes added from the source as part of the export. This would likely be less error-prone than using the search-and-replace method.
Good luck!

Related

How to combine two rows into one row with respective column values

I've a csv file which contains data with new lines for a single row i.e. one row data comes in two lines and I want to insert the new lines data into respective columns. I've loaded the data into sql but now I want to replace the second row data into 1st row with respective column values.
output details:
I wouldn't recommend fixing this in SQL because this is an issue with the CSV file. The issue is that file contains new lines, which causes rows split.
I strongly encourage fixing CSV files, if possible. It's going to be difficult to fix that in SQL given there are going to be more cases like that.
If you're doing the import with SSIS (or if you have the option of doing it with SSIS if you are not currently), the package can be configured to manage embedded carriage returns.
Define your file import connection manager with the columns you're expecting.
In the connection manager's Properties window, set the AlwaysCheckForRowDelimiters property to False. The default value is True.
By setting the property to False, SSIS will ignore mid-row carriage return/line feeds and will parse your data into the required number of columns.
Credit to Martin Smith for helping me out when I had a very similar problem some time ago.

Import Unformatted txt file into SQL

I am having an issue importing data into SQL from a text file. Not because I don't know how...but because the formatting is pretty much terrible for this purpose. Below is an altered sample of the types of text files I need to work with:
1 VA - P
2 VB to 1X P
3 VC to 1Y P
4 N - P
5 G to 1G,Frame P
6 Fout to 1G,Frame P
7 Open Breaker P
8 1B to 1X P
9 1C to 1Y P
Test Status: Pass
Hi-Pot # 1500V: Pass
Customer Order:904177-F
Number: G4901626-200
Serial Number: J245F6-2D03856
Catalog #: CBDC37-X5LE30-H40-L630C-4GJ-G31
Operator: TGY
Date: Aug 01, 2013
Start Time: 04:09:26
Finish Time: 04:09:33
The first 9 lines are all specific test results (tab separated), with header information below. My issue is that I need to figure out:
How can I take the data above and turn it into something broken down into a standard column format to import into SQL?
How can I then automate this such that I can loop through an entire folder structure?
-What you see above is one of hundreds of files divided into several sub-directories.
Also note that the # of test lines above the header information vary from file to file. The header information remains in much the same format though. This is all legacy data that cannot be regenerated, but needs to be imported into our SQL databases.
I am thinking of using an SSIS project with a custom script to import the data...splicing the top section from the bottom by looking for the first empty row...then pivot the data in the header into column format...merge...then move on. But I don't write much VB and I'm not sure how to approach that.
I am working in a SQL Server 2008R2 environment with access to BIDS.
Thoughts?
I would start by importing the data as all character into a table with a single field, one record per line. Then, from that table, you can parse each record into the fields and field types appropriate for each line. Hopefully there is a way to figure out what kind of data each line is, whether each file is consistant in order, or the header record indicates information about subsequent lines. From that, the data can be moved to a final (parsing may take more than one pass) table with the data stored in a format that is useable for whatever you need it.
I would first concentrate on getting the data into the database in the least complicated (and least error prone) way possible. Create a table with three columns: filename, line_number and line_data. Plop all of your files into that table and then you can start to think about how to interpret the data. I would probably be looking to use PIVOT, but if different files can have different numbers of fields it may introduce complications.
I would use a different approach and use SSDT/SSIS package to import the data.
Add a script component to read in the text file and convert it to XML. Not hard there many examples on the web. In your script Store the XML you develop into a variable.
Add a data flow
Add an XML Source. In the XML source you can select the XML variable you created and process either group of data present in your file. Here is some information on using the XML Source.
Add destination task to import it to a destination of your choice
This solution assumes your input lines are terminated {CR}{LF}, the normal Windows way.
Tell MSSQL's Import/Export Wizard to import a Flat File; the Format is "Delimited"; the "Text Qualifier" is the {CR}; the "Header Row Delimiter" is the {LF}; and the OutputColumnWidth (in "Advanced") is a bit more than the longest possible line length.
It's simple and it works.
I just used this to import 23 million lines of mixed up data, and it took less than ten minutes. Now to edit it...

SQL Server 2008 - TSQL Read CSV file

I am working on a project that basically entails on importing a CSV file into a SQL Server 2008 R2 database. The CSV file is generated from an Excel file that is populated by a "manager" with PR hours for his employees. This also includes some additional information such as which job and phase the employees were working on and also includes the number of hours for an equipment (if used).
Once you generate a CSV file for that, it's not exactly the usual straighforward "column" based CSV file. It's more like a "row" based CSV file with each row being kind of unique. Due to this caveat involved, I cannot do a straight dump (using BULK insert or OPENROWSET) to SQL, which would essential create a (temp) table with the appropriate column filled data.
I am looking to use the fields within the CSV file based on the "location" of that field in the row.
So, basically the positions of the data will remain the same, since every CSV is based on a TEMPLATE file - so all I have to do is navigate through the CSV file using SQL code to find the right field based on it's position in the ROW. I hope that gives you guys a better understanding of what I am trying to achieve here. Sorry for the long wall of text.
I researched a bit and here's what I have come up with so far:
Reads CSV files into a temp table through a custom SQL function (Reading lines from a file)
https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
This one is interesting. Dumps the whole file as a BLOB and then you can sift through the data.
http://www.mssqltips.com/sqlservertip/1643/using-openrowset-to-read-large-files-into-sql-server/
Finally, this one essential splits out the rows and creates separates records per row. Interesting..
http://ask.sqlservercentral.com/questions/17408/how-to-read-a-text-file.html
If anyone has any suggestions or steps that I could follow to get through this, I would greatly appreciate it.
To the Mods: If I have posted something (especially the links) that shouldn't be here, please feel free to remove it. I apologize if I did.
Thanks much.. Hope to hear some positive responses! :)
Warm Regards,
Pranav
If the file is not too large, another option is to post-process the file in Excel using a VBA macro. Of course, you'd need to come up to speed using the Excel object model and VBA, but the recording function makes it fairly simple. One advantage of the VBA approach is that it seems you really do want to do row by row processing, and VBA is better for that, whereas SQL is better for set-based operations.

Get list of columns of source flat file in SSIS

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.
What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.
What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.
I am relatively new to SSIS, so a detailed help would be much appreciated.
Thanks!
Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.
However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.
I agree with the answer provided by #TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).
May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.
The "Get Header Row" DataFlow would look like this. Row Number if needed.
The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..
Above, the precedent constraints (fx icons ) of
[#ExpectedHeader] == [#CurrentHeader]
[#ExpectedHeader] != [#CurrentHeader]
determine whether you load data or send email.
Hope this helps!
i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps
This is not a code issue but more of soft skills and working with other teams(yours and your vendors).
Steps you can take are:
(1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL.
(2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type.
(3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column.
(4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.
If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.
SSIS cannot make the columns dynamic,
one thing, i always do, is use a script task to read the first and last lines of a file.
if it is not an expected list of csv columns i mark file as errored and continue/fail as required.
Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.
I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.
1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature
2-We can achieve this using ssis script task.
3-If the Header is correct process further for migration else fail the package before DFT execute.
4-Read the line from the header using script task and store in array or list object
5-Then compare those array values to user defined variables declare earlier contained default value as column name.
6-If values are matching exactly then progress further else fail it.

How to create conditional destinations in SSIS Data Flow?

Goal: I have a Db source. Depending on a variable, i need to store it into a fixed width file OR a delimited file.
How do I do this in a data flow? I tried creating a conditional split, with two conditions. One condition going to a fixed width destination, and one to a delimited condition. Problem is that conditional split executed BOTH conditions even if no data comes in one condition. Becuase filename is same, so it errors out.
I would keep your solution with the folowing tweeks.
Write out to two Filename-fixed.txt and filename-delim.txt. Before those steps add row count tasks.
Then in your control flow you have two Success paths. Edit the success paths to look for both success and expression. Add an expression that checks the count from the new row count tasks in your dataflow. If you have file system tasks as your end point have them rename your fixed or delim file to the correct file name.
Note: I didn't try this and the pics all have red x's because I find it helpful to have the picture to figure out the logic not because I actually coded the solution.
Use two different data flows and do the diversion from with in the control flow. If you want to do it within the data-flow itself I guess you will have to use different filenames.