SSIS 2005 - How to Import a Fixed Width Flat File? - sql

I have a flat file that looks something like this:
junk I don't care about \n
\n
columns names\n
val1 val2 val3\n
val1 val2 val3\n
columns names \n
val1 val2 val3\n
I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names can have any line width.
When I try the flat file fixed width option or the ragged right option the preview looks all wrong. Any ideas what the easiest way to get this into SSIS is?

You cannot use the fixed width option and I seem to recall that the ragged right option only applies if the raggedness is in the entire last column.
You can use the ragged right option and read the entire thing into a string column and then use derived columns.
Alternatively, pre-process the file (possibly in SSIS, using a ragged-right with a conditional split, outputting to a flat file) to filter out the lines you are going to ignore and then you can use the flat file connection manager on the resulting file.
Another option is to code a data source script task by hand.
It would be nice if you could use more complex files by being able to define new connection manager layouts on the outputs of other data flows, but that is not currently available in SSIS.
This is basically the same problem I posed in this question: How to process ragged right text files with many suppressed columns in SSIS or other tool?

Try this after removing the junk at the top manually.
set the task with fixed width option
Add columns manually to the advanced tab. Here you need to add 3 columns with each of length 4.
If it works.. Then you can use a script task to read the flat file and remove the junk before you go for the data flow task.

Related

Text was truncated or one or more characters had no match in the target code page including the primary key in an unpivot

I'm trying to import a flat file into an oledb target sql server database.
here's the field that's giving me trouble:
here are the properties of that flat file connection, specifically the field:
here's the error message:
[Source - 18942979103_txt [424]] Error: Data conversion failed. The
data conversion for column "recipient-name" returned status value 4
and status text "Text was truncated or one or more characters had no
match in the target code page.".
What am I doing wrong?
Here is what fixed the problem for me. I did not have to convert to Excel. Just modified the DataType when choosing the data source to "text stream" (Figure 1). You can also check the "Edit Mappings" dialog to verify the change to the size (Figure 2).
Figure 1
Figure 2
After failing by increasing the length or even changing to data type text, I solved this by creating an XLSX file and importing. It accurately detected the data type instead of setting all columns as varchar(50). Turns out nvarchar(255) for that column would have done it too.
I solved this problem by ORDERING my source data (xls, csv, whatever) such that the longest text values on at the top of the file. Excel is great. use the LEN() function on your challenging column. Order by that length value with the longest value on top of your dataset. Save. Try the import again.
SQL Server may be able to suggest the right data type for you (even when it does not choose the right type by default) - clicking the "Suggest Types" button (shown in your screenshot above) allows you to have SQL Server scan the source and suggest a data type for the field that's throwing an error. In my case, choosing to scan 20000 rows to generate the suggestions, and using the resulting suggested data type, fixed the issue.
While an approach proposed above (#chookoos, here in this q&a convert to Excel workbook) and import resolves those kinds of issues, this solution this solution in another q&a is excellent because you can stay with your csv or tsv or txt file, and perfom the necessary fine tuning without creating a Microsoft product related solution
I've resolved it by checking the 'UNICODE'checkbox. Click on below Image link:
You need to go increase the column length while importing the data for particular column.
Choose a data source >> Advanced >> increase the column from default 50 to 200 or more.
Not really a technical solution, but SQL Server 2017 flat file import is totally revamped, and imported my large-ish file with 5 clicks, handled encoding / field length issues without any input from me
SQl Management Studio data import looks at the first few rows to determine source data specs..
shift your records around so that the longest text is at top.
None of the above worked for me. I SOLVED my problem by saving my source data (save as) Excel file as a single xls Worksheet Excel 5.0/95 and imported without column headings. Also, I created the table in advance and mapped manually instead of letting SQL create the table.
I had similar problem against 2 different databases (DB2 and SQL), finally I solved it by using CAST in the source query from DB2. I also take advantage of using a query by adapting the source column to varchar and avoiding the useless blank spaces:
CAST(RTRIM(LTRIM(COLUMN_NAME)) AS VARCHAR(60) CCSID UNICODE
FOR SBCS DATA) COLUMN_NAME
The important issue here is the CCSID conversion.
It usually because in connection manager it may be still of 50 char , hence I have resolved the problem by going to Connection Manager--> Advanced and then change to 100 or may be 1000 if its big enough

SSIS flat file export adding extra characters during CRLF

I am working in Windows using SQL Server 2008 R2 and VS 2008.
I haven't been able to find any other incidence of this happening via google, but I'm having an issue with SSIS not recognizing the CRLF code in my SQL query. The problem is two fold:
In notepad, the flat file does not come out in columns. It is
just one long string of text (although this resolves in notepad++).
When viewed in notepad++, the first row of data is indented by
two characters and each subsequent row is indented even further!
Basically this file will be unreadable at the other end.
Here's an example how I'm currently approaching it:
Select col1, col2, col3, char(13)+char(10) CRLF
Which produces data like this:
Col1 Col2 Col3 CRLF
xxxx xxxx xxxx
xxxx xxxx xxxx
xxxx xxxx xxxx
Other things I have tried include:
Using declare #crlf (returns the same results)
Using only char(13) or only char(10) (returns the same results)
Using Col3+char(13)+char(10) (returns results in single line)
I think I'm missing just a small piece of the puzzle here, but I can't figure out what that piece would be. Your help is much appreciated.
Throwing in some requested screenshots here:
You can see here where the extra characters are starting to sneak in.
On the Advanced tab of the Flat File Connection Manager the InputColumnWidth might not be set correctly. I'm guessting the last column containing the CRLF, it should be 2 long.
I use the exact same dev stack you list, and I don't include the CRLF in the SQL query, I only use the row delimiter in the SSIS output connection.
In the SSIS package, edit the output connection. It displays the Flat File Connection Manager. In the "Columns" tab (well, not quite a tab, but pick Columns from the list on the left side) here is a "Row Delimiter" and I specify my CRLF there.
There is also a "Header Row Delimiter" on the "General" tab, but that only applies to the header row.
Unless there is a reason you are trying to embed a line break in the middle of a query row?
EDIT: Some more troubleshooting questions ...
1) Are you writing your file to a network drive or a local drive? Try setting to a local drive in case any automatic mapping is going on.
2) What is your data source? I usually use an OLEDB source, but if you are having trouble, maybe try a flat file input source and see if it can mimic a simple input to a simple output.
3) How are you getting your file to look at it? Are you logged on to the server and using Notepad there? If not, try that to see if the problem happens when you are getting the file to look at.
4) Are there any special characters in the data that might interfere? Try a query that returns a few constants.
EDIT 2: I saw your comment, I'll switch one of mine to fixed width and get back to you shortly - did you check to see if you made the width too short and it's clipping the termination characters?
EDIT 3:
I have to go for tonight, I'll look at this more tomorrow and get back to you, and clean my messy and confusing post up. I made a package that I tried to match yours as closely as I could but I started with a copy of an existing one instead of a fresh start and it got stuck in a half-baked state. I'll make a fresh one from scratch tomorrow.
BTW, Are all of your rows the same width? If not, have you tried Ragged Right instead of Fixed Width?
EDIT 4: Adding more ...
Over the weekend I continued to play with this and noticed that you can get SSIS to add the row delimiter for you. When you first create the Flat File Destination and edit it, you get the choice to create a new flat file connection manager, and one of the options is to add a column with CRLF. Unfortunately, this has the annoying side effect of always including a heading of "Row Delimiter Column" if you include column names in the output file. You can get around it by specifying a header row instead of building it from field names, but appending the CRLF in your SQL statement is probably a lot less work than that.
.
And for anyone else continuing to play with this, using a delimited flat file but forcing the fields to fixed length in a data transform (Derived Column) or in the SQL query also worked, but was more complicated. Within the Derived Column transform I replaced my input column (Nums) with SUBSTRING(Nums + REPLICATE(" ",4),1,4) where 4 is the field width. To do the same thing in the SQL query I used CONVERT(CHAR(4), Nums) as Nums.

SSIS flat file with string larger than 50

SSIS by default makes the datatype to be String with length 50, what if the string in a certain column is larger than 50 and also I can't use suggest types (it sucks!).
Is there a way to fix this, rather than manually increasing the sizes ie. manually editing the column lengths/datatypes in the flat file manager's advanced tab, ideally changing datatypes based on the destination (sql server) mapping columns' datatypes?
You can set datatypes in the flat file connection manager. In the advanced section.
I've heard good things about BIDS Helper, but haven't used it myself.
I haven't found a way to change default length, or to stop it from resetting when changing the connection manager. I was pleased that you can select all columns at once in the advanced editor and change them simultaneously, that's something...
The best way I could do this was write C# code that modifies the ssis package xml file and increases the string length values by looking at the lengths of the destination table (using information_schema query)

Writing on HDFS messed the data

I was trying to save the output of a Hive query on HDFS but the data got changed. Any idea?
See below the data and the changed one. remove the space before the file name :)
[[Correct]: i.stack.imgur.com/ DLNTT.png
[[Messed up]: i.stack.imgur.com/ 7WIO3.png
Any feedback would be appreciated.
Thanks in advance.
It looks like you are importing an array into Hive which is one of the available complex types. Internally, Hive separates the elements in an array with the ASCII character 002. If you consult an ascii table, you can see that this is the non printable character "start of text". It looks like your terminal does actually print the non-printable character, and by comparing the two images you can see that 002 does indeed separate every item of your array.
Similarly, Hive will separate every column in a row with ASCII 001, and it will separate map keys/values and structure fields/values with ASCII 003. These values were chosen because they are unlikely to show up in your data. If you want to change this, you can manually specify delimiters using ROW FORMAT in you create table statement. Be careful though, if you switch the collection items terminator to something like , then any commas in your input will look like collection terminators to Hive.
Unless you need to store the data in human readable form and are sure there is a printable character that will not collide with your terminators, I would leave them as is. If you need to read the HDFS files you can always hadoop fs -cat /exampleWarehouseDir/exampleTable/* | tr '\002' '\t' to display array items as separated with tabs. If you write a MapReduce or Pig job against the Hive tables, just be aware what your delimiters are. Learning how to write and read Hive tables from MapReduce was how I learned about these terminators in first place. And if you are doing all of your processing in Hive, you shouldn't ever have to worry about what the terminators are (unless they show up in your input data).
Now this would explain why you would see ASCII 002 popping up if you were reading the file contents off of HDFS, but it looks like you are seeing it from the Hive Command Line Interface which should be aware of the collection terminators (and therefore use them to separate elements of the array instead of printing them). My best guess there is you have specified the schema wrong and the column of the table results is a string where you meant to make it an array. This would explain why it went ahead and printed the ASCII 002's instead of using them as collection terminators.

Outputting long fixed-width records to a flat file using SSIS

I have to build an SSIS package for work that takes the contents of a table, all columns, and outputs it to a flat file. The problem is, one of the columns is a varchar(5100) and that puts the total row size at about 5200 characters. It seems the flat file connection manager editor won't let me define a fixed-width row beyond 483 characters.
I've tried going at this from several directions. The ragged right option doesn't appear to work for me, as the columns themselves don't have delimiters in them (no CR/LF for instance). They truly are fixed width. But I can't figure out how to tell the flat file connection manager to go past 483 characters. Does anyone know what I can do here?
Thanks!
Personally I would use a delimited text file as my destination. SSIs will put inthe column delimter and the record dellimiters for you. These are simpler to import as well which the people you are sending it to should appreciate. We use | as the delimiter for most of our exports and CR?LF as the record delimiter.