I've created a query to display the result, and it's working properly. Now, I need to compare the returned records with a target file to verify the data was loaded correctly. The problem I'm having is when I copy the Teradata query results, displayed within the Answerset window, the fields are tab-delimited but the file is delimited by a vertical bar character, '|'.
I've encountered a similar problem in the past when working with verifying target files that use a fixed-length-column scheme, and was wondering if there is an efficient solution to my problem.
In Teradata Assistant: Tools -> Options, and then Export/import, you can chose your delimiter there, and you can switch the delimiter to "|"
Related
I've been given some csv files that I want to turn into tables in a SQL database. However, the genius who created the files used comma delimiters, even though several data fields contain commas. So when I try to BCP the data into the database, I get a whole bunch of errors.
Is there a way that I can escape the commas that aren't field separators? At the moment I'm tempted to write a script to manually replace every comma in each file with a pipe, and then go through and manually change the affected rows back.
The only way to fix this is to write a script or program that fixes the data.
If the bad data is limited to a single field the process should be trivial:
You consume the row from either side by the count of good delimiters and replace with a new unique delimiter and what remains is the column with the extra old delimiters that you would just leave as is.
If you have two bad fields straddling good fields, you would need some kind of advanced logic, for instance I had XML data with delimiters, I had to parse the XML until I found a terminating tag and then process the other delimiters as needed.
I am working in Windows using SQL Server 2008 R2 and VS 2008.
I haven't been able to find any other incidence of this happening via google, but I'm having an issue with SSIS not recognizing the CRLF code in my SQL query. The problem is two fold:
In notepad, the flat file does not come out in columns. It is
just one long string of text (although this resolves in notepad++).
When viewed in notepad++, the first row of data is indented by
two characters and each subsequent row is indented even further!
Basically this file will be unreadable at the other end.
Here's an example how I'm currently approaching it:
Select col1, col2, col3, char(13)+char(10) CRLF
Which produces data like this:
Col1 Col2 Col3 CRLF
xxxx xxxx xxxx
xxxx xxxx xxxx
xxxx xxxx xxxx
Other things I have tried include:
Using declare #crlf (returns the same results)
Using only char(13) or only char(10) (returns the same results)
Using Col3+char(13)+char(10) (returns results in single line)
I think I'm missing just a small piece of the puzzle here, but I can't figure out what that piece would be. Your help is much appreciated.
Throwing in some requested screenshots here:
You can see here where the extra characters are starting to sneak in.
On the Advanced tab of the Flat File Connection Manager the InputColumnWidth might not be set correctly. I'm guessting the last column containing the CRLF, it should be 2 long.
I use the exact same dev stack you list, and I don't include the CRLF in the SQL query, I only use the row delimiter in the SSIS output connection.
In the SSIS package, edit the output connection. It displays the Flat File Connection Manager. In the "Columns" tab (well, not quite a tab, but pick Columns from the list on the left side) here is a "Row Delimiter" and I specify my CRLF there.
There is also a "Header Row Delimiter" on the "General" tab, but that only applies to the header row.
Unless there is a reason you are trying to embed a line break in the middle of a query row?
EDIT: Some more troubleshooting questions ...
1) Are you writing your file to a network drive or a local drive? Try setting to a local drive in case any automatic mapping is going on.
2) What is your data source? I usually use an OLEDB source, but if you are having trouble, maybe try a flat file input source and see if it can mimic a simple input to a simple output.
3) How are you getting your file to look at it? Are you logged on to the server and using Notepad there? If not, try that to see if the problem happens when you are getting the file to look at.
4) Are there any special characters in the data that might interfere? Try a query that returns a few constants.
EDIT 2: I saw your comment, I'll switch one of mine to fixed width and get back to you shortly - did you check to see if you made the width too short and it's clipping the termination characters?
EDIT 3:
I have to go for tonight, I'll look at this more tomorrow and get back to you, and clean my messy and confusing post up. I made a package that I tried to match yours as closely as I could but I started with a copy of an existing one instead of a fresh start and it got stuck in a half-baked state. I'll make a fresh one from scratch tomorrow.
BTW, Are all of your rows the same width? If not, have you tried Ragged Right instead of Fixed Width?
EDIT 4: Adding more ...
Over the weekend I continued to play with this and noticed that you can get SSIS to add the row delimiter for you. When you first create the Flat File Destination and edit it, you get the choice to create a new flat file connection manager, and one of the options is to add a column with CRLF. Unfortunately, this has the annoying side effect of always including a heading of "Row Delimiter Column" if you include column names in the output file. You can get around it by specifying a header row instead of building it from field names, but appending the CRLF in your SQL statement is probably a lot less work than that.
.
And for anyone else continuing to play with this, using a delimited flat file but forcing the fields to fixed length in a data transform (Derived Column) or in the SQL query also worked, but was more complicated. Within the Derived Column transform I replaced my input column (Nums) with SUBSTRING(Nums + REPLICATE(" ",4),1,4) where 4 is the field width. To do the same thing in the SQL query I used CONVERT(CHAR(4), Nums) as Nums.
I have a text file that is split using commas
Simple enough to do in SSIS but i have the following row in my source flat file:
Desc,Curr,Desc,ID,Quantity
05969A105 ,CU,BANCORP INC, THE DEL COMMON ,1,2126
there is a comma in my Desc column and im not sure how i can ignore that comma
AFAIK, you can't do anything in SSIS (or any other app that I have ever used) to handle this, because it is simply bad data. If you need to persist with comma delimiters then you will need to get the data provider to use text-delimiters, e.g. double-quotes, to wrap the data. SSIS can be told what is the text delimiter and will strip these chars off the data automatically.
Of course this may raise the issue of 'but the text may need to contain a double-quote!', in which case you would be better off getting the delimiter changed to something else, such as a tab or pipe.
I'm generating a CSV which contains several rows and columns.
However, when I'm testing said CSV I feel like I am simply repeating the code that builds the file in the test as I'm checking each and every field is correct.
Question is, is this more sensible than it seems to me, or is there a better way?
A far simpler test is to just import the CSV into a spreadsheet or database and verify the data output is aligned to the proper fields. No extra columns or extra rows, data selected from the imported recordset is a perfect INTERSECT with the recordset from which the CSV was generated, etc.
More importantly, I recommend making sure your test data includes common CSV fail scenarios such as:
Field contains a comma (or whatever your separator character)
Field contains multiple commas (You might think it's the same thing, but I've seen one fail where the other succeeded)
Field contains the new-row character(s)
Field contains characters not in the code page of the CSV file
...to make sure your code is handling them properly.
I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appropriate line termination (CRLF or Windows), preferably also with column headings.
This is the same export I can get in SQL Assistant 12.0, but choosing the export option, using tab delimiter, setting my NULL value to '' and including column headings.
I have been unable to find the right combination of options - the closest I have gotten is by building a single column with CAST and '09'XC, but the rows still have a leading 2-byte length indicator in most settings I have tried. I would prefer not to have to build large strings for the various different tables.
To eliminate the 2-byte in the FastExport output:
.EXPORT OUTFILE &dwoutfile MODE RECORD FORMAT TEXT;
And your SELECT must generate a fixed length export field e.g. CHAR(n). So you will inflate the size of the file and end up with a delimited but fixed length export file.
The other option is if you are in a UNIX/Linux environment you can post process the file and strip the leading two bytes or write an ASXMOD in C to do it as the records are streamed to the file.