Leading zeros not retaining while reading comma separated value(csv) file by Microsoft ACE Ole Db means if col value is 0000123456, i am getting only 123456 only, while reading programmatically in vb.net. i don't want to put any condition while preparing the comma separated value(csv) file i.e. used of apostrophe before the zeros etc.
Data structure is like below in comma separated value(csv) file
Name,ID
John,0001234
moon,0001235
but i am getting id as 1234 , 1235 but i want with zeros using Microsoft ACE Ole Db
Please guys any idea.Thanks in advance
The Text File driver used by OleDb to read CSV files is unable to accurately determine the datatype of the columns. In your case, your second column is misunderstood to be a numeric column because it contains only digits, thus the leading zero are removed.
You can give a strong hint to OleDb creating a file called SCHEMA.INI that explains what is the content of the file.
In your case you could create one in the same folder of the file (I assume temp.csv for this example) and write these lines:
[temp.csv]
Format=CSVDelimited
ColNameHeader=True
Col1=Name Text
Col2=ID Text
DecimalSymbol=.
Notice that I need to specify the DecimalSymbol being a point because in my locale the comma is used as separator between the decimal and the integer part of a number (thus we use CSV files separated by semicolons)
Related
I'm dealing with a csv file that contains double inverted quotes (since the data has commas in it). But Google sheets do not show me the double-quotes and hence i'm able to split the column by a delimiter (space in this case) properly, to retrieve its first word. The initial column values can be seen in the picture below.
However, in Data Prep when i upload the csv file, the column's double quotes are considered as a value and it causes extra work (if at all possible) in handling the resulting split data.
Is there a way in Data Prep where we can have the data displayed as in Google Sheets ?
You can "ignore" these double-quotes, and when running a job, untick the box of "include quotes", then these original quotes shall remain.
Not perfect, but could be a working workaround.
What is the purpose of adding a text qualifier to a SSIS flat text file output?
I'm pulling data out of a SQL database that has quotes/commas/pipes/and many other common delimiters in the data.
Extreme example of a data point in a column:
"Johnson"|Smith,Jones
I set up the export as a comma delimited, with a double quote " text qualifier. I assumed it would export the data like so, and it did:
,""Johnson"|Smith,Jones",
Now i'm testing re-importing the data back in, as a comma delimited, with a double quote text qualifier. I got errors saying SSIS couldn't find the delimiter. I thought it would recognize the combination comma, and double quote, essentially as a more complex delimiter.
If adding a text delimiter to the output doesn't help with the problem of having the characters in the actual data, what does it do?
Assuming the person receiving the data might use a tool like Excel to process the data, which doesn't seem to be able to handle a complex multi character delimiter like |", is the best way to handle this by removing the most common delimiter from my data, and using that as the delimiter? Probably pipe in my case, instead of comma.
Text qualifier is used in the event that delimiters are contained within the row cell. Typically, the text qualifier is a double quote. In the event that the cell contains a delimiter and a text qualifier is not used, then the data that occurs after the delimiter will spill into the next column. From there, the data row can potentially blow up and none of the columns will line up afterwards. It can be a real mess.
Additionally, you will not see the text qualifier in applications, like Excel. However, if you open the file in Notepad++, then you will see the text qualifiers. There can be a lot of data (e.g., text qualifiers, new line characters, column delimiters, etc.) that is contained within a file but is not displayed in certain applications. This data typically is used to define the structure of the data as opposed to being the actual data.
For your problem, you will need to remove the double quotes from the source data or use a different text qualifier. You could use a single quote, but what if you have data like Jones's? The idea here is that the text qualifier should be unique in defining the data structure, which, as I understand it, means that you cannot have a text qualifier that is actually a part of the data (see note from Microsoft below - emphasis mine).
Per Microsoft:
Specify a text qualifier character. Each column can be configured to
recognize a text qualifier.
The use of a qualifier character to embed a qualifier character into a
qualified string is supported by the Flat File Connection Manager. The
double instance of a text qualifier is interpreted as a literal,
single instance of that string. For example, if the text qualifier is
a single quote and the input data is 'abc', 'def', 'g'hi', the output
data is abc, def, g'hi. However, an instance of a qualifier embedded
in a qualified string causes the Flat File Source to fail with the
error DTS_E_PRIMEOUTPUTFAILED.
References
Flat File Connection Manager official documentation
My office uses excel to prepare our data before importing it into a SQL database. However, we have been expreiencing the following error.
When the data is imported from one computer it loses all of the leading zeros. However, when it is imported from a different computer it imports perfectly.
An example of the leading zeros are that our item numbers are required to be formatted as "001, 002, 003,... 010, 011, 012,... 100, 101, 102, ect".
1) The excel file is stored on a server so there is no difference in the file.
2) If the users swap workstations the result stays with the computer, and doesn't switch with the user.
3) The data is formatted as text. It has been formatted as text both from the Data Tab and from Format Cells.
Is there a setting within excel that is specific to the computer and not the spreadsheet which will affect exporting the data? Or is there a non-excel specific setting which will cause this?
Its best to avoid the 'TEXT' format option. Confusingly, it does not force the contents of a cell to be a text data type, and it wreaks havoc when a formula references a 'TEXT' format.
To add to the previous answer (with all of the caveats about if this is a good idea), you can use the TEXT worksheet function
=TEXT(A1,"000")
to guarantee an actual text string with leading zeros if needed.
Depending on number of leading zeroes that you require, you can select your data/column in Excel, go into Excel >> Format >> Custom >> type in however many zeroes you require into the Type field (i.e. 000000000 for a 9-digit number with leading zeroes), and it will automatically preface with the correct number of leading zeroes to make the numerical string the correct length (i.e. 4000 = 00004000).
Note, this only works with numerical data, not text, but depending on the scenario it may be more useful to retain your data in numerical format - the example you gave listed numerical data only, and often retaining the numerical format is a benefit for analysis.
Not sure what the benefit of padding data before inserting it into the database would be...(takes more space, slower searching, etc.). Sounds like you're formatting it for output (?), which might be more efficiently done elsewhere.
But anyway -- here are some ideas for your SELECT (sql) statement:
RIGHT(1000 + [excel field], 3)
or another one would be
REPLICATE('0', 3 - LEN([excel field])) + [excel field]
Something you can do to the Excel field itself (before import) is prefix it with a ' (apostrophe). Notice if you type 0007 into Excel, it will change it to 7, but if you type '0007, it will keep the leading zeros.
I have a text file that is split using commas
Simple enough to do in SSIS but i have the following row in my source flat file:
Desc,Curr,Desc,ID,Quantity
05969A105 ,CU,BANCORP INC, THE DEL COMMON ,1,2126
there is a comma in my Desc column and im not sure how i can ignore that comma
AFAIK, you can't do anything in SSIS (or any other app that I have ever used) to handle this, because it is simply bad data. If you need to persist with comma delimiters then you will need to get the data provider to use text-delimiters, e.g. double-quotes, to wrap the data. SSIS can be told what is the text delimiter and will strip these chars off the data automatically.
Of course this may raise the issue of 'but the text may need to contain a double-quote!', in which case you would be better off getting the delimiter changed to something else, such as a tab or pipe.
I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appropriate line termination (CRLF or Windows), preferably also with column headings.
This is the same export I can get in SQL Assistant 12.0, but choosing the export option, using tab delimiter, setting my NULL value to '' and including column headings.
I have been unable to find the right combination of options - the closest I have gotten is by building a single column with CAST and '09'XC, but the rows still have a leading 2-byte length indicator in most settings I have tried. I would prefer not to have to build large strings for the various different tables.
To eliminate the 2-byte in the FastExport output:
.EXPORT OUTFILE &dwoutfile MODE RECORD FORMAT TEXT;
And your SELECT must generate a fixed length export field e.g. CHAR(n). So you will inflate the size of the file and end up with a delimited but fixed length export file.
The other option is if you are in a UNIX/Linux environment you can post process the file and strip the leading two bytes or write an ASXMOD in C to do it as the records are streamed to the file.