how to import flat file source to database using sql - sql

im currently want to inport my data from flat file to the database.
the flat file is in a txt file. in that txt file, i save a list of URLs. example:
http://www.mimi.com/Hotels-g303188-Rurrenabaque-Hotels.html
im using the SQL Server Import and Export wizard to do it. but when the time of execution, it has error saying
Error 0xc02020a1:
Data Flow Task 1: Data conversion failed. The data conversion for column
"Column 0" returned status value 4 and status text "Text was truncated or one
or more characters had no match in the target code page.".
can anyone help?..

You get this error because the text is too long for the column youve chosen to put it in.

Text was truncated or
You might want to check the size of the database column vis-a-vis your input data. Does the longest URL less than the column width?
one or more characters had no match in the target code page.".
Check if your input file has any special characters. An easy way to check this would be to save your file in ANSI (Notepad > Save As > Encoding = ANSI). Note - you'd still have to select the right code page so that the import interprets your input text correctly.
Here's a very nice link that has some background on what code pages are - http://www.joelonsoftware.com/articles/Unicode.html

Note you can also change the target column data type (to text stream for example) in the Datasource->Advanced section

Related

How to clear txt file having different Delimiter using SSIS package?

I have text file which is having ^(CAP) and ,(Comma) as a delimiter and after clearing i need to load to sql . I have tried my best to clear a source file
But still file is not cleaned as expectation .
Please find the below picture i have tried to correct the source file
But still file is not cleared as expected . Please find below uncleared file .
You have a variety of issues here.
You have identified the header row delimiter as a comma. A row delimiter is the, usually invisible, delimiter than indicates a row's worth of data has happened. Traditionally, this is an Operating System specific value but it's a Carriage Return (CR), Line Feed (LF) or Carriage Return/Line Feed.
Your source data is not a comma delimited file with caret/circumflex/cap text delimiters. You have a comma-space delimited file which SSIS doesn't support in the editor. However, if you hand edit the dtsx file as I outlined in How to read a flatfile with lowercase thorn as the delimiter to specify that it should use comma space ColumnDelimiter="_x002C__x0020_"
Given a truncated version of your source data
ListCode, CAS, Name
^216^, ^^, ^Coal Dust^
^216^, ^7782-24-5^, ^Graphite (Natural)^
^216^, ^^, ^Inert or Nuisance Dust^
and the comma (0x2C) space (0x20) edited into the raw dtsx connection manager, I was able to pull data as I believe you are expecting
You might also run into additional issues given your selection of code pages and not checking the Unicode button but that's beyond my ability to generate matching source data from an image.
Just replace the ^, ^ with ^,^
It looks like your source
CAS, SubName, ListCode, Type, CountryCode, ListName
^1000413-72-8^,^fasiglifam^,^447^,^Chemical Inventory^,^EU^,^ECICS Custom Tariff Codes^
^1000413-72-8^,^fasiglifam^,^0^,^^,^NN^,^SPHERA Global Substance List^
Then edit your connection manager with below details
[![enter image description here][2]][2]
It will work .
[2]: https://i.stack.imgur.com/0x89k.png

SQL Server 2012 Bulk Insert with carriage returns in text fields?

I've seen variations of this question all over the place yet can't seem to get this to work. I need to be able to bulk insert data from a flat file where some of the text fields will contain carriage returns.
I have set the flat file up to be delimited by the caret ^ symbol. The Row delimiter is a vertical pipe and the column delimiter is a tab. Why does the import still fail when my text field has a carriage return in it?
I was under the impression that if the row/column delimiter was NOT a CR/LF then a delimited text field could contain a CR/LF (or single CR or single LF). How can I get the import to work? Thanks.
PS - the way I've been testing is to just take a table, export it to a flat file with delimiters set as above, insert a newline in a text field, then try to import the data again using the SQL Server Import Export Wizard in both directions. Here is the error message I see:
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data conversion for column "Column 23" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
Error 0xc020902a: Data Flow Task 1: The "Source - IVREJECTHD_txt.Outputs[Flat File Source Output].Columns[Column 23]" failed because truncation occurred, and the truncation row disposition on "Source - IVREJECTHD_txt.Outputs[Flat File Source Output].Columns[Column 23]" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Error 0xc0202092: Data Flow Task 1: An error occurred while processing file "C:\Users\bbauer\Desktop\IVREJECTHD.txt" on data row 2.
Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - IVREJECTHD_txt returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Bulk Insert can import embedded CR/LF pairs in text fields. Something else is going on with the raw data in your source at the specified column (23) on the second row. There are a number of causes for the "text was truncated" error. Some of them are touched on in this thread. One common cause which particularly bites those using the Wizard is not specifying the target column width. It doesn't matter if your target table is set up correctly; if the column width specified in the import isn't big enough, you'll get this error.
You might consider performing a bulk insert using T-SQL and a format file; if you need to repeatedly test your import process and refine it, it's a lot easier to make modifications and re-run.
Also, as noted in this answer, the embedded CR/LFs will be present even if the tools (e.g. Management Studio) aren't displaying them to you.

Cannot upload CSV that starts with an integer

I'm stuck with what seems like a weird BigQuery bug : I cannot upload a CSV file that starts (first line, first column) by an integer.
Here's my schema : COL1:INTEGER,COL2:INTEGER,COL3:STRING
Here's my csv file content :
100,4,XXX
100,4,XXX
If I put the STRING column as first column, the upload is OK.
If I add a header and tell BigQuery to skip it during the import, the upload is ok too.
But with the CSV and schema above, BigQuery always complains : Line:1 / Field:1, Value cannot be converted to expected type.
Anyone knows what the problem is ?
Thank you in advance,
David
I could not reproduce this problem--I copied and pasted the content into a file and uploaded it with no problems.
Perhaps the uploaded file format is corrupted somehow? If there are extra bytes at the beginning of the file, those would be ignored in a header row but might result in this error is the first value of the first field is expected to be an integer. I'd recommend examining the actual binary data in the file to make sure there's nothing funny going on.
Also, are you doing this import via web UI, command-line tool, or API? Have you tried one of the other methods?

Pentaho Spoon - Validate Fixed Width Input File Format

I'm trying to process a fixed width input file in pentaho and validate the format. The file will be a mixture of strings, numbers and dates. However when attempting to process a number field that has an incorrect character present (which i had expected would throw an error) it just reads the first part of the number and ignores the bad char.
I can recreate this issue with a very simple input file containing a single field:
I specify the expected number format, along with start position and length:
On running the transformation i would have expected the 'Q' to cause an error instead the following result is displayed, just reading the first two digits "67" and padding the rest to match the specified format:
If the input file is formatted correctly it runs perfectly well, but need it to throw an error otherwise. Any suggestions would be awesome. Thanks!
Just an FYI in case someone stumbles accross this question after hitting the same issues as myself.
I was able to construct a workaround by reading all values in the "Text File Input" step as strings, and then using a "Data Validator" step equipped with regex evaluation to ensure numbers were correctly formatted before parsing to number type with a following "Select Values" step.
Takes a bit longer to do this for every field, but was the most robust solution i could come up with.
Thanks

How to force scheme.ini to be used for MS Text Driver?

I am creating this huge csv import, that uses the ms text driver, to read the csv file.
And I am using ColdFusion to create the scheme.ini in each folder's location, where the file has been uploaded.
Here is a sample one I am using:
[some_filename.csv]
Format=CSVDelimited
ColNameHeader=True
MaxScanRows=0
Col1=user_id Text width 80
Col2=first_name Text width 20
Col3=last_name Text width 30
Col4=rights Text width 10
Col5=assign_training Text width 1
CharacterSet=ANSI
Then in my ColdFusion code, I am doing 2 cfdump's:
<cfdump var="#GetMetaData( csvfile )#" />
<cfdump var="#csvfile#">
The meta data shows that the query has not grabbed the correct data types for reading the csv file.
And the dump of the query to read file, shows that it is missing values, because of Excel we can not force them to use double quotes. And when fields have mixed data types, then it causes our process to not work..
How can I either change the data type inside the query, aka make it use scheme.ini, or update metadata to the correct data type.
I am using a view on information_schema in sql server 2005 to get the correct data types, column names, and max lengths...
Unless I have some kind of syntax error, I can't see why it's not grabbing the data as the correct data type.
Any suggestions?
Funnily, I had the filename spelled wrong, instead of using schema.ini i was having it as scheme.ini.
I hate when you make lil mistakes like this...
Thank You