SSIS Bulk Insert where fields contain commas? - sql

My bulk insert in SSIS is failing when a field contains a comma character. My flat file source is tab delimited and there are many instances in which a text field will contain commas. For example, a UserComment may have a comma. This causes the bulk insert to fail.
How can I tell SSIS to ignore the commas? I thought it would happen automatically since the row delimiter is {CR}{LF} and the column delimiter is "Tab". Why does it bark at the comma? Also please note that I am NOT currently using a format file.
Thanks in advance.
UPDATE:
Here is the error I get in SSIS:
Error: 0xC002F304 at Bulk Insert Task, Bulk Insert Task: An error occurred with the following error message: "Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 183, column 5 (EmailAddress).Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 182, column 5 (EmailAddress).Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 181, column 5 (EmailAddress).".
Task failed: Bulk Insert Task
It seems to fail on record 131988 which is why I think it's because of the "something,something" email with no space. Many records before 131988 come across fine.
131988 01 MEMPHIS, TN someone#somewhere.com
131988 02 NORTH LITTLE ROCK, AR someone#somewhere.com,someone1#somewhere1.com
131988 03 HOUSTON, TX someone#somewhere.com,someone1#somewhere1.com

I doubt the comma or the # sign is being called an "invalid character".
I see there are two tabs in the input record just before the field that contains the email addresses, so that email address column would be the fifth column. But when the error message refers to "column 5" it's presumably using zero-based indexing, so the email column is only index 4. Is there tab and another column? Maybe the invalid character is there.
I suspect there is a invisible bad character embedded in whatever column is causing the error. I often pick up bad characters when cutting and pasting out of email address lines, so that's a likely suspect.
Run the failing line by itself to make sure it still fails.
Then copy it into, say, Notepad, and do a "Save As" with the Encoding set to ANSI. (It may complain at that point if there's a bad character.) Use the "Save As" file as the new import file. At this point you should be able to be reasonably confident that "what you see is what you get", and that there are no invisible characters embedded in the import file.
If this turns out to be the problem, you'll need some way to verify that future import files are clean, or else handle them somehow during the import process.
(I presume you've checked the destination column length is okay. That would definitely be a showstopper.)

"Type mismatch or invalid character for the specified codepage" is a misleading error message. The source table's field length exceeded the destination table's specified length and thus the error. After adjusting lengths, everything worked properly.

Related

ERROR: extra data after last expected column on PostgreSQL while the number of columns is the same

I am new to PostgreSQL and I need to import a set of csv files, but some of them weren't imported successfully. I got the same error with these files: ERROR: extra data after last expected column. I have investigated this error report and learned that these errors occur might because the number of columns of the table is not equal to that in the file. But I don't think I am in this situation.
For example, I create this table:
CREATE TABLE cast_info (
id integer NOT NULL PRIMARY KEY,
person_id integer NOT NULL,
movie_id integer NOT NULL,
person_role_id integer,
note character varying,
nr_order integer,
role_id integer NOT NULL
);
And then I want to copy the csv file:
COPY cast_info FROM '/private/tmp/cast_info.csv' WITH CSV HEADER;
Then I got the error:
**ERROR: extra data after last expected column
CONTEXT: COPY cast_info, line 8801: "612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions..."**
The complete row in this csv file is as follows:
612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions\" - \"Chicken Huntin\" - \"Another love song\" - \"How many times?\" - \"Bowling balls\" - \"The people\" - \"Piggy pie\" - \"Hokus pokus\" - \"Let\"s go all the way\" - \"Real underground baby\")/Full Clip (segments \"Duk da fuk down\" - \"Real underground baby\")/Guy Gorfey (segment \"Raw deal\")/Sugar Bear (segment \"Real underground baby\")",2,1
You can see that there's exactly 7 columns as the table has.
The strange thing is, I found that the error lines of all these files contain the characters backslash and quotation mark (\"). Also, these rows are not the only row that contains \" in the files. I wonder why this error doesn't appear in other rows. Because of that, I am not sure if this is the problem.
After modifying these rows (e.g. replace the \" or delete the content while remaining the commas), there are new errors: ERROR: invalid input syntax for line 2 of every file. And the errors occur because the data in the last column of these rows have been added three semicolons(;;;) for no reason. But when I open these csv files, I can't see the three semicolons in those rows.
For example, after deleting the content in the fifth column of this row:
612,207,2222077,1,,2,1
I got the error:
**ERROR: invalid input syntax for type integer: "1;;;"
CONTEXT: COPY cast_info, line 2, column role_id: "1;;;"**
While the line 2 doesn't contain three semicolons, as follows:
2,2,2163857,1,,25,1
In principle, I hope the problem can be solved without any modification to the data itself. Thank you for your patience and help!
The CSV format protects quotation marks by doubling them, not by backslashing them. You could use the text format instead, except that that doesn't support HEADER, and also it would then not remove the outer quote marks. You could instead tweak the files on the fly with a program:
COPY cast_info FROM PROGRAM 'sed s/\\\\/\"/g /private/tmp/cast_info.csv' WITH CSV;
This works with the one example you gave, but might not work for all cases.
ERROR: invalid input syntax for line 2 of every file. And the errors
occur because the data in the last column of these rows have been
added three semicolons(;;;) for no reason. But when I open these csv
files, I can't see the three semicolons in those rows
How are you editing and viewing these files? Sounds like you are using something that isn't very good at preserving formatting, like Excel.
Try actually naming the columns you want processed in the copy statement:
copy cast_info (id, person_id, movie_id, person_role_id, note, nr_order, role_id) from ...
According to a friend's suggestion, I need to specify the backslashes as escape characters:
copy <table_name> from '<csv_file_path>' csv escape '\';
and then the problem is solved.

Line contains invalid enclosed character data or delimiter at position

I was trying to load the data from the csv file into the Oracle sql developer, when inserting the data I encountered the error which says:
Line contains invalid enclosed character data or delimiter at position
I am not sure how to tackle this problem!
For Example:
INSERT INTO PROJECT_LIST (Project_Number, Name, Manager, Projects_M,
Project_Type, In_progress, at_deck, Start_Date, release_date, For_work, nbr,
List, Expenses) VALUES ('5770','"Program Cardinal
(Agile)','','','','','',to_date('', 'YYYY-MM-DD'),'','','','','');
The Error shown were:
--Insert failed for row 4
--Line contains invalid enclosed character data or delimiter at position 79.
--Row 4
I've had success when I've converted the csv file to excel by "save as", then changing the format to .xlsx. I then load in SQL developer the .xlsx version. I think the conversion forces some of the bad formatting out. It worked at least on my last 2 files.
I fixed it by using the concatenate function in my CSV file first and then uploaded it on sql, which worked.
My guess is that it doesn't like to_date('', 'YYYY-MM-DD'). It's missing a date to format. Is that an actual input of your data?
But it could also possibly be the double quote in "Program Cardinal (Agile). Though I don't see why that would get picked up as an invalid character.

SQL Server 2012 Bulk Insert with carriage returns in text fields?

I've seen variations of this question all over the place yet can't seem to get this to work. I need to be able to bulk insert data from a flat file where some of the text fields will contain carriage returns.
I have set the flat file up to be delimited by the caret ^ symbol. The Row delimiter is a vertical pipe and the column delimiter is a tab. Why does the import still fail when my text field has a carriage return in it?
I was under the impression that if the row/column delimiter was NOT a CR/LF then a delimited text field could contain a CR/LF (or single CR or single LF). How can I get the import to work? Thanks.
PS - the way I've been testing is to just take a table, export it to a flat file with delimiters set as above, insert a newline in a text field, then try to import the data again using the SQL Server Import Export Wizard in both directions. Here is the error message I see:
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data conversion for column "Column 23" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
Error 0xc020902a: Data Flow Task 1: The "Source - IVREJECTHD_txt.Outputs[Flat File Source Output].Columns[Column 23]" failed because truncation occurred, and the truncation row disposition on "Source - IVREJECTHD_txt.Outputs[Flat File Source Output].Columns[Column 23]" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
Error 0xc0202092: Data Flow Task 1: An error occurred while processing file "C:\Users\bbauer\Desktop\IVREJECTHD.txt" on data row 2.
Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - IVREJECTHD_txt returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
Bulk Insert can import embedded CR/LF pairs in text fields. Something else is going on with the raw data in your source at the specified column (23) on the second row. There are a number of causes for the "text was truncated" error. Some of them are touched on in this thread. One common cause which particularly bites those using the Wizard is not specifying the target column width. It doesn't matter if your target table is set up correctly; if the column width specified in the import isn't big enough, you'll get this error.
You might consider performing a bulk insert using T-SQL and a format file; if you need to repeatedly test your import process and refine it, it's a lot easier to make modifications and re-run.
Also, as noted in this answer, the embedded CR/LFs will be present even if the tools (e.g. Management Studio) aren't displaying them to you.

SQL Error: Cannot be converted to a PACKED DECIMAL value

I have db2 import statement which reads from a file and writes to a database.
Column data type for column 18 (where i am getting error) is Decimal(18,2)
The value for that column coming in the file is -502.47
However, I am getting the below error:
SQL3123W The field value in row "1" and column "18" cannot be converted to a PACKED DECIMAL value. A null was loaded.
And the value is not going into database.
What is the reason for this error ? What is the solution ?
There was an issue with the number of column. I was passing more number of columns then the program expected. So we can get above error in that case as well.
It was because of the double quotes in the loaded CSV files at that particular cell mentioned in the error.
You should try opening the file in Notepad++ or any other text editor, remove the double quotes, save and load back into the DB.
Your error should be resolved.

Pentaho Spoon - Validate Fixed Width Input File Format

I'm trying to process a fixed width input file in pentaho and validate the format. The file will be a mixture of strings, numbers and dates. However when attempting to process a number field that has an incorrect character present (which i had expected would throw an error) it just reads the first part of the number and ignores the bad char.
I can recreate this issue with a very simple input file containing a single field:
I specify the expected number format, along with start position and length:
On running the transformation i would have expected the 'Q' to cause an error instead the following result is displayed, just reading the first two digits "67" and padding the rest to match the specified format:
If the input file is formatted correctly it runs perfectly well, but need it to throw an error otherwise. Any suggestions would be awesome. Thanks!
Just an FYI in case someone stumbles accross this question after hitting the same issues as myself.
I was able to construct a workaround by reading all values in the "Text File Input" step as strings, and then using a "Data Validator" step equipped with regex evaluation to ensure numbers were correctly formatted before parsing to number type with a following "Select Values" step.
Takes a bit longer to do this for every field, but was the most robust solution i could come up with.
Thanks