PSQL: Invalid input syntax for integer on COPY - sql

I have a .txt file with a plain list of words (one word on each line) that I want to copy into a table. The table was created in Rails with one row: t.string "word".
The same file loaded into another database/table worked fine, but in this case I get:
pg_fix_development=# COPY dictionaries FROM '/Users/user/Documents/en.txt' USING DELIMITERS ' ' WITH NULL as '\null';
ERROR: invalid input syntax for integer: "aa"
CONTEXT: COPY dictionaries, line 1, column id: "aa"
I did some googling on this but can't figure out how to fix it. I'm not well versed in SQL. Thanks for your help!

If you created that table in Rails then you almost certainly have two columns, not one. Rails will add an id serial column behind your back unless you tell it not to; this also explains your "input syntax for integer" error: COPY is trying to use the 'aa' string from your text file as a value for the id column.
You can tell COPY which column you're importing so that the default id values will be used:
copy dictionaries(word) from ....

Related

ERROR: extra data after last expected column on PostgreSQL while the number of columns is the same

I am new to PostgreSQL and I need to import a set of csv files, but some of them weren't imported successfully. I got the same error with these files: ERROR: extra data after last expected column. I have investigated this error report and learned that these errors occur might because the number of columns of the table is not equal to that in the file. But I don't think I am in this situation.
For example, I create this table:
CREATE TABLE cast_info (
id integer NOT NULL PRIMARY KEY,
person_id integer NOT NULL,
movie_id integer NOT NULL,
person_role_id integer,
note character varying,
nr_order integer,
role_id integer NOT NULL
);
And then I want to copy the csv file:
COPY cast_info FROM '/private/tmp/cast_info.csv' WITH CSV HEADER;
Then I got the error:
**ERROR: extra data after last expected column
CONTEXT: COPY cast_info, line 8801: "612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions..."**
The complete row in this csv file is as follows:
612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions\" - \"Chicken Huntin\" - \"Another love song\" - \"How many times?\" - \"Bowling balls\" - \"The people\" - \"Piggy pie\" - \"Hokus pokus\" - \"Let\"s go all the way\" - \"Real underground baby\")/Full Clip (segments \"Duk da fuk down\" - \"Real underground baby\")/Guy Gorfey (segment \"Raw deal\")/Sugar Bear (segment \"Real underground baby\")",2,1
You can see that there's exactly 7 columns as the table has.
The strange thing is, I found that the error lines of all these files contain the characters backslash and quotation mark (\"). Also, these rows are not the only row that contains \" in the files. I wonder why this error doesn't appear in other rows. Because of that, I am not sure if this is the problem.
After modifying these rows (e.g. replace the \" or delete the content while remaining the commas), there are new errors: ERROR: invalid input syntax for line 2 of every file. And the errors occur because the data in the last column of these rows have been added three semicolons(;;;) for no reason. But when I open these csv files, I can't see the three semicolons in those rows.
For example, after deleting the content in the fifth column of this row:
612,207,2222077,1,,2,1
I got the error:
**ERROR: invalid input syntax for type integer: "1;;;"
CONTEXT: COPY cast_info, line 2, column role_id: "1;;;"**
While the line 2 doesn't contain three semicolons, as follows:
2,2,2163857,1,,25,1
In principle, I hope the problem can be solved without any modification to the data itself. Thank you for your patience and help!
The CSV format protects quotation marks by doubling them, not by backslashing them. You could use the text format instead, except that that doesn't support HEADER, and also it would then not remove the outer quote marks. You could instead tweak the files on the fly with a program:
COPY cast_info FROM PROGRAM 'sed s/\\\\/\"/g /private/tmp/cast_info.csv' WITH CSV;
This works with the one example you gave, but might not work for all cases.
ERROR: invalid input syntax for line 2 of every file. And the errors
occur because the data in the last column of these rows have been
added three semicolons(;;;) for no reason. But when I open these csv
files, I can't see the three semicolons in those rows
How are you editing and viewing these files? Sounds like you are using something that isn't very good at preserving formatting, like Excel.
Try actually naming the columns you want processed in the copy statement:
copy cast_info (id, person_id, movie_id, person_role_id, note, nr_order, role_id) from ...
According to a friend's suggestion, I need to specify the backslashes as escape characters:
copy <table_name> from '<csv_file_path>' csv escape '\';
and then the problem is solved.

Line contains invalid enclosed character data or delimiter at position

I was trying to load the data from the csv file into the Oracle sql developer, when inserting the data I encountered the error which says:
Line contains invalid enclosed character data or delimiter at position
I am not sure how to tackle this problem!
For Example:
INSERT INTO PROJECT_LIST (Project_Number, Name, Manager, Projects_M,
Project_Type, In_progress, at_deck, Start_Date, release_date, For_work, nbr,
List, Expenses) VALUES ('5770','"Program Cardinal
(Agile)','','','','','',to_date('', 'YYYY-MM-DD'),'','','','','');
The Error shown were:
--Insert failed for row 4
--Line contains invalid enclosed character data or delimiter at position 79.
--Row 4
I've had success when I've converted the csv file to excel by "save as", then changing the format to .xlsx. I then load in SQL developer the .xlsx version. I think the conversion forces some of the bad formatting out. It worked at least on my last 2 files.
I fixed it by using the concatenate function in my CSV file first and then uploaded it on sql, which worked.
My guess is that it doesn't like to_date('', 'YYYY-MM-DD'). It's missing a date to format. Is that an actual input of your data?
But it could also possibly be the double quote in "Program Cardinal (Agile). Though I don't see why that would get picked up as an invalid character.

Why doesn't the PostgreSQL COPY command allow NULL values inside arrays?

I have the following table definition:
create table null_test (some_array character varying[]);
And the following SQL file containing data.
copy null_test from stdin;
{A,\N,B}
\.
When unnesting the data (with select unnest(some_array) from null_test), the second value is "N", when I am expecting NULL.
I have tried changing the data to look as follows (to use internal quotes on the array value):
copy null_test from stdin;
{"A",\N,"B"}
\.
The same non-null value "N" is inserted?
Why is this not working and is there a workaround for this?
EDIT
As per the accepted answer, the following worked. However, the two representation of NULL values within a COPY command depending on whether you're using single or array values is inconsistent.
copy null_test from stdin;
{"A",NULL,"B"}
\.
\N represents NULL as a whole value to COPY, not as part of another value and \N isn't anything special to PostgreSQL itself. Inside an array, the \N is just \N and COPY just passes the array literal to the database rather than trying to interpret it using COPY's rules.
You simply need to know how to build an array literal that contains a NULL and from the fine manual:
To set an element of an array constant to NULL, write NULL for the element value. (Any upper- or lower-case variant of NULL will do.) If you want an actual string value "NULL", you must put double quotes around it.
So you could use these:
{A,null,B}
{"A",NULL,"B"}
...
to get NULLs in your arrays.

SSIS Bulk Insert where fields contain commas?

My bulk insert in SSIS is failing when a field contains a comma character. My flat file source is tab delimited and there are many instances in which a text field will contain commas. For example, a UserComment may have a comma. This causes the bulk insert to fail.
How can I tell SSIS to ignore the commas? I thought it would happen automatically since the row delimiter is {CR}{LF} and the column delimiter is "Tab". Why does it bark at the comma? Also please note that I am NOT currently using a format file.
Thanks in advance.
UPDATE:
Here is the error I get in SSIS:
Error: 0xC002F304 at Bulk Insert Task, Bulk Insert Task: An error occurred with the following error message: "Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 183, column 5 (EmailAddress).Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 182, column 5 (EmailAddress).Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 181, column 5 (EmailAddress).".
Task failed: Bulk Insert Task
It seems to fail on record 131988 which is why I think it's because of the "something,something" email with no space. Many records before 131988 come across fine.
131988 01 MEMPHIS, TN someone#somewhere.com
131988 02 NORTH LITTLE ROCK, AR someone#somewhere.com,someone1#somewhere1.com
131988 03 HOUSTON, TX someone#somewhere.com,someone1#somewhere1.com
I doubt the comma or the # sign is being called an "invalid character".
I see there are two tabs in the input record just before the field that contains the email addresses, so that email address column would be the fifth column. But when the error message refers to "column 5" it's presumably using zero-based indexing, so the email column is only index 4. Is there tab and another column? Maybe the invalid character is there.
I suspect there is a invisible bad character embedded in whatever column is causing the error. I often pick up bad characters when cutting and pasting out of email address lines, so that's a likely suspect.
Run the failing line by itself to make sure it still fails.
Then copy it into, say, Notepad, and do a "Save As" with the Encoding set to ANSI. (It may complain at that point if there's a bad character.) Use the "Save As" file as the new import file. At this point you should be able to be reasonably confident that "what you see is what you get", and that there are no invisible characters embedded in the import file.
If this turns out to be the problem, you'll need some way to verify that future import files are clean, or else handle them somehow during the import process.
(I presume you've checked the destination column length is okay. That would definitely be a showstopper.)
"Type mismatch or invalid character for the specified codepage" is a misleading error message. The source table's field length exceeded the destination table's specified length and thus the error. After adjusting lengths, everything worked properly.

SQL Error: Cannot be converted to a PACKED DECIMAL value

I have db2 import statement which reads from a file and writes to a database.
Column data type for column 18 (where i am getting error) is Decimal(18,2)
The value for that column coming in the file is -502.47
However, I am getting the below error:
SQL3123W The field value in row "1" and column "18" cannot be converted to a PACKED DECIMAL value. A null was loaded.
And the value is not going into database.
What is the reason for this error ? What is the solution ?
There was an issue with the number of column. I was passing more number of columns then the program expected. So we can get above error in that case as well.
It was because of the double quotes in the loaded CSV files at that particular cell mentioned in the error.
You should try opening the file in Notepad++ or any other text editor, remove the double quotes, save and load back into the DB.
Your error should be resolved.