Import S3 CSV to RDS Postgres - sql

Trying to import a csv from S3 into RDS postgres:
The data in some columns contains ",". for example the address column.
The data is enclosed in "".
Here is our select:
SELECT aws_s3.table_import_from_s3(
'Change_National_IDs',
'',
'DELIMITER '',''',
aws_commons.create_s3_uri('data-migration-s3-bucket', 'Change_IDs_WqlResults_20xxxxx4_xxxxxx.csv', 'us-west-2')
);
Tried many different combinations. How can we handle the "," in the data?
Current error:
ERROR: extra data after last expected column

you should change your column delimiter to another different than comma. Something like | and then use code like this.
SELECT aws_s3.table_import_from_s3(
'Change_National_IDs',
'',
'DELIMITER ''|''',
aws_commons.create_s3_uri('data-migration-s3-bucket',
'Change_IDs_WqlResults_20xxxxx4_xxxxxx.csv', 'us-west-2')
);
More Information
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PostgreSQL.S3Import.html#USER_PostgreSQL.S3Import.FileFormats.CustomDelimiter

Related

SQL: Extract from messy JSON nested field with backslashes

I have a table that has some rows with normal JSON and some with escaped values in the JSON field (backslashes)
id
obj
1
{"is_from_shopping_bag":true,"products":[{"price":{"amount":"18.00","currency":"USD","offset":100,"amount_with_offset":"1800"},"product_id":"1234","quantity":1}],"source":"cart"}
2
{"is_from_shopping_bag":"","products":"[{\ "product_id\ ":\ "2345\ ",\ "price\ ":{\ "currency\ ":\ "USD\ ",\ "amount\ ":\ "140.00\ ",\ "offset\ ":100},\ "quantity\ ":1}]"}
(Note: I needed to include a space after the backslashes in the above table so that they would show up in the github generated markdown table -- my actual table does not include those spaces between the backslash and the quote character)
I am doing a sql query in Hive to get the 'currency' field.
Currently I can run
SELECT
id,
JSON_EXTRACT(obj, '$.products[0].price.currency')
FROM my_table
Which will give me the correct output for the first row, but gives me a NULL in the second row
id
obj
1
"USD"
2
NULL
What is the best way to get currency field from the second row? Is there a way to clean up the field and remove the backslashes before trying to JSON_EXTRACT the relevant data?
I could use REPLACE to swap the '\ ' for '', but is that the most efficient method?
Replace \" with " using regexp_replace like this:
regexp_replace(obj,'\\\\"','"')

Data type when reading from a TSV-file (postgresql)

Im reading data from a tsv-file to a postgresql table. The problem I have is that in one column (death year) it is either a year or \N if the actor is not dead yet. If i try to use INTEGER as data type I get an error because of the \N. Do anyone know how to solve this?
This is my table:
CREATE TABLE name_mock(nconst VARCHAR, primaryName VARCHAR, birthYear INTEGER, deathYear INTEGER, primaryProfession VARCHAR, knownForTitles VARCHAR);
Then I import the data from the csv-file:
COPY name_mock FROM '/home/pathtofile/name.basics.tsv' DELIMITER E'\t' CSV HEADER;
And I get the following error:
ERROR: invalid input syntax for type integer: "\N"
CONTEXT: COPY name_mock, line 4, column deathyear: "\N"
Thank you.
COPY takes a parameter, NULL.
Please try this:
COPY name_mock FROM '/home/pathtofile/name.basics.tsv' DELIMITER E'\t' NULL '\N' CSV HEADER;

Why am I getting "Invalid input syntax for type integer" in postgresql when importing a CSV?

I'm trying to import a .csv file to my postgresql DB.
I created a table as follows:
CREATE TABLE accounts
(
acc_id integer,
acc_name text,
website text,
lat numeric,
longe numeric,
primary_poc text,
sales_rep_id integer
)
Then I used the following command to import the .csv file
COPY accounts(acc_id,acc_name,website,lat,longe,primary_poc,sales_rep_id)
FROM 'D:\accounts.csv' DELIMITER ';' CSV ;
And my .csv file contains the following:
1;Walmart;www.walmart.com;40.23849561;-75.10329704;Tamara Tuma;321500
2;Exxon Mobil;www.exxonmobil.com;41.16915630;-73.84937379;Sung Shields;321510
3;Apple;www.apple.com;42.29049481;-76.08400942;Jodee Lupo;321520
However, this doesn't work and the following message appear:
ERROR: invalid input syntax for type integer: "1"
CONTEXT: COPY accounts, line 1, column acc_id: "1"
SQL state: 22P02
Maybe there is a BOM in the CSV?
hexdump the file, and inspect the first three characters
(and) use an editor to remove the BOM
(or) export again, without the BOM (there should be a checkmark, even in the Microsoft "software")

import csv file in h2 database in several column

In my CSV file there is:
prenom; nom; age
prenom1; nom1; age1
prenom2; nom2; age2
...
When I import my CSV file using this command:
CREATE TABLE TEST AS SELECT * FROM CSVREAD('C:\Users\anonymous\Desktop\test.csv');
The main problem is that my database has 1 column with my CSV file..
I would like 3 columsn with prenom, nom and age with the data in each column.
Thanks for your help!
As #jdv stated, you must specify the field separator if it is not the default ,. The null specifies that the column names will be parsed from the first row.
CREATE TABLE TEST AS SELECT * FROM CSVREAD('C:\Users\anonymous\Desktop\test.csv',null,'fieldSeparator=;');
Keep in mind you may have to specify charset=Cp1252 as well, if the CSV file was generated with Excel. If you see something like prénom you have the wrong encoding.

invalid input syntax for integer with postgres

i have a table:
id | detail
1 | ddsffdfdf ;df, deef,"dgfgf",/dfdf/
when I did: insert into details values(1,'ddsffdfdf ;df, deef'); => got inserted properly
When I copied that inserted value from database to a file,the file had: 1 ddsffdfdf ;df, deef
Then I loaded the whole csv file to pgsql database,with values in the format: 1 ddsffdfdf ;df, deef
ERROR: invalid input syntax for integer: "1 ddsffdfdf ;df, deef is obtained. How to solve the problem?
CSVs need a delimiter that Postgres will recognize to break the text into respective fields. Your delimiter is a space, which is insufficient. Your CSV file should look more like:
1,"ddsffdfdf df, deef"
And your SQL should look like:
COPY details FROM 'filename' WITH CSV;
The WITH CSV is important because it tells Postgres to use a comma as the delimiter and parses your values based on that. Because your second field contains a comma, you want to enclose its value in quotes so that its comma is not mistaken for a delimiter.
To look at a good example of a properly formatted CSV file, you can output your current table:
COPY details TO '/your/filename.csv' WITH CSV;