Postgres Copy - Importing an integer with a comma - sql

I am importing 50 CSV data files into postgres. I have an integer field where sometimes the value is a regular number (comma-delimited) and sometimes it is in quotations and uses a comma for the thousands.
For instance, I need to import both 4 and "4,000".
I'm trying:
COPY race_blocks FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
And get the error:
ERROR: invalid input syntax for integer: "1,133"
How can I do this?

Let's assume you have only one column in your data.
First create temporary table with varchar column:
CREATE TEMP TABLE race_blocks_temp (your_integer_field VARCHAR);
Copy your data from file
COPY race_blocks_tmp FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
Remove ',' from varchar field, convert data to numeric and insert into your table.
INSERT INTO race_blocks regexp_replace(your_integer_field, ',', '') :: numeric AS some_colun FROM race_blocks_tmp;

Related

How to insert latin data into a Snowflake Table

We have a scenario, that we need to insert some special characters coming from file to the Snowflake table.
For exp:
emp_id| emp_name
110|Famille immédiate
As the snowflake only allow UTF-8 format, when running the dml operation the data is not getting inserted into the table and throwing an error.
Have tried updating the file format command but no solution yet.
CREATE OR REPLACE FILE FORMAT DB.LayOut01_FORMAT TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 ESCAPE_UNENCLOSED_FIELD = NONE REPLACE_INVALID_CHARACTERS = TRUE VALIDATE_UTF8 = FAlSE
What will be changes required to allow special charectors into the table as it is coming from source file ??
Insert Statement:
INSERT INTO DB.EMP_T ( emp_id, emp_name)
SELECT
(temp.$1) AS emp_id , (temp.$2) AS emp_name
from
$AZURE_FILE_STORAGE_LOCATION (file_format => DB.LayOut01_FORMAT, pattern=>'filename.csv') temp
UTF-8 is the only format for semi-structured data, but for structured you can insert data with different encodings.
Use on the file format the ENCODING parameter and set it to IS-8859-1, like:
CREATE FILE FORMAT ... ENCODING='ISO-8859-1'
For more information have a look here.

How to load a "|" delimited file into hive without creating a hive table with "ROW FORMAT DELIMITER"

I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.

Redshift copy a free-hand note field into Redshift

I have a few processes where I use the copy command to copy data from S3 into Redshift.
I have a new csv file where I am unable to figure out how I can bring in the "note" field- which is a free hand field a sales person writes anything into. It can have ";", ",", ".", spaces, new lines- anything.
Any common suggestions to copy this type of field? it is varchar(max) type in table_name.
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
ignoreheader 1
escape
removequotes
acceptinvchars
I get Delimiter not found
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
fillrecord
ignoreheader 1
escape
removequotes
acceptinvchars
I get String length exceeds DDL length
The second copy command command fixed your initial issue, namely of copy parsing the csv file. But now it can't be inserted because the input value exceeds the maximum column length of yr column in database. Try increasing the size of the column:
Alter column data type in Amazon Redshift

How to create an external Hive table if the field value has comma separated values

I had used sqoop-import command to sqoop the data into Hive from teradata. Sqoop-import command is creating a text file with comma(,) as the delimiter.
After Sqooping, I had created an external table as shown below:
CREATE EXTERNAL TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, description String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
But description column has values like this:"abc,xyz,mnl". Due to this,loading of data into a hive table is not proper. Then how to create a text file with a delimiter other than comma while sqooping.
Then how to delimit the fields while creating an external table of Hive?
Use --fields-terminated-by in your Sqoop job if you want to avoid the default delimiter.
--fields-terminated-by - This parameter is used for field separator character in output.
Example: --fields-terminated-by |
and then change fields separator in create table statement by FIELDS TERMINATED BY ‘|’

Hive: using quote character as delimiter in data files

Can we use quote (" or ') as delimiter in hive data files? If not why?
If we could refer to a list of characters which we can use as delimiters for hive data, that would be great.
When using the decimal notation, you can use the whole basic ascii range (decimal 0-127) - tested.
Avoid using \n or\r.
As for " and ', it can be done straightforward -
create table mytable (i int,j int) row format delimited fields terminated by '"';
create table mytable (i int,j int) row format delimited fields terminated by "'";
or
create table mytable (i int,j int) row format delimited fields terminated by '\'';
create table mytable (i int,j int) row format delimited fields terminated by "\"";