I have a few processes where I use the copy command to copy data from S3 into Redshift.
I have a new csv file where I am unable to figure out how I can bring in the "note" field- which is a free hand field a sales person writes anything into. It can have ";", ",", ".", spaces, new lines- anything.
Any common suggestions to copy this type of field? it is varchar(max) type in table_name.
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
ignoreheader 1
escape
removequotes
acceptinvchars
I get Delimiter not found
Using this:
copy table_name
from 's3://location'
iam_role 'something'
delimiter as ','
fillrecord
ignoreheader 1
escape
removequotes
acceptinvchars
I get String length exceeds DDL length
The second copy command command fixed your initial issue, namely of copy parsing the csv file. But now it can't be inserted because the input value exceeds the maximum column length of yr column in database. Try increasing the size of the column:
Alter column data type in Amazon Redshift
Related
I am trying to load a local file with "|" delimited values into hive table, we usually create a table with option "ROW FORMAT DELIMITER "|" . But I want to create a normal table and load data . What is the right syntax I need to use, please suggest.
Working Code
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
)ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE;
But I want to do :
CREATE TABLE IF NOT EXISTS testdb.TEST_DATA_TABLE
( column1 string,
column 2 bigint,
);
LOAD DATA LOCAL INPATH 'xxxxx.csv' INTO TABLE testdb.TEST_DATA_TABLE FIELDS TERMINATED BY '|';
Reason begin: If i create a table, HDFS will store the data in the table with "|" delimeter
With second DDL you have provided, Hive will create default formatted table like Textformat,orc,parquet..etc(as per your configuration) with cntrl+A delimited file(default delimiter in hive).
If you want to store the hdfs file with pipe delimited then we need to create Hive Table in Text with | delimiter.
(or)
You can also write the result of select query to local (or) HDFS path with pipe delimiter also.
I have a number of datasets that I am trying to load into amazon redhsift from s3 buckets. Here is my command:
"copy tablename from 'my_s3_bucket' iam_role 'my_role' delimiter ',' IGNOREHEADER 1 null as ''
this works but for some files throws an error:
Invalid digit, Value 'i', Pos 0, Type: Decimal...
On inspection the data has 'inf' in some positions which is causing the error. I am wondering if there is a way to handle infinite values with this type of command? Or simply upload it as a null - though I have '' already specified as null so not sure if I can do another?
Maybe change table's schema to load data as VARCHAR and then create a view with a CASE statement handling inf values and cast values to proper data type.
I had used sqoop-import command to sqoop the data into Hive from teradata. Sqoop-import command is creating a text file with comma(,) as the delimiter.
After Sqooping, I had created an external table as shown below:
CREATE EXTERNAL TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, description String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
But description column has values like this:"abc,xyz,mnl". Due to this,loading of data into a hive table is not proper. Then how to create a text file with a delimiter other than comma while sqooping.
Then how to delimit the fields while creating an external table of Hive?
Use --fields-terminated-by in your Sqoop job if you want to avoid the default delimiter.
--fields-terminated-by - This parameter is used for field separator character in output.
Example: --fields-terminated-by |
and then change fields separator in create table statement by FIELDS TERMINATED BY ‘|’
I am importing 50 CSV data files into postgres. I have an integer field where sometimes the value is a regular number (comma-delimited) and sometimes it is in quotations and uses a comma for the thousands.
For instance, I need to import both 4 and "4,000".
I'm trying:
COPY race_blocks FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
And get the error:
ERROR: invalid input syntax for integer: "1,133"
How can I do this?
Let's assume you have only one column in your data.
First create temporary table with varchar column:
CREATE TEMP TABLE race_blocks_temp (your_integer_field VARCHAR);
Copy your data from file
COPY race_blocks_tmp FROM '/census/race-data/al.csv' DELIMITER ',' CSV HEADER;
Remove ',' from varchar field, convert data to numeric and insert into your table.
INSERT INTO race_blocks regexp_replace(your_integer_field, ',', '') :: numeric AS some_colun FROM race_blocks_tmp;
I have a mysql database that holds content as a blob, for whatever reason those developers chose to use a blob is out of my control. Is it possible to convert the data to text and the data type to text?
have you tried the alter table command ?
alter table mytable change mycolumn mycolumn text;
from http://forums.mysql.com/read.php?103,164923,167648#msg-167648 it looks like you can use CAST.
you could create a new (TEXT) column, then fill it in with an update command:
update mytable set myNewColumn = CAST(myOldColumn AS CHAR(10000) CHARACTER SET utf8)
Converting the field from blob to text truncates all characters > 127. In my case we have lots of european characters, so this was not an option. Here's what I did:
Create temp field as text
Copy the blob field to the temp field: UPDATE tbl SET col_temp = CONVERT(col USING latin1); In this case my blob held latin1 encoded chars
Convert actual field to text datatype
Copy temp to actual field
Remove temp column
Not exactly straightforward but it worked and no data loss. I'm using Version: '5.1.50-community'