Redshift copy command from S3 works, but no data uploaded - amazon-s3

I am using the copy command to copy a file (.csv.gz) from AWS S3 to Redshift
copy sales_inventory from
's3://[redacted].csv.gz'
CREDENTIALS '[redacted]'
COMPUPDATE ON
DELIMITER ','
GZIP
IGNOREHEADER 1
REMOVEQUOTES
MAXERROR 30
NULL 'NULL'
TIMEFORMAT 'YYYY-MM-DD HH:MI:SS'
;
I don't receive any errors, just '0 rows loaded successfully'. I checked the easy things: double checked the file's content, made sure I was targeting the right file with the copy command. Then I created a simple one row example file to try and it didn't work. I've been using a copy command template I made a long time ago and that has worked very recently.
Any common mistakes I might have overlooked? Any way other than the example file that I could try?
Thanks.

With IGNOREHEADER 1 option, Redshift will regard the first line as a header and skip it. If there is just one line in the file, you should take this option off.
If your file contains multiple records, you might have a data load error. Since you're specifying MAXERROR 30, Redshift will skip invalid records up to 30 records and return success result. The load error information during the copy would be stored in STL_LOAD_ERRORS table. Try SELECT * FROM STL_LOAD_ERRORS order by starttime desc limit 10; to check if you had load errors.

Related

Is it any way to ignore the record that isn't correct and go ahead with the next record while using COPY command to upload data from s3 to redshift?

I have a '.csv' file in the s3 that has a lot of text data. I am trying to upload the data from s3 to redshift table but my data is not consistent , it has a lot of special character. Some records may be denied by the redshift. I want to ignore that record and move ahead with the next record. Is it possible using COPY command to ignore that record ?
I am expecting exception handling feature while using COPY command to upload data from s3 to Redshift.
Redshift has several ways to attack this kind of situation. First there is the MAXERROR option which you can set for how many unreadable rows will be allowed before the COPY will fail. There is also the IGNOREALLERRORS option to COPY which will read every row it can.
If you want to accept the rows with the odd characters you can use the ACCEPTINVCHARS option to COPY where you can specify a replacement character for every character Redshift cannot parse. It is typical to use '?' but you can make it any character.

How do you bulk load parquet files into Snowflake from AWS S3?

I'm trying to bulk load 28 parquet files into Snowflake from an S3 bucket using the COPY command and regex pattern matching. But each time I run the command in my worksheet, I'm getting the following bad response:
Copy executed with 0 files processed.
Inside a folder in my S3 bucket, the files I need to load into Snowflake are named as follows:
S3://bucket/foldername/filename0000_part_00.parquet
S3://bucket/foldername/filename0001_part_00.parquet
S3://bucket/foldername/filename0002_part_00.parquet
...
S3://bucket/foldername/filename0026_part_00.parquet
S3://bucket/foldername/filename0027_part_00.parquet
Using the Snowflake worksheet, I'm trying to load data into a pre-existing table, using the following commands:
CREATE or REPLACE file format myparquetformat type = 'parquet';
COPY INTO [Database].[Schema].[Table] FROM (
SELECT $1:field1::VARCHAR(512), $1:field2::INTEGER, $1:field3::VARCHAR(512),
$1:field4::DOUBLE, $1:field5::VARCHAR(512), $1:field6::DOUBLE
FROM #AWS_Snowflake_Stage/foldername/
(FILE_FORMAT => 'myparquetformat', PATTERN =>
'filename00[0-9]+_part_00.parquet')
)
on_error = 'continue';
I'm not sure why these commands fail to run.
In every example I've seen in the Snowflake documentation, "PATTERN" is only used within the COPY command outside of a SELECT query. I'm not sure if it's possible to use PATTERN inside a SELECT query.
In this case, I think it's necessary to use the SELECT query within the COPY command, since I'm loading in parquet data that would first need to be cast from a single column ($1) into multiple columns with appropriate data types for the table (varchar, integer, double). The SELECT query is what enables the importing of the parquet file into the existing table -- is it possible to find a way around this using a separate staging table?
It's a huge pain to load the parquet files one at a time. Is there any way to bulk load these 28 parquet files using the Snowflake worksheet? Or is it better to try to do this using a Python script and the Snowflake API?
For me the below worked, I agree my pattern is quite simple to select all parquet file in the location, but you can probably verify if the regex pattern is valid.
COPY INTO <TABLE_NAME> FROM (
SELECT
$1:col_name_1,
$1:col_name_2
FROM #STAGE_NAME/<PATH_TO_FILES>
)
PATTERN = '.*.parquet'
FORCE = TRUE
FILE_FORMAT = (
TYPE = 'parquet'
);
Side note, Keep in mind that Snowflake has a safety check to skip
files if it has already been Staged and loaded once successfully.

Specify multiple delimiters for Redshift copy command

Is there a way to specify multiple delimiters to Redshift copy command while loading data.
I have a data file having the following format:-
1 | ab | cd | ef
2 | gh | ij | kl
I am using a command like this:-
COPY MY_TBL
FROM 's3://s3-file-path'
iam_role 'arn:aws:iam::ddfjhgkjdfk'
manifest
IGNOREHEADER 1
gzip delimiter '|';
Fields are separated by | and records are separated using newline. How do I copy this data into Redshift. Because my query above gives me a delimiter not found error
No, delimiters are single characters.
From Data Format Parameters:
Specifies the single ASCII character that is used to separate fields in the input file, such as a pipe character ( | ), a comma ( , ), or a tab ( \t ).
You could import it with a pipe delimiter, then perform an UPDATE command to STRIP() off the spaces.
Your error above suggests that something in your data is causing the COPY command to fail. This could be a number of things, from file encoding, to some funky data in there. I've struggled with the "delimiter not found" error recently, which turned out to be the ESCAPE parameter combined with trailing backslashes in my data which prevented my delimiter (\t) from being picked up.
Fortunately, there are a few steps you can take to help you narrow down the issue:
stl_load_errors - This system table contains details on any error logged by Redshift during the COPY operation. This should be able to identify the row number in your data file that is causing the problem.
NOLOAD - will allow you to run your copy command without actually loading any data to Redshift. This performs the COPY ANALYZE operation and will highlight any errors in the stl_load_errors table.
FILLRECORD - This allows Redshift to "fill" any columns that it sees as missing in the input data. This is essentially to deal with any ragged-right data files, but can be useful in helping to diagnose issues that can lead to the "delimiter not found" error. This will let you load your data to Redshift and then query in database to see where your columns start being out of place.
From the sample you've posted, your setup looks good, but obviously this isn't the entire picture. The options above should help you narrow down the offending row(s) to help resolve the issue.

"UNLOAD" data tables from AWS Redshift and make them readable as CSV

I am currently trying to move several data tables in my current AWS instance's redshift database to a new database in a different AWS instance (for background my company has acquired a new one and we need to consolidate to on instance of AWS).
I am using the UNLOAD command below on a table and I plan on making that table a csv then uploading that file to the destination AWS' S3 and using the COPY command to finish moving the table.
unload ('select * from table1')
to 's3://destination_folder'
CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXX'
ADDQUOTES
DELIMITER AS ','
PARALLEL OFF;
My issue is that when I change the file type to .csv and open the file I get inconsistencies with the data. there are areas where many rows are skipped and on some rows when the expected columns end I get additional columns with the value "f" for unknown reasons. Any help on how I could achieve this transfer would be greatly appreciated.
EDIT 1: It looks like fields with quotes are having the quotes removed. Additionally fields with commas are having the commas separated away. I've identified some fields with quotes and commas and they are throwing everything off. Would the addquotes clause I have apply to the entire field regardless of whether there are quotes and commas within the field?
Default document will have extension as txt and with quotes. Try to open it with Excel and then save as csv file.
refer https://help.xero.com/Q_ConvertTXT

SQL syntax error when using Heroku dataclips to export PostgreSQL database into csv

I have a Rails app on Heroku that i'm currently testing to ensure that I can download the information it gathers. I've managed to get PostgreSQL 9.3.5 working and can even get it to spit out a public url to an unreadable dump file, but I want to export a particular table into a CSV that is easier to understand so that I can gather the data.
I've been looking into Heroku Dataclips. The documentation says that this is possible, but doesn't explain how. This site seemed to give some tips on SQL inputs:
http://www.gistutor.com/postgresqlpostgis/10-intermediate-postgresqlpostgis-tutorials/39-how-to-import-or-export-a-csv-file-using-postgresql-copy-to-and-copy-from-queries.html
So I entered this into Dataclips:
COPY participations(user_full_name, user_email, event_name, event_date_time)
TO '/usr/local/pgsql/data/csv/event_registrations.csv'
WITH DELIMITER ‘,’
CSV HEADER
However, I get this error:
Your query couldn't be created.
ERROR: syntax error at or near "COPY"
LINE 2: COPY participation(user_full_name, user_email, event_name, e...
^
How can I fix this? Maybe the reference i'm using is wrong, because I don't see the difference between what i'm doing and what's there.
FWIW, i'm using Cloud9 IDE as my terminal.
If you are trying to get data out in csv file then :
try to do this in command line and put "\" before copy
like this
\COPY participations(user_full_name, user_email, event_name, event_date_time)
TO '/usr/local/pgsql/data/csv/event_registrations.csv'
WITH DELIMITER ‘,’
CSV HEADER
or you can download PGadmin it has option execute query to file under QUERY tab on top
According to Heroku support, this is what you need to put in a Dataclip if you want to get all the records from a particular table:
SELECT * from table_name;
Once you create your Dataclip, you will have the option through the Dataclips interface to download the results as a CSV.