"UNLOAD" data tables from AWS Redshift and make them readable as CSV - sql

I am currently trying to move several data tables in my current AWS instance's redshift database to a new database in a different AWS instance (for background my company has acquired a new one and we need to consolidate to on instance of AWS).
I am using the UNLOAD command below on a table and I plan on making that table a csv then uploading that file to the destination AWS' S3 and using the COPY command to finish moving the table.
unload ('select * from table1')
to 's3://destination_folder'
CREDENTIALS 'aws_access_key_id=XXXXXXXXXXXXX;aws_secret_access_key=XXXXXXXXX'
ADDQUOTES
DELIMITER AS ','
PARALLEL OFF;
My issue is that when I change the file type to .csv and open the file I get inconsistencies with the data. there are areas where many rows are skipped and on some rows when the expected columns end I get additional columns with the value "f" for unknown reasons. Any help on how I could achieve this transfer would be greatly appreciated.
EDIT 1: It looks like fields with quotes are having the quotes removed. Additionally fields with commas are having the commas separated away. I've identified some fields with quotes and commas and they are throwing everything off. Would the addquotes clause I have apply to the entire field regardless of whether there are quotes and commas within the field?

Default document will have extension as txt and with quotes. Try to open it with Excel and then save as csv file.
refer https://help.xero.com/Q_ConvertTXT

Related

Is it any way to ignore the record that isn't correct and go ahead with the next record while using COPY command to upload data from s3 to redshift?

I have a '.csv' file in the s3 that has a lot of text data. I am trying to upload the data from s3 to redshift table but my data is not consistent , it has a lot of special character. Some records may be denied by the redshift. I want to ignore that record and move ahead with the next record. Is it possible using COPY command to ignore that record ?
I am expecting exception handling feature while using COPY command to upload data from s3 to Redshift.
Redshift has several ways to attack this kind of situation. First there is the MAXERROR option which you can set for how many unreadable rows will be allowed before the COPY will fail. There is also the IGNOREALLERRORS option to COPY which will read every row it can.
If you want to accept the rows with the odd characters you can use the ACCEPTINVCHARS option to COPY where you can specify a replacement character for every character Redshift cannot parse. It is typical to use '?' but you can make it any character.

Glue create_dynamic_frame.from_catalog return empty data

I'm debugging issue which create_dynamic_frame.from_catalog return no data, despite I'm able to view the data through Athena.
The Data Catelog is pointed to S3 folder and there are multiple files with same structure. The file type is csv, delimiter is space " ", consists of two column (string and json string), with no header.
This is CSV format file.
This is Athena query using crawler generated.
No result returned from dataframe when debug, any thought?
Take a look if you have enabled the Bookmark for this job. If you are running it multiple times, you need to reset the Bookmark or disable it.
Other thing to check is the logs. Maybe you can find some AccessDenied, the role that is running the job might have no access to this bucket.

How can I move data from spreadsheet to a database through SQL

I want to move the data from a spreadsheet into a database. The program I am using is called SQLWorkbenchJ. I am kinda of lost and don't really know where to start. Is there any tips or ways that might point me in the right direction.
Sql Workbench/J provides the WbImportcommand in order to load a text file into a DB table. So if you save your spreadsheet file in the CSV (comma separed value) format you can then load it in a table using this command.
Here is an example to load the text file CLASSIFICATION_CODE.csvhaving ,as field delimiter and ^ as quoting character in the CLASSIFICATION_CODEDB table.
WbImport -type=text
-file='C:\dev\CLASSIFICATION_CODE.csv'
-delimiter=,
-table=CLASSIFICATION_CODE
-quoteChar=^
-badfile='C:\dev\rejected'
-continueOnError=true
-multiLine=true
-emptyStringIsNull=false;
You might not need all the parameters of the example. Refer to the documentation to find the ones you need.
If the data you have in your spreadsheet are heterogeneous (e.g. your spreadsheet has two books) then split them in two files in order to store them in separate DB tables.

How to load csv data which is control+A separated into bigquery

I'm trying to load a CSV file which is control+A separated into bigquery. What should be the option I pass for -F parameter for the bq load command? All the options I have tried are resulting in an error while loading.
I would guess that Control+A is used in some legacy formats that OP wants to load into BigQuery. From the other hand Control+A can be chosen when it is hard to select any of usually used delimiters.
My recommendation would be to load your CSV file without any delimiter, so whole row will be loaded as a one field
Assuming your rows loaded into TempTable look like below with just one column called FullRow.
'value1^Avalue2^Avalue3'
where ^A is "invisible" character
So, after you loaded your file into BigQuery - now you can parse it to separate columns and write it to final table with something like below
SELECT
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){0}(\w*)') AS col1,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){1}(\w*)') AS col2,
REGEXP_EXTRACT(FullRow, r'(?:\w*\x01){2}(\w*)') AS col3
FROM TempTable
Above is confirmed to work as I used this approach multiple times. Works for both Legacy and Standard SQL

Is There A Way To Only Copy Specific Columns From RedShift To S3 Using RedShiftCopyActivity?

I assume that copying from RedShift -> S3 can only be done with RedshiftcCopyActivity. However I can't seem to find a way to copy only specific columns to S3 (only copy all columns).
The reason I am doing this because one of the columns in the redshift contains carriage return character that messing up with PigActivity defined later on. So I figure since I don't need that column I will just copy only the columns to make my PigActivity runs smoothly.
you can use transformSql option in RedshiftCopyActivity to copy selective columns.
ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html
I believe RedshiftCopyActivity is used for the utilizing COPY command which is S3->Redshift. The opposite command is UNLOAD.
Your request can be done with SQLActivity where you can write complete UNLOAD command using SELECT statement to define columns for unload.