UNLOAD to a new file when running in Redshift - sql

I am trying to UNLOAD the file to an S3 Bucket. However, I DONT want to overwrite, but create a new file everytime I run the command. How can I achieve this?
unload ('select * from table1')
to 's3://bucket/file1/file2/file3/table1.csv'
iam_role 'arn:aws:iam::0934857378:role/RedshiftAccessRole,arn:aws:iam::435874575846546:role/RedshiftAccessRole'
DELIMITER ','
PARALLEL OFF
HEADER

Just change the destination path specified in the "TO" section.
If you wish to do this programmatically, you could do it in whatever script/command sends the UNLOAD command.
You might be able to do it via a Stored Procedure by keeping a table with the last file number and writing code to retrieve and increment it.
Or, you could write an AWS Lambda function that is triggered upon creation of the file. The Lambda function could then copy the object to a different path/filename and delete the original object.

Related

Redshift Unload command with CSV extension

I'm using the following Unload command -
unload ('select * from '')to 's3://**summary.csv**'
CREDENTIALS 'aws_access_key_id='';aws_secret_access_key=''' parallel off allowoverwrite CSV HEADER;
The file created in S3 is summary.csv000
If I change and remove the file extension from the command like below
unload ('select * from '')to 's3://**summary**'
CREDENTIALS 'aws_access_key_id='';aws_secret_access_key=''' parallel off allowoverwrite CSV HEADER;
The file create in S3 is summary000
Is there a way to get summary.csv, so I don't have to change the file extension before importing it into excel?
Thanks.
actually a lot of folks asked the similar question, right now it's not possible to have an extension for the files. (but parquet files can have)
The reason behind this is, RedShift by default export it in parallel which is a good thing. Each slice will export its data. Also from the docs,
PARALLEL
By default, UNLOAD writes data in parallel to multiple files,
according to the number of slices in the cluster. The default option
is ON or TRUE. If PARALLEL is OFF or FALSE, UNLOAD writes to one or
more data files serially, sorted absolutely according to the ORDER BY
clause, if one is used. The maximum size for a data file is 6.2 GB.
So, for example, if you unload 13.4 GB of data, UNLOAD creates the
following three files.
So it has to create new files after 6GB that's why they are adding numbers as a suffix.
How do we solve this?
No native options from RedShift, but we can do some workaround with lambda.
Create a new S3 bucket and a folder inside it specifically for this process.(eg: s3://unloadbucket/redshift-files/)
Your unload files should go to this folder.
Lambda function should be triggered based on S3 put object event.
Then the lambda function,
Download the file(if it is large use EFS)
Rename it with .csv
Upload to the same bucket(or different bucket) into a different path (eg: s3://unloadbucket/csvfiles/)
Or even more simple if you use shell/powershell script to do the following process
Download the file
Rename it with .csv
As per AWS Documentation around UNLOAD command, it's possible to save data as CSV.
In your case, this is what your code would look like:
unload ('select * from '')
to 's3://summary/'
CREDENTIALS 'aws_access_key_id='';aws_secret_access_key='''
CSV <<<
parallel off
allowoverwrite
CSV HEADER;

How to ignore errors but not skip rows in redshift copy command

I have a nested json as my source file in S3 and I am trying to copy this file into redshift.
My issues with this are as follows,
I use MAXERROR - I need to skip certain errors because the source file is missing certain fields in some cases and has them in other
I use a JSONPATH file - to pick the fields that I need to copy to redshift
All the columns in the table are varchar
Obviously, since I am using maxerror the copy command executes successfully but the table has 0 records. Here is my copy command
COPY public.table(col1,col2,col3,col4,col5,col6)
from 's3://bucket/filename'
credentials 'redshift'
format as JSON 'jsonpathfile.json'
timeformat 'YYYY-MM-DDTHH:MI:SS'
EMPTYASNULL ACCEPTANYDATE ACCEPTINVCHARS TRUNCATECOLUMNS maxerror 100 ;
If I check into stl_load_errors it keeps saying
Invalid JSONPath format: Member is not an object.
Does this mean the copy command is not able to find even one object that fits the jsonpath file?
Which is definitely not true. I inferred the schema of the input file to design the jsonpath file.
Here is an example from COPY Examples - Amazon Redshift:
copy category
from 's3://mybucket/category_object_paths.json'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
json 's3://mybucket/category_jsonpath.json';
The path to the jsonpath file is specified fully, whereas your example just refers to the filename.
Try specifying the full path starting with s3:// and see whether that helps.

How does unloading an empty table from redshift to s3 behaves?

If an empty table is unloaded from redshift to S3 using UNLOAD command, does it creates an empty file on S3 or does it not do anything.
Earlier (few days back ) I unloaded using unload command command, it placed a 0 byte file on s3. But today it is not doing anything (that is, there is no file placed on s3) but redshift is showing "UNLOAD completed, 0 record(s) unloaded successfully" message.
Even using HEADER (to unload with headers) in the options of UNLOAD command is not showing any file on s3.
UNLOAD ($$ SELECT * FROM <table_name> $$) TO
's3://<bucket_name>/abc/test1'
iam_role '<iam_role>' ADDQUOTES HEADER ALLOWOVERWRITE DELIMITER AS ','
ESCAPE PARALLEL OFF
As per AWS support, they have gone back to the old UNLOAD behavior of creating empty files when there is no data to be unloaded in Versions >= 1.0.10880. So redshift clusters having Versions >= 1.0.10880 have the fix and is available in all regions.
Looks like the unload functionality changed since yesterday. Empty tables are not generating files while unloading.

how to create a file in application server with abap programing

I have a file in my D: drive of my computer and I want to copy this file to an SAP application server so that I am able to see my file with transaction AL11.
I know that I can create a file with AL11 but I want do this in ABAP.
Of course in my search I find this code but I cannot solve my problem with it.
data: unixcom like rlgrap-filename.
data: begin of tabl occurs 500,
line(400),
end of tabl.
dir =
unixcom = 'mkdir mydir'. "command to create dir
"to execute the unix command
call 'SYSTEM' id 'COMMAND' field unixcom
id 'TAB' field tabl[].
To upload the file to the application server, there are three steps to be followed. To open the file use the below statement:
Step1: OPEN DATASET file name FOR INPUT IN TEXT MODE ENCODING DEFAULT.
To write into the application server use.
Step2: TRANSFER name TO file name.
Dont forget to close the file once it is transferred.
Step3: CLOSE DATASET file name.
Plese mark with correct answer, if it helps! :)
If you want to do this using ABAP you could create a small report that uses the function module GUI_UPLOAD to get the file from your local disk into an internal table and then write it to the application server with something like this:
lv_filename = '\\path\to\al11\directory\file.txt'.
OPEN DATASET lv_filename FOR OUTPUT IN TEXT MODE ENCODING UTF-8.
LOOP AT lt_contents INTO lv_line.
TRANSFER lv_line TO lv_filename.
ENDLOOP.
CLOSE DATASET lv_filename.
I used CG3Z transaction and with this transaction I was able to copy a file in the application server directory.

BIDS Import from changing file name [wildcard?]

I'm attempting to create a process to import data. I created the entire process and it works, but I'm having trouble creating the variable to find the file name of the csv i want to import automatically. Each time a new csv is uploaded to me it has a timestamp on it. I want to be able to grab that file no matter what the name is and do work to it.
So for example this week the file name would be
filename_4-14-2014.csv
And next week
filename_4_21_2014.csv
And so on into eternity. . .
Is there a way to create a variable that picks up the full file name even though its changing?
After doing some poking around, I've discovered the following...
You can use a file system task to perform the copy operation I was referring to. You can set the input file and the output file as variables. This way you can always know that the file you use for import is always named the same, and has the right data.
You just need to add the variables and a File System Task to your package.
Ok so to accomplish what I wanted I created a Foreach Loop Container. Using the foreach loop container I had it look for any files ending with .csv in my specified folder by using a wildcard [denoted by asterisk: *.csv] .
Within the Foreach Loop container is as follows.
Step 1: File System Task - rename file.
Step 2: Data Flow Task - Import data to sql
Step 3: File System Task - Copy the file to another folder, append datetime to filename
Step 4: File System Task - Delete source file.
I used variables to get all the file and folder names plus datetimes.