talend tsystem multiline commands - amazon-s3

I have command to move files between s3 folders. I am getting bucket name from context variable.
placed the command in array line
"aws s3 mv s3://"+context.bucket+"/Egress/test1/abc/xyz.dat s3://"+context.bucket+"/Egress/test1/abc/archive/archive_xyz.dat"
The command fetches the bucket name from context variable, but shows no file or direcrtory error=2.
I think it is due to (") at begin and end.
Is there any way to solve.
please help

You probably want to use an array command.
Using /bin/bash or cmd
then your command.

Related

Stroing large entry in redis using cli?

I am trying to store data of 5000 characters in redis via cli.
My command is SET MY_KEY "copy pasted the value"
But the whole value is not getting pasted in CLI.
Is there any alternative to it.
I have redis version 3.0.54
But the whole value is not getting pasted in CLI. Is there any alternative to it.
Yes, here is one which works with most modern shells: create a text file with your command(s) and use input redirection.
For example, create a file named commands.txt with your Redis commands:
SET MY_KEY "copy pasted the value"
And pass it to the CLI through input redirection (Bash here, but the syntax is similar if not equal in most modern shells):
redis-cli < commands.txt

How to hide blob url and SAS token in the azcopy in the command prompt

Am trying to copy the files from AWS S3 to Azure blob. To do that i have used azcopy.
I have downloaded azcopy in my windows machine and accessed it via command prompt.
As needed, i have set my AWS secret key and access key ID in the azcopy env
Now, i can move the files from S3 to blob
But to do that each time am giving the SAS token in the command. Is it possible to hide it or is it possible to pass via a file
cp "https://s3.amazonaws.com/testblobutopud/" "SAS token" --recursive
I do not need to pass this token directly in command...cAn someone help me in this
Error:
Environment variable
Updated:
please set your sastoken in Environment Variable, like it's name is bloburl_sastoken, and use double quotes to wrap it's value(it's value looks like this: https://blobtestutopus.blob.core.windows.net/?sv=2019-10-10&ss=bfqt&srt=sco&sp=rwdlacupx&se=2020-07-17T10:09:48Z&st=2020-07-17T02:09:48Z&spr=https&sig=xxxxxxxxx).
Then in the command prompt, use the code below:
set temp_chars=""
set temp_url=%temp_chars%%bloburl_sastoken%
set url_sastoken=%temp_url:""=%
azcopy cp "https://s3.amazonaws.com/testblobutopud/" %url_sastoken%
Original answer:
You can store the sasToken in the Environment variable. The steps are as below:
Step 1: Set an Environment Variable, you can use command or UI to do this. Here, I just set it via UI. Note that for it's value, you should wrap the value in double quotes in English mode. The screenshot as below:
Step 2: then open a cmd prompt, and use the code below(please remove the comments section):
//set a variable for the blob url
set url="https://yy1.blob.core.windows.net/test5/myfoo1.txt"
//set a variable which contains blob url and sastoken
set temp_url=%url%%mysastoken%
//remove the redundant double quotes in blob url and sastoken
set url_sastoken=%temp_url:""=%
//copy it
azcopy copy "d:\foo.txt" %url_sastoken%
Another way is that you can use AzCopy login command, then you don't need to use sastoken here.

Create a csv file of a view in hive and put it in s3 with headers excluding the table names

I have a view in hive named prod_schoool_kolkata. I used to get the csv as:
hive -e 'set hive.cli.print.header=true; select * from prod_schoool_kolkata' | sed 's/[\t]/,/g' > /home/data/prod_schoool_kolkata.csv
that was in EC2-Instance. I want the path to be in S3.
I tried giving the path like :
hive -e 'set hive.cli.print.header=true; select * from prod_schoool_kolkata' | sed 's/[\t]/,/g' > s3://data/prod_schoool_kolkata.csv
But the csv is not getting stored.
I also had a problem that the csv file is getting generated but every column head is having pattern like: tablename.columnname for example prod_schoool_kolkata.id. Is there any way to remove the table names in the csv getting formed.
You have to first install the AWS Command Line Interface.
Refer the Link : Installing the AWS Command Line Interface and follow the relevant installation instructions or go to the Sections at the bottom to get the installation links relevant to your Operating System(Linux/Mac/Windows etc).
After verifying that it's installed properly, you may run normal commands like cp,ls etc over the aws file system. So, you could do
hive -e 'set hive.cli.print.header=true; select * from prod_schoool_kolkata'|
sed 's/[\t]/,/g' > /home/data/prod_schoool_kolkata.csv
aws s3 cp /home/data/prod_schoool_kolkata.csv s3://data/prod_schoool_kolkata.csv
Also see How to use the S3 command-line tool

Using a variable in a batch script for SQLLDR

Hi i am trying to set a variable in my .bat file and i want to use the variable in sqlldr code to state the file (infile)
This is what i have in my bat file for the variable:
set directroy_name= D:\Folder\Folder\Folder\File.csv
Then in my command file i have
load data
infile '%directory_name%'
When ever i try to run the .bat file from the command prompt i just receive the SQL_Loader_500: unable to ope file (%directory_name%.dat
I know the files in the correct location?
any ideas why its doing this?
No, you can't do that - you're expecting the Oracle executable to understand Windows environment variable syntax. If it did that it would have to deal with $ variables in Unix, etc.
You can just pass the file name on the command line instead. In your control file omit the INFILE altogether, then when you call SQL*Loader add a a DATA command-line argument:
sqlldr user/password CONTROL=your.ctl DATA=%directory_name% ...
Assuming your variable is just oddly named and does have a full file path as you've shown.
If it's present, the INFILE argument will be overridden by the command-line argument, so you could include a default fixed value if you wanted to, I suppose.
You also appear to have a typo; you set directroy_name, but then use directory_name, which will have no value. You need to change that to:
set directory_name= D:\Folder\Folder\Folder\File.csv

How to force STORE (overwrite) to HDFS in Pig?

When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers:
2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 6000: Output Location Validation Failed for: 'hdfs://[server]/user/[user]/foo/bar More info to follow:
Output directory hdfs://[server]/user/[user]/foo/bar already exists
So I'm searching for an in-Pig solution to automatically remove the directory, also one that doesn't choke if the directory is non-existent at call time.
In the Pig Latin Reference I found the shell command invoker fs. Unfortunately the Pig script breaks whenever anything produces an error. So I can't use
fs -rmr foo/bar
(i. e. remove recursively) since it breaks if the directory doesn't exist. For a moment I thought I may use
fs -test -e foo/bar
which is a test and shouldn't break or so I thought. However, Pig again interpretes test's return code on a non-existing directory as a failure code and breaks.
There is a JIRA ticket for the Pig project addressing my problem and suggesting an optional parameter OVERWRITE or FORCE_WRITE for the STORE command. Anyway, I'm using Pig 0.8.1 out of necessity and there is no such parameter.
At last I found a solution on grokbase. Since finding the solution took too long I will reproduce it here and add to it.
Suppose you want to store your output using the statement
STORE Relation INTO 'foo/bar';
Then, in order to delete the directory, you can call at the start of the script
rmf foo/bar
No ";" or quotations required since it is a shell command.
I cannot reproduce it now but at some point in time I got an error message (something about missing files) where I can only assume that rmf interfered with map/reduce. So I recommend putting the call before any relation declaration. After SETs, REGISTERs and defaults should be fine.
Example:
SET mapred.fairscheduler.pool 'inhouse';
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
%default name 'foobar'
rmf foo/bar
Rel = LOAD 'something.tsv';
STORE Rel INTO 'foo/bar';
Once you use the fs command, there a lot of ways to do this. For an individual file, I wound up adding this to the beginning of my scripts:
-- Delete file (won't work for output, which will be a directory
-- but will work for a file that gets copied or moved during the
-- the script.)
fs -touchz top_100
rm top_100
For a directory
-- Delete dir
fs -rm -r out