Passing Parameter in pig - apache-pig

A = load '$path' using PigStorage('$Delimiter') as ($table_schema);
I want to pass these parameter in pig command dynamically.
Can any help me in this by showing an example?

Try this :
test.cfg
path=/input/file/path
delimiter=,
table_schema=requiredschema:chararray
N.B. Valid values to be given for above keys before test run.
test.pig
A = load '$path' using PigStorage('$delimiter') as ($table_schema);
DUMP A;
Invocation :
pig -f test.pig -m test.cfg
-f : To specify pig file name
-m : To specify the param file where
Ref : Error getting when passing parameter through pig script for a similar use case.

Related

Error loading pig script

I am having difficulty loading data using apache pig script
cat data15.txt
1,(2,3)
2,(3,4)
grunt>a = load 'nikhil/data15.txt' using PigStorage(',') as (x:int, y:tuple(y1:int,y2:int));
grunt>dump a;
(1,)
(2,)
I know its too late to answer this
The problem is that the tuple and the other field have the same delimiter as ','. Pig fails to do the schema conversion.
you can try something like thisyou need to change the delimiter
1:(5,7,7)
3:(7,9,4)
5:(5,9,7)
and run the pig script as
A = load 'file.txt' using PigStorage(':') as (t1:int,t2:tuple(x:int,y:int,z:int));
dump A;
the output is
(1,(5,7,7))
(3,(7,9,4))
(5,(5,9,7))
you can change the delimiter using sed command in the input file and then load the file.

Is it possible to create a variable in PIG

Can i create a variable in PIG and concatenate them where on if the variable is dynamic- like current time?
I need a file name to be created based on the current time.
%declare FILE_PREFIX file;
%declare FILE_POSTFIX date +%Y-%m-%d-%s;
Can i do something like:
file_name = '$FILE_PREFIX$FILE_POSTFIX';
As of my Experience,I worked like below..
Passed parameter from command line to pig script filename and date..
pig -f myscript.pig --param file="india_" --param nw=$(date +"%Y-%m-%d-%s")
In the pig script.
%declare FILE_PREFIX '$file$nw ';
A = load '/user/root/$FILE_PREFIX' USING PigStorage(',') as (id1, name1);
dump A;

dynamically fetching dynamic variable's value from properties file

Below unix commands works:
export myTempVar=myTempVar1
export myTempVar1=myTempVar2
eval echo '$'$myTempVar
This correctly prints myTempVar2.
However, what if myTempVar1=myTempVar2 is present in a properties file instead of directly in the script.
So my script will have
. $MYDIR/myProperties.properties
myTempVar=myTempVar1
myTempVar3=eval echo '$'$myTempVar
Above lines are not working and the value of myTempVar3 is not coming as myTempVar2.
myProperties.properties is having below line:
myTempVar1=myTempVar2
Using indirection is far safer than eval:
#!/bin/bash
. $MYDIR/myProperties.properties # myTempVar1=myTempVar2
myTempVar=myTempVar1
myTempVar3=${!myTempVar}
echo $myTempVar3
Gives:
myTempVar2
and you don't need the echo in eval:
eval myTempVar3='$'$myTempVar

How can I pass command-line parameters with whitespace to an apache pig script?

I want to write a pig script that takes a filter condition as a command line parameter. From the command line I want to type something like:
pig -p "MY_FILTER=field1 == 0 and field2 == 5" myscript.pig
In my script I have a line:
my_filtered_data = filter my_data by $MY_FILTER;
This works as expected when MY_FILTER has no spaces and I pass quotes around my value; So if I type MY_FILTER=\"field1==0\" at the command line the shell will pass the quotes with the value and pig does the expansion I want. However, the parameter will fail to expand if I supply it like MY_FILTER=\"field1 == 0\"
I've tried a bunch of different quoting techniques and even tried running the command directly from python's subprocess module to ensure my shell wasn't doing something weird.
Which version of Pig do you use? I use 0.9.2 and the following command works for me:
pig -p "F='field1 == 3 AND field2 == 5'" test.pig
But it doesn't work with 0.8.1.

Error executing shell command in pig script

I have a pig script where in the beginning I would like to generate a string of the dates of the past 7 days from a certain date (later used to retrieve log files for those days).
I attempt to do this with this line:
%declare CMD7 input= ; for i in {1..6}; do d=$(date -d "$DATE -i days" "+%Y-%m-%d"); input="\$input\$d,"; done; echo \$input
I get an error :
" ERROR 2999: Unexpected internal error. Error executing shell command: input= ; for i in {1..6}; do d=$(date -d "2012-07-10 -i days" "+%Y-%m-%d"); input="$input$d,"; done;. Command exit with exit code of 127"
however the shell command runs perfectly fine outside of pig. I am really not sure what is going wrong here.
Thank you!
I have got a working solution but not as streamlined as you want, essentially I don't manage to get Pig to execute a complex shell statement in the declare.
I first wrote a shell script (let's call it 6-days-back-from.sh):
#!/bin/bash
DATE=$1
for i in {1..6}; do d=$( date -d "$DATE -$i days" +%F ) ; echo -n "$d "; done
Then a pig script as follow (let's call it days.pig):
%declare my_date `./6-days-back-from.sh $DATE`
A = LOAD 'dual' USING PigStorage();
B = FOREACH A GENERATE '$my_date';
DUMP B
note that dual is a directory containing a text file with a single line of text, for the purpose of displaying our variable
I called the script as follow:
pig -x local -param DATE="2012-08-03" days.pig
and got the following output:
({(2012-08-02),(2012-08-01),(2012-07-31),(2012-07-30),(2012-07-29),(2012-07-28)})