Is it possible to create a variable in PIG - apache-pig

Can i create a variable in PIG and concatenate them where on if the variable is dynamic- like current time?
I need a file name to be created based on the current time.
%declare FILE_PREFIX file;
%declare FILE_POSTFIX date +%Y-%m-%d-%s;
Can i do something like:
file_name = '$FILE_PREFIX$FILE_POSTFIX';

As of my Experience,I worked like below..
Passed parameter from command line to pig script filename and date..
pig -f myscript.pig --param file="india_" --param nw=$(date +"%Y-%m-%d-%s")
In the pig script.
%declare FILE_PREFIX '$file$nw ';
A = load '/user/root/$FILE_PREFIX' USING PigStorage(',') as (id1, name1);
dump A;

Related

Error loading pig script

I am having difficulty loading data using apache pig script
cat data15.txt
1,(2,3)
2,(3,4)
grunt>a = load 'nikhil/data15.txt' using PigStorage(',') as (x:int, y:tuple(y1:int,y2:int));
grunt>dump a;
(1,)
(2,)
I know its too late to answer this
The problem is that the tuple and the other field have the same delimiter as ','. Pig fails to do the schema conversion.
you can try something like thisyou need to change the delimiter
1:(5,7,7)
3:(7,9,4)
5:(5,9,7)
and run the pig script as
A = load 'file.txt' using PigStorage(':') as (t1:int,t2:tuple(x:int,y:int,z:int));
dump A;
the output is
(1,(5,7,7))
(3,(7,9,4))
(5,(5,9,7))
you can change the delimiter using sed command in the input file and then load the file.

Passing Parameter in pig

A = load '$path' using PigStorage('$Delimiter') as ($table_schema);
I want to pass these parameter in pig command dynamically.
Can any help me in this by showing an example?
Try this :
test.cfg
path=/input/file/path
delimiter=,
table_schema=requiredschema:chararray
N.B. Valid values to be given for above keys before test run.
test.pig
A = load '$path' using PigStorage('$delimiter') as ($table_schema);
DUMP A;
Invocation :
pig -f test.pig -m test.cfg
-f : To specify pig file name
-m : To specify the param file where
Ref : Error getting when passing parameter through pig script for a similar use case.

How to use apache pig filter to find '.PDF'

I have a file /pigmix.txt in HDFS which have a list of files with different format like .PDF,.DOC,.PPT etc. I want to filter only .PDF. How can I use apache pig filter function for it?
Can you try the below filter command?
input:
file1.txt
file2.PDF
file3.doc
file4.ppt
file5.pdf
PigScript:
A = LOAD 'input' USING PigStorage() AS (filename:chararray);
B = FILTER A BY filename matches '.*\\.(pdf|PDF)$';
DUMP B;
Output:
(file2.PDF)
(file5.pdf)

run os command and set out put to hive variable

Is it possible to run something like this in Hive CLI?
I am trying to pass file contents as a variable to another query.
set column_list=!cat /home/user/filename.lst ;
create table tabname as select $column_list from ...
if you have a query file you pass the variables as hiveconf
hive -hiveconf var1=abcd -f file.txt
or you can construct your query and then pass it to hive cli using -e
hive -e "create table ..."
file filename.lst
line
make a file test.sh,
temp=$(cat /home/user/filename.lst)
hive -f test.hql -hiveconf var=$temp
make a another file test.hql
create table test(${hiveconf:var} string);
on terminal
sh -x test.sh
It will pass the line to the test.hql and it will create a table with line as column;
note- all files should be in same directory .This script is passing only one variable.

How can I pass command-line parameters with whitespace to an apache pig script?

I want to write a pig script that takes a filter condition as a command line parameter. From the command line I want to type something like:
pig -p "MY_FILTER=field1 == 0 and field2 == 5" myscript.pig
In my script I have a line:
my_filtered_data = filter my_data by $MY_FILTER;
This works as expected when MY_FILTER has no spaces and I pass quotes around my value; So if I type MY_FILTER=\"field1==0\" at the command line the shell will pass the quotes with the value and pig does the expansion I want. However, the parameter will fail to expand if I supply it like MY_FILTER=\"field1 == 0\"
I've tried a bunch of different quoting techniques and even tried running the command directly from python's subprocess module to ensure my shell wasn't doing something weird.
Which version of Pig do you use? I use 0.9.2 and the following command works for me:
pig -p "F='field1 == 3 AND field2 == 5'" test.pig
But it doesn't work with 0.8.1.