Following PIG code is not working:
grunt> Register /usr/lib/pig/lib/piggybank.jar ;
grunt> define Stitch org.apache.pig.piggybank.evaluation.Stitch();
grunt> data = load 'a' using PigStorage('|') ;
grunt> B = Stitch(data,data);
Error:-
2015-01-06 12:03:57,730 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 1200: <line 12> Cannot expand macro 'Stitch'.
Reason: Macro must be defined before expansion.
Details at logfile: /home/hduser/nikhil/pig_1420524859398.log
Can someone explain whats going wrong here.
There are two issues in your code
1. You can't directly assign the output of stitch command to any relation. It should be projected as part of FOREACH stmt.
2. Stitch command will take only bags as an input parameter, but you are passing the entire relation.
Can you fix the above two issue and retry your script.
Sample example:
input:
{(a,b),(e,f)} {(c,d),(g,h)}
PigScript:
grunt> REGISTER /tmp/piggybank.jar;
grunt> DEFINE MyStitch org.apache.pig.piggybank.evaluation.Stitch;
grunt> A = LOAD 'input' USING PigStorage() AS (B1:{T:(t1:chararray,t2:chararray)},B2:{T1:(t3:chararray,t4:chararray)});
grunt> B = FOREACH A GENERATE MyStitch(B1,B2);
grunt> DUMP B;
Output:
({(a,b,c,d),(e,f,g,h)})
Reference:
http://pig.apache.org/docs/r0.13.0/api/org/apache/pig/piggybank/evaluation/Stitch.html
Related
Getting incorrect output while executing FOREACH statement.
step 1) Copied a csv file from S3 to hdfs without header
step 2) In hdfs mode, i tried to load the same file in an alias in pig.(Dump is working fine till this)
grunt> rec = Load '/home/Output/' using PigStorage(',') AS (Student:chararray,School:chararray,Year:int,Awards:int);
grunt> dump rec
;
step 3) Then i grouped it and tried to count the number of awards.
grunt>rec2 = FOREACH rec1 GENERATE group as Country,SUM(rec.Awards) as Award_count;
When i m dumping rec2, there is no error but output is (,)
The above command is working perfectly when using in local mode.I am getting desired output.
I am learning Apache Pig. I am trying to load some data in to pig. When i see the txt file in vi editor, I find the following (sample) row.
[ABBOTT,DEEDEE W GRADES 9-12 TEACHER 52,122.10 0 LBOE
ATLANTA INDEPENDENT SCHOOL SYSTEM 2010].
I use the following command to load data into a pig relation.
A = LOAD 'salaryTravelReport_sample.txt' USING PigStorage() as (name:chararray,
prof:chararray,max_sal:float,travel:float,board:chararray,state:chararray,year:int);
However, when I do a dump in pig in the distributed environment, I find the following result (for the row mentioned above):
(ABBOTT,DEEDEE W,GRADES 9-12 TEACHER,,0.0,LBOE,ATLANTA INDEPENDENT
SCHOOL SYSTEM,2010).
The numeric data "52,122.10 " seems to be missing.
Please help.
PigStorage() is inbuilt function in pig which takes record delimiter as arguments. here its tab -- > \t
A = LOAD 'salaryTravelReport_sample.txt' USING PigStorage('\t') as (name:chararray,
prof:chararray,max_sal:float,travel:float,board:chararray,state:chararray,year:int);
I have a csv data in the following format:
id,name,price,information
12,Pants,50.00,{Clothes & Shoes: 5}
And here is my pig script:
grunt> sample = LOAD 'data.csv' USING PigStorage (',') AS (id:int, name:chararray, price:double, information:chararray);
The problem is, when I load information as chararray, because I can't access the category or the quantity itself. I tried to do something like:
information:tuple(category:chararray, quantity:int)
But it didn't work..
What should I do?
What is the best way to load information so I can have access to both category and quantity..
Thanks
What you have is a Bag and not a Tuple.See here for Bag,Tuple.
( ) A tuple is enclosed in parentheses ( ).
{ } An inner bag is enclosed in curly brackets { }.
You can load it like this
sample = LOAD 'data.csv' USING PigStorage (',') AS (id:int, name:chararray, price:double, information:bag{});
I have the data like this:
$ cat samp.txt
Ramesh,[city#Bangalore],123
Arun,[city#Anantapur],345
Pranith,[city#US],456
I have written the following pig query:
A = load 'samp.txt' using PigStorage(',')
as(name:chararray,addr:map[chararray,chararray],empno:int);
When I execute the above code in pig I am getting the following error:
error: mismatched input ',' expecting RIGHT_BRACKET Details at logfile: /home/training/pig_1471586597209.log
Can any one help me to resolve this error?
A= load 'pdemo/samp' using PigStorage(',') as (name:chararray,add:map[],empno:int);
Now it will work..
I am having difficulty loading data using apache pig script
cat data15.txt
1,(2,3)
2,(3,4)
grunt>a = load 'nikhil/data15.txt' using PigStorage(',') as (x:int, y:tuple(y1:int,y2:int));
grunt>dump a;
(1,)
(2,)
I know its too late to answer this
The problem is that the tuple and the other field have the same delimiter as ','. Pig fails to do the schema conversion.
you can try something like thisyou need to change the delimiter
1:(5,7,7)
3:(7,9,4)
5:(5,9,7)
and run the pig script as
A = load 'file.txt' using PigStorage(':') as (t1:int,t2:tuple(x:int,y:int,z:int));
dump A;
the output is
(1,(5,7,7))
(3,(7,9,4))
(5,(5,9,7))
you can change the delimiter using sed command in the input file and then load the file.