Error loading pig script - apache-pig

I am having difficulty loading data using apache pig script
cat data15.txt
1,(2,3)
2,(3,4)
grunt>a = load 'nikhil/data15.txt' using PigStorage(',') as (x:int, y:tuple(y1:int,y2:int));
grunt>dump a;
(1,)
(2,)

I know its too late to answer this
The problem is that the tuple and the other field have the same delimiter as ','. Pig fails to do the schema conversion.
you can try something like thisyou need to change the delimiter
1:(5,7,7)
3:(7,9,4)
5:(5,9,7)
and run the pig script as
A = load 'file.txt' using PigStorage(':') as (t1:int,t2:tuple(x:int,y:int,z:int));
dump A;
the output is
(1,(5,7,7))
(3,(7,9,4))
(5,(5,9,7))
you can change the delimiter using sed command in the input file and then load the file.

Related

How to remove extra entries from the tuple in apache pig

How do I solve the issue of extra comma or entries from the tuple
ab = load "/path/file1.txt" USING PigStorage(',') AS (id1:chararray, id2:chararray, dt:chararray, qty:int);
Current output:-
(F1,S9,12/09/2011,2,,,)
Expected Output:-
(F1,S9,12/09/2011,2)
Should I make changes in the text which is there in my file.txt or something else?
Write path between single quote('') in LOAD statement.
Example:
ab = load '/path/file1.txt' USING PigStorage(',') AS (id1:chararray, id2:chararray, dt:chararray, qty:int);

Load CSV file in PIG

In PIG, When we load a CSV file using LOAD statement without mentioning schema & with default PIGSTORAGE (\t), what happens? Will the Load work fine and can we dump the data? Else will it throw error since the file has ',' and the pigstorage is '/t'? Please advice
When you load a csv file without defining a schema using PigStorage('\t'), since there are no tabs in each line of the input file, the whole line will be treated as one tuple. You will not be able to access the individual words in the line.
Example:
Input file:
john,smith,nyu,NY
jim,young,osu,OH
robert,cernera,mu,NJ
a = LOAD 'input' USING PigStorage('\t');
dump a;
OUTPUT:
(john,smith,nyu,NY)
(jim,young,osu,OH)
(robert,cernera,mu,NJ)
b = foreach a generate $0, $1, $2;
dump b;
(john,smith,nyu,NY,,)
(jim,young,osu,OH,,)
(robert,cernera,mu,NJ,,)
Ideally, b should have been:
(john,smith,nyu)
(jim,young,osu)
(robert,cernera,mu)
if the delimiter was a comma. But since the delimiter was a tab and a tab does not exist in the input records, the whole line was treated as one field. Pig doe snot complain if a field is null- It just outputs nothing when there is a null. Hence you see only the commas when you dump b.
Hope that was useful.

Issue with Complex data types processing in pig with comma delimited data

I have the data like this:
$ cat samp.txt
Ramesh,[city#Bangalore],123
Arun,[city#Anantapur],345
Pranith,[city#US],456
I have written the following pig query:
A = load 'samp.txt' using PigStorage(',')
as(name:chararray,addr:map[chararray,chararray],empno:int);
When I execute the above code in pig I am getting the following error:
error: mismatched input ',' expecting RIGHT_BRACKET Details at logfile: /home/training/pig_1471586597209.log
Can any one help me to resolve this error?
A= load 'pdemo/samp' using PigStorage(',') as (name:chararray,add:map[],empno:int);
Now it will work..

apache pig load data with multiple delimiters

Hi everyone I have a problem about loading data using apache pig, the file format is like:
"1","2","xx,yy","a,sd","3"
So I want to load it by using the multiple delimiter "," 2double quotes and one comma like:
A = LOAD 'file.csv' USING PigStorage('","') AS (f1,f2,f3,f4,f5);
but the PigStorage doesn't accept the multiple delimiter ",".How I can do it? Thank you very much!
PigStorage takes single character as delimiter.You will have use builtin functions from PiggyBank. Download piggybank.jar and save in the same folder as your pigscript.Register the jar in your pigscript.
REGISTER piggybank.jar;
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
A = LOAD 'test1.txt' USING CSVLoader(',') AS (f1:int,f2:int,f3:chararray,f4:chararray,f5:int);
B = FOREACH A GENERATE f1,f2,f3,f4,f5;
DUMP B;
Alternate option is to load the data into a line and then use STRSPLIT
A = LOAD 'test1.txt' USING TextLoader() AS (line:chararray);
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line, '","'));
DUMP B;

Is it possible to create a variable in PIG

Can i create a variable in PIG and concatenate them where on if the variable is dynamic- like current time?
I need a file name to be created based on the current time.
%declare FILE_PREFIX file;
%declare FILE_POSTFIX date +%Y-%m-%d-%s;
Can i do something like:
file_name = '$FILE_PREFIX$FILE_POSTFIX';
As of my Experience,I worked like below..
Passed parameter from command line to pig script filename and date..
pig -f myscript.pig --param file="india_" --param nw=$(date +"%Y-%m-%d-%s")
In the pig script.
%declare FILE_PREFIX '$file$nw ';
A = load '/user/root/$FILE_PREFIX' USING PigStorage(',') as (id1, name1);
dump A;