pig file load error - apache-pig

I am trying to run this commang over pig env.
grunt> A = LOAD inp;
But I am getting this error in the log files:
Pig Stack Trace:
ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Failed to parse: mismatched input 'inp' expecting QUOTEDSTRING
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:226)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
And in console Iam getting like this:
grunt> A = LOAD inp;
2012-10-26 12:18:34,627 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Details at logfile: /usr/local/hadoop/pig_1351232517175.log
Can any body provide me appropriate solution for this?

The syntax for load has been used wrongly. Check out the correct example provided herewith.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#LOAD
Suppose we have a data file called myfile.txt. The fields are tab-delimited. The records are newline-separated.
1 2 3
4 2 1
8 3 4
In this example the default load function, PigStorage, loads data from myfile.txt to form relation A. The two LOAD statements are equivalent. Note that, because no schema is specified, the fields are not named and all fields default to type bytearray.
A = LOAD 'myfile.txt';
A = LOAD 'myfile.txt' USING PigStorage('\t');
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
Example from http://pig.apache.org/docs
I believe the error log is self explanatory, it says - expecting QUOTEDSTRING

Please put the file name in single quotes to solve this issue.

Related

how to remove headers using piggybank?

I have a directory which contains 10 files and I want to remove the headers from the files present in directory and while executing using piggybank, I am getting an error. Is there any other way which can remove header from all the files present in a directory.My code is:-
REGISTER /usr/lib/pig/piggybank.jar;
input = LOAD 'insurance_data' using CSVExcelStorage(
',','default','NOCHANGE','SKIP_INPUT_HEADER')
as (population:int, private:int,public:int,uninsecured:int);
dump input;
The error which I am getting is :-
2016-09-13 14:01:48,239 [main] ERROR org.apache.pig.PigServer -
exception during parsing: Error during parsing. mismatched input 'input' expecting EOF Failed to parse:
mismatched input 'input' expecting
EOF at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:241)
at
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688) at
org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421) at
org.apache.pig.PigServer.parseAndBuild(PigServer.java:354) at
org.apache.pig.PigServer.executeBatch(PigServer.java:379) at
org.apache.pig.PigServer.executeBatch(PigServer.java:365) at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at
org.apache.pig.Main.run(Main.java:613) at
org.apache.pig.Main.main(Main.java:158) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606) at
org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
org.apache.hadoop.util.RunJar.main(RunJar.java:136) 2016-09-13
14:01:48,250 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR
1200: mismatched input 'input'
expecting EOF Details at logfile: /home/cloudera/pig_1473800504430.log
'input' is a keyword.Reference here.Change the name of the relation 'input' to something else.
REGISTER /usr/lib/pig/piggybank.jar;
A = LOAD 'insurance_data' USING CSVExcelStorage(',','default','NOCHANGE','SKIP_INPUT_HEADER') as (population:int, private:int,public:int,uninsecured:int);
DUMP A;

Load File delimited by double colon :: in pig

Following is a sample dataset delimited by double colon(::).
1::Toy Story (1995)::Animation|Children's|Comedy
I want to extract three fields from above data set as movieID,title and genre. I have written following code for that
movies = LOAD 'location/of/dataset/on/hdfs '
using PigStorage('::')
as
(MovieID:int,title:chararray,genre:chararray);
But i am getting following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file script.pig, line 1, column 9> pig script failed to validate:
java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[::]'
Use MyRegExloader: You will need piggybank.jar for this.
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('([^\\:]+)::([^\\:]+)::([^\\:]+)')
as (movieid:int, title:chararray, genre:chararray);
Output :
(1,Toy Story (1995),Animation|Children's|Comedy)

Unable to extract data with double pipe delimiter in Pig Script

I am trying to extract data which is pipe delimited in Pig. Following is my command
L = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('||');
Iam getting following error
2016-08-04 23:58:21,122 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<line 1, column 4> pig script failed to validate: java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[||]'
My input sample file has exactly 5 lines as following
POS_TIBCO||HDFS||POS_LOG||1||7806||2016-07-18||1||993||0
POS_TIBCO||HDFS||POS_LOG||2||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||3||7806||2016-07-18||1||0||5
POS_TIBCO||HDFS||POS_LOG||4||7806||2016-07-18||1||0||0
POS_TIBCO||HDFS||POS_LOG||5||7806||2016-07-18||1||0||19.99
I tried several options like using the backslash before delimiter(\||,\|\|) but everything failed. Also, I tried with schema but got the same error.I am using Horton works(HDP2.2.4) and pig (0.14.0).
Any help is appreciated. Please let me know if you need any further details.
I have faced this case, and by checking PigStorage code source, i think PigStorage argument should be parsed into only one character.
So we can use this code instead:
L0 = LOAD 'entirepath_in_HDFS/b.txt/part-m*' USING PigStorage('|');
L = FOREACH L0 GENERATE $0,$2,$4,$6,$8,$10,$12,$14,$16;
Its helpful if you know how many column you have, and it will not affect performance because it's map side.
When you load data using PigStorage, It only expects single character as delimiter.
However if still you want to achieve this you can use MyRegExLoader-
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('||')
as (movieid:int, title:chararray, genre:chararray);

Getting error message Unexpected character ' ' when running LOAD command in PIG

I have a file stored in HDFS at this path: /user/hdfs/countries
(the file is in comma separated format).
To import this HDFS data into PIG I ran the below command in PIG:
test = load ‘/ user/hdfs/countries’ using PigStorage(',') as (id:int, Name:chararray, Language:chararray);
where,
ID: is the primary key column in HDFS file
Name and Language are the column names in HDFS file
I am getting below error when I run the above mentioned pig command:
Pig Stack Trace
ERROR 1200: <line 1, column 12> Unexpected character ''
Failed to parse: <line 1, column 12> Unexpected character ''
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:243)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:179)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1648)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1621)
at org.apache.pig.PigServer.registerQuery(PigServer.java:575)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:541)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Can someone please help me with this? Is my command incorrect or any jar file is missing?
Thank you in advance!
It tells you exactly where the problem is: the ‘ should be replaced by ' which is not the same character.
Also, the space after the / seems fishy.

Pig script fails with java.io.EOFException: Unexpected end of input stream

I am having a Pig script to pick up a set of fields using regular expression and store the data to a Hive table.
--Load data
cisoFortiGateDataAll = LOAD '/user/root/cisodata/Logs/Fortigate/ec-pix-log.20140625.gz' USING TextLoader AS (line:chararray);
--There are two types of data, filter type1 - The field dst_country seems unique there
cisoFortiGateDataType1 = FILTER cisoFortiGateDataAll BY (line matches '.*dst_country.*');
--Parse each line and pick up the required fields
cisoFortiGateDataType1Required = FOREACH cisoFortiGateDataType1 GENERATE
FLATTEN(
REGEX_EXTRACT_ALL(line, '(.*?)\\s(.*?)\\s(.*?)\\s(.*?)\\sdate=(.*?)\\s+time=(.*?)\\sdevname=(.*?)\\sdevice_id=(.*?)\\slog_id=(.*?)\\stype=(.*?)\\ssubtype=(.*?)\\spri=(.*?)\\svd=(.*?)\\ssrc=(.*?)\\ssrc_port=(.*?)\\ssrc_int=(.*?)\\sdst=(.*?)\\sdst_port=(.*?)\\sdst_int=(.*?)\\sSN=(.*?)\\sstatus=(.*?)\\spolicyid=(.*?)\\sdst_country=(.*?)\\ssrc_country=(.*?)\\s(.*?\\s.*)+')
) AS (
rmonth:charArray, rdate:charArray, rtime:charArray, ip:charArray, date:charArray, time:charArray,
devname:charArray, deviceid:charArray, logid:charArray, type:charArray, subtype:charArray,
pri:charArray, vd:charArray, src:charArray, srcport:charArray, srcint:charArray, dst:charArray,
dstport:charArray, dstint:charArray, sn:charArray, status:charArray, policyid:charArray,
dstcountry:charArray, srccountry:charArray, rest:charArray );
--Store to hive table
STORE cisoFortiGateDataType1Required INTO 'ciso_db.fortigate_type1_1_table' USING org.apache.hcatalog.pig.HCatStorer();
The script works fine on a small file but breaks with the following exception on a bigger file (750 MB). Any idea how can I debug and find the root cause?
2014-09-03 15:31:33,562 [main] ERROR org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - java.io.EOFException: Unexpected end of input stream
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:149)
at org.apache.pig.builtin.TextLoader.getNext(TextLoader.java:58)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:211)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
Check the size of the text you are loading into line:chararray.If the size is greater than hdfs block size (64 MB) then you will get an error.