ERROR 1066: Unable to open iterator for alias ~ ERROR - apache-pig

The following code works fine:
S4 = FOREACH S3 GENERATE group AS page_i,
COUNT(S2) AS outlinks,
FLATTEN(S2.rank);
DUMP S4
The result are like below:
(Computer engineering,1,0.1111111111111111)
(Outline of computer science,1,0.1111111111111111)
However, when I try to create one more table using divide:
S44 = FOREACH S4 GENERATE group as page_i, outlinks/2, ...
It goes gown like:
Failed Jobs:
JobId Alias Feature Message Outputs
job_local125575051_0033 S6,S66 DISTINCT Message: Job failed! Error - NA
file:/tmp/temp-847036156/tmp-1908150009,
Input(s):
Successfully read records from: "/home/song/workspace/FinalProject/output/part-m-00000"
Successfully read records from: "/home/song/workspace/FinalProject/output/part-m-00000"
Output(s):
Failed to produce result in "file:/tmp/temp-847036156/tmp-1908150009"
Job DAG:
job_local777342816_0028 -> job_local39708124_0029,
job_local39708124_0029 -> job_local495952123_0030,job_local268178801_0032,
job_local495952123_0030 -> job_local1880869927_0031,
job_local1880869927_0031 -> job_local268178801_0032,
job_local268178801_0032 -> job_local125575051_0033,
job_local125575051_0033
2015-04-29 07:45:49,133 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2015-04-29 07:45:49,136 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias S66
Details at logfile: /home/song/workspace/FinalProject/pig_1430315000370.log

It seems that you are trying to project a field called group for each tuple in relation S4, while the schema contains only those fields:
page_i
outlinks
unnamed field as a result of FLATTEN(S2.rank).
So I guess that all you need to do is to replace the invalid line with:
S44 = FOREACH S4 GENERATE page_i as page_i, outlinks/2, ...
Or maybe just:
S44 = FOREACH S4 GENERATE page_i, outlinks/2, ...
Hope that this was the only issue here...

Related

Load File delimited by double colon :: in pig

Following is a sample dataset delimited by double colon(::).
1::Toy Story (1995)::Animation|Children's|Comedy
I want to extract three fields from above data set as movieID,title and genre. I have written following code for that
movies = LOAD 'location/of/dataset/on/hdfs '
using PigStorage('::')
as
(MovieID:int,title:chararray,genre:chararray);
But i am getting following error
ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
<file script.pig, line 1, column 9> pig script failed to validate:
java.lang.RuntimeException: could not instantiate 'PigStorage' with arguments '[::]'
Use MyRegExloader: You will need piggybank.jar for this.
REGISTER '/path/to/piggybank.jar'
A = LOAD '/path/to/dataset' USING org.apache.pig.piggybank.storage.MyRegExLoader('([^\\:]+)::([^\\:]+)::([^\\:]+)')
as (movieid:int, title:chararray, genre:chararray);
Output :
(1,Toy Story (1995),Animation|Children's|Comedy)

Pig-Attempt to access non existing field

Problem:
Dumping filtered output throws an error and prints incorrect output with warnings:
Error-attempt to access non-existing field in input
Steps:
Loaded a tab-delimited file into relation a:
a = LOAD '/user/a6000518-a/AdobeHourlySampleHit/hit_data.tsv' USING PigStorage('\t');
This file contains 952 columns.
I want to list the values in the 374th column. I did a null check and generated the 374th column values.
b = FILTER a BY $373 is not null;
c = FOREACH b GENERATE $373;
DUMP c
Dumping the results produces the expected output but also prints this warning message:
2015-08-20 16:50:53,179 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject(ACCESSING_NON_EXISTENT_FIELD): Attempt to access field which was not found in the input
Could you please let me know where I could've made a mistake?
Thanks!

Avro : java.lang.RuntimeException: Unsupported type in record

Input: test.csv
100
101
102
Pig Script :
REGISTER required jars are registered;
A = LOAD 'test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (code:chararray);
STORE A INTO 'test' USING org.apache.pig.piggybank.storage.avro.AvroStorage
('schema',
'{"namespace":"com.pig.test.avro","type":"record","name":"Avro_Test","doc":"Avro Test Schema",
"fields":[
{"name":"code","type":["string","null"],"default":null}
]}'
);
Getting a runtime error while STORE. Any inputs on resolving the same.
Error Log :
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Unsupported type in record:class java.lang.String
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMap
2015-06-02 23:06:03,934 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-06-02 23:06:03,934 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
Looks like this is a bug: https://issues.apache.org/jira/browse/PIG-3358
If you can, try to update to pig 0.14, according to the comments this has been fixed.

Pig process multiple file error: ERROR 0: Error while executing ForEach at []

I have 4 files A, B, C, D under the directory /user/bizlog/cpc on HDFS, and the record looks like this:
87465422^C376832^C27786^C21161214^Ckey
Here is my pig script:
cpc_all = load '/user/bizlog/cpc' using PigStorage('\u0003') as (cpcid, accountid, cpcplanid, cpcgrpid, key);
cpc = foreach cpc_all generate accountid, key;
account_group = group cpc by accountid;
account_sort = order account_group by group;
account_key = foreach account_sort generate group, BagToTuple(cpc.key);
store account_key into 'last' using PigStorage('\u0003');
It will get results such as:
376832^Ckey1^Ckey2
Above script suppose to process all the 4 files, but I get this error:
Backend error message
---------------------
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.
Pig Stack Trace
---------------
ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: account_key: New For Each(false,false)[bag] - scope-18 Operator Key: scope-18): org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach at []
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:242)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:464)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:432)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:412)
================================================================================
Oddly if I load one single file such as load '/user/bizlog/cpc/A' then the script will succeed.
If I load each file first and then union them, it will work fine too.
If I put the sort step at the last and the error goes away
The version of hadoop is 0.20.2 and the pig version is 0.12.1, any help will be appreciated
As mentioned in the comments:
I put the sort step at the last and the error goes away
Though I did not find much on the topic, it appears that pig does not like to rearrange the group itself.
As such the 'solution' is to rearrane the output of what is generated for the group, instead of ordering the group itself.

pig file load error

I am trying to run this commang over pig env.
grunt> A = LOAD inp;
But I am getting this error in the log files:
Pig Stack Trace:
ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Failed to parse: mismatched input 'inp' expecting QUOTEDSTRING
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:226)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1565)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1538)
at org.apache.pig.PigServer.registerQuery(PigServer.java:540)
at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:970)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:386)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:490)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
And in console Iam getting like this:
grunt> A = LOAD inp;
2012-10-26 12:18:34,627 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input 'inp' expecting QUOTEDSTRING
Details at logfile: /usr/local/hadoop/pig_1351232517175.log
Can any body provide me appropriate solution for this?
The syntax for load has been used wrongly. Check out the correct example provided herewith.
http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#LOAD
Suppose we have a data file called myfile.txt. The fields are tab-delimited. The records are newline-separated.
1 2 3
4 2 1
8 3 4
In this example the default load function, PigStorage, loads data from myfile.txt to form relation A. The two LOAD statements are equivalent. Note that, because no schema is specified, the fields are not named and all fields default to type bytearray.
A = LOAD 'myfile.txt';
A = LOAD 'myfile.txt' USING PigStorage('\t');
DUMP A;
(1,2,3)
(4,2,1)
(8,3,4)
Example from http://pig.apache.org/docs
I believe the error log is self explanatory, it says - expecting QUOTEDSTRING
Please put the file name in single quotes to solve this issue.