Tried loading a file in pig but this comesup everytime.Warning IMPLICIT_CAST_TO_FLOAT 2 time(s) - apache-pig

I am learning pig latin and this error keeps coming up.
command:
m = LOAD '/assignment/movies.csv' USING PigStorage(',')AS(id:int,name:chararray,year:int,rating:float,duration:int);
error msgs:
2021-01-11 21:10:44,303 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 21:10:44,304 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2021-01-11 21:10:44,310 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 2 time(s).

These are just warnings, not errors, so your script will complete.
You could compare the effect of loading rating in as e.g. chararray and then using an explicit cast in a FOREACH:
cast_rating = FOREACH m GENERATE
id..year,
(float)rating AS rating,
duration;

Related

ERROR 1066: Unable to open iteratorfor alias

Command run (trying to get Maximum run scored)
Run_M = foreach Run_Group_All generate (Match.Player, Match.Run) , MAX(Match.Run);
As per log Group command is failing , can anybody help where is problem?
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:556)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2103: Problem doing work on Longs
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:84)
at org.apache.pig.builtin.AlgebraicLongMathBase.exec(AlgebraicLongMathBase.java:93)
at org.apache.pig.builtin.AlgebraicLongMathBase.exec(AlgebraicLongMathBase.java:37)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:326)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextLong(POUserFunc.java:410)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:351)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:400)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:317)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:474)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:442)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:422)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:269)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.Number
at org.apache.pig.builtin.AlgebraicLongMathBase.doTupleWork(AlgebraicLongMathBase.java:77)
... 20 more
2017-09-03 07:48:03,212 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2017-09-03 07:48:03,212 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1294624349_0011 has failed! Stop running all dependent jobs
2017-09-03 07:48:03,212 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2017-09-03 07:48:03,213 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-09-03 07:48:03,214 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2017-09-03 07:48:03,214 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed!
2017-09-03 07:48:03,215 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
2.8.1 0.15.0 goldi 2017-09-03 07:48:01 2017-09-03 07:48:03 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_local1294624349_0011 Cric,Match,Run_Group_All,Run_M GROUP_BY Message: Job failed! file:/tmp/temp-1949037811/tmp1601097545,
Input(s):
Failed to read data from "/home/goldi/Batting.csv"
Output(s):
Failed to produce result in "file:/tmp/temp-1949037811/tmp1601097545"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local1294624349_0011
2017-09-03 07:48:03,217 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
2017-09-03 07:48:03,218 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias Run_M
Details at logfile: /home/goldi/pig_1504365116860.log
Replace '(Match.Player, Match.Run)' with 'group'.
Run_M = foreach Run_Group_All generate FLATTEN(group) as (player,run) , MAX(Match.Run);

Avro : java.lang.RuntimeException: Unsupported type in record

Input: test.csv
100
101
102
Pig Script :
REGISTER required jars are registered;
A = LOAD 'test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (code:chararray);
STORE A INTO 'test' USING org.apache.pig.piggybank.storage.avro.AvroStorage
('schema',
'{"namespace":"com.pig.test.avro","type":"record","name":"Avro_Test","doc":"Avro Test Schema",
"fields":[
{"name":"code","type":["string","null"],"default":null}
]}'
);
Getting a runtime error while STORE. Any inputs on resolving the same.
Error Log :
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Unsupported type in record:class java.lang.String
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMap
2015-06-02 23:06:03,934 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-06-02 23:06:03,934 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
Looks like this is a bug: https://issues.apache.org/jira/browse/PIG-3358
If you can, try to update to pig 0.14, according to the comments this has been fixed.

Failed to load data from S3

I launched two m1.medium nodes on amazon ec2 for executing my pig script, but looks like it failed at the first line (even before MapReduce start): raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);
The error message I got:
2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20
... (omitted)
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022
Input(s):
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
I think the code should be fine since I have ever successfully loaded other data with the same syntax, and the link to s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000 looks valid. I suspect it might be related to some of my EC2 settings, but not sure how to investigate further or narrow down the problem. Anyone has a clue?
"Java heap space" error message gives some clues. Your files seem to be quite large (~2GB). Make sure that you have enough memory for each task runner to read the data.
The problem was currently solved by changing my node from m1.medium to m3.large , thanks for the good hint from #Nat as he pointed out the error message regarding with java heap space. I'll update more details later.

What is the meaning of pig exit codes?

Pig exists with exit code 7 after printing these 3 lines:
2014-07-16 21:57:37,271 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.6.0 (rexported) compiled Feb 26 2014, 03:01:22
2014-07-16 21:57:37,272 [main] INFO org.apache.pig.Main - Logging error messages to: ..../pig_1405562257268.log
2014-07-16 21:57:37,627 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/sam/.pigbootup not found
what does this mean?
The INFO messages are normal
The only unusual bit is the exit code (7, see above)
The pig_*.log file does not exist
Is this documented somewhere?
EDIT: the problem was eliminated when I removed the semicolon from the end of the %declare line.
go figure...
You may take a look at the return codes in the source code.
The book Programming Pig also contains a list of their meaning in chapter two.
I copy them here for reference:
0 Success
1 Retriable failure
2 Failure
3 Partial failure - Used with multiquery; see “Nonlinear Data Flows”
4 Illegal arguments passed to Pig
5 IOException thrown - Would usually be thrown by a UDF
6 PigException thrown - Usually means a Python UDF raised an exception
7 ParseException thrown (can happen after parsing if variable substitution
is being done)
8 Throwable thrown (an unexpected exception)

Pig ORDER command fails

I am trying to analyze an apache log and the goal is the find out all user agents and their percentage in usage. The following program works fine to the line when result contains each useragent, count and percentage. The program fails at last line when tries to order according to most used. Could someone help?
logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);
uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;
gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
subtotal = COUNT(uarows);
GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
};
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;
what's weird is that 'dump result' works just fine, so it's the ORDER line makes trouble
errors:
013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R:
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,
Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002 -> job_local_0003,
job_local_0003 -> job_local_0004,
job_local_0004 -> job_local_0005,
job_local_0005
2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log
Make sure two things:
1) Run pig in local mode: pig -x local
2) Set either PIG_HOME or PIG_INSTALL environment variable to point to pig installation directory
Please check that you don't have already file /tmp/temp265162785/tmp896004388
You can use the same file\directory for different tasks.