What is the meaning of pig exit codes? - apache-pig

Pig exists with exit code 7 after printing these 3 lines:
2014-07-16 21:57:37,271 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.6.0 (rexported) compiled Feb 26 2014, 03:01:22
2014-07-16 21:57:37,272 [main] INFO org.apache.pig.Main - Logging error messages to: ..../pig_1405562257268.log
2014-07-16 21:57:37,627 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/sam/.pigbootup not found
what does this mean?
The INFO messages are normal
The only unusual bit is the exit code (7, see above)
The pig_*.log file does not exist
Is this documented somewhere?
EDIT: the problem was eliminated when I removed the semicolon from the end of the %declare line.
go figure...

You may take a look at the return codes in the source code.
The book Programming Pig also contains a list of their meaning in chapter two.
I copy them here for reference:
0 Success
1 Retriable failure
2 Failure
3 Partial failure - Used with multiquery; see “Nonlinear Data Flows”
4 Illegal arguments passed to Pig
5 IOException thrown - Would usually be thrown by a UDF
6 PigException thrown - Usually means a Python UDF raised an exception
7 ParseException thrown (can happen after parsing if variable substitution
is being done)
8 Throwable thrown (an unexpected exception)

Related

Tried loading a file in pig but this comesup everytime.Warning IMPLICIT_CAST_TO_FLOAT 2 time(s)

I am learning pig latin and this error keeps coming up.
command:
m = LOAD '/assignment/movies.csv' USING PigStorage(',')AS(id:int,name:chararray,year:int,rating:float,duration:int);
error msgs:
2021-01-11 21:10:44,303 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2021-01-11 21:10:44,304 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2021-01-11 21:10:44,310 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_FLOAT 2 time(s).
These are just warnings, not errors, so your script will complete.
You could compare the effect of loading rating in as e.g. chararray and then using an explicit cast in a FOREACH:
cast_rating = FOREACH m GENERATE
id..year,
(float)rating AS rating,
duration;

Using AWK to Retrieve an Error in a Cryptic Log File

I need to grab an occurrence of error for the current time, ignoring early occurrences. The problem is the date is few lines above (not on the same line as error code). How do I return the information from
***begin ibmdb error message***
which has the date & time to equate that to current time, and include all of this error log data:
*** begin ibmdb error message ***
Sun Dec 18 21:50:57 2016 - program 'execjob', User 'OSID:root', RMId 'root' Driver Version '9.0.1.14.865 2015-01-20 04:00:00'
DELETEDBREC() error on file 'USERRPT' in 'GEN'
DeleteSqlRec(lawson."USERRPT", 1)
DB2 FATAL ERROR for SQLExecute - Code: 40001/-911
[IBM][CLI Driver][DB2/AIX64] SQL0911N The current transaction has been rolled
back because of a deadlock or timeout. Reason code "68". SQLSTATE=40001
awk 'BEGIN{FS="begin ibmdb error message"} captures the beginning - how do I encapsulate the ending with - Reason code "68"
FS tells awk that the fields on your line will be separated by 'begin ibmdb error message'
You probably want to do something like
awk '/begin ibmdb error message/,/Reason code "68"/'
Something like this? I have started from the time point just for testing, instead of begin ibmdb error message since i thought might be more sections starting with the same text.
$ awk '/21:50/,/Reason code "68"/' file11
Sun Dec 18 21:50:57 2016 - program 'execjob', User 'OSID:root', RMId 'root' Driver Version '9.0.1.14.865 2015-01-20 04:00:00'
DELETEDBREC() error on file 'USERRPT' in 'GEN'
DeleteSqlRec(lawson."USERRPT", 1)
DB2 FATAL ERROR for SQLExecute - Code: 40001/-911
[IBM][CLI Driver][DB2/AIX64] SQL0911N The current transaction has been rolled
back because of a deadlock or timeout. Reason code "68". SQLSTATE=40001
Tip: You can see about capabilities in awk about pattern matching here: https://www.gnu.org/software/gawk/manual/html_node/Expression-Patterns.html

Avro : java.lang.RuntimeException: Unsupported type in record

Input: test.csv
100
101
102
Pig Script :
REGISTER required jars are registered;
A = LOAD 'test.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage() AS (code:chararray);
STORE A INTO 'test' USING org.apache.pig.piggybank.storage.avro.AvroStorage
('schema',
'{"namespace":"com.pig.test.avro","type":"record","name":"Avro_Test","doc":"Avro Test Schema",
"fields":[
{"name":"code","type":["string","null"],"default":null}
]}'
);
Getting a runtime error while STORE. Any inputs on resolving the same.
Error Log :
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Unsupported type in record:class java.lang.String
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263)
at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:722)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:558)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:106)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMap
2015-06-02 23:06:03,934 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-06-02 23:06:03,934 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
Looks like this is a bug: https://issues.apache.org/jira/browse/PIG-3358
If you can, try to update to pig 0.14, according to the comments this has been fixed.

Failed to load data from S3

I launched two m1.medium nodes on amazon ec2 for executing my pig script, but looks like it failed at the first line (even before MapReduce start): raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);
The error message I got:
2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20
... (omitted)
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022
Input(s):
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
I think the code should be fine since I have ever successfully loaded other data with the same syntax, and the link to s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000 looks valid. I suspect it might be related to some of my EC2 settings, but not sure how to investigate further or narrow down the problem. Anyone has a clue?
"Java heap space" error message gives some clues. Your files seem to be quite large (~2GB). Make sure that you have enough memory for each task runner to read the data.
The problem was currently solved by changing my node from m1.medium to m3.large , thanks for the good hint from #Nat as he pointed out the error message regarding with java heap space. I'll update more details later.

Rest of the file not processed

The status is shown as success but the file is not actually transferred to big-query.
# bq show -j abc
Job Type State Start Time Duration Bytes Processed
---------- --------- ----------------- ---------- -----------------
load SUCCESS 05 Jul 15:32:45 0:26:24
From web interface, I can see the actual error.
Line:9732968, Too few columns: expected 27 column(s) but got 9 column(s)
Line:10893908 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.
1) How do I know which bad character needs to be removed?
2) Why does "success" shown as job status?
Update:
Job ID: summary_2012_07_09_to_2012_07_10a2
The error that I got at command prompt:
BigQuery error in load operation: Backend Error
A lot of lines were not processed at all. The details from web interface:
Line:9857286 / Field:1, Bad character (ASCII 0) encountered: field starts with: <15>
Line:9857287 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.
All the lines where successfully processed in the second attempt:
job_id: summary_2012_07_09_to_2012_07_10a3
Update 2:
Line:174952407 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.
Job ID: job_19890847cbc3410495c3cecaf79b31fb
Sorry for the slow response, the holiday weekend meant most of the bigquery team was not answering support questions. The 'bad character' looks like it may be a known bug with some gzipped files where we improperly detect an ascii 0 value at the end of the file.
If the job is actually failing but reporting success, that sounds like a problem but we'll need the job id of the failing job in order to be able to debug. Also if you can reproduce it that would be helpful since we may not have the logs around for the original job anymore.