Getting error: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask - sql

I have hit an error when trying to run with Impala script. Below is the error codes, where was the issue from?
INFO : Kill Command = /var/opt/teradata/cloudera/parcels/CDH-6.3.3-1.cdh6.3.3.p4155.5453354/lib/hadoop/bin/hadoop job -kill job_1653322450545_7855589
INFO : Hadoop job information for Stage-2: number of mappers: 3; number of reducers: 1
INFO : 2022-07-22 17:13:42,628 Stage-2 map = 0%, reduce = 0%
INFO : 2022-07-22 17:14:28,740 Stage-2 map = 33%, reduce = 100%
INFO : 2022-07-22 17:14:29,763 Stage-2 map = 100%, reduce = 100%
ERROR : Ended Job = job_1653322450545_7855589 with errors
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
INFO : MapReduce Jobs Launched:
INFO : Stage-Stage-2: Map: 3 Reduce: 1 HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
INFO : Total MapReduce CPU Time Spent: 0 msec
INFO : Completed executing command(queryId=hive_20220722171206_e2c10ea6-b409-426d-b6ee-08a780903fdd); Time taken: 143.612 seconds

Related

Hive Mapreduce Query runs only once after running HIVESERVER

hive mapreduce query runs only once successfully after running hiveserver once afterwards it gives error eventhough the same query
hiveserver info:-
Connecting to jdbc:hive2://localhost:10000
to: Apache Hive (version 4.0.0-alpha-2)
Driver: Hive JDBC (version 4.0.0-alpha-2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.0-alpha-2 by Apache Hive
0: jdbc:hive2://localhost:10000\>insert into table abc values(3983);
1 row affected (10.589 seconds)
0: jdbc:hive2://localhost:10000\>insert into table abc values(3973);
Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
0: jdbc:hive2://localhost:10000\>insert into table abc values(3173);
Error: Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
first mapreduce successful query
Query ID = ndv_20230203183337_f004b3f8-5f56-4a47-822c-d756ce652cef
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=\<number\>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=\<number\>
In order to set a constant number of reducers:
set mapreduce.job.reduces=\<number\>
Job running in-process (local Hadoop)
2023-02-03 18:33:45,085 Stage-1 map = 100%, reduce = 0%
2023-02-03 18:33:46,117 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local1222265821_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory file:/Users/ndv/Downloads/work/hivestuff/warehouse/abc/.hive-staging_hive_2023-02-03_18-33-37_374_1488321338071048426-1/-ext-10000
Loading data to table default.abc
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
second same query which failed
Query ID = ndv_20230203183352_2fb0fdba-3e66-4e7e-8624-43a93cbdfc9b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=\<number\>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=\<number\>
In order to set a constant number of reducers:
set mapreduce.job.reduces=\<number\>
Job running in-process (local Hadoop)
2023-02-03 18:33:54,841 Stage-1 map = 0%, reduce = 0%
Ended Job = job_local2047992482_0002 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
in hiveserver web ui it is showing it failed in stage 1 only
Stage-1:MAPRED Failure, ReturnVal 2
Stage 1 statistics:
Status: Failure, ReturnVal 2
Error FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce job progress:
0% map0% reduce
"Map Progress (%)": 0
"Reduce Progress (%)": 0
"Cleanup Progress (%)": 100
"Setup Progress (%)": 100
"Complete": true
"Successful": false

Failed to load data from S3

I launched two m1.medium nodes on amazon ec2 for executing my pig script, but looks like it failed at the first line (even before MapReduce start): raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);
The error message I got:
2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20
... (omitted)
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022
Input(s):
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
I think the code should be fine since I have ever successfully loaded other data with the same syntax, and the link to s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000 looks valid. I suspect it might be related to some of my EC2 settings, but not sure how to investigate further or narrow down the problem. Anyone has a clue?
"Java heap space" error message gives some clues. Your files seem to be quite large (~2GB). Make sure that you have enough memory for each task runner to read the data.
The problem was currently solved by changing my node from m1.medium to m3.large , thanks for the good hint from #Nat as he pointed out the error message regarding with java heap space. I'll update more details later.

dump works, but store doesn't - pig - where can I find details of the error?

I'm trying to load apache log, split into fields and save it into hcatalog.
apache_log = LOAD 'httpd-www01-access.log.2014-02-09-*' USING TextLoader AS (line:chararray);
apache_row = FOREACH apache_log GENERATE FLATTEN (
REGEX_EXTRACT_ALL
(line,'^"(\\S+)" \\[(\\d{2}\\/\\w+\\/\\d{4}:\\d{2}:\\d{2}:\\d{2} \\+\\d{4}]) (\\S+) (\\S+) "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" "([^"]*)"'))
AS (ip: chararray, datetime: chararray, session_id: chararray, time_of_request:chararray, request: chararray, status: chararray, size: chararray, referer : chararray, cookie: chararray, user_agent: chararray);
If i do:
a = sample apache_row 0.001;
dump a
it works.
but
store apache_row into 'stage.apache_log' using org.apache.hcatalog.pig.HCatStorer();
doesn't.
Error:
2014-02-17 08:17:13,812 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2014-02-17 08:17:13,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201402120751_0117 has failed! Stop running all dependent jobs
2014-02-17 08:17:13,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-02-17 08:17:13,814 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-02-17 08:17:13,815 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.2.0.1.3.2.0-111 0.11.1.1.3.2.0-111 pig 2014-02-17 08:16:24 2014-02-17 08:17:13 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201402120751_0117 apache_log,apache_row MAP_ONLY Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201402120751_0117_m_000000 stage.atg_apache_log,
Input(s):
Failed to read data from "hdfs://hadoop1:8020/user/pig/httpd-www01-access.log.2014-02-09-*"
Output(s):
Failed to produce result in "stage.apache_log"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201402120751_0117
Where can I fing any details of the problem?
There is an an information that more details I can find under:
hadoop1:50030/jobdetails.jsp?jobid=job_201402120751_0117
But when job is finished it doesn't work...
Regards
Pawel

Pig ORDER command fails

I am trying to analyze an apache log and the goal is the find out all user agents and their percentage in usage. The following program works fine to the line when result contains each useragent, count and percentage. The program fails at last line when tries to order according to most used. Could someone help?
logs = LOAD '$LOGS' USING ApacheCombinedLogLoader AS (remoteHost, hyphen, user, time, method, uri, protocol, statusCode, responseSize, referer, userAgent);
uarows = FOREACH logs GENERATE userAgent;
total = FOREACH (GROUP uarows ALL) GENERATE COUNT(uarows) as count;
dump total;
gpuarows = GROUP uarows BY userAgent;
result = FOREACH gpuarows {
subtotal = COUNT(uarows);
GENERATE flatten(group) as ua, subtotal AS SUB_TOTAL, 100*(double)subtotal/(double)total.count AS percentage;
};
orderresult = ORDER result BY SUB_TOTAL DESC;
dump orderresult;
what's weird is that 'dump result' works just fine, so it's the ORDER line makes trouble
errors:
013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - data buffer = 79691776/99614720
2013-04-13 11:33:09,976 [Thread-48] INFO org.apache.hadoop.mapred.MapTask - record buffer = 262144/327680
2013-04-13 11:33:09,995 [Thread-48] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0005
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/dliu/ApacheLogAnalysisWithPig/pigsample_1573648613_1365823989735
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigFileInputFormat.listStatus(PigFileInputFormat.java:37)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.pig.impl.io.ReadToEndLoader.init(ReadToEndLoader.java:177)
at org.apache.pig.impl.io.ReadToEndLoader.<init>(ReadToEndLoader.java:124)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:131)
... 6 more
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0005
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases orderresult
2013-04-13 11:33:10,276 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: orderresult[16,14] C: R:
2013-04-13 11:33:15,286 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2013-04-13 11:33:15,286 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0005 has failed! Stop running all dependent jobs
2013-04-13 11:33:15,287 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2013-04-13 11:33:15,287 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2013-04-13 11:33:15,288 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
1.0.4 0.11.0 dliu 2013-04-13 11:32:27 2013-04-13 11:33:15 GROUP_BY,ORDER_BY
Some jobs have failed! Stop running all dependent jobs
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local_0002 1 1 n/a n/a n/a n/a n/a n/a 1-18,logs,total,uarows MULTI_QUERY,COMBINER
job_local_0003 1 1 n/a n/a n/a n/a n/a n/a gpuarows,result GROUP_BY,COMBINER
job_local_0004 1 1 n/a n/a n/a n/a n/a n/a orderresult SAMPLER
Failed Jobs:
JobId Alias Feature Message Outputs
job_local_0005 orderresult ORDER_BY Message: Job failed! Error - NA file:/tmp/temp265162785/tmp896004388,
Input(s):
Successfully read 0 records from: "file:///home/dliu/ApacheLogAnalysisWithPig/access.log"
Output(s):
Failed to produce result in "file:/tmp/temp265162785/tmp896004388"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local_0002 -> job_local_0003,
job_local_0003 -> job_local_0004,
job_local_0004 -> job_local_0005,
job_local_0005
2013-04-13 11:33:15,291 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Some jobs have failed! Stop running all dependent jobs
2013-04-13 11:33:15,297 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias orderresult
Details at logfile: /home/dliu/ApacheLogAnalysisWithPig/pig_1365823931459.log
Make sure two things:
1) Run pig in local mode: pig -x local
2) Set either PIG_HOME or PIG_INSTALL environment variable to point to pig installation directory
Please check that you don't have already file /tmp/temp265162785/tmp896004388
You can use the same file\directory for different tasks.

Amazon Elastic Map Reduce for analyzing s3 logs

I am using EMR to analyze web nginx logs. But I need to process the logs so that it can fall into rows and columns in order to make it easy for querying. Thus i made two tables - rawlog, processedlog in the following manner:
create table rawlog(line string)
row format delimited fields terminated by '\t' lines terminated by '\n'
LOCATION 's3://istreamanalytics/logs/';
CREATE EXTERNAL TABLE processedlog (
day string,
hour int,
playSessionId string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n';
and added a ruby script to hive which can do the transformation, the script is as follows:
#!/usr/bin/env ruby
mon={"Jan" => '01',"Feb" => '02',"Mar" => '03',"Apr" => '04',"May" => '05',"Jun" => '06',"Jul" => '07',"Aug" => '08',"Sep" => '09',"Oct" => '10',"Nov" => '11',"Dec" => '12'}
STDIN.each_line do |line|
if line =~ /(\d+)\/(\w+)\/(\d+):(\d+):\d+:\d+ \+\d+] "GET \/api\?playSessionId=(^&*)/
d = "#{$3}-#{mon$2}-#{$1}"
h = $4
pid = $5
puts "#{d}\t#{h}\t#{pid}"
end
end
Now when i run the job using the following command on hive:
from rawlog insert overwrite table processedlog select transform (line) using 'ruby /mnt/var/lib/hive_081/downloaded_resources/hive_transformer.rb' as (day String, hour INT, playSessionId String);
I am getting the following error:
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201206061145_0015, Tracking URL = http://domU-12-31-39-0F-86-07.compute-1.internal:9100/jobdetails.jsp?jobid=job_201206061145_0015
Kill Command = /home/hadoop/.versions/0.20.205/libexec/../bin/hadoop job -Dmapred.job.tracker=10.193.133.241:9001 -kill job_201206061145_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2012-06-08 09:47:49,644 Stage-1 map = 0%, reduce = 0%
2012-06-08 09:48:50,267 Stage-1 map = 0%, reduce = 0%
2012-06-08 09:48:52,278 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201206061145_0015 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201206061145_0015_m_000002 (and more) from job job_201206061145_0015
Exception in thread "Thread-41" java.lang.RuntimeException: Error while reading from task log url
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL:
http://10.254.139.143:9103/tasklogtaskid=attempt_201206061145_0015_m_000000_2&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 3 more
Counters:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
Can someone tell me what's wrong ?
EMR is a very generic tool to deal with logs.
Why not use more tailored technology.
E.g.:
Sumo Logic
Splunk Storm
Loggly
At least with Sumo you could make that kind of processing much easier.
The only suggestion I would make is make sure the script is working properly before EMR. Using EMR to test is script should be the very last step in the process. Beyond that it is usually a basic config problem.
Some basic googling found:
http://entxtech.blogspot.com/2010/10/how-to-unit-test-apache-hive-scripts.html
http://jairam.me/2011/09/08/hive-on-amazon-emr/
More details on the error can be found in the log files or see the details here in your case: http://10.254.139.143:9103/tasklogtaskid=attempt_201206061145_0015_m_000000_2&start=-8193