Out of memory in Hive/tez with LATERAL VIEW json_tuple - hive

[There was an initial question at OOM in tez/hive but after some answers and comments a new question with the new knowledge is warranted.]
I have a query with a large LATERAL VIEW. It joins 4 tables, all ORC compressed. The buckets are on the same column. It goes like:
select
10 fields from t
, 80 fields from the lateral view
from
(
select
10 fields
from
e (800M rows, 7GB of data, 1 bucket)
LEFT JOIN m (1M rows, 20MB )
LEFT JOIN c (2k rows, <1MB)
LEFT JOIN contact (150M rows, 283GB, 4 buckets)
) t
LATERAL VIEW
json_tuple (80 fields) as lv
If I remove the LATERAL VIEW, the query completes.
If I add the LV, I always end up with:
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch.
Container id: container_e113_1516602562532_3606_01_000008
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
at org.apache.hadoop.util.Shell.run(Shell.java:844)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 255
]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
I tried many things:
update all tez.grouping.* settings.
Add the WHERE condition in the JOIN as well
set hive.auto.convert.join.noconditionaltask = false; to make sure does not try to do a map join
add distributed by different columns to prevent possible skewness
set mapred.map.tasks=100
I already maxed out all java-opts or memory settings.
I need to keep the LATERAL VIEW as it might be possible that some fields are used to filter on them (ie. I can't just do some nice string manipulation to output a csv-like table).
Is there a way to make the Lateral view fit in memory, or split it in multiple mappers? This is the tez UI view:
hdp2.6, 8 datanodes with 32GB Ram

Related

PostgreSQL Distinct Sort For Huge Amount of Data

Here my query is:
explain(buffers, analyze) SELECT DISTINCT e.eventid, e.objectid, e.clock, e.ns, e.name, e.severity
FROM EVENTS e, functions f, items i, hosts_groups hg
WHERE e.source='0' AND e.object='0' AND NOT EXISTS
(SELECT NULL FROM functions f, items i, hosts_groups hgg
LEFT JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (12, 13, 14, ...)
WHERE e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hgg.hostid
GROUP BY i.hostid HAVING MAX(permission)<2 OR MIN(permission) IS NULL OR MIN(permission)=0)
AND e.objectid=f.triggerid AND f.itemid=i.itemid AND i.hostid=hg.hostid
AND hg.groupid IN (1, 2, 3, ...)
AND e.value=1
ORDER BY e.eventid DESC;
You can find the related execution plan here.
As you can see, it spills to the disk. Because default value of work_mem is 8 MB. Than, I set work_mem to 1 GB on my session to see difference and run the query again. The new execution plans is here. Now, it is doing quicksort but still, the execution time is 779213.763 ms.
This query is a auto - generated query by a third party tool but we can change it I assume.
Doing distinct - sort for ~602k rows is insane. That is why I want to add more filter for clock column. Yet, I want to ask is there any other options to decrease execution time of this query?
Specifications for database server:
$ lscpu
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 2
NUMA node(s): 1
Memory: 96 GB
The database settings for:
max_parallel_workers_per_gather
---------------------------------
4
max_worker_processes
----------------------
16
max_parallel_workers
----------------------
16
Thanks!
It looks like the core of the problem is that the planner is not using a hashed subplan (where it runs it in bulk once and memorizes the results in a hash) for the NOT EXISTS, but rather is running it parameterized for each tuple in a loop. Usually this is because the planner thinks it will take too much memory to hash the results, but in this case I think it is just because it can not figure out how to analyze GROUP BY...HAVING.
You can guide it down the (presumably) correct path here by replacing the NOT EXISTS (...) with:
AND e.objectid NOT IN (
SELECT triggerid FROM functions f, items i, hosts_groups hgg
LEFT JOIN rights r ON r.id=hgg.groupid AND r.groupid IN (12, 13, 14 /*...*/)
WHERE f.itemid=i.itemid AND i.hostid=hgg.hostid
GROUP BY triggerid, i.hostid HAVING MAX(permission)<2 OR MIN(permission) IS NULL OR MIN(permission)=0
)
But before trying this, I might run just the inner query there by itself to see how long it takes and how many rows it returns.
If this ends up working, it might be worthwhile to investigate what it would take to make the planner smart enough to do this conversion on its own.

Does BigQuery charge for querying only the stream buffer?

I have a day partitioned table with approx 300k rows in the streaming buffer. When running an interactive, non-cached, standard SQL query using
SELECT .. FROM .. WHERE _PARTITIONTIME IS NULL
The query validator says:
Valid: This query will process 0 B when run.
And after executing, the job information tab says:
Bytes Processed 0 B
Bytes Billed 0 B
The query is certainly returning real-time results each time I run it. Is this actually a free operation?

how to implement offset in spark-sql

Spark-sql does not have support for offset only supports limit. My query returns a huge result and to get the specific rows I wrote spark sql as follows
select col1, col2 from (
select d.Diagnosis_Start_Date as col1,
e.Encounter_ID as col2,
row_number() over (order by d.Patient_ID) as row_num
from Diagnoses d, Encounters e
where d.Patient_ID = e.Patient_ID)
where row_num between 300 and 310;
But this is giving
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 249, ip-172-31-9-85.us-west-1.compute.internal, executor 23): ExecutorLostFailure (executor 23 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 6.2 GB of 5.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1505)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1493)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1492)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1492)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:803)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:803)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:803)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1720)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1675)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1664)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:629)
at org.apache.spark.SparkContext.runJob(SparkConte

Resource Exceeded error message in order by

I have a destination table(created as an output of some other query),
simple order by on one of its column is resulting "resource exceeded" error message.
Destination table created has 8.5 million rows and 6 columns (size 567 MB approx).
select col1,col2.....col6 from desttable order by col 5 desc
is resulting "resource exceeded" error message.
Remove ORDER BY and see if error disappear!
ORDER BY moves WHOLE data into one worker - thus resources exceeded
If I am adding "LIMIT" and "OFFSET" clause in the query after order by
its working,even though LIMIT clause is the last to be evaluated.How
it is working there??
When you add LIMIT N - query runs on multiple workers. Each worker gets only part of the data to process and outputs only respective N rows. Those N rows from all workers than gets "delivered" to one worker where final ORDER BY and LIMIT occurs and "winning" N rows becomes output of whole query

Hive Query Execution Error, return code 3 from MapredLocalTask

I am getting this error while performing a simple join between two tables. I run this query in Hive command line. I am naming table as a & b. Table a is Hive internal table and b is External table (in Cassandra). Table a has only 1610 rows and Table b has ~8million rows. In actual production scenario Table a could get upto 100K rows. Shown below is my join with table b as the last table in the join
SELECT a.col1, a.col2, b.col3, b.col4 FROM a JOIN b ON (a.col1=b.col1 AND a.col2=b.col2);
Shown below is the error
Total MapReduce jobs = 1
Execution log at: /tmp/pricadmn/.log
2014-04-09 07:15:36 Starting to launch local task to process map join; maximum memory = 932184064
2014-04-09 07:16:41 Processing rows: 200000 Hashtable size: 199999 Memory usage: 197529208 percentage: 0.212
2014-04-09 07:17:12 Processing rows: 300000 Hashtable size: 299999 Memory usage: 163894528 percentage: 0.176
2014-04-09 07:17:43 Processing rows: 400000 Hashtable size: 399999 Memory usage: 347109936 percentage: 0.372
...
...
...
2014-04-09 07:24:29 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 714454400 percentage: 0.766
2014-04-09 07:25:03 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 901427928 percentage: 0.967
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-5
Logs:
/u/applic/pricadmn/dse-4.0.1/logs/hive/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
I am using DSE 4.0.1. Following are few of my settings which you might be interested in mapred.map.child.java.opts=-Xmx512M
mapred.reduce.child.java.opts=-Xmx512M
mapred.reduce.parallel.copies=20
hive.auto.convert.join=true
I increased mapred.map.child.java.opts to 1G and i got past few more records and then errored out. It doesn't look like a good solution. Also i changed the order in the join but no help. I saw this link Hive Map join : out of memory Exception but didn't solve my issue.
For me it looks Hive is trying to put the bigger table in memory during local task phase which i am confused. As per my understanding the second table (in my case table b) should be streamed in. Correct me if I am wrong. Any help in solving this issue is highly appreciated.
set hive.auto.convert.join = false;
It appears your task is running out of memory. Check line 324 of the MapredLocalTask class.
} catch (Throwable e) {
if (e instanceof OutOfMemoryError
|| (e instanceof HiveException && e.getMessage().equals("RunOutOfMeomoryUsage"))) {
// Don't create a new object if we are already out of memory
return 3;
} else {
Last join should be the largest table. You can change the order of join tables.