How can I change physical memory in a mapreduce/hive job? - hive

I'm trying to run a Hive INSERT OVERWRITE query on an EMR cluster with 40 worker nodes and single master node.
However, while running the INSERT OVERWRITE query, as soon as I get to
Stage-1 map = 100%, reduce = 100%, Cumulative CPU 180529.86 sec
this state, I get the following error:
Ended Job = job_1599289114675_0001 with errors
Diagnostic Messages for this Task:
Container [pid=9944,containerID=container_1599289114675_0001_01_041995] is running beyond physical memory limits. Current usage: 1.5 GB of 1.5 GB physical memory used; 3.2 GB of 7.5 GB virtual memory used. Killing container.
Dump of the process-tree for container_1599289114675_0001_01_041995 :
I'm not sure how can I change the 1.5 GB physical memory number. In my configurations, I don't see such a number, and I don't understand how that 1.5 GB number is being calculated.
I even tried changing the "yarn.nodemanager.vmem-pmem-ratio":"5" to 5 as suggested in some forums. But irrespective of this change, I still get the error.
This is how the job starts:
Number of reduce tasks not specified. Estimated from input data size: 942
Hadoop job information for Stage-1: number of mappers: 910; number of reducers: 942
And this is how my configuration file looks like for the cluster. I'm unable to understand what settings do I have to change to not run into this issue. Could it also be due to Tez settings? Although I'm not using it as the engine.
Any suggestions will be greatly appreciated, thanks.

While opening hive console, append the following to the command
--hiveconf mapreduce.map.memory.mb=8192 --hiveconf mapreduce.reduce.memory.mb=8192 --hiveconf mapreduce.map.java.opts=-Xmx7600M
Incase you still get the Java heap error, try increasing to higher values, but make sure that the mapreduce.map.java.opts doesn't exceed mapreduce.map.memory.mb.

Related

running into an issue when running query on Impala

I've been running into the following issue when running basic queries on Impala (For example: select * from table limit 100) recently. I did some online research but have not found a fix for this. any insights on how I could fix this? I use HUE for querying.
ExecQueryFInstances rpc query_id=5d4f8d25428xxxx:813cfbd30000xxxx
failed: Failed to get minimum memory reservation of 8.00 MB on daemon
ser1830.xxxx.com:22000 for query xxxxxxxxx:813cfbd30xxxxxxxxx
due to following error: Failed to increase reservation by 8.00 MB
because it would exceed the applicable reservation limit for the
"Process" ReservationTracker: reservation_limit=68.49 GB
reservation=68.49 GB used_reservation=0 child_reservations=68.49 GB
The top 5 queries that allocated memory under this tracker are:
Query(5240724f8exxxxxx:ab377425000xxxxxx): Reservation=41.81 GB
ReservationLimit=64.46 GB OtherMemory=133.44 MB Total=41.94 GB
Peak=42.62 GB Query(394dcbbaf6bxxxxx2f4760000xxxxxx0):
Reservation=26.68 GB ReservationLimit=64.46 GB OtherMemory=92.94 KB
Total=26.68 GB Peak=26.68 GB Query(5d4f8d25428xxxxx:813cfbd30000xxxxx):
Limit=100.00 GB Reservation=0 ReservationLimit=64.46 GB OtherMemory=0
Total=0 Peak=0 Memory is likely oversubscribed. Reducing query
concurrency or configuring admission control may help avoid this
error.

AWS glue write dynamic frame out of memory (OOM)

I am using AWS glue to run a pyspark to read a dynamic frame from catalog (data in redshift),then write it to s3 in csv format. I am getting this error saying the executor is out of memory:
An error occurred while calling o883.pyWriteDynamicFrame. Job aborted due to stage failure: Task 388 in stage 21.0 failed 4 times, most recent failure: Lost task 388.3 in stage 21.0 (TID 10713, ip-10-242-88-215.us-west-2.compute.internal, executor 86): ExecutorLostFailure (executor 86 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 16.1 GB of 16 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
My guess is that the dataframe is not partitioned well before writing so one executor runs out of memory. But when I follow this doc https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html to add partition keys to my dynamicframe, the job simply timeout after 4 hours. (The partition key I choose splits the data set into around 10 partitions)
Some other approaches I tried:
Trying to configure the fetchsize but the aws doc shows the glue has configure the fetchsize to 1000 by default. https://aws.amazon.com/blogs/big-data/optimize-memory-management-in-aws-glue/#:~:text=With%20AWS%20Glue%2C%20Dynamic%20Frames,Spark%20executor%20and%20database%20instance.
Trying to set pushdown predicates but the input dataset is created daily and not partitioned. I also need all rows to perform joins/filters in ETL so this might not be a good solution to me.
Does anyone know what are some good alternatives to try out?

Azure Database cannot reduce the sizing

Azure Database cannot reduce the sizing from 750 to 500 GB.
Overall sizing after I checked in the Azure dashboard.
Used space is 248.29 GB.
Allocated space is 500.02 GB
Maximum storage size is 750 GB.
The validation message when I reduce the sizing:
Message is
The storage size of your database cannot be smaller than the currently
allocated size. To reduce the database size, the database first needs
to reclaim unused space by running DBCC SHRINKDATABASE (XXX_Database
Name). This operation can impact performance while it is running and
may take several hours to complete.
What should I do?
Best Regard
If we want to reduce the database size, we need to ensure the number of databse size is larger than the number of the Allocated space you set. Now, according to your need, we should reclaim unused allocated space. Regarding how to do it, we can run the following command
-- Shrink database data space allocated.
DBCC SHRINKDATABASE (N'db1')
For more details, please refer to the document
I got this error via CLI after also disabling read scale and the solution was to remove --max-size 250GB from command:
az sql db update -g groupname -s servername -n dbname --edition GeneralPurpose --capacity 1 --max-size 250GB --family Gen5 --compute-model Serverless --no-wait

Error in Apache Drill when doing JOIN between two large tables

I'm trying to make a JOIN between two tables, one having 1,250,910,444 records and the other 385,377,113 records using the Apache Drill.
However, after 2 minutes of execution, it gives the following error:
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Failure allocating buffer.
Fragment 1:2 [Error Id: 51b70ce1-29d5-459b-b974-8682cec41961 on sbsb35.ipea.gov.br:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer. io.netty.buffer.PooledByteBufAllocatorL.allocate():64 org.apache.drill.exec.memory.AllocationManager.():80
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():243
org.apache.drill.exec.memory.BaseAllocator.buffer():225 org.apache.drill.exec.memory.BaseAllocator.buffer():195
org.apache.drill.exec.vector.VarCharVector.allocateNew():394 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():239
org.apache.drill.exec.test.generated.HashTableGen1800$BatchHolder.():137 org.apache.drill.exec.test.generated.HashTableGen1800.newBatchHolder():697
org.apache.drill.exec.test.generated.HashTableGen1800.addBatchHolder():690 org.apache.drill.exec.test.generated.HashTableGen1800.addBatchIfNeeded():679
org.apache.drill.exec.test.generated.HashTableGen1800.put():610 org.apache.drill.exec.test.generated.HashTableGen1800.put():549
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():366
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():222
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
...
java.lang.Thread.run():748
Drill configuration information I'm using: planner.memory_limit = 268435456
The server I'm using has 512GB of memory.
Could someone suggest me how to solve this problem? Creating index for each table could be a solution? If so, how do I do this on Drill.
Currently Apache Drill does not support indexing.
Your query fails during execution stage so planner.memory_limit won't take any effect.
Currently all you can do is allocate more memory:
make sure you have enough direct memory allocated in drill-env.sh;
use planner.memory.max_query_memory_per_node option.
There is ongoing work in the community to allow spill to disk for the Hash Join
but it's still in progress (https://issues.apache.org/jira/browse/DRILL-6027).
Use setting the planner.memory.max_query_memory_per_node to the max possible.
ALTER SESSION SET 'planner.memory.max_query_memory_per_node' = (some Value)
Make sure the parameter is set for this session.
check the DRILL_HEAP and DRILL_DIRECT_MAX MEMORY too.

Why will my SQL Transaction log file not auto-grow?

The Issue
I've been running a particularly large query, generating millions of records to be inserted into a table. Each time I run the query I get an error reporting that the transaction log file is full.
I've managed to get a test query to run with a reduced set of results and by using SELECT INTO instead of INSERT into as pre built table. This reduced set of results generated a 20 gb table, 838,978,560 rows.
When trying to INSERT into the pre built table I've also tried using it with and without a Cluster index. Both failed.
Server Settings
The server is running SQL Server 2005 (Full not Express).
The dbase being used is set to SIMPLE for recovery and there is space available (around 100 gb) on the drive that the file is sitting on.
The transaction log file setting is for File Growth of 250 mb and to a maximum of 2,097,152 mb.
The log file appears to grow as expected till it gets to 4729 mb.
When the issue first appeared the file grow to a lower value however i've reduced the size of other log files on the same server and this appears to allow this transaction log file grow further by the same amount as the reduction on the other files.
I've now run out of ideas of how to solve this. If anyone has any suggestion or insight into what to do it would be much appreciated.
First, you want to avoid auto-growth whenever possible; auto-growth events are HUGE performance killers. If you have 100GB available why not change the log file size to something like 20GB (just temporarily while you troubleshoot this). My policy has always been to use 90%+ of the disk space allocated for a specific MDF/NDF/LDF file. There's no reason not to.
If you are using SIMPLE recovery SQL Server is supposed manage the task of returning unused space but sometimes SQL Server does not do a great job. Before running your query check the available free log space. You can do this by:
right-click the DB > go to Tasks > Shrink > Files.
change the type to "Log"
This will help you understand how much unused space you have. You can set "Reorganize pages before releasing unused space > Shrink File" to 0. Moving forward you can also release unused space using CHECKPOINT; this may be something to include as a first step before your query runs.