Apache Ignite: errMsg=Out of memory - ignite

Inserted 1M records in single partition to cache and then trying to retrieve them.
I have 3 ignite nodes each having 32G memory and 8 core.
Exception in thread "main" java.util.concurrent.ExecutionException: javax.cache.CacheException: Failed to execute map query on remote node [nodeId=f02e1c83-52af-4ea7-ab70-4e48540e5321, errMsg=Out of memory.; SQL statement:
SELECT KEY, _VAL FROM IGNITEVALUE WHERE KEY BETWEEN 'ParkedEvents/T45/' AND 'ParkedEvents/T45/|' AND AFFINITYKEY='Book12174583-T45' ORDER BY KEY ASC [90108-197]]
at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
at com.arcesium.trinity.cache.TrinityCachePerfMain.main(TrinityCachePerfMain.java:85)
Caused by: javax.cache.CacheException: Failed to execute map query on remote node [nodeId=f02e1c83-52af-4ea7-ab70-4e48540e5321, errMsg=Out of memory.; SQL statement:
SELECT KEY, _VAL FROM IGNITEVALUE WHERE KEY BETWEEN 'ParkedEvents/T45/' AND 'ParkedEvents/T45/|' AND AFFINITYKEY='Book12174583-T45' ORDER BY KEY ASC [90108-197]]
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2186)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2125)
How can I fetch those 1M records?

As the error suggests, your server nodes ran out of memory. One suggestion is to make sure lazy loading is enabled otherwise it might have to copy all the data to the Java heap. You might also want to increase the amount of Java heap space.
Having said that, you're not going to get the best out of Ignite using it in client-server mode like this. Don't copy the data, instead, send the code to the data using compute tasks.

Related

Why does Java OutOfMemoryError occurs when selecting less columns in hive query?

I have two hive select statements:
select * from ode limit 5;
This successfully pulls out 5 records from the table 'ode'. All the columns are included in the result. However, This following query caused an error:
select content from ode limit 5;
Where 'content' is one column in the table. The error is:
hive> select content from ode limit 5;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
The second query should be a lot cheaper and why does it cause a memory issue? How to fix this?
When you select the whole table, Hive triggers Fetch task instead of MR that involves no parsing (it is like calling hdfs dfs -cat ... | head -5).
As far as I can see in your case, the hive client tries to run map locally.
You can choose one of the two ways:
Force remote execution with hive.fetch.task.conversion
Increase hive client heap size using HADOOP_CLIENT_OPTS env variable.
You can find more details regarding fetch tasks here.

Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements

I am trying to insert record in to Azure sql Dataware House using Oracle ODI, but i am getting error after insertion of some records.
NOTE: I am trying to insert 1000 records, but error is coming after 800.
Error Message: Caused By: java.sql.BatchUpdateException: 112007;Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements.
I am trying to insert 1000 records,but error is coming after 800.
Error Message: Caused By: java.sql.BatchUpdateException: 112007;Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements.
While Abhijith's answer is technically correct, I'd like to suggest an alternative that will give you far better performance.
The root of your problem is that you've chosen the worst-possible way to load a large volume of data into Azure SQL Data Warehouse. A long list of INSERT statements is going to perform very badly, no matter how many DWUs you throw at it, because it is always going to be a single-node operation.
My recommendation is to adapt your ODI process in the following way, assuming that your Oracle is on-premise.
Write your extract to a file
Invoke AZCOPY to move the file to Azure blob storage
CREATE EXTERNAL TABLE to map a view over the file in storage
CREATE TABLE AS or INSERT INTO to read from that view into your target table
This will be orders of magnitude faster than your current approach.
20MB is the limit defined and it is hard limit for now. Reducing the batch size will certainly help you work around this limit.
Link to capacity limits.
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-service-capacity-limits

Error in Apache Drill when doing JOIN between two large tables

I'm trying to make a JOIN between two tables, one having 1,250,910,444 records and the other 385,377,113 records using the Apache Drill.
However, after 2 minutes of execution, it gives the following error:
org.apache.drill.common.exceptions.UserRemoteException: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Failure allocating buffer.
Fragment 1:2 [Error Id: 51b70ce1-29d5-459b-b974-8682cec41961 on sbsb35.ipea.gov.br:31010]
(org.apache.drill.exec.exception.OutOfMemoryException) Failure allocating buffer. io.netty.buffer.PooledByteBufAllocatorL.allocate():64 org.apache.drill.exec.memory.AllocationManager.():80
org.apache.drill.exec.memory.BaseAllocator.bufferWithoutReservation():243
org.apache.drill.exec.memory.BaseAllocator.buffer():225 org.apache.drill.exec.memory.BaseAllocator.buffer():195
org.apache.drill.exec.vector.VarCharVector.allocateNew():394 org.apache.drill.exec.vector.NullableVarCharVector.allocateNew():239
org.apache.drill.exec.test.generated.HashTableGen1800$BatchHolder.():137 org.apache.drill.exec.test.generated.HashTableGen1800.newBatchHolder():697
org.apache.drill.exec.test.generated.HashTableGen1800.addBatchHolder():690 org.apache.drill.exec.test.generated.HashTableGen1800.addBatchIfNeeded():679
org.apache.drill.exec.test.generated.HashTableGen1800.put():610 org.apache.drill.exec.test.generated.HashTableGen1800.put():549
org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase():366
org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext():222
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
...
java.lang.Thread.run():748
Drill configuration information I'm using: planner.memory_limit = 268435456
The server I'm using has 512GB of memory.
Could someone suggest me how to solve this problem? Creating index for each table could be a solution? If so, how do I do this on Drill.
Currently Apache Drill does not support indexing.
Your query fails during execution stage so planner.memory_limit won't take any effect.
Currently all you can do is allocate more memory:
make sure you have enough direct memory allocated in drill-env.sh;
use planner.memory.max_query_memory_per_node option.
There is ongoing work in the community to allow spill to disk for the Hash Join
but it's still in progress (https://issues.apache.org/jira/browse/DRILL-6027).
Use setting the planner.memory.max_query_memory_per_node to the max possible.
ALTER SESSION SET 'planner.memory.max_query_memory_per_node' = (some Value)
Make sure the parameter is set for this session.
check the DRILL_HEAP and DRILL_DIRECT_MAX MEMORY too.

Why does my Dataflow output "timeout value is negative" on insertion to BigQuery?

I have a Dataflow job consisting of ReadSource, ParDo, Windowing, Insert (into a date-partitioned table in BigQuery).
It basically:
Reads text files from a Google Storage bucket using a glob
Process each line by splitting on delimiter, changing some values before giving each column a name and data type before outputting as a BigQuery table row together with a timestamp based on the data
Window on a daily window using the timestamp from step 2
Write to BigQuery, using Window table and "dataset$datepartition" syntax to specify table and partition. Create disposition set to CREATE_IF_NEEDED and write disposition set to WRITE_APPEND.
The first three steps seems to run fine but in most cases the job runs into problem on the last insert step which gives exceptions in the log:
java.lang.IllegalArgumentException: timeout value is negative at java.lang.Thread.sleep(Native Method)
at com.google.cloud.dataflow.sdk.util.BigQueryTableInserter.insertAll(BigQueryTableInserter.java:287)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.flushRows(BigQueryIO.java:2446)
at com.google.cloud.dataflow.sdk.io.BigQueryIO$StreamingWriteFn.finishBundle(BigQueryIO.java:2404)
at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:158)
at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:196)
at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47)
at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.finish(ParDoOperation.java:65)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:80)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
This exception is repeated ten times.
At last I get "workflow failed" as below:
Workflow failed. Causes: S04:Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/
GroupByKey/Read+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/GroupByKey/
GroupByWindow+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/Reshuffle/
ExpandIterable+Insert/DataflowPipelineRunner.BatchBigQueryIOWrite/BigQueryIO.StreamWithDeDup/ParDo(StreamingWrite)
failed.
Sometimes the same job with the same input works without problem though which makes this quite hard to debug. So where to start?
This is a known issue with the BigQueryIO streaming write operation in Dataflow SDK for Java 1.7.0. It is fixed in the GitHub HEAD and the fix will be included in the 1.8.0 release of the Dataflow Java SDK.
For more details, see Issue #451 on the DataflowJavaSDK GitHub repository.

sonar 5.1.1 analysis results for different branches giving timeouts with mysql db

We are using Sonarqube 5.1.1 with a MySQL database. We are facing timeout issues with the database. We ran the MySQL tuning primer script and made some changes to the InnoDB timeout (increased it in /etc/my.cnf), but it made no difference. One of the suggestions from mysl tuner output is :
"of 7943 temp tables, 40% were created on disk"
Note: BLOB and TEXT columns are not allowed in memory tables.
Are there any suggestions for dealing with Sonar analysis results for a bunch of different branches?
Perhaps using Postgres instead of MySQL?
We get errors as shown below:
Failed to process analysis report 8 of project "X"
org.apache.ibatis.exceptions.PersistenceException:
Error committing transaction. Cause: org.apache.ibatis.executor.BatchExecutorException:
org.sonar.core.issue.db.IssueMapper.insert (batch index #1) failed.
Cause: java.sql.BatchUpdateException: Lock wait timeout exceeded; try
restarting transaction
Cause: org.apache.ibatis.executor.BatchExecutorException: org.sonar.core.issue.db.IssueMapper.insert (batch index #1) failed.
Cause: java.sql.BatchUpdateException: Lock wait timeout exceeded; try
restarting transaction