I use Snowflake(Standard edition) as a data warehouse and have created 6 warehouses. There is an API job crashing often causing memory issue(Java Heap error) leading to other jobs failing. I want to understand the role of JVM here and what can be done to save other jobs from failing?
Related
We're running Matillion (v1.54) on an AWS EC2 instance (CentOS), based on Tomcat 8.5.
We have developped a few ETL jobs by now, and their execution takes quite a lot of time (that is, up to hours). We'd like to speed up the execution of our jobs, and I wonder how to identify the bottle neck.
What confuses me is that both the m5.2xlarge EC2 instance (8 vCPU, 32G RAM) and the database (Snowflake) don't get very busy and seem to be sort of idle most of the time (regarding CPU and RAM usage as shown by top).
Our environment is configured to use up to 16 parallel connections.
We also added JVM options -Xms20g -Xmx30g to /etc/sysconfig/tomcat8 to make sure the JVM gets enough RAM allocated.
Our Matillion jobs do transformations and loads into a lot of tables, most of which can (and should) be done in parallel. Still we see, that most of the tasks are processed in sequence.
How can we enhance this?
By default there is only one JDBC connection to Snowflake, so your transformation jobs might be getting forced serial for that reason.
You could try bumping up the number of concurrent connections under the Edit Environment dialog, like this:
There is more information here about concurrent connections.
If you do that, a couple of things to avoid are:
Transactions (begin, commit etc) will force transformation jobs to
run in serial again
If you have a parameterized transformation job,
only one instance of it can ever be running at a time. More information on that subject is here
Because the Matillion server is just generating SQL statements and running them in Snowflake, the Matillion server is not likely to be the bottleneck. You should make sure that your orchestration jobs are submitting everything to Snowflake at the same time and there are no dependencies (unless required) built into your flow.
These steps will be done in sequence:
These steps will be done in parallel (and will depend on Snowflake warehouse size to scale):
Also - try the Alter Warehouse Component with a higher concurrency level
Enviornment : Ignite-2.8.1, Java 8
I am getting heap full for my application after few hours of start. On analyzing heap dump, I see instances class org.apache.internal.processors.query.*. Looks like after query execution, it is not getting cleaned up from the heap and after some time leading to failure due to heap full.
One thing I have realized is all these entries are for queries that are triggered via ignite executor services or normal task scheduling services.
Please suggest. Attaching snapshot.
Ignite is just trying to execute the SQL queries that are being sent to it. You would need to investigate the client.
On the Ignite side, you can make sure you're using the right garbage collector and setting the heap size appropriately. There's no One Good Answer, but in general, use the G1 GC and I'd start with 4Gb of heap space if you're using a lot of SQL.
Enviornment : Ignite-2.8.1, Java 11
I am getting out of memory for my application after few minutes of start. On analyzing heap dump created on OOM, I see millions of instances of class org.apache.internal.processors.continuous.GridContinuousMessage
I do not see any direct references of these from my code.
Please suggest. Attaching snapshot.
You seem to have a Continuous Query running, and it is too slow/hanging and not able to process notifications in time, leading to their pile-up.
This is the error I get from the Log while trying to process a SQL Server 2012 MOLAP Cube.
"Time-out occurred while waiting for buffer latch type 3 for page (1:2044928) database ID 2.; 42000." Source="Microsoft SQL Server 2012 Analysis Services" HelpFile="Error ErrorCode="3240034318" Description="Errors in the OLAP storage engine: An error occurred while processing the 'Measurement' partition of the measure group for the 'PE cube' cube from the Cube database."
I have scripted the processing task in XMLA and execute the processing via a SSAS Command in an Agent Job.
The first step is to Process Update all dimensions and this succeeds, but when I want to Process Data of the cube the load fails and this error pops up.
I first tried processing with an SSIS package, but this caused the whole server to crash instead of just the job failing. This leads me to believe this a performance issue, but the machine running the job is an Azure VM with 16 processors and 112 GB RAM so I don't know where to look. I also tried running the job without any other activities on the server, but that did not help.
The disk containing the SSAS Instance still has 500GB Free.
The measure group is querying a table containing 180 million records.
While processing the cube on a Dev server with way less data there are no issues. I once succeeded to Process Full the whole cube while processing the SSAS cube directly within SSAS, but via DTEXEC, SSISDB or using SSDT the processing results in a server crash.
Earlier I got different time-out errors, but after adjusting the SSAS ExternalCommandTimeOut, ExternalConnectionTimeOut and ForceCommitTimeout properties to 0 this did not occur anymore.
I have tried multiple processing settings, but because I think it is a performance issue I tried to make the processing as low as possible on performance.
Processing Settings:
Object: Cube; Option: Process Data;
Processing Order: Sequential with Seperate Transactions.
Writeback Table Option: Use Existing;
Do not process affected objects.
Update:
I have processed the measure which triggered the error on its own, this did not finish and in the Activity Monitor I saw a lot of Wait_Type IO_Completion and CXPacket. And when querying the sys.dm_exe_requests I see a Select with wait_type IO_Completion which is already running for a long time and a lot of reads.
Last night I tried to process all measurements excluding the measuregroup which triggered the error earlier, but unfortunately the whole server crashed again...
Update2:
We have looked into upgrading to premium storage, but this means we have to switch from A11 to a DS or GS serie. Meaning we need to resize the whole VM which contains live solutions resulting in down-time and effort to restore the VHDS and replacing the current OS disk which contains parts of live solutions.
Another option we identified is applying partitioning or improving the underlying queries from the measures. Unfortunately way more effort than anticipated, a quick work-around for now would help a lot in selling a long-term solution improvement.
Update3:
We have had contact with Microsoft and they advice to migrate from an A11 VM to a D14 V2 and upgrade to premium storage disks. This will be our next step and will be executed upcoming friday. After the migration I will update or close this post.
If you miss information, please let me know. Any suggestions that would help me pin-point the situation would be much appreciated!
The upgrade to a VM better suitable for the situation (DS14 V2) and upgrade to P30 premium storage disks have resolved the occuring issues. The issue was not in the way the cube was being processed or configured, but in the hardware used.
I am trying to load a dataset to GraphDB 7.0. I wrote a Python script to transform and load the data on Sublime Text 3. The program suddenly stopped working and closed, the computer threatened to restart but didn't, and I lost several hours worth of computing as GraphDB doesn't let me query the inserts. This is the error I get on GraphDB:
The currently selected repository cannot be used for queries due to an error:
org.openrdf.repository.RepositoryException: java.lang.RuntimeException: There is not enough memory for the entity pool to load: 65728645 bytes are required but there are 0 left. Maybe cache-memory/tuple-index-memory is too big.
I set the JVM as follows:
-Xms8g
-Xmx9g
I don't exactly remember what I set as the values for the cache and index memories. How do I resolve this issue?
For the record, the database I need to parse has about 300k records. The program shut shop at about 50k. What do I need to do to resolve this issue?
Open the workbench and check the amount of memory you have given to cache memory.
Xmx should be a value that is enough for
cache-memory + memory-for-queries + entity-pool-hash-memory
sadly the latter cannot be calculated easily because it depends on the number of entities in the repository. You will either have to:
Increase the java memory with a bigger value for Xmx
Decrease the value for cache memory