when I try to export data, it runs out of memory, regardless of table size (even empty tables.)
Out of memory! ] 162926/498508267 rows (0.0%) on total estimated data (14 sec., avg: 11637 recs/sec)
Issuing rollback() due to DESTROY without explicit disconnect() of DBD::Oracle::db handle (DESCRIPTION=(ADDRESS=(PORT=1521)(HOST=192.168.0.42)
(PROTOCOL=tcp))(CONNECT_DATA=(SID=orcl))) at
/usr/local/lib/perl/5.18.2/DBD/Oracle.pm line 348.
Asking the author, I got this response:
You don't have enough memory. If you can't increase the memory size than reduce the value of DATA_LIMIT in ora2pg.conf. Try with 5000 and if it doesn't works use 2500.
Opened ./config/ora2pg.conf and modfied set DATA_LIMIT 5000 solved the issue.
I originally tried to add more RAM, but only doubled it from 2GB to 4GB, it did not help. Reducing the DATA_LIMIT was the solution.
Related
I am trying to convert a JSON file to CSV using Lambda.
I am using Pandas for this operation.
Initially I started with the following configuration :
File Size : 5 MB
Memory : 128
It took me around 5 seconds to complete the conversion.
Then I increased the file size to 10 MB, but there is a weird behavior.
It will be great if someone could help me to understand this.
Basically I am trying to benchmark this operation
Sometimes the file is getting processed successfully and sometimes it is getting timeout with message
REPORT RequestId: 28e55591-e6a7-4344-b5bc-321bd03422b6 Duration: 900089.03 ms Billed Duration: 900000 ms Memory Size: 128 MB Max Memory Used: 129 MB
It can be clearly seen that this a memory issue, but I am not able to understand the root cause.
It will be great if someone could help me to understand this behavior.
Sometimes it also happens that the lambda is re-triggered and then the file is getting processed.
It's due to your use of Panda's dataframe. It uses a lot more memory to store the CSV than what's just the size of the file itself. You can check how much memory the dataframe needs with df.info(memory_usage='deep').
If you just need to convert a csv to json, a better way would be to use the stdlib modules csv and json and code it yourself.
I'm using GraphDb Free 8.6.1 in research project, I'm running it with default configuration on linux server having 4GB memory.
However, it has started to throw exceptions pointing to insufficient memory:
Caused by: org.eclipse.rdf4j.repository.RepositoryException: Query evaluation error: Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb (HTTP status 500)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1143)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1066)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQueryViaHttp(SPARQLProtocolSession.java:834)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getTupleQueryResult(SPARQLProtocolSession.java:763)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendTupleQuery(SPARQLProtocolSession.java:391)
at org.eclipse.rdf4j.repository.http.HTTPTupleQuery.evaluate(HTTPTupleQuery.java:69)
Please, can you help me to identify the problem?
How can I properly configure GraphDB?
The behavior you observe is part of memory optimization of distinct/group by operations. The error message itself is related to the default threshold of 250 mb and it's there to let you know you need to adjust your memory. When the free heap memory became less than the threshold, a QueryEvaluationException is thrown so to avoid running out of memory due to hungry distinct/group by operation. You can adjust the threshold to minimize those errors, by reducing it with with passing the following argument when starting GraphDB "-Ddefaut.min.distinct.threshold=XXX" (which could be set to the amount of memory in bytes for the threshold).
Insufficient free Heap Memory 238Mb for group by and distinct, threshold:250Mb, reached 0Mb
238Mb = free heap space reported by the JVM
250Mb = the default threshold below which the protection should raise an exception to prevent OME
0Mb = the current buffer used for distinct and group by
I suspect another operation takes most of your RAM and once you run any DISTINCT/GROUP BY query it immediately stop because of the OME protection.
This answer would have helped me.
Example, if your machine has 32 GB RAM:
/opt/graphdb-free/app/graphdb-free.cfg (cutout)
[JVMOptions]
-Xms20G
-Xmx20G
-XX:PermSize=4G
-XX:MaxPermSize=4G
-Dfile.encoding=UTF-8
-Djava.net.preferIPv4Stack=true
--add-exports
jdk.management.agent/jdk.internal.agent=ALL-UNNAMED
--add-opens
java.base/java.lang=ALL-UNNAMED
Via GUI and Settings:
graphdb.page.cache.size 10G
Recently, I am testing the proper usage of ext4 filesystem. what is my expert is that:
when system crashed, the data had been write return ok can not loss, but metadate can.
Here is my usage:
1. call fallocate to alloc centain space
fallocate(fd, 0, 0, 4*1024*1024); //4MB
2. call fsync(fd) let data and metadata write to disks
3. then i call function to randomly write the file with 4k size(random data but not 0). with O_DRICT flagļ¼but not call fsync. I log the offset with return write ok.
4. check the offset that logged. but i find in some offset, read 4k data, is 0. It seems mean that offset isn't used like hole files.
My question is that:
<1. why after calling fallocate and fsync the metadata of the file still seems
indicate some blocks is not used, so when read it return null. It is my understand .
<2. have other api to call, can make sure that in allocate space with file is not holes ,after that when write data return ok with O_DIRECT can make sure the data will not be loss even the system crashed.
Thanks.
Only writing to the file space can eliminate the hole. Without writing, there is no dirty page and fsync simply does nothing.
I am wondering how did you execute you step 4. It seems that you did it by a manual crash, did you? If you read it after write without a crash, it should not be zero, provided you wrote non-zeros. If you read it after a crash, zero can happen if disk cache existed. However, this kind of zero is not like holes, they are zeros read from the disk (very probably the disk contains zeros).
Is there anyway to get information about how many Garbage collection been performed for different generations from a dump file. When I try to run some psscor4 commands I get following.
0:003> !GCUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
0:003> !CLRUsage
The garbage collector data structures are not in a valid state for traversal.
It is either in the "plan phase," where objects are being moved around, or
we are at the initialization or shutdown of the gc heap. Commands related to
displaying, finding or traversing objects as well as gc heap segments may not
work properly. !dumpheap and !verifyheap may incorrectly complain of heap
consistency errors.
Error: Requesting GC Heap data
I can get output from eehpeap though, but it does not give me what I am looking for.
0:003> !EEHeap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0000000002c81030
generation 1 starts at 0x0000000002c81018
generation 2 starts at 0x0000000002c81000
ephemeral segment allocation context: none
segment begin allocated size
0000000002c80000 0000000002c81000 0000000002c87fe8 0x6fe8(28648)
Large object heap starts at 0x0000000012c81000
segment begin allocated size
0000000012c80000 0000000012c81000 0000000012c9e358 0x1d358(119640)
Total Size: Size: 0x24340 (148288) bytes.
------------------------------
GC Heap Size: Size: 0x24340 (148288) bytes.
Dumps
You can see the number of garbage collections in performance monitor. However, the way performance counters work makes me believe that this information is not available in a dump file and probably even not available during live debugging.
Think of Debug.WriteLine(): once the text was written to the debug output, it is gone. If you didn't have DebugView running at the time, the information is lost. And that's good, otherwise it would look like a memory leak.
Performance counters (as I understand them) work in a similar fashion. Various "pings" are sent out for someone else (the performance monitor) to be recorded. If noone does, the ping with all its information is gone.
Live debugging
As already mentioned, you can try performance monitor. If you prefer WinDbg, you can use sxe clrn to see garbage collections happen.
PSSCOR
The commands you mentioned, do not show information about garbage collection count:
0:016> !gcusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
0:016> !clrusage
Number of GC Heaps: 1
------------------------------
GC Heap Size 0x36d498(3,593,368)
Total Commit Size 0000000000384000 (3 MB)
Total Reserved Size 0000000017c7c000 (380 MB)
Note: I'm using PSSCOR2 here, since I have the same .NET 4.5 issue on this machine. But I expect the output of PSSCOR4 to be similar.
Would like to know, how the system would get impacted when dbms.output(NULL) is set . i.e, buffer size is unlimited.
Also, I have another question.
DBMS_OUTPUT.ENABLE (buffer_size IN INTEGER DEFAULT 20000);
Buffer size can be set between 2000 to 1000000.
But , what happens if buffer size is set > 1000000.
For ex: DBMS_OUTPUT.ENABLE(2000000)
why dont you try it and see what happens the memory is stored in PGA,.., so you can easily track your consumption. There is of course limited memory in real life, so ...it depends on that amount!