Optimize memory usage of very large HashMap - optimization

I need to preprocess data from OpenStreetMap. First step is to store a bunch of nodes (more than 200 million) from a unprocessed.pbf file(Europe, ~21GB). Therefore I'm using a HashMap. After importing the data into the map, my programm checks each single Node if it fulfills some conditions. If not, the node is removed from the map. Afterwards each remaining node in the map is written into a new processed.pbf file.
The problem is, that this programm is using more than 100GB RAM. I want to optimize the memory usage.
I've read that I should adjust the initial capacity and load factor of HashMap if many entries are used. Now I'm asking myself which is the best value for those two parameters.
I've also seen that the memory load when using JVM of Oracle-JDK (1.8) raises slower than using OpenJDK JVM (1.8). Are there some settings which i can use for OpenJDK JVM, to minimize memory usage?
Thanks for your help.

There will be a lot of collision in the hashmap if you don't provide the load factor and initial size while searching the key.
Generally for,
default load factor = 0.75, we provide a
initial size = ((number of data) / loadFactor) + 1
It increases the efficiency of the code. As hashmap has more space to store the data which reduces the collision occurring inside hashmap while searching a key.

Related

Thousands of REDIS Sorted Sets VS millions of Simple Sets

I came to 2 options on how to solve the problem I have with (AWS ElastiCache (REDIS)).
I was able to find all the differences for these two approaches in scope of Time complexity (Big O) and other stuff.
However, there is one question that still bothers me:
Is there any difference for REDIS cluster (in memory consumption, CPU or any other resources) to handle:
500K larger Sorted Sets (https://redis.io/commands#sorted_set) containing ~100K elements each
48MLN smaller Simple Sets (https://redis.io/commands#set) containing ~500 elements each
?
Thanks in advance for the help :)
You are comparing two different data types, it is better to be benchmarked to decide which one's memory consumption is better with info memory. But I assume both are used with the same length for entries inside.
If you use the config set-max-intset-entries and stay in the limits of it while adding to this set(let's say 512), then your memory consumption will be lower than your first option(same value lengths and equality of the total entries). But it doesn't come for free.
The documentation states that
This is completely transparent from the point of view of the user and API. Since this is a CPU / memory trade off it is possible to tune the maximum number of elements and maximum element size for special encoded types using the following redis.conf directives.

How to use chronicle-map instead of redis as a data cache

I intend to use chronicle-map instead of redis, the application scenario is the memoryData module starts every day from the database to load hundreds of millions of records to chronicle-map, and dozens of jvm continue to read chronicle-map records. Each jvm has hundreds of threads. But probably because of the lack of understanding of the chronicle-map, the code poor performance, running slower, until the memory overflow. I wonder if the above practice is the correct use of chronicle-map.
Because Chronicle map stores your data off-heap it's able to store more data than you can hold in main memory, but will perform better if all the data can fit into memory, ( so if possible, consider increasing your machine memory, if this is not possible try to use an SSD drive ), another reason for poor performance maybe down to how you have sized the map in the chronicle map builder, for example how you have set the number of max entries, if this is too large it will effect performance.

Aerospike DB - lists/maps suitable at large sizes, under high write load?

Overview
Aerospike list/map manipulations via UDFs are copy-on-write (one modification results in an entire rewrite). So, UDF-based appends become progressively expensive as the list/map size grows.
List Manipulation by UDF, Test Results
Time required to append 100 values to a list (each append persisted to disk independently)
Time measured at the Java client
Each result is the average of 10 measurements.
Initial list size = 1 -> 19.6ms
Initial list size = 1000 -> 43.4ms
Initial list size = 10000 -> 237.3ms
Question
Are lists/maps within a single record advisable at large (1000's of values, ~200kB total) sizes, under high write loads?
Aerospike server version 3.7.0 added support to manipulate lists directly using client API. Check the latest version of your favourite client language for support (Java 3.1.8+, Go 1.9.0+, C 3.1.25+). Such functionality to manipulate maps will follow.
Modifying the lists via native API is much more efficient than via UDFs. To start with, one need not pay the overhead of the UDF execution. If the data is in memory, the list will be maintained in memory in a format which is very efficient to perform the delta operations. It can easily sustain heavy read/write loads with less latency. Nevertheless, you should always benchmark for your workload.
Single-record lists/maps can be efficiently manipulated via the native client API.
The test results in the question are based on using UDFs, which are slower because UDFs incur overhead (and perhaps cannot modify natively in memory?).
After Ronen's suggestion, I updated my client/server to get the new list API and wrote a new test using com.aerospike.client.cdt.ListOperation. The results of the new test are very fast regardless of list size.

Disadvantages of using Texture Cache / Image2D for 2D Arrays?

When accessing 2D arrays in global memory, using the Texture Cache has many benefits, like filtering and not having to care as much for memory access patterns. The CUDA Programming Guide is only naming one downside:
However, within the same kernel call, the texture cache is not kept coherent with respect to global memory writes, so that any texture fetch to an address that has been written to via a global write in the same kernel call returns undefined data.
If I don't have a need for that, because I never write to the memory I read from, are there any downsides/pitfalls/problems when using the Texture Cache (or Image2D, as I am working in OpenCL) instead of plain global memory? Are there any cases where I will lose performance by using the Texture Cache?
Textures can be faster, the same speed, or slower than "naked" global memory access. There are no general rules of thumb for predicting performance using textures, as the speed up (or lack of speed up) is determined by data usage patterns within your code and the texture hardware being used.
In the worst case, where cache hit rates are very low, using textures is slower that normal memory access. Each thread has to firstly have a cache miss, then trigger a global memory fetch. The resulting total latency will be higher than a direct read from memory. I almost always write two versions of any serious code I am developing where textures might be useful (one with and one without), and then benchmark them. Often it is possible to develop heuristics to select which version to use based on inputs. CUBLAS uses this strategy extensively.

What causes page fault and how to minimize them?

When examining a process in Process Explorer, what does it mean when there are several page faults? The application is processing quite a bit of data and the UI is not very responsive. Are there optimizations to the code that could reduce or eliminate page faults? Would increasing the physical RAM of the system make a difference?
http://en.wikipedia.org/wiki/Page_fault
Increasing the physical RAM on your machine could result in fewer page faults, although design changes to your application will do much better than adding RAM. In general, having a smaller memory footprint, and having things that will often be accessed around the same time be on the same page will decrease the number of page faults. It can, also, be helpful to try to do everything you can with some bit of data in memory all at once so that you don't need to access it many different times, which may cause page faults (aka thrashing).
It might also be helpful to make sure that memory that is accessed after each other is near to each other (eg if you have some objects, place them in an array) if these objects have lots of data that is very infrequently used, place it in another class and make the first class have a reference to the second one. This way you will use less memory most of the time.
A design option would be to write a memory cache system, lazy creating memory (create on demand). such cache would have a collection of pre-allocated memory chunks, accessed by their size. For example, an array of N lists, each list having M buffers.each list is responsible to bring you memory in a certain size range. (for example, from each list bringing you memory in the range of 2^i (i = 0..N-1). even if you want to use less then 2^i, you just dont use the extra memory in the buffer.
this would be a tradeoff of small memory waste, vs caching and less page faults.
another option is to use nedmalloc
good luck
Lior