Can I store a 100GB string in a FlatBuffer? 10 TB (in 2019 you can buy servers with more RAM than that)?
Is it just limited by how much RAM/swap my server has or is there a hard limit you need set like with Protocol Buffers (which you can't set above 2GiB - 1 byte)?
A single FlatBuffer is currently limited to 2GB (it uses 32-bit signed offsets). If you want to store more than that, you'd have to use a sequence of buffers.
This kinda makes sense, because FlatBuffers are meant to be contiguous in memory, so it puts more of a strain on your memory system than Protobuf (where you could stream 100GB data from disk which would then end up as discontinuous data in memory).
I agree, with mmap and friends, there are definitely use cases for >2GB nowadays. There are some plans for a 64-bit extension: https://github.com/google/flatbuffers/projects/10#card-14545298
Related
I have read, that Aerospike provides better performance than other NoSQL key-value databases because it uses flash disks. But DRAMs are faster than flash disks. Then how can it have better performance than Redis or Scalaris that use only DRAMs? Is it because of the Aerospike's own system to access flash disks directly?
Thank you for answer
Aerospike allows you the flexibility to store all or part of your data (part segregated by "namespaces") in DRAM or Flash (writing in blocks on device without using a filesystem) or both DRAM and Flash simultaneously (not the best use of your spend) or both DRAM and in files in Flash or HDD (ie using a filesystem) .. DRAM with filesystem on HDD - gives performance of DRAM with inexpensive persistence of spinning disks. Finally for single value integer or float data, there is an extremely fast performance option of storing the data in the Primary Index itself (data-in-index option).
You can mix storage options in Aeropsike. ie store one namespace records purely in DRAM, store another namespace records purely in Flash -- on the same horizontally scalable cluster of nodes. You can define upto 32 namespaces in Aerospike Enterprise Edition.
Flash / HDD .. etc options allow you to persist your data.
Pure DRAM storage in Aerospike will definitely give you better latency performance, relative to storing on persistent layer.
Storing on Flash in "blocks" (ie without using the filesystem) in Aerospike is the sweet spot that gives you "RAM like" performance with persistence. Other NoSQL solutions that are pure DRAM storage don't give you persistence, or there may be others that if they give you persistence with storage in files only via the filesystem, they will quite likely give you much higher latency.
How to set the parameter - setRAMBufferSizeMB? Is depending on the RAM size of the Machine? Or Size of Data that needs to be Indexed? Or any other parameter? could someone please suggest an approach for deciding the value of setRAMBufferSizeMB.
So, what we have about this parameter in Lucene javadoc:
Determines the amount of RAM that may be used for buffering added
documents and deletions before they are flushed to the Directory.
Generally for faster indexing performance it's best to flush by RAM
usage instead of document count and use as large a RAM buffer as you
can. When this is set, the writer will flush whenever buffered
documents and deletions use this much RAM.
The maximum RAM limit is inherently determined by the JVMs available
memory. Yet, an IndexWriter session can consume a significantly larger
amount of memory than the given RAM limit since this limit is just an
indicator when to flush memory resident documents to the Directory.
Flushes are likely happen concurrently while other threads adding
documents to the writer. For application stability the available
memory in the JVM should be significantly larger than the RAM buffer
used for indexing.
By default, Lucene uses 16 Mb as this parameter (this is the indication to me, that you shouldn't have that much big parameter to have fine indexing speed). I would recommend you to tune this parameter by setting it let's say to 500 Mb and checking how well your system behave. If you will have crashes, you could try some smaller value like 200 Mb, etc. until your system will be stable.
Yes, as it stated in the javadoc, this parameter depends on the JVM heap, but for Python, I think it could allocate memory without any limit.
Is it possible to make CPU work on large persistent storage medium directly without using RAM ? I am fine if performance is low here.
You will need to specify the CPU architecture you are interested, most standard architectures (x86, Power, ARM) assume the existence of RAM for their data bus, I am afraid only a custom board for this processors would allow to use something like SSD instead of RAM.
Some numbers comparing RAM vs SSD latencies: https://gist.github.com/jboner/2841832
Also, RAM is there for a reason, to "smooth" access to bigger storage from CPU,
have a look at this image (from https://www.reddit.com/r/hardware/comments/2bdnny/dram_vs_pcie_ssd_how_long_before_ram_is_obsolete/)
As side note, it is possible to access persistent storage without CPU involving (although RAM is still needed), see https://www.techopedia.com/definition/2767/direct-memory-access-dma or https://en.wikipedia.org/wiki/Direct_memory_access
As memory is much slower to CPU, It should send the data in blocks of some 'x'Bytes.
How much would be the size of this 'x'?
Do the data line b/n memory and CPU is also a x*8 bit lane?
If I access an address 'A' on memory, would it be sending all the next x-1 memory addresses to the cache?
What is the Approx frequency a memory bus would be working?
SIMD - Do SSE and MMX extensions somehow leverage this bulk reading feature?
Please feel free to provide any references.
Thanks in advance.
The size 'x' is generally the size of a cache line. Cache line size depends on the architecture, but Intel and AMD use 64 byte.
At least . If you have more channels, you can fetch more data from different channels.
Not exactly the next x-1 memory addresses. You can think of the memory, divided into 64 byte chunks. Every time you want to access even one byte, you will bring the chunk your address belongs. Lets assume you want to access the address 123 (decimal). The start of the address should be 64 to 127. So, you will bring that whole chunk. Which means, you do not only bring the following ones, but the previous addresses as well, depending on the address you access of course.
That depends the version of DDR you CPU supports. You can check some numbers in here: https://en.wikipedia.org/wiki/Double_data_rate
Yes, they do. When you bring a data from memory to caches, you bring one cache line, and SIMD extensions work on multiple data elements in a single instruction. Which means if you want to add 4 values in one instruction, the data you are looking for would be in the cache (since you brought the whole chunk) and you just read it from cache.
I am tuning my PostgreSQL db effective_cache_size. The PostgreSQL documentation references the expected available memory in PostgreSQL buffer caches to calculate the expected memory available for disk caching. How do I estimate this? Is the shared_buffers the only memory allocated for the buffer caching?
effective_cache_size represents the total memory of the machine minus what you know is used for something else than disk caching.
From Greg's Smith 5-Minute Introduction to PostgreSQL Performance:
effective_cache_size should be set to how much memory is leftover for
disk caching after taking into account what's used by the operating
system, dedicated PostgreSQL memory, and other applications
shared_buffers is considered in this sentence as "dedicated PostgreSQL memory", but other than that, it's not correlated to effective_cache_size.
On Linux if you run free when your system is at its typical memory usage (all applications running and caches are warm), the cached field gives a good value for effective_cache_size.
If you use monitoring tools that produce graphs, you can look at the cached size for long period of times at a glance.
One typical suggestion for a dedicated Postgres server is to set effective_cache_size to about 3/4 of your available RAM. A good tool to use for setting sane defaults is pgtune, which can be found here: https://github.com/gregs1104/pgtune