May Ignite benefit from Linux Huge pages (or Transparent Huge Pages)?
Are there any recommendations (best practices?) for huge pages tuning for large physical memory size?
It's recommended to disable Transparent Huge Pages since it's already proved by many products and companies which build high load or in-memory solutions that this Linux feature can be a reason of latencies and long GC pauses. Refer to this section for more information about the topic.
Related
I'm wondering what is the implementation of Splunk Files/Directory Monitor feature that ensures the small footprint on the performance of the system running the Data Collector (in terms of CPU, memory and disk I/O)?
I'm asking since we are considering running an Universal Forwarder on a production machine to forward and monitor its log, but we would like to best analyze the performance hit to make sure it doesn't affect the availability of the service in production.
UF resource usage is minimal, typically < 2% CPU and only a few MB of memory. That varies, of course, based on the number of files monitored and on some configuration items (like Indexer Ack).
Many Splunk customers run a UF on their domain controllers - something they'd be unlikely to do if performance was an issue.
I learned etcd for a few hours, but a question suddenly came into me. I found that redis is fully capable of covering functions which etcd owns.Like key/value CRUD && watch, and redis is very simple to use. why people choose etcd instead of redis?
why?
I googled a few posts, but no post told me the reason.
Thanks!
Redis stores data in memory, which makes it very high performance but not very durable. If the redis server dies, it's easy to lose data. Etcd stores data in files on disc, and performs fsync across multiple nodes before resolving to guarantee consistency, which makes it very durable but not very performant.
That's a good trade-off for kubernetes, which is using etcd for cluster state and configuration, not user data. It would not be a good trade-off for something like user session data which you might be using redis for in your app because you need extremely fast response times and can tolerate a bit of data loss or inconsistency.
A major difference which is affecting my choice of one vs the other is:
etcd keeps the data index in RAM and the data store on disk
redis keeps both data index and data store in RAM
Theoretically, this means etcd ought to be a good fit for large data / small memory scenarios, where redis would require large RAM.
In practice, etcd's current behaviour is that it allocates some memory per transaction when data is accessed. Under heavy load, the memory footprint of the etcd server balloons unboundedly (appears limited by the rate of read requests), and the Go runtime eventually OOM's, killing the server.
In contrast, the redis design requires a virtual address space sized in relation to the dataset, or to the partition of the dataset stored locally.
Memory footprint examples
Eg, with redis, a 8GB dataset partition with an index size of 0.5GB requires 8.5GB of virtual address space (ie, could be handled with 1GB of RAM and 7.5GB of swap), but not less, and the requirement has an upper bound.
The same 8GB dataset, with etcd, would require only 0.5GB of virtual address space, but not less (ie, could be handled with 500MB of RAM and no swap), in theory. In practice, under high load, etcd's memory use is unbounded.
Other considerations
There are other considerations like data consistency, or supported languages, that have to be evaluated separately.
In my case, the language the server is written in is a factor, as I have in-house C expertise, but no Go expertise. This means I can maintain/diagnose/customize redis (written in C) in-house if needed, but cannot do the same with etc (written in Go), I'd have to use it as released by the maintainers.
My conclusion
Unfortunately, the memory behaviour of etcd, whereby it needs to allocate memory to access the indexed data, negates the memory advantages it might have theoretically, and the risk of crash by OOM due to high load make it unsuitable in applications that might experience unexpected usage spikes. Github bug 14362, Github bug 14352, other OOM reports
Furthermore, the ability to customize the server in-house (ie, available C vs Go expertise) is a business consideration that weighs in redis's favour, in my case.
Im trying to work around an issue which has been bugging me for a while. In a nutshell: on which basis should one assign a max heap space for resource-hogging application and is there a downside for tit being too large?
I have an application used to visualize huge medical datas, which can eat up to several gigabytes of memory if several imaging volumes are opened size by side. Caching the data to be viewed is essential for fluent workflow. The software is supported with windows workstations and is started with a bootloader, which assigns the heap size and launches the main application. The actual memory needed by main application is directly proportional to the data being viewed and cannot be determined by the bootloader, because it would require reading the data, which would, ultimately, consume too much time.
So, to ensure that the JVM has enough memory during launch we set up xmx as large as we dare based, by current design, on the max physical memory of the workstation. However, is there any downside to this? I've read (from a post from 2008) that it is possible for native processes to hog up excess heap space, which can lead to memory errors during runtime. Should I maybe also sniff for free virtualmemory or paging file size prior to assigning heap space? How would you deal with this situation?
Oh, and this is my first post to these forums. Nice to meet you all and be gentle! :)
Update:
Thanks for all the answers. I'm not sure if I put my words right, but my problem rose from the fact that I have zero knowledge of the hardware this software will be run on but would, nevertheless, like to assign as much heap space for the software as possible.
I came to a solution of assigning a heap of 70% of physical memory IF there is sufficient amount of virtual memory available - less otherwise.
You can have heap sizes of around 28 GB with little impact on performance esp if you have large objects. (lots of small objects can impact GC pause times)
Heap sizes of 100 GB are possible but have down sides, mostly because they can have high pause times. If you use Azul Zing, it can handle much larger heap sizes significantly more gracefully.
The main limitation is the size of your memory. If you heap exceeds that, your application and your computer will run very slower/be unusable.
A standard way around these issues with mapping software (which has to be able to map the whole world for example) is it break your images into tiles. This way you only display the image which is one the screen (or portions which are on the screen) If you need to be able to zoom in and out you might need to store data at two to four levels of scale. Using this approach you can view a map of the whole world on your phone.
Best to not set JVM max memory to greater than 60-70% of workstation memory, in some cases even lower, for two main reasons. First, what the JVM consumes on the physical machine can be 20% or more greater than heap, due to GC mechanics. Second, the representation of a particular data entity in the JVM heap may not be the only physical copy of that entity in the machine's RAM, as the OS has caches and buffers and so forth around the various IO devices from which it grabs these objects.
I have installed MongoDB 2.4.4 on Amazon EC2 with ubuntu 64 bit OS and 1.6 GB RAM.
On this server, only MongoDB running nothing else.
But sometime CPU usage reach to 99% and load average: 500.01, 400.73,
620.77
I have also installed MMS on server to monitor what's going on server.
Here is MMS detail
As per MMS details, indexing working perfectly for each queries.
Suspect details as below
1) HIGH non-mapped virtual memory
2) HIGH page faults
Can anyone help me to understand what exactly causing high CPU usage ?
EDIT:
After comments of #Dylan Tong, i have reduced active connetions but
still there is high non-mapped virtual memory
Here's a summary of a few things to look into:
1. Observed a large number of connections and cursors (13k):
- fix: make sure your connection pool is appropriate. For reporting, and your current request rate, you only need a few connections at most. Also, I'm guessing you have a m1small instance, which means you only have 1 core.
2. Review queries and indexes:
- run your queries with explain(), to observe how the queries are executed. The right model normally results in queries only pulling very few documents and utilization of an index.
3. Memory (compact and readahead setting):
- make the best use of memory. 1.6GB is low. Check how much free memory you have, and compare it to what is reported as resident. A couple of common causes of low resident memory is due to fragmentation. If there are alot of documents moving, changing size and such, you should run the compact command to defragment your data files. Also, a bad readahead can lead to poor use of memory as well. Check your readahead setting (http://manpages.ubuntu.com/manpages/lucid/man2/readahead.2.html). Try a few values starting with low values (http://docs.mongodb.org/manual/administration/production-notes/). The production notes recommend 32 (for standard 512byte blocks). Sometimes higher values are optimal if your documents are larger. The hope is that resident memory should be close to your available memory and your page faults should start to lower.
If you're using resources to the fullest after this, and you're still capped out on CPU then it means you need to up your resources.
Recently I've been working on an expression engine project that has a performance problem. On a test with 50 concurrent connections
Extremely high (100%) CPU usage
Low RAM usage (2 gigs out of 8)
Low CPU/RAM usage on the database
And the web server has 4 CPUs. Now, if I turn on the cache, the utilization is lower, but the content is such that dynamic caching had to be taken off. Now the expression engine is made up of templates that have to be read into memory and parsed. For those not familiar with expression engine, it is built using CodeIgniter.
My thinking is this that if Apache and the expression engine files were taken off HDD and put onto an SSD, I/O for the templates, it would be a lot faster and would lower the CPU utilization by Apache. Would this kind of performance improvement actually happen or would an SSD make no difference?
SSD will always be faster then spinny turny disks where disk I/O is concerned, but it doesn't sound like that's where your bottleneck is.
You're not using RAM and as you correctly stated, the templates have to be parsed. You have 4 CPU's, but they may be from 1998 (we don't know). If they are more recent, it sounds like it should be more than enough for 50 concurrent connections, but you may be rendering the contents of the Library of Congress (again, we don't know).
You might get some benefit with tag caching or some of the other techniques mentioned in The Guide.
Also found this: http://eeinsider.com/articles/using-cache-wisely-with-expressionengine/