I've set up SOLR indices on my computer, and everything works fine.
I'm not experienced with SOLR but after profiling the "start.jar" process for a while, I've noticed that the RAM consumption jumps around a lot, anywhere from 150MB to 400MBish. And this is just for 10K documents!
So as a response, I wrote a script that just waits for SOLR to go past my RAM consumption limit (on shared hosting), and when it does, it kills start.jar and restarts it.
Does this have any adverse effect? And if so, what better solutions are there, besides get more RAM or use cloud based SOLR (which also costs money)? Sorry if this sounds stupid but I just need a working solution.
Thank you.
You need to provide some Index stats:
How big is your index (no. of docs & size in MB/GB)?
How many fields indexed/stored?
How much memory is allocated to JVM?
Do you optimize/commit, very often/realtime?
What is the SOLR Query time you see in logs?
Best
Related
I have a apache server with 32 GB of RAM. When I start the server and execute top to see the resources It show me that the CPU is at 95 percent. It doesn't a normal behaviour and after a few minutes it raises:
apache cannot allocate memory fork unable to fork new process
I don't know how to solve the problem. Any tips?
I had same problem to fix it there is 2 options:
1- move from micro instances to small and this was the change that solved the problem (micro instances on amazon tend to have large cpu steal time)
2- tune the mysql database server configuration and my apache configuration to use a lot less memory.
tuning guide for a low memory situation such as this one: http://www.narga.net/optimizing-apachephpmysql-low-memory-server/ (But don't use the suggestion of MyISAM tables - horrible...)
this 2 options will make the problem much much less happening .. I am still looking for better solution to close the process that are done and kill the ones that hang in there .
I have a relatively big file - around 10GB to process. I suspect it won't fit into my laptop's RAM, if MRJob decides to sort it in RAM or something similar.
At the same time, I don't want to setup hadoop or EMR - the job is not urgent and I can simple start worker before going to sleep and get the results the next morning. In other words, I'm quite happy with local mode. I know, the performance won't be perfect but it's ok for now.
So can it process such 'big' files at a single weak machine? If yes - what would you recommend to do (besides setting a custom tmp dir to point to the filesystem, not to the ramdisk which will be exhausted quickly). Let's assume we use version 0.4.1.
I think the RAM size won't be an issue with the python runner of mrjob. The output of each step should be written out to temporary file on disk, so it should not fill up the RAM I believe. Dumping output to disk is the way it should be with Hadoop (and the reason why it is slow due to IO). So I would just run the job and see how it goes.
If the RAM size is an issue, you can create enough swap space on your laptop to make it at least run, thought it will be slow if the partition isn't on SSD.
Virtual Machine:
4CPU
10GB RAM
10GB swap
Java 1.7
-Xms=-Xmx=6144m
Tomcat 7
We observed a very strange behaviour with the JVM. The JVm resident memory began to shrink and the swap usage shot up to over 50%.
Please see below stats from monitoring tools.
http://i44.tinypic.com/206n6sp.jpg
http://i44.tinypic.com/m99hl0.jpg
Any pointers to understand this is grateful.
Thanks!
Or maybe your Java program was idle and it didn't need that memory, and you have high swappiness? In such situation your OS would free RAM just in case and leave only used part.
In my opinion, that is actually good behaviour, why should you waste RAM for process that won't use it?
Unless you run only this one process on VM, then it would be quite good idea to set swappiness to 0 or other small number - this memory was given to this single process, so we may disable swapping it.
Thanks for the response. Yes this is more close to a system troubleshooting than Java but I thought this the right forum to initiate this topic incase anybody has seen such a phenomena with JVM.
Anyways, I had already checked the top and no there was no other process than Java which was hungry for memory. Actually the second top process was utilizing 72MB (RSS).
No the swappiness is not aggressive set on this system but at default 60. One additional information I missed to share is we have 4 app servers in cluster and all showed this behaviour exactly at the same time. AFAIK, JVM does not swap out but the OS would. But all of it is what confusing me.
All these app servers are production and busy serving request so not idle. The used Heap size was at Avg 5 GB used of the the 6GB.
The other interesting thing I found out were some failed messages in the Vmware logs at the same time which is what I'm investigating.
I have installed MongoDB 2.4.4 on Amazon EC2 with ubuntu 64 bit OS and 1.6 GB RAM.
On this server, only MongoDB running nothing else.
But sometime CPU usage reach to 99% and load average: 500.01, 400.73,
620.77
I have also installed MMS on server to monitor what's going on server.
Here is MMS detail
As per MMS details, indexing working perfectly for each queries.
Suspect details as below
1) HIGH non-mapped virtual memory
2) HIGH page faults
Can anyone help me to understand what exactly causing high CPU usage ?
EDIT:
After comments of #Dylan Tong, i have reduced active connetions but
still there is high non-mapped virtual memory
Here's a summary of a few things to look into:
1. Observed a large number of connections and cursors (13k):
- fix: make sure your connection pool is appropriate. For reporting, and your current request rate, you only need a few connections at most. Also, I'm guessing you have a m1small instance, which means you only have 1 core.
2. Review queries and indexes:
- run your queries with explain(), to observe how the queries are executed. The right model normally results in queries only pulling very few documents and utilization of an index.
3. Memory (compact and readahead setting):
- make the best use of memory. 1.6GB is low. Check how much free memory you have, and compare it to what is reported as resident. A couple of common causes of low resident memory is due to fragmentation. If there are alot of documents moving, changing size and such, you should run the compact command to defragment your data files. Also, a bad readahead can lead to poor use of memory as well. Check your readahead setting (http://manpages.ubuntu.com/manpages/lucid/man2/readahead.2.html). Try a few values starting with low values (http://docs.mongodb.org/manual/administration/production-notes/). The production notes recommend 32 (for standard 512byte blocks). Sometimes higher values are optimal if your documents are larger. The hope is that resident memory should be close to your available memory and your page faults should start to lower.
If you're using resources to the fullest after this, and you're still capped out on CPU then it means you need to up your resources.
I have a Redis server running version 2.4.5 and with a dump.rdb of 11GB loaded into memory.
It is running on EC2 on a high memory 4x extra large instance (70GB total memory).
However, turns out Redis is already taking up 50GB of memory and is just growing more and more. My dataset is still gonna grow larger, probably to around 20GB, so clearly 70GB memory wont be enough. Do you guys have any ideas on how to overcome this limitation or how to make Redis eat less memory?
I've tried redis 32bit but it dies trying to load the data set into memory at startup.
Have also tried max-memory in the past but got weird results. Haven't tried virtual memory since I read it is/was gonna be deprecated.
Unlike the discussion in the comments, I think this problem can be solved with programming, not server configuration
Systems like redis work well sharded. Once you have your scheme set up, you can get it to scale pretty easily. It does take some work to get it set up though in the client code.
For example...
You could shard it across 4x instances using a modulo/hash scheme.
Basically, if md5sum(key) % 4 == 0, it goes to server 0; if md5sum(key) % 4 == 1, it goes to server 1, etc.
You'll have to add some logic into your client to make sure it accesses the right one. When you get a record, figure out which server it is suppose to be at, then query that one. If you have to set a record, figure out which server it is suppose to be at, then set it in that one.
The nice thing about this is that it doesn't affect your performance.