Redis CPU spike - redis

The problem is described and answered here: https://groups.google.com/forum/#!topic/redis-db/egyA1xvhGfo
Unfortunately I do not fully understand the answer.
My concerns are if redis takes up 100% CPU every 5 minutes and if my server only has a single CPU (i.e. staging) would that mean it will freeze my httpd process every 5 minutes?
Would this not be of a concern if my server has multiple CPUs?

Depending on the type of persistence you select, this will happen. The reason is because the standard persist method ( fork and copy-on-write aka cow ) happens after x number of object changes ( or however you have it configured ) and will eat up a fair amount of I/O persisting the database to disk. You'll want at least a spare core on your server for the persistence to happen but it's not so much actual CPU that's being utilized as it is the wait for the I/O. Faster I/O will mitigate the impact of the db saves.

Related

Scheduling on multiple cores with each list in each processor vs one list that all processes share

I have a question about how scheduling is done. I know that when a system has multiple CPUs scheduling is usually done on a per processor bases. Each processor runs its own scheduler accessing a ready list of only those processes that are running on it.
So what would be the pros and cons when compared to an approach where there is a single ready list that all processors share?
Like what issues are there when assigning processes to processors and what issues might be caused if a process always lives on one processor? In terms of the mutex locking of data structures and time spent waiting on for the locks are there any issues to that?
Generally there is one, giant problem when it comes to multi-core CPU systems - cache coherency.
What does cache coherency mean?
Access to main memory is hard. Depending on the memory frequency, it can take between a few thousand to a few million cycles to access some data in RAM - that's a whole lot of time the CPU is doing no useful work. It'd be significantly better if we minimized this time as much as possible, but the hardware required to do this is expensive, and typically must be in very close proximity to the CPU itself (we're talking within a few millimeters of the core).
This is where the cache comes in. The cache keeps a small subset of main memory in close proximity to the core, allowing accesses to this memory to be several orders of magnitude faster than main memory. For reading this is a simple process - if the memory is in the cache, read from cache, otherwise read from main memory.
Writing is a bit more tricky. Writing to the cache is fast, but now main memory still holds the original value. We can update that memory, but that takes a while, sometimes even longer than reading depending on the memory type and board layout. How do we minimize this as well?
The most common way to do so is with a write-back cache, which, when written to, will flush the data contained in the cache back to main memory at some later point when the CPU is idle or otherwise not doing something. Depending on the CPU architecture, this could be done during idle conditions, or interleaved with CPU instructions, or on a timer (this is up to the designer/fabricator of the CPU).
Why is this a problem?
In a single core system, there is only one path for reads and writes to take - they must go through the cache on their way to main memory, meaning the programs running on the CPU only see what they expect - if they read a value, modified it, then read it back, it would be changed.
In a multi-core system, however, there are multiple paths for data to take when going back to main memory, depending on the CPU that issued the read or write. this presents a problem with write-back caching, since that "later time" introduces a gap in which one CPU might read memory that hasn't yet been updated.
Imagine a dual core system. A job starts on CPU 0 and reads a memory block. Since the memory block isn't in CPU 0's cache, it's read from main memory. Later, the job writes to that memory. Since the cache is write-back, that write will be made to CPU 0's cache and flushed back to main memory later. If CPU 1 then attempts to read that same memory, CPU 1 will attempt to read from main memory again, since it isn't in the cache of CPU 1. But the modification from CPU 0 hasn't left CPU 0's cache yet, so the data you get back is not valid - your modification hasn't gone through yet. Your program could now break in subtle, unpredictable, and potentially devastating ways.
Because of this, cache synchronization is done to alleviate this. Application IDs, address monitoring, and other hardware mechanisms exist to synchronize the caches between multiple CPUs. All of these methods have one common problem - they all force the CPU to take time doing bookkeeping rather than actual, useful computations.
The best method of avoiding this is actually keeping processes on one processor as much as possible. If the process doesn't migrate between CPUs, you don't need to keep the caches synchronized, as the other CPUs won't be accessing that memory at the same time (unless the memory is shared between multiple processes, but we'll not go into that here).
Now we come to the issue of how to design our scheduler, and the three main problems there - avoiding process migration, maximizing CPU utilization, and scalability.
Single Queue Multiprocessor scheduling (SQMS)
Single Queue Multiprocessor schedulers are what you suggested - one queue containing available processes, and each core accesses the queue to get the next job to run. This is fairly simple to implement, but has a couple of major drawbacks - it can cause a whole lot of process migration, and does not scale well to larger systems with more cores.
Imagine a system with four cores and five jobs, each of which takes about the same amount of time to run, and each of which is rescheduled when completed. On the first run through, CPU 0 takes job A, CPU 1 takes B, CPU 2 takes C, and CPU 3 takes D, while E is left on the queue. Let's then say CPU 0 finishes job A, puts it on the back of the shared queue, and looks for another job to do. E is currently at the front of the queue, to CPU 0 takes E, and goes on. Now, CPU 1 finishes job B, puts B on the back of the queue, and looks for the next job. It now sees A, and starts running A. But since A was on CPU 0 before, CPU 1 now needs to sync its cache with CPU 0, resulting in lost time for both CPU 0 and CPU 1. In addition, if two CPUs both finish their operations at the same time, they both need to write to the shared list, which has to be done sequentially or the list will get corrupted (just like in multi-threading). This requires that one of the two CPUs wait for the other to finish their writes, and sync their cache back to main memory, since the list is in shared memory! This problem gets worse and worse the more CPUs you add, resulting in major problems with large servers (where there can be 16 or even 32 CPU cores), and being completely unusable on supercomputers (some of which have upwards of 1000 cores).
Multi-queue Multiprocessor Scheduling (MQMS)
Multi-queue multiprocessor schedulers have a single queue per CPU core, ensuring that all local core scheduling can be done without having to take a shared lock or synchronize the cache. This allows for systems with hundreds of cores to operate without interfering with one another at every scheduling interval, which can happen hundreds of times a second.
The main issue with MQMS comes from CPU Utilization, where one or more CPU cores is doing the majority of the work, and scheduling fairness, where one of the processes on the computer is being scheduled more often than any other process with the same priority.
CPU Utilization is the biggest issue - no CPU should ever be idle if a job is scheduled. However, if all CPUs are busy, so we schedule a job to a random CPU, and a different CPU ends up becoming idle, it should "steal" the scheduled job from the original CPU to ensure every CPU is doing real work. Doing so, however, requires that we lock both CPU cores and potentially sync the cache, which may degrade any speedup we could get by stealing the scheduled job.
In conclusion
Both methods exist in the wild - Linux actually has three different mainstream scheduler algorithms, one of which is an SQMS. The choice of scheduler really depends on the way the scheduler is implemented, the hardware you plan to run it on, and the types of jobs you intend to run. If you know you only have two or four cores to run jobs, SQMS is likely perfectly adequate. If you're running a supercomputer where overhead is a major concern, then an MQMS might be the way to go. For a desktop user - just trust the distro, whether that's a Linux OS, Mac, or Windows. Generally, the programmers for the operating system you've got have done their homework on exactly what scheduler will be the best option for the typical use case of their system.
This whitepaper describes the differences between the two types of scheduling algorithms in place.

mongodb high cpu usage

I have installed MongoDB 2.4.4 on Amazon EC2 with ubuntu 64 bit OS and 1.6 GB RAM.
On this server, only MongoDB running nothing else.
But sometime CPU usage reach to 99% and load average: 500.01, 400.73,
620.77
I have also installed MMS on server to monitor what's going on server.
Here is MMS detail
As per MMS details, indexing working perfectly for each queries.
Suspect details as below
1) HIGH non-mapped virtual memory
2) HIGH page faults
Can anyone help me to understand what exactly causing high CPU usage ?
EDIT:
After comments of #Dylan Tong, i have reduced active connetions but
still there is high non-mapped virtual memory
Here's a summary of a few things to look into:
1. Observed a large number of connections and cursors (13k):
- fix: make sure your connection pool is appropriate. For reporting, and your current request rate, you only need a few connections at most. Also, I'm guessing you have a m1small instance, which means you only have 1 core.
2. Review queries and indexes:
- run your queries with explain(), to observe how the queries are executed. The right model normally results in queries only pulling very few documents and utilization of an index.
3. Memory (compact and readahead setting):
- make the best use of memory. 1.6GB is low. Check how much free memory you have, and compare it to what is reported as resident. A couple of common causes of low resident memory is due to fragmentation. If there are alot of documents moving, changing size and such, you should run the compact command to defragment your data files. Also, a bad readahead can lead to poor use of memory as well. Check your readahead setting (http://manpages.ubuntu.com/manpages/lucid/man2/readahead.2.html). Try a few values starting with low values (http://docs.mongodb.org/manual/administration/production-notes/). The production notes recommend 32 (for standard 512byte blocks). Sometimes higher values are optimal if your documents are larger. The hope is that resident memory should be close to your available memory and your page faults should start to lower.
If you're using resources to the fullest after this, and you're still capped out on CPU then it means you need to up your resources.

If not CPU, disk or network, what is the bottleneck during query execution?

I work with SQL Server 2005 and wonder, if not CPU, disk or network, what are users waiting for when SQL Server is working. The strange thing is that system monitor shows that the 4 processors are at an average of 5%, the disk (demonstrated 50MB/s write) works with about 5-8 MB/s, but the execution (inserts and selects) take up to 10 minutes. I'd be happy to install additional hardware, but I don't see what device is the bottleneck and how do I measure its capacity and current workload.
Any advice would be appreciated.
Thanks
additional info: RAM is constantly at about 70% capacity and I am running windows xp.
check your disk read and write 'wait' time. a heavy load database may just make a lot of read and write request with very small piece of data that saturates the IO.
As others mention, disks are rarely the bottleneck when it comes to bandwidth, but rather in the number of IO operations they can perform per second - commonly called IOPS.
The IOPS capabilities of your disks will vary according to disk type, cache and the RAID setup you have.
Another thing you may run into is locking. If you have a lot of concurrent access to the same data, especially inside of large transactions, you may see other transactions being blocked - causing no network, CPU nor disk usage while being blocked, just wasted time.
Probably the disks. If you are seeking all over the place, the throughput (MB/s) will be low even though the disks are running as fast as they can.
Generic advice: try increasing SQLServer's cache, tune your queries, check that the appropriate indexes exist and are used (and that you don't have too many).

What's the best way to 'indicate/numerate' performance of an application?

In the old (single-threaded) days we instructed our testing team to always report the CPU time and not the real-time of an application. That way, if they said that in version 1 an action took 5 CPU seconds, and in version 2 it took 10 CPU seconds, that we had a problem.
Now, with more and more multi-threading, this doesn't seem to make sense anymore. It could be that the version 1 of an application takes 5 CPU seconds, and version 2 10 CPU seconds, but that version 2 is still faster if version 1 is single-threaded, and version 2 uses 4 threads (each consuming 2.5 CPU seconds).
On the other hand, using real-time to compare performance isn't reliable either since it can be influenced by lots of other elements (other applications running, network congestion, very busy database server, fragmented disk, ...).
What is in your opinion the best way to 'numerate' performance?
Hopefully it's not intuition since that is not an objective 'value' and probably leads to conflicts between the development team and the testing team.
Performance needs to be defined before it is measured.
Is it:
memory consumption?
task completion times?
disk space allocation?
Once defined, you can decide on metrics.

SQL Server 2005 - Multiple Processor Usage

We have a 16 processor SQL Server 2005 cluster. When looking at CPU usage data we see that most of the time only 4 of the 16 processors are ever utilized. However, in periods of high load, occasionally a 5th and 6th processor will be used, although never anywhere near the utilization of the other 4. I'm concerned that in periods of tremendously high load that not all of the other processors will be utilized and we will have performance degradation.
Is what we're seeing standard SQL Server 2005 cluster behavior? I assumed that all 16 processors would be utilized at all times, though this does not appear to be the case. Is this something we can tune? Or is this expected behavior? Will SQL server be able to utilize all 16 processors if it comes to that?
I'll consider you did due diligence and validated that the CPU consumption belongs to the sqlservr.exe process, so we're not chasing a red herring here. If not, please make sure the CPU is consumed by sqlservr.exe by checking the Process\% Processor performance counters.
You need to understand the SQL Server CPU scheduling model, as described in Thread and Task Architecture. SQL Server spreads requests (sys.dm_exec_requests) across schedulers (sys.dm_os_schedulers) by assigning each requests to task (sys.dm_os_tasks) that is run by a worker (sys.dm_os_workers). A worker is backed by an OS thread or fiber (sys.dm_os_threads). Most requests (a batch sent to SQL Server) spawn only one task, some requests though may spawn multiple tasks (parallel queries being the most notorious).
The normal behavior of SQL Server 2005 scheduling should be to distribute the tasks evenly, across all schedulers. Each scheduler corresponds to one CPU core. The result should be an even load on all CPU cores. But I've seen the problem you describe a few times in the labs, when the physical workload would distribute unevenly across only few CPUs. You have to understand that SQL Server does not control the thread affinity of its workers, but instead relies on the OS affinity algorithm for thread locality. What that means is that even if SQL Server spreads the requests across the 16 schedulers, the OS might decide to run the threads on only 4 cores. In correlation with this issue there are two problems that may cause or aggravate this behavior:
Hyperthreading. If you enabled hyperthreading, turn it off. SQL Server and hyperthreading should never mix.
Bad drivers. Make sure you have the proper system device drivers installed (for things like main board and such).
Also make sure your SQL 2005 is at least at SP2 level, prefferably at latest SP and all CU applied. Same goes for Windows (do you run Windows 2003 or Windows 2008?).
In theory the behavior could also be explained by a very peculiar workload, ie. SQL sees only few very long and CPU demanding requests that have no parallle option. But that would be an extremly skewed load and I never seen something like that in real life.
Even accounting for IO bottleneck I would check is whether you have processor affinities set up, what your maxdop setting is, whether it is SMP or NUMA which should also affect what maxdop you may wish to set.
when you say you have a 16 processor cluster, you mean 2 SQL servers in a cluster with 16 processors each, or 2 x 8 way SQL servers?
Are you sure that you're not bottlenecking elsewhere? On IO perhaps?
Hard to be sure without hard data, but I suspect the problem is that you're more IO-bound or memory-bound than CPU-bound right now, and 4 processors is enough to keep up with your real bottleneck.
My reasoning is that if there were some configuration problem that was keeping you limited to 4 cpus, you wouldn't see it spill over to the 5th and 6th processors at all.