Running Redis with Docker (performance issue) - redis

Has anyone else seen performance issues with running Redis in a Docker container environment?
Here's what I've noticed...
Setup A: Local machine, traditional Redis install
Setup B: Local machine, using canonical Redis image https://registry.hub.docker.com/_/redis/
I've got an identical HTTP server on my local machine that fires as fast as the request/response cycle will allow.
Observations:
- A can sustain approximately 2X the throughput of B.
- B performs identical to A when you benchmark (from within the container)
So, this leads me to believe that B is slower than A because of a networking issue: i.e. the networking relays introduced by running software in a virtualized environment are creating significant performance issues...
Just wondering if anyone else has noticed anything like this?

Docker's default networking option, --net=bridge introduces overhead due to NAT packet rewriting, noticeable with high packet rates.
Network performance can be improved by using --net=host, instructing Docker to not create a separate network stack for the container, allowing full access to the host network interfaces.
This option should be used carefully though, as it lets container processes open low-numbered ports like any other root process, and access local network services like D-bus, which can lead to processes in the container being able to do unexpected things.
In short: If you know what you are running inside the container it is safe. If you suspect unwanted or aggressive behavior - do not do it.

Related

HTTPD response time is increasing after 90 TPS

I am doing load test to tune my apache to server maximum concurrent https request. Below is the details of my test.
System
I dockerized my httpd and deployed in openshift with pod configuration is 4CPU, 8GB RAM.
Running load from Jmeter with 200 thread, 600sec ramup time, loop is for infinite. duration is long run (Jmeter is running in same network with VM configuration 16CPU, 32GB RAM ).
I compiled by setting module with worker and deployed in openshift.
Issue
Httpd is not scaling more than 90TPS, even after tried multiple mpm worker configuration (no difference with default and higher configuration)
2.Issue which i'am facing after 90TPS, average time is increasing and TPS is dropping.
Please let me know what could be the issue, if any information is required further suggestions.
I don't have the answer, but I do have questions.
1/ What does your Dockerfile look like?
2/ What does your OpenShift cluster look like? How many nodes? Separate control plane and workers? What version?
2b/ Specifically, how is traffic entering the pod (if you are going in via a route, you'll want to look at your load balancer; if you want to exclude OpenShift from the equation then for the short term, expose a NodePort and have Jmeter hit that directly)
3/ Do I read correctly that your single pod was assigned 8G ram limit? Did you mean the worker node has 8G ram?
4/ How did you deploy the app -- raw pod, deployment config? Any cpu/memory limits set, or assumed? Assuming a deployment, how many pods does it spawn? What happens if you double it? Doubled TPS or not - that'll help point to whether the problem is inside httpd or inside the ingress route.
5/ What's the nature of the test request? Does it make use of any files stored on the network, or "local" files provisioned in a network PV.
And,
6/ What are you looking to achieve? Maximum concurrent requests in one container, or maximum requests in the cluster? If you've not already look to divide and conquer -- more pods on more nodes.
Most likely you have run into a bottleneck/limitation at the SUT. See the following post for a detailed answer:
JMeter load is not increasing when we increase the threads count

What will happen for an Openstack instance if a server (Node) go offline?

I'm new to OpenStack and have a basic question about it. Assume that we have 3 Master node (Controller) and 10 Slave node (Compute node) in our cloud. We make 50 VMs (Instances) on the cloud. What will happen if one node (Controller or Compute node) become offline (Failure)? What is the best solution to prevent shutting down a VM if a server get offline?
Best regards
This question requires more than a short Stackoverflow answer. Here are a few initial thoughts.
When a controller goes offline, the instance itself continues running, but if the failed controller hosts a router, the instance might be cut off from the network. Generally, if the controller has anything that the instance needs, that thing won't be available anymore. There are measures like HA routers that can help in such a case.
When the instance's compute host goes down, the instance doesn't run anymore. You can evacuate instances from a failed compute host, which means that they are rebuilt on different hosts. If an instance's root disk resides on a volume or an ephemeral disk that is shared with other compute hosts, this means a mere instance reboot. If the instance has an ephemeral disk inside the failed host, it must be rebuilt from scratch.
OpenStack has a project named Masakari whose goal is to make instances resilient by redundancy. In short, instance HA. The application keeps running even if an instance crashes.
By the way, master and slave are not correct terminology in this context. Use controller and compute instead.

JMeter gives “The target server failed to respond ” Error

We use Jmeter to do performance testing. I gave 200 threads(200 users). and we have two servers. like sever A, Server B. i tested indivisibly for 200 Users, it works. and we load balancing server Like server C. So request goes to ether server A Or Server B. But if configure my same jmx script(200 thread) with Server C. it gives error below (but it works for 50 users-- no error).
org.apache.http.NoHttpResponseException: The target server failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:61)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at org.apache.jmeter.protocol.http.sampler.MeasuringConnectionManager$MeasuredConnection.receiveResponseHeader(MeasuringConnectionManager.java:201)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
If the issue can be reproduced only on higher loads - it's definitely a server (or load balancer) issue so congratulations on finding the first bottleneck.
Now you can investigate the reason and suggest the fixes, the next steps could be:
Inspect application under test / load balancer logs - you can find a clue there
Inspect application under test / load balancer / database / any other middleware configuration. in the majority of cases default configuration is good for development and debugging but you will need to perform some performance tuning before running a prod-like load test
Collect main health metrics on the application under test side (CPU, RAM, Network, Disk, Swap usage, etc.). It might be the case your application simply lacks hardware resources. You can use built-in tools of the operating system(s) or an APM tool or JMeter PerfMon Plugin
Re-run your test with a profiling tool telemetry enabled on the application under test side. This will give you an overview with regards to where the application spends the most time, which are the "heaviest" functions or functions called most frequently so you would know what to optimise.
Make sure that the load balancer equally (or according to the other algorithm) distributes the requests between the backend servers. It might be the case you're hitting only one server, if this is - consider adding the DNS Cache Manager to your Test Plan and re-run your test to see if it helps.

What exactly is a pre-fork web server model?

I want to know what exactly it means when a web server describes itself as a pre-fork web server. I have a few examples such as unicorn for ruby and gunicorn for python.
More specifically, these are the questions:
What problem does this model solve?
What happens when a pre-fork web server is initially started?
How does it handle requests?
Also, a more specific question for unicorn/gunicorn:
Let's say that I have a webapp that I want to run with (g)unicorn. On initialization, the webapp will do some initialization stuff (e.g. fill in additional database entries). If I configure (g)unicorn with multiple workers, will the initialization stuff be run multiple times?
Pre-forking basically means a master creates forks which handle each request. A fork is a completely separate *nix process.
Update as per the comments below. The pre in pre-fork means that these processes are forked before a request comes in. They can however usually be increased or decreased as the load goes up and down.
Pre-forking can be used when you have libraries that are NOT thread safe. It also means issues within a request causing problems will only affect the process which they are processed by and not the entire server.
The initialisation running multiple times all depends on what you are deploying. Usually however connection pools and stuff of that nature would exist for each process.
In a threading model the master would create lighter weight threads to dispatch requests too. But if a thread causes massive issues it could have repercussions for the master process.
With tools such as Nginx, Apache 2.4's Event MPM, or gevent (which can be used with Gunicorn) these are asynchronous meaning a process can handle hundreds of requests whilst not blocking.
How does a "pre-fork worker model" work?
Master Process: There is a master process that spawns and kills workers, depending on the load and the capacity of the hardware. More incoming requests would cause the master to spawn more workers, up to a point where the "hardware limit" (e.g. all CPUs saturated) is reached, at which point queing will set in.
Workers: A worker can be understood as an instance of your application/server. So if there are 4 workers, your server is booted 4 times. It means it occupies 4 times the "Base-RAM" than only one worker would, unless you do shared memory wizardry.
Initialization: Your initialization logic needs to be stable enough to account for multiple servers. For example, if you write db entries, check if they are there already or add a setup job before your app server
Pre-fork: The "pre" in prefork means that the master always adds a bit more capacity than currently required, such that if the load goes up the system is "already ready". So it preemptively spawns some workers. For example in this apache library, you control this with the MinSpareServers property.
Requests: The requests (TCP connection handles) are being passed from the master process to the children.
What problem do pre-fork servers solve?
Multiprocessing: If you have a program that can only target one CPU core, you potentially waste some of your hardware's capacity by only spawning one server. The forked workers tackle this problem.
Stability: When one worker crashes, the master process isn't affected. It can just spawn a new worker.
Thread safety: Since it's really like your server is booted multiple times, in separate processes, you don't need to worry about threadsafety (since there are no threads). This means it's an appropriate model when you have non-threadsafe code or use non-threadsafe libs.
Speed: Since the child processes aren't forked (spawned) right when needed, but pre-emptively, the server can always respond fast.
Alternatives and Sidenotes
Container orchestration: If you're familiar with containerization and container orchestration tools such as kubernetes, you'll notice that many of the problems are solved by those as well. Kubernetes spawns multiple pods for multiprocessing, it has the same (or better) stability and things like "horizontal pod autoscalers" that also spawn and kill workers.
Threading: A server may spawn a thread for each incoming request, which allows for many requests being handled "simultaneously". This is the default for most web servers based on Java, since Java natively has good support for threads. Good support meaning the threads run truly parallel, on different cpu cores. Python's threads on the other hand cannot truly parallelize (=spread work to multiple cores) due to the GIL (Global Interpreter Lock), they only provide a means for contex switching. More on that here. That's why for python servers "pre-forkers" like gunicorn are so popular, and people coming from Java might have never heard of such a thing before.
Async / non-blocking processing: If your servers spend a lot of time "waiting", for example disk I/O, http requests to external services or database requests, then multiprocessing might not be what you want. Instead consider making your code "non-blocking", meaning that it can handle many requests concurrently. Async / await (coroutines) based systems like fastapi (asgi server) in python, Go or nodejs use this mechanism, such that even one server can handle many requests concurrently.
CPU bound tasks: If you have CPU bound tasks, the non-blocking processing mentioned above won't help much. Then you'll need some way of multiprocessing to distribute the load on your CPU cores, as the solutions mentioned above, that is: container orchestration, threading (on systems that allow true parallelization) or... pre-forked workers.
Sources
https://www.reddit.com/r/learnprogramming/comments/25vdm8/what_is_a_prefork_worker_model_for_a_server/
https://httpd.apache.org/docs/2.4/mod/prefork.html

Using ServiceStack Redis with Twemproxy

I've been using ServiceStack PooledRedisClientManager with success. I'm now adding Twemproxy into the mix and have 4 Redis instances fronted with Twemproxy running on a single Ubuntu server.
This has caused problems with light load tests (100 users) connecting to Redis through ServiceStack. I've tried the original PooledRedisClientManager and BasicRedisClientManager, both are giving the error No connection could be made because the target machine actively refused it
Is there something I need to do to get these two to play nice together? This is the Twemproxy config
alpha:
listen: 0.0.0.0:12112
hash: fnv1a_64
distribution: ketama
auto_eject_hosts: true
redis: true
timeout: 400
server_retry_timeout: 30000
server_failure_limit: 3
server_connections: 1000
servers:
- 0.0.0.0:6379:1
- 0.0.0.0:6380:1
- 0.0.0.0:6381:1
- 0.0.0.0:6382:1
I can connect to each one of the Redis server instances individually, it just fails going through Twemproxy.
I haven't used twemproxy before but I would say your list of servers is wrong. I don't think you are using 0.0.0.0 correctly.
Your servers would need to be (for your local testing):
servers:
- 127.0.0.1:6379:1
- 127.0.0.1:6380:1
- 127.0.0.1:6381:1
- 127.0.0.1:6382:1
You use 0.0.0.0 on the listen command to tell twemproxy to listen on all available network interfaces on the server. This mean twemproxy will try to listen on:
the loopback address 127.0.0.1 (localhost),
on your private IP (i.e. 192.168.0.1) and
on your public IP (i.e. 134.xxx.50.34)
When you are specifying servers, the server config needs to know the actual address it should connect on. 0.0.0.0 doesn't make sense. It needs a real value. So when you come to use different Redis machines you will want to use, the private IPs of each machine like this:
servers:
- 192.168.0.10:6379:1
- 192.168.0.13:6379:1
- 192.168.0.14:6379:1
- 192.168.0.27:6379:1
Obviously your IP addresses will be different. You can use ifconfig to determine the IP on each machine. Though it may be worth using a hostname if your IPs are not statically assigned.
Update:
As you have said you are still having issues, I would make these recommendations:
Remove auto_eject_hosts: true. If you were getting some connectivity, then after time you end up with no connectivity, it's because something has caused twemproxy to think there was something wrong with the Redis hosts and reject them.
So eventually when your ServiceStack client connects to twemproxy, there will be no hosts to pass the request onto and you get the error No connection could be made because the target machine actively refused it.
Do you actually have enough RAM to stress test your local machine this way? You are running at least 4 instances of Redis, which require real memory to store the values, twemproxy consumes a large amount of memory to buffer the requests it passes to Redis, this memory pool is never released, see here for more information. Your ServiceStack app will consume memory - more so in Debug mode. You'll probably have Visual Studio or another IDE open, the stress test application, and your operating system. On top of all that there will likely be background processes and other applications you haven't closed.
A good practice is to try to run tests on isolated hardware as far as possible. If it is not possible, then the system must be monitored to check the benchmark is not impacted by some external activity.
You should read the Redis article here about benchmarking.
As you are using this in a localhost situation use the BasicRedisClientManager not the PooledRedisClientManager.