We have an architecture such that, there are 3 instances of RabbitMQ (with multiple clusters) setup in 3 different data centers - which are the (federation) upstreams.
There's one instance of RabbitMQ at a different data center, acting as the downstream - to which messages from the other 3 upstreams are federated.
Clients connect to our stomp service, that is setup to connect to this single RabbitMQ - from which it gets the messages from all the instances.
But this single downstream can potentially go down, and the clients would then not be getting any messages. So my questions are:
Is it possible to have a redundant downstream setup?
Can we setup multiple downstreams, for example, also a downstream on one of the 3
data centers?
If so, how can we make sure that the messages are not
duplicated among the 2 (or more) downstreams?
Finally, are there any
other ways to tackle this problem?
A server listening on a UDP port, many clients can connect to it, there are many groups of clients connected to it. In a group one client is sending message and the server needs to route the message to the rest in the group. Like this many groups could be running simultaneously. How can we test what is the maximum number of connections the server can handle without inducing a visible lag in the response time ?
Firstly, let me desrcibe your network topology again. There is a server and many clients, clients are divided into several groups. A client sends a message to the server, and then the server sends something to the other clients in that group.
If the topology is like what I describe above, is the connections limitation you want to reach about how many clients the server can send to at the same time? Or do you want to know how many clients can send to server at the same time?
The way to test these two different circumstances may be using multi-thread or go routine if you can write by go. But they need to set different judge to give out the limitation.
My app has multiple threads that publish messages to a single RabbitMQ cluster.
Reading the rabbit docs: i read the following:
For applications that use multiple threads/processes for processing, it is very common to open a new channel per thread/process and not share channels between them.
And I understand that instead of opening multiple connection (expensive)
it is better to open multiple channels.
But why not use a single channel to all threads?
What are the benefits of using multiple channels over a single channel?
AMQP has the concept of Channel to provide more flexibility over reliable TCP connections. Opening a TCP connection per message would be extremely expensive, so they came up with the idea of logical Channels within a connection.
It is not a good idea to use a Channel for all the threads because if anything fails in a particular thread and the Channel dies, the rest of the threads will throw the exception AlreadyClosedException. A channel can die for multiple reasons: for example for trying to declare something that is already declared with other parameters or trying to cancel a consumer which doesn't exist, publishing to an exchange that doesn't exist, etc...
My best advice would be to have an object that holds a Channel in a local variable and also implements ShutdownListener interface, so every time the channel fails, it is able to recover and create a new one from a connection. So I would say that the main benefit is failure tolerance and scalability, since if a Channel dies it won't affect the rest.
What techniques are people using to utilize multiple processors/cores when running a TwistedWeb server? Is there a recommended way of doing it?
My twisted.web based web service is running on Amazon EC2 instances, which often have multiple CPU cores (8, 16), and the type of work that the service is doing benefits from extra processing power, so i would very much like to use that.
I understand that it is possible to use haproxy, squid or a web server, configured as a reverse proxy, in front of multiple instances of Twisted. In fact, we are currently using such a setup, with nginx serving as a reverse proxy to several upstream twisted.web services running on the same host, but each on different port.
This works fine, but what i'm really interested in, is a solution where there is no "front-facing" server, but all twistd processes somehow bind to the same socket and accept requests. Is such thing even possible... or am i being crazy? The operating system is Linux (CentOS).
Thanks.
Anton.
There are a number of ways to support multiprocess operation for a Twisted application. One important question to answer at the start, though, is what you expect your concurrency model to be, and how your application deals with shared state.
In a single process Twisted application, concurrency is all cooperative (with help from Twisted's asynchronous I/O APIs) and shared state can be kept anywhere a Python object would go. Your application code runs knowing that, until it gives up control, nothing else will run. Additionally, any part of your application that wants to access some piece of shared state can probably do so quite easily, since that state is probably kept in a boring old Python object that is easy to access.
When you have multiple processes, even if they're all running Twisted-based applications, then you have two forms of concurrency. One is the same as for the previous case - within a particular process, the concurrency is cooperative. However, you have a new kind, where multiple processes are running. Your platform's process scheduler might switch execution between these processes at any time, and you have very little control over this (as well as very little visibility into when it happens). It might even schedule two of your processes to run simultaneously on different cores (this is probably even what you're hoping for). This means that you lose some guarantees about consistency, since one process doesn't know when a second process might come along and try to operate on some shared state. This leads in to the other important area of consideration, how you will actually share state between the processes.
Unlike the single process model, you no longer have any convenient, easily accessed places to store your state where all your code can reach it. If you put it in one process, all the code in that process can access it easily as a normal Python object, but any code running in any of your other processes no longer has easy access to it. You might need to find an RPC system to let your processes communicate with each other. Or, you might architect your process divide so that each process only receives requests which require state stored in that process. An example of this might be a web site with sessions, where all state about a user is stored in their session, and their sessions are identified by cookies. A front-end process could receive web requests, inspect the cookie, look up which back-end process is responsible for that session, and then forward the request on to that back-end process. This scheme means that back-ends typically don't need to communicate (as long as your web application is sufficiently simple - ie, as long as users don't interact with each other, or operate on shared data).
Note that in that example, a pre-forking model is not appropriate. The front-end process must exclusively own the listening port so that it can inspect all incoming requests before they are handled by a back-end process.
Of course, there are many types of application, with many other models for managing state. Selecting the right model for multi-processing requires first understanding what kind of concurrency makes sense for your application, and how you can manage your application's state.
That being said, with very new versions of Twisted (unreleased as of this point), it's quite easy to share a listening TCP port amongst multiple processes. Here is a code snippet which demonstrates one way you might use some new APIs to accomplish this:
from os import environ
from sys import argv, executable
from socket import AF_INET
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.static import File
def main(fd=None):
root = File("/var/www")
factory = Site(root)
if fd is None:
# Create a new listening port and several other processes to help out.
port = reactor.listenTCP(8080, factory)
for i in range(3):
reactor.spawnProcess(
None, executable, [executable, __file__, str(port.fileno())],
childFDs={0: 0, 1: 1, 2: 2, port.fileno(): port.fileno()},
env=environ)
else:
# Another process created the port, just start listening on it.
port = reactor.adoptStreamPort(fd, AF_INET, factory)
reactor.run()
if __name__ == '__main__':
if len(argv) == 1:
main()
else:
main(int(argv[1]))
With older versions, you can sometimes get away with using fork to share the port. However, this is rather error prone, fails on some platforms, and isn't a supported way to use Twisted:
from os import fork
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.static import File
def main():
root = File("/var/www")
factory = Site(root)
# Create a new listening port
port = reactor.listenTCP(8080, factory)
# Create a few more processes to also service that port
for i in range(3):
if fork() == 0:
# Proceed immediately onward in the children.
# The parent will continue the for loop.
break
reactor.run()
if __name__ == '__main__':
main()
This works because of the normal behavior of fork, where the newly created process (the child) inherits all of the memory and file descriptors from the original process (the parent). Since processes are otherwise isolated, the two processes don't interfere with each other, at least as far as the Python code they are executing goes. Since the file descriptors are inherited, either the parent or any of the children can accept connections on the port.
Since forwarding HTTP requests is such an easy task, I doubt you'll notice much of a performance improvement using either of these techniques. The former is a bit nicer than proxying, because it simplifies your deployment and works for non-HTTP applications more easily. The latter is probably more of a liability than it's worth accepting.
The recommended way IMO is to use haproxy (or another load balancer) like you already are, the bottleneck shouldn't be the load balancer if configured correctly. Besides, you'll want to have some fallover method which haproxy provides in case one of your processes goes down.
It isn't possible to bind multiple processes to the same TCP socket, but it is possible with UDP.
If you wish to serve your web content over HTTPS as well, this is what you will need to do on top of #Jean-Paul's snippet.
from twisted.internet.ssl import PrivateCertificate
from twisted.protocols.tls import TLSMemoryBIOFactory
'''
Original snippet goes here
..........
...............
'''
privateCert = PrivateCertificate.loadPEM(open('./server.cer').read() + open('./server.key').read())
tlsFactory = TLSMemoryBIOFactory(privateCert.options(), False, factory)
reactor.adoptStreamPort(fd, AF_INET, tlsFactory)
By using fd, you will serve either HTTP or HTTPS but not both.
If you wish to have both, listenSSL on the parent process and include the ssl fd you get from the ssl port as the second argument when spawning the child process.
Complete snipper is here:
from os import environ
from sys import argv, executable
from socket import AF_INET
from twisted.internet import reactor
from twisted.web.server import Site
from twisted.web.static import File
from twisted.internet import reactor, ssl
from twisted.internet.ssl import PrivateCertificate
from twisted.protocols.tls import TLSMemoryBIOFactory
def main(fd=None, fd_ssl=None):
root = File("/var/www")
factory = Site(root)
spawned = []
if fd is None:
# Create a new listening port and several other processes to help out.
port = reactor.listenTCP(8080, factory)
port_ssl = reactor.listenSSL(8443, factory, ssl.DefaultOpenSSLContextFactory('./server.key', './server.cer'))
for i in range(3):
child = reactor.spawnProcess(
None, executable, [executable, __file__, str(port.fileno()), str(port_ssl.fileno())],
childFDs={0: 0, 1: 1, 2: 2, port.fileno(): port.fileno(), port_ssl.fileno(): port_ssl.fileno()},
env=environ)
spawned.append(child)
else:
# Another process created the port, just start listening on it.
port = reactor.adoptStreamPort(fd, AF_INET, factory)
cer = open('./server.cer')
key = open('./server.key')
pem_data = cer.read() + key.read()
cer.close()
pem.close()
privateCert = PrivateCertificate.loadPEM(pem_data )
tlsFactory = TLSMemoryBIOFactory(privateCert.options(), False, factory)
reactor.adoptStreamPort(fd_ssl, AF_INET, tlsFactory)
reactor.run()
for p in spawned:
p.signalProcess('INT')
if __name__ == '__main__':
if len(argv) == 1:
main()
else:
main(int(argv[1:]))
I have situation where I have to handle multiple live UDP streams in the server.
I have two options (as I think)
Single Socket :
1) Listen at single port on the server and receive the data from all clients on the same port and create threads for each client to process the data till the client stop sending.
Here only one port is used to receive the data and number of threads used to process the data.
Multiple Sockets :
2) Client will request open port from the server to send the data and the application will send the open port to the client and opens a new thread listening at the port to receive and process the data.Here for each client will have unique port to send the data.
I already implemented a way to know which packet is coming from which client in UDP.
I have 1000+ clients and 60KB data per second I am receiving.
Is there any performance issues using the above methods
or Is here any efficient way to handle this type of task in C ?
Thanks,
Raghu
With that many clients, having one thread per client is very inefficient since lots and lots of context switches must be performed.
Also, the number of ports you can open per IP is limited (port is a 16 bit number).
Therefore "Single Socket" will be far more efficient. But you can also use "Multipe Sokets" with just a single thread using the asynchronous API. If you can identify the client using the package's payload, then there is no need to have a port per client.