Using redis to emails queue in many servers environment

Using redis to emails queue in many servers environment - redis

In my production environment there are 7 parallel servers. I use redis to make emails queue like that:
$this->getRedis()->lpush('mailsQueue', serialize($mail));
And the daemon that is listening to the queue:
do {
$mail = $this->getRedis()->rpop('mailsQueue');
if ($mail) {
// sending an email
}
usleep(1000);
} while (true);
It works pretty well when the daemon is run in only one instance. But in production environment each of 7 servers has own daemon service. This makes a problem that sometimes, an email is sending couple times. It's because sometimes not only the one daemon service load the same email from "mailsQueue" list.
How can I make sure, that the element load with "rpop" will be loaded the only one time regardless how many daemon services I've got run?
Huge thanks for every help!

Wierd, I would have that that rpop would be atomic. You should be able to use MULTI to force a transaction so that no one else can interfear with that variable.
http://redis.io/topics/transactions
All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
More info:
https://github.com/StackExchange/StackExchange.Redis/blob/master/Docs/Transactions.md

Related

Celery 4.3.0 - Send Signal To a Task Without Termination

On a celery service on CENTOS which runs a single task at a time, the termination of a task is simple:
revoke(id, terminate=True, signal='SIGINT')
However while the interrupt signal is being processed, the running task gets revoked. Then a new task - from the queue - starts on the node. This is troublesome. Two task are running at the same time on the node. The signal handling could take up to a minute.
The question is how a signal could be sent to a running task, without actually terminating the task in celery?
Or let's say is there any way to send a signal to a running task?
The assumption is user should be able to send a signal from a remote node. In other words user does not have access to list the running processes of the node.
Any other solution is welcome.

I don't understand your goal.
Are you trying to kill the worker? if so, I guess you are talking about t "Warm shutdown", so you can send the SIGTEERM to the worker's process. The running task will get a chance to finish but no new task will be added.
If you're just interested in revoking a specific task and keep using the same worker, can you share your celery configuration and the worker command? are you sure you're running with concurrency 1 ?

Restarting managed servers by clusters without outage

I want to write script for restarting weblogics managed servers, which would do the following:
It would contain loop ,which would restart first nodes of all clusters at one time.
a.)FORCE_SHUTDOWN
b.)wait for status: SHUTDOWN
c.)START managed servers
d.)wait for status: RUNNING
e.)move to next node of each cluster and repeat until all managed servers are restarted.
So in first iteration it would restart all first nodes of each cluster, in second iteration it would restart the second nodes of each cluster and repeat this action until all managed servers are restarted.
I have not started to writing the script yet, I am newbie with weblogic and this is just concept. Do you have any suggestions how to achieve that goal?

Why reinvent the wheel?
rollingRestart
Category: Control Commands
Use with WLST: Online
Description Initiates a rolling restart of all servers in a domain or all servers in a specific cluster or clusters without interrupting
the service. This command provides the ability to sequentially restart
servers.
This operation involves the graceful shutdown of the servers, and the
servers being restarted without interrupting the service for the user.
Syntax
rollingRestart(target, [options])

Is PUBSUB CHANNELS command blocking Redis server?

We are known that KEYS command block Redis server and need to use *SCAN commands instead.
As I understand Redis server can handle a lot of pubsub connection. So, if I call PUBSUB CHANNELS command on such server can it handle pubsub connections or handle other commands during execution of this command?

Redis is single threaded. It can have any number of clients, but the commands that are getting executed is single threaded (one by one).
In PUBSUB you are subscribing to a client, which will hold the connection to the server.
When you publish a message it gets delivered to all the channels that have subscribed, so basically it's a single call which does publishing to all channels in that call itself. So if you have multiple clients (say a million) subscribing to a single channel, it will take some time to publish to all those clients, then yes it is blocking. Also note that blocking will happen only during publish action.
Hope this answers your question.

What exactly is a pre-fork web server model?

I want to know what exactly it means when a web server describes itself as a pre-fork web server. I have a few examples such as unicorn for ruby and gunicorn for python.
More specifically, these are the questions:
What problem does this model solve?
What happens when a pre-fork web server is initially started?
How does it handle requests?
Also, a more specific question for unicorn/gunicorn:
Let's say that I have a webapp that I want to run with (g)unicorn. On initialization, the webapp will do some initialization stuff (e.g. fill in additional database entries). If I configure (g)unicorn with multiple workers, will the initialization stuff be run multiple times?

Pre-forking basically means a master creates forks which handle each request. A fork is a completely separate *nix process.
Update as per the comments below. The pre in pre-fork means that these processes are forked before a request comes in. They can however usually be increased or decreased as the load goes up and down.
Pre-forking can be used when you have libraries that are NOT thread safe. It also means issues within a request causing problems will only affect the process which they are processed by and not the entire server.
The initialisation running multiple times all depends on what you are deploying. Usually however connection pools and stuff of that nature would exist for each process.
In a threading model the master would create lighter weight threads to dispatch requests too. But if a thread causes massive issues it could have repercussions for the master process.
With tools such as Nginx, Apache 2.4's Event MPM, or gevent (which can be used with Gunicorn) these are asynchronous meaning a process can handle hundreds of requests whilst not blocking.

How does a "pre-fork worker model" work?
Master Process: There is a master process that spawns and kills workers, depending on the load and the capacity of the hardware. More incoming requests would cause the master to spawn more workers, up to a point where the "hardware limit" (e.g. all CPUs saturated) is reached, at which point queing will set in.
Workers: A worker can be understood as an instance of your application/server. So if there are 4 workers, your server is booted 4 times. It means it occupies 4 times the "Base-RAM" than only one worker would, unless you do shared memory wizardry.
Initialization: Your initialization logic needs to be stable enough to account for multiple servers. For example, if you write db entries, check if they are there already or add a setup job before your app server
Pre-fork: The "pre" in prefork means that the master always adds a bit more capacity than currently required, such that if the load goes up the system is "already ready". So it preemptively spawns some workers. For example in this apache library, you control this with the MinSpareServers property.
Requests: The requests (TCP connection handles) are being passed from the master process to the children.
What problem do pre-fork servers solve?
Multiprocessing: If you have a program that can only target one CPU core, you potentially waste some of your hardware's capacity by only spawning one server. The forked workers tackle this problem.
Stability: When one worker crashes, the master process isn't affected. It can just spawn a new worker.
Thread safety: Since it's really like your server is booted multiple times, in separate processes, you don't need to worry about threadsafety (since there are no threads). This means it's an appropriate model when you have non-threadsafe code or use non-threadsafe libs.
Speed: Since the child processes aren't forked (spawned) right when needed, but pre-emptively, the server can always respond fast.
Alternatives and Sidenotes
Container orchestration: If you're familiar with containerization and container orchestration tools such as kubernetes, you'll notice that many of the problems are solved by those as well. Kubernetes spawns multiple pods for multiprocessing, it has the same (or better) stability and things like "horizontal pod autoscalers" that also spawn and kill workers.
Threading: A server may spawn a thread for each incoming request, which allows for many requests being handled "simultaneously". This is the default for most web servers based on Java, since Java natively has good support for threads. Good support meaning the threads run truly parallel, on different cpu cores. Python's threads on the other hand cannot truly parallelize (=spread work to multiple cores) due to the GIL (Global Interpreter Lock), they only provide a means for contex switching. More on that here. That's why for python servers "pre-forkers" like gunicorn are so popular, and people coming from Java might have never heard of such a thing before.
Async / non-blocking processing: If your servers spend a lot of time "waiting", for example disk I/O, http requests to external services or database requests, then multiprocessing might not be what you want. Instead consider making your code "non-blocking", meaning that it can handle many requests concurrently. Async / await (coroutines) based systems like fastapi (asgi server) in python, Go or nodejs use this mechanism, such that even one server can handle many requests concurrently.
CPU bound tasks: If you have CPU bound tasks, the non-blocking processing mentioned above won't help much. Then you'll need some way of multiprocessing to distribute the load on your CPU cores, as the solutions mentioned above, that is: container orchestration, threading (on systems that allow true parallelization) or... pre-forked workers.
Sources
https://www.reddit.com/r/learnprogramming/comments/25vdm8/what_is_a_prefork_worker_model_for_a_server/
https://httpd.apache.org/docs/2.4/mod/prefork.html

What is the correct way to use the timeout manager with the distributor in NServiceBus 3+?

Version pre-3 the recommendation was to run a timeout manager as a standalone process on your cluster, beside the distributor. (As detailed here: http://support.nservicebus.com/customer/portal/articles/965131-deploying-nservicebus-in-a-windows-failover-cluster).
After the inclusion of the timeout manager as a satellite assembly, what is the correct way to use it when scaling out with the distributor?
Should each worker of Service A run with timeout manager enabled or should only the distributor process for service A be configured to run a timeout manager for service A?
If each worker runs it, do they share the same Raven instance for storing the timeouts? (And if so, how do you make sure that two or more workers don't pick up the same expired timeout at the same time?)

Allow me to answer this clearly myself.
After a lot of digging and with help from Andreas Öhlund on the NSB team(http://tech.groups.yahoo.com/group/nservicebus/message/17758), the correct anwer to this question is:
Like Udi Dahan mentioned, by design ONLY the distributor/master node should run a timeout manager in a scale out scenario.
Unfortunately in early versions of NServiceBus 3 this is not implemented as designed.
You have the following 3 issues:
1) Running with the Distributor profile does NOT start a timeout manager.
Workaround:
Start the timeout manager on the distributor yourself by including this code on the distributor:
class DistributorProfileHandler : IHandleProfile<Distributor>
{
public void ProfileActivated()
{
Configure.Instance.RunTimeoutManager();
}
}
If you run the Master profile this is not an issue as a timeout manager is started on the master node for you automatically.
2) Workers running with the Worker profile DO each start a local timeout manager.
This is not as designed and messes up the polling against the timeout store and dispatching of timeouts. All workers poll the timeout store with "give me the imminent timeouts for MASTERNODE". Notice they ask for timeouts of MASTERNODE, not for W1, W2 etc. So several workers can end up fetching the same timeouts from the timeout store concurrently, leading to conflicts against Raven when deleting the timeouts from it.
The dispatching always happens through the LOCAL .timouts/.timeoutsdispatcher queues, while it SHOULD be through the queues of the timeout manager on the MasterNode/Distributor.
Workaround, you'll need to do both:
a) Disable the timeout manager on the workers. Include this code on your workers
class WorkerProfileHandler:IHandleProfile<Worker>
{
public void ProfileActivated()
{
Configure.Instance.DisableTimeoutManager();
}
}
b) Reroute NServiceBus on the workers to use the .timeouts queue on the MasterNode/Distributor.
If you don't do this, any call to RequestTimeout or Defer on the worker will die with an exception saying that you have forgotten to configure a timeout manager. Include this in your worker config:
<UnicastBusConfig TimeoutManagerAddress="{endpointname}.Timeouts#{masternode}" />
3) Erroneous "Ready" messages back to the distributor.
Because the timeout manager dispatches the messages directly to the workers input queues without removing an entry from the available workers in the distributor storage queue, the workers send erroneous "Ready" messages back to the distributor after handling a timeout. This happens even if you have fixed 1 and 2, and it makes no difference if the timeout was fetched from a local timeout manager on the worker or on one running on the distributor/MasterNode. The consequence is a build up of an extra entry in the storage queue on the distributor for each timeout handled by a worker.
Workaround:
Use NServiceBus 3.3.15 or later.

In version 3+ we created the concept of a master node which hosts inside it all the satellites like the distributor, timeout manager, gateway, etc.
The master node is very simple to run - you just pass a /master flag to the NServiceBus.Host.exe process and it runs everything for you. So, from a deployment perspective, where you used to deploy the distributor, now you deploy the master node.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas