Is it possible to use RabbitMQ streams with Celery? - rabbitmq

RabbitMQ introduced streams last year. They claim streams work with AMQP 0.9 and 1.0 as well as mentioned here. That is, theoretically we should be able to create a queue backed by a stream, connect as many workers we need to fan-out to the queue and each worker should get the message delivered.
My question is, has anyone tried to use streams with celery yet? If so, please share any info on how to configure streams in Celery and your experience with them so far. There are unfortunately no blog posts nor any documentation I could find on this topic. I am hoping this post brings together all this information in one place.
The big advantage of streams is they allow large fan-out using the existing infra of RabbitMQ + Celery.

As far as I am aware of there is no way Celery can utilise streams. However, you can probably spin up a long running Celery task that processes particular stream. This is probably reason why nobody attempted (or better say recorded as a blog post or something similar) to do this. - Why bother using Celery for something it is not made for?

Related

Is there any way to read messages from Kafka topic without consumer?

Just for testing purpose, I want to automate scenario where I need to check Kafka messages content, so just wanted to know if it is possible to read messages without consumers directly from TOPIC using Kafka java libraries?
I'm new to Kafka so any suggestion will be good for me.
Thanks in advance!
You could SSH to the broker in question, then dump the log segments into a deserialized fashion, but it would take less time to simply use a consumer in any language, not necessarily Java
"For testing purposes" Kafka Java API provides MockProducer and MockConsumer, which are backed by Lists, not a full broker

how to make sub-topics in Kafka

I am trying to represent Topics and Sub-topics in Kafka.
Example : Topic 'Sports' Sub-topic 'Football', 'Handball'
And as I know Kafka doesn't support this. what I am using now are Topics like this 'Sports_Football' , 'Sports_Handball'...
this is not really functional because when we need to when we want the Topic 'Sports' with all the subs we need to query all the topics for it.
we are also using Redis and Apache Storm. So please is there a better way of doing this?
You are correct. There is no such thing as a "subtopic" in Kafka, however, consuming all topics that begin with the word 'Sports' is trivial. Assuming you're using Java, once you have initialized a consumer use the method consumer.subscribe(Pattern.compile("^Sports_.+")) to subscribe to your "subtopics." Calling consumer.poll(timeout) will now read from all topics beginning with 'Sports_'.
The only downside to doing it this way is that the consumer will have to resubscribe when new 'Sports_' topics are added.
Apache Kafka doesn't support it, you are right. But Kafka supports message partitioning. Kafka provides garantee, that all messages with the same key go to the same partition.
You can consume all partitions, or only single one. So you basically can set different keys for the each sport in order to separate messages.
There is also the option of using Redis streams,
using kafka-redis-connector you can push data to Redis streams.
but consider the benefits and inconvenients of Redis streams.
An other interesting solution is using Kafka Streams, so you can create subtopics.
Broker(Sport) ==> Sport_Stream(Football, Handball) ==> Consumer can receive topics from the Broker or receive a subtopic from the Stream.

How to handle long asynchronous requests with pyramid and celery?

I'm setting up a web service with pyramid. A typical request for a view will be very long, about 15 min to finish. So my idea was to queue jobs with celery and a rabbitmq broker.
I would like to know what would be the best way to ensure that bad things cannot happen.
Specifically I would like to prevent the task queue from overflow for example.
A first mesure will be defining quotas per IP, to limit the number of requests a given IP can submit per hour.
However I cannot predict the number of involved IPs, so this cannot solve everything.
I have read that it's not possible to limit the queue size with celery/rabbitmq. I was thinking of retrieving the queue size before pushing a new item into it but I'm not sure if it's a good idea.
I'm not used to good practices in messaging/job scheduling. Is there a recommended way to handle this kind of problems ?
RabbitMQ has flow control built into the QoS. If RabbitMQ cannot handle the publishing rate it will adjust the TCP window size to slow down the publishers. In the event of too many messages being sent to the server it will also overflow to disk. This will allow your consumer to be a bit more naive although if you restart the connection on error and flood the connection you can cause problems.
I've always decided to spend more time making sure the publishers/consumers could work with multiple queue servers instead of trying to make them more intelligent about a single queue server. The benefit is that if you are really overloading a single server you can just add another one (or another pair if using RabbitMQ HA. There is a useful video from Pycon about Messaging at Scale using Celery and RabbitMQ that should be of use.

CQRS using Redis MQ

I have been working on a CQRS project (my first) for over the last 9 months which has been a heavy learning curve. I am currently using JOliver's excellent EventStore in my write model and using PostGresSql for my read model.
Both my read and write databases are on the same machine which means that when a change is made to the write database, in the same synchronous call a change is made to the read model.
As I was learning CQRS I felt this was the best way to go as I had no experience with message queue/service bus frameworks such as MassTransit, NServiceBus etc.
I am now at a point with most of my architecture in place to introduce a message queue framework.
Today, I came across Redis MQ which is part of ServiceStack and as we are already using ServiceStack for our Rest based HTTP clients, this seems like the right way to go.
My question is more about understanding what I need to know (or if I have any misunderstandings) to implement Redis MQ and whether Redis MQ is the right choice?
Now from what I understand, I would use Redis MQ as a durable queue between the write and read database. Once my event store has recorded that something has happened in my domain then it will publish to Redis MQ. The services listening for events/messages would receive the event/message from Redis MQ and once it has processed it (i.e. update or write to the read model), a notification/response goes back to the event store to tell the event store that the message has been received and processed by the listener/subscriber.
Does this sound correct?
Also would the Redis MQ architecture give me everything that NSB, RavenDB, MassTransit etc offer?
Also, I will be deploying to windows 2008 and 2003 server. Is Redis stable for these OSs?
I think the ServiceStack implementation of message queueing in Redis is more appropriate for job-queue scenarios - it pushes a message onto the end of a Redis list and then uses Redis pub-sub to notify listening subscribers that there is a message to pull from the queue. Any consumers would be competing for messages.
For event sourcing, you may be more interested in a type of fanout or topic based messaging topology as offered by RabbitMQ, not that that precludes you from building that sort of thing using Redis data structures yourself.
Now from what I understand, I would use Redis MQ as a durable queue
between the write and read database.
Yes this is correct.
Once my event store has recorded that something has happened in my
domain then it will publish to Redis MQ.
Yes and this can be done in several ways. It can either happen as part of the transaction which persists to the event store or you can have an out of band process which continuously publishes events from the event store.
a notification/response goes back to the event store to tell the event
store that the message has been received and processed by the
listener/subscriber.
The response back to the event publisher is usually omitted. This truly decouples the publishers from subscribers. You make the assumption that once the message is published, all interested subscribers will handle it. If something happens, an error should be logged.
Also would the Redis MQ architecture give me everything that NSB,
RavenDB, MassTransit etc offer?
I don't have experience running Redis MQ, but I do know that Redis supports pub/sub which is one of the value propositions of NSB and MassTransit (as opposed to say bare-bones MSMQ). What MT and NSB offer beyond pub/sub are sagas and it doesn't seem like Redis MQ support those out of the box at least. You may not ever have a need for sagas so this should not automatically be a deterrent. RavenDB is not a message queue so it doesn't apply here.
Also, I will be deploying to windows 2008 and 2003 server. Is Redis
stable for these OSs?
I've run Redis on 2008 R2 and it has been stable so I would think Redis MQ would be stable as well.
You may be interested in a little side project of mine on GitHub which is a queue and persistence implementation for NServiceBus using Redis. https://github.com/mackie1001/NServicebus.Redis
I'd not call it production ready and I want to port it to NSB 4 and do some thorough testing but the meat of it is done.

Real-time application newbie - Node.JS + Redis or RabbitMQ -> client/server how?

I am a newbie to real-time application development and am trying to wrap my head around the myriad options out there. I have read as many blog posts, notes and essays out there that people have been kind enough to share. Yet, a simple problem seems unanswered in my tiny brain. I thought a number of other people might have the same issues, so I might as well sign up and post here on SO. Here goes:
I am building a tiny real-time app which is asynchronous chat + another fun feature. I boiled my choices down to the following two options:
LAMP + RabbitMQ
Node.JS + Redis + Pub-Sub
I believe that I get the basics to start learning and building this out. However, my (seriously n00b) questions are:
How do I communicate with the end-user -> Client to/from Server in both of those? Would that be simple Javascript long/infinite polling?
Of the two, which might more efficient to build out and manage from a single Slice (assuming 100 - 1,000 users)?
Should I just build everything out with jQuery in the 'old school' paradigm and then identify which stack might make more sense? Just so that I can get the product fleshed out as a prototype and then 'optimize' it. Or is writing in one over the other more than mere optimization? ( I feel so, but I am not 100% on this personally )
I hope this isn't a crazy question and won't get flamed right away. Would love some constructive feedback, love this community!
Thank you.
Architecturally, both of your choices are the same as storing data in an Oracle database server for another application to retrieve.
Both the RabbitMQ and the Redis solution require your apps to connect to an intermediary server that handles the data communications. Redis is most like Oracle, because it can be used simply as a persistent database with a network API. But RabbitMQ is a little different because the MQ Broker is not really responsible for persisting data. If you configure it right and use the right options when publishing a message, then RabbitMQ will actually persist the data for you but you can't get the data out except as part of the normal message queueing process. In other words, RabbitMQ is for communicating messages and only offers persistence as a way of recovering from network problems or system crashes.
I would suggest using RabbitMQ and whatever programming languages you are already familiar with. Since the M in LAMP is usually interpreted as MySQL, this means that you would either not use MySQL at all, or only use it for long term storage of data, not for the realtime communications.
The RabbitMQ site has a huge amount of documentation about building apps with AMQP. I suggest that after you install RabbitMQ, you read through the docs for rabbitmqctl and then create a vhost to experiment in. That way it is easy to clean up your experiments without resetting everything. I also suggest using only topic exchanges because you can emulate the behavior of direct and fanout exchanges by using wildcards in the routing_key.
Remember, you only publish messages to exchanges, and you only receive messages from queues. The exchange is responsible for pattern matching the message's routing_key to the queue's binding_key to determine which queues should receive a copy of the message. It is worthwhile learning the whole AMQP model even if you only plan to send messages to one queue with the same name as the routing_key.
If you are building your client in the browser, and you want to build a prototype, then you should consider just using XHR today, and then move to something like Kamaloka-js which is a pure Javascript implementation of AMQP (the AMQ Protocol) which is the standard protocol used to communicate to a RabbitMQ message broker. In other words, build it with what you know today, and then speed it up later which something (AMQP) that has a long term future in your toolbox.
Should I just build everything out with jQuery in the 'old school' paradigm and then identify which stack might make more sense? Just so that I can get the product fleshed out as a prototype and then 'optimize' it. Or is writing in one over the other more than mere optimization? ( I feel so, but I am not 100% on this personally )
This is usually called RAD (rapid application design/development) and it is what I would recommend right now. This lets you build the proof of concept that you can use to work off of later to get what you want to happen.
As for how to talk to the clients from the server, and vice versa, have you read at all on websockets?
Given the choice between LAMP or event based programming, for what you're suggesting, I would tell you to go with the event based programming, so nodejs. But that's just one man's opinion.
Well,
LAMP - Apache create new process for every request. RabbitMQ can be useful with many features.
Node.js - Uses single process to handle all request asynchronously with help of event looping. So, no extra overhead process creation like apache.
For asynchronous chat application,
socket.io + Node.js + redis pub-sup is best stack.
I have already implemented real-time notification using above stack.