Using RabbitMQ with Stormcrawler - rabbitmq

I want to use RabbitMQ with StormCrawler. I already saw that there is a repository for using RabbitMQ with Storm:
https://github.com/ppat/storm-rabbitmq
How would you use this for the StormCrawler? I would like to use the Producer as well as the consumer.
For the consumer there seems to be some documentation. What about the Producer? Can you just put the config entries in the storm crawler config or would I need to change the source code of the RabbitMQProducer?

You'd want the bolt which sends URLs to RabbitMQ to extend AbstractStatusUpdaterBolt as the super class does a lot of useful things under the bonnet, which means that you would not use the Producer out of the box but will need to write some custom code.
Unless you are certain that there will be no duplicates URLs, you'll need to deduplicate the URLs before sending them to the queues anyway, which could be done e.g. with Redis within your custom status updater.

Related

RabbitMQ: Shovel vs Federation for Microservice Communication

I've spent quite a bit of time trying to figure out whether I should use the RabbitMQ federation plugin or shovel.
Basically I have two microservices. I want one of them to send a message to another. Each microservice has a different rabbitMQ cluster, so I need to use Federation/shovel.
I read this post When to use RabbitMQ shovels and when Federation plugin? and still couldn't figure it out / make a decision.
I want to satisfy the following:
Loose coupling
Microservices don't know about each other -- I.e the first microservice emits a message saying "i'm done doing x". And the second microservice just listens to that 'event' and acts accordingly..
In the future I 'might' want to add more microservices, each with their own rabbitMQ cluster / vhost.
Based on this information - what do you recommend, shovel or federation?
Why not just have one cluster for everything? RabbitMQ is build for handling 10k+ exchanges and queues, actually there is no upper limit except memory or disk space. Setting up a cluster for each microservice is too much work and creates unnecessary overhead. Using vhost should also not be used for this, but for each business area.
I'm only using shovels and I use them to transfer messages from my production environment to test, so I can test with real data. It's very easy to setup with scripts. And yes, you should only do this with scripts. Using the UI is too slow.
I know this doesn't answer your question directly, but I hope it has given you some food for thought.
Happy messaging!

Is there any way to read messages from Kafka topic without consumer?

Just for testing purpose, I want to automate scenario where I need to check Kafka messages content, so just wanted to know if it is possible to read messages without consumers directly from TOPIC using Kafka java libraries?
I'm new to Kafka so any suggestion will be good for me.
Thanks in advance!
You could SSH to the broker in question, then dump the log segments into a deserialized fashion, but it would take less time to simply use a consumer in any language, not necessarily Java
"For testing purposes" Kafka Java API provides MockProducer and MockConsumer, which are backed by Lists, not a full broker

Nservicebus routing

We have multiple web and windows applications which were deployed to different servers that we are planning to integrate using NservierBus to let all apps can pub/sub message between them, I think we using pub/sub pattern and using MSMQ transport will be good for it. but one thing I am not clear if it is a way to avoid hard code to set sub endpoint to MSMQ QueueName#ServerName which has server name in it directly if pub is on another server. on 6-pre I saw idea to set endpoint name then using routing to delegate to transport-level address, is that a solution to do that? or only gateway is the solution? is a broker a good idea? what is the best practice for this scenario?
When using pub/sub, the subscriber currently needs to know the location of the queue of the publisher. The subscriber then sends a subscription-message to that queue, every single time it starts up. It cannot know if it subscribed already and if it subscribed for all the messages, since you might have added/configured some new ones.
The publisher reads these subscriptions messages and stores the subscription in storage. NServiceBus does this for you, so there's no need to write code for this. The only thing you need is configuration in the subscriber as to where the (queue of the) publisher is.
I wrote a tutorial myself which you can find here : http://dennis.bloggingabout.net/2015/10/28/nservicebus-publish-subscribe-tutorial/
That being said, you should take special care related to issues regarding websites that publish messages. More information on that can be found here : http://docs.particular.net/nservicebus/hosting/publishing-from-web-applications
In a scale out situation with MSMQ, you can also use the distributor : http://docs.particular.net/nservicebus/scalability-and-ha/distributor/
As a final note: It depends on the situation, but I would not worry too much about knowing locations of endpoints (or their queues). I would most likely not use pub/sub just for this 'technical issue'. But again, it completely depends on the situation. I can understand that rich-clients which spawn randomly might want this. But there are other solutions as well, with a more centralized storage and an API that is accessed by all the rich clients.

Best way to display dynamic data in a webpage

My goal is to visualize the incoming data stream on a browser. I have used activemq to queue the stream. A single message consumed from the queue looks like this: "int,date/time,int,string". I have to update my line graph on the browser (every 100ms). Any ideas?
It sounds like a use case for WebSocket.
There are many ways to implement it, but a rather nice blog post on the topic is presented here.
Another way is to use MQTT directly from the browser using javascript and subscribe to a topic with your updates. You have to forward your data to that topic, in this case. For that, you can use composite queues with forwardOnly=false.
If you're using ActiveMQ, you could enable its websockets interface: http://activemq.apache.org/websockets.html
In your browser code, use the STOMP over WebSocket library to subscribe to the queue. http://jmesnil.net/stomp-websocket/doc/

ActiveMQ: Simple topic based cluster

Well, lets say I'm building ActiveMQ based chat application. It's pretty simple. Having only one QUEUE.IN and one TOPIC.OUT. All messages are simply routed right away from QUEUE.IN to TOPIC.OUT. Clients are producing their chat messages to QUEUE.IN and consuming from TOPIC.OUT. That's all.
Now, I wanna cluster it. Don't need something complex. Just run few other identical nodes (A..N). Basically, client, subscribed to A node, sends message to A.QUEUE.IN. This message must then appear on all other nodes (A..N).TOPIC.OUT. This could be easily done by simple camel route that re-route all messages comes to TOPIC.OUT to other nodes, but is there some nice ActiveMQ-native way to do so? Like some queue/topic shared among several AMQ instances?
I think you can find your answer here:
http://activemq.apache.org/how-do-distributed-queues-work.html
You can forward messages to multiple endpoints in activemq using virtual destinations.
http://activemq.apache.org/virtual-destinations.html