sharing a BlockingQueue in a storm spout - apache

I am programming a big data application in which two threads running concurrently. Thread A receives data from network and puts them as JSONOBJECT in a BlockingQueue. Thread B, a storm spout, then reads from the BlockingQueue and process them.
I pass the BlockingQueue object to the spout class in the class constructor. The problem I found is that the BlockingQueue in the spout is empty. Could you please let me know how can I solve this problem?

You start a storm application by running some class that builds and configures the topology as a set of objects and then submits that collection of objects (along with the jar file) to the Nimbus server. Some of those objects are instances of the spouts and bolts which are serialized as part of the topology submission. Each instance of the bolt and spout on the cluster is one of these deserialized objects. So all bolts and spouts are constructed when you first start the topology (usually on an edge node) and not on the cluster.
What this means to you is that any objects referenced by the spout during class initialization and object construction are serialized along with the spout instance. This would include the BlockingQueue. Your BlockingQueue is being serialized and distributed to the cluster and it sounds like it's not surviving the trip.
What you want to do is leave the blocking queue variable null in the constructor and instead set the variable in the open() method. When you create the actual queue object you might store it in a public static variable somewhere so that it's available to the spout's open() method.

Related

Queue declaration on all nodes in RabbitMQ

I have a Rabbitmq cluster setup (without HA). My consumers are spring applications and it provides a failover mechanism out of the box where it connects to the next available node.
Since the queues are not mirrored, is it okay if I declare the queues up front and when the first node goes down, the connection will be established to the second node. Does this make sense?
Another question, lets say I have a load balance on top of Rabbitmq cluster. My applications connect using the load balance. Will the queues be declared on all nodes or will it be declared on the node based on the routing strategy by LB.
For the first scenario, yes, the queues will be declared on the failover broker instance when the connection is established.
If you want to pre-declare on all nodes you will need a connection factory for each node, and a RabbitAdmin for each connection factory.
You will also need something to cause a connection to be opened on each (the RabbitAdmins register themselves as connection listeners).
You can do that by adding a bean the implements SmartLifecycle and call createConnection() on each connection factory.
You can also selectively declare elements. See Conditional Declaration.
By default, all queues, exchanges, and bindings are declared by all RabbitAdmin instances (that have auto-startup="true") in the application context.
Starting with the 1.2 release, it is possible to conditionally declare these elements. This is particularly useful when an application connects to multiple brokers and needs to specify with which broker(s) a particular element should be declared.

Why A single Jedis instance is not threadsafe?

https://github.com/xetorthio/jedis/wiki/Getting-started
using Jedis in a multithreaded environment
You shouldn't use the same instance from different threads because you'll have strange errors. And sometimes creating lots of Jedis instances is not good enough because it means lots of sockets and connections, which leads to strange errors as well.
A single Jedis instance is not threadsafe
! To avoid these problems, you should use JedisPool, which is a threadsafe pool of network connections. You can use the pool to reliably create several Jedis instances, given you return the Jedis instance to the pool when done. This way you can overcome those strange errors and achieve great performance.
=================================================
I want to know why? Can anyone help me please
A single Jedis instance is not threadsafe because it was implemented this way. That's the decision that the author of the library made.
You can check in the source code of BinaryJedis which is a super type of Jedis https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/jedis/BinaryJedis.java
For example these lines:
public Transaction multi() {
client.multi();
client.getOne(); // expected OK
transaction = new Transaction(client);
return transaction;
}
As you can see the transaction field is shared for all threads using Jedis instance and initialized in this method. Later this transaction can be used in other methods. Imagine two threads perform transactional operations at the same time. The result may be that a transaction created by one thread is unintentionally accessed by another thread. The transaction field in this case is shared state access to which is not synchronized. This makes Jedis non-threadsafe.
The reason why the author decided to make Jedis non-threadsafe and JedisPool threadsafe might be to provide flexibility for clients so that if you have a single-threaded environment you can use Jedis and get better performance or if you have a multithreaded environment you can use JedisPool and get thread safety.

Reuse WCF server instance between operations, without concurrency

How can I make the WCF server instance (the instance of the class in the .svc.cs / .svc.vb file) stay alive between requests?
It's a stateless, read-only type of service: I'm fine with different clients reusing the same instance. However, it's not thread-safe: I don't want two threads to execute a method on this instance concurrently.
Ideally, what I'm looking for is that WCF manages a "worker pool" of these instances. Say, 10. New request comes in: fetch an instance, handle the request. Request over, go back to the pool. Already 10 concurrent requests running? Pause the 11th until a new worker is free.
What I /don't/ want is per-client sessions. Startup for these instances is expensive, I don't want to do that every time a new client connects.
Another thing I don't want: dealing with this client-side. This is not the responsibility of the client, which should know nothing about the implementation of the server. And I can't always control that.
I'm getting a bit lost in unfamiliar terminology from the MSDN docs. I have a lot working, but this pool system I just can't seem to get right.
Do I have to create a static pool and manage it myself?
Thanks
PS: A source of confusion for me is that almost anything in this regard points toward the configuration of the bindings. Like basicHttp or wsHttp. But that doesn't sound right: this should be on a higher level, unrelated to the binding: this is about the worker managers. Or not?
In the event that you have a WCF service that centralizes business logic, provides/controls access to another “single” backend resource (e.g. data file, network socket) or otherwise contains some type of shared resource, then you most likely need to implement a singleton.
[ServiceBehavior(InstanceContextMode = InstanceContextMode.Single)]
In general, use a singleton object if it maps well to a natural singleton in the application domain. A singleton implies the singleton has some valuable state that you want to share across multiple clients. The problem is that when multiple clients connect to the singleton, they may all do so concurrently on multiple worker threads. The singleton must synchronize access to its state to avoid state corruption. This in turn means that only one client at a time can access the singleton. This may degrade responsiveness and availability to the point that the singleton is unusable as the system grows.
The singleton service is the ultimate shareable service, which has both pros(as indicated above) and cons (as implied in your question, you have to manage thread safety). When a service is configured as a singleton, all clients get connected to the same single well-known instance independently of each other, regardless of which endpoint of the service they connect to. The singleton service lives forever, and is only disposed of once the host shuts down. The singleton is created exactly once when the host is created.
https://msdn.microsoft.com/en-us/magazine/cc163590.aspx

How does WCF instance work

I am trying to understand how instances with WCF works. I have a WCF service which the InstanceContextMode set to PerCall (so for each call of every client a new instance will be created) and ConcurrencyMode set to Single (so the service instance is executing exactly one or no operation call at a time).
So with this I understand that when a client connects a new instance is created. But what happens when the client leaves the service. Does the instance die. The reason I ask is because I need to implement a ConcurrentQueue in the service. So a client will connect to the service and put loads of data to be processed and then leave the service. The workers will work of the queue. After the work is finished I need the instance to be destroyed.
Basically, learning from the "WCF Master Class" tought by Juval Lowy, per-call activation is the preferred choice for services that need to scale, i.e. that need to handle lots of concurrent requests.
Why?
With the per-call, each incoming request (up to a configurable limit) gets its own, fresh, isolated instance of the service class to handle the request. Instantiating a service class (a plain old .NET class) is not a big overhead - and the WCF runtime can easily manage 10, 20, 50 concurrently running service instances (if your server hardware can handle it). Since each request gets its own service instance, that instance just handles one thread at a time - and it's totally easy to program and maintain, no fussy locks and stuff needed to make it thread-safe.
Using a singleton service (InstanceContextMode=Single) is either a terrible bottleneck (if you have ConcurrencyMode=Single - then each request is serialized, handled one after another), or if you want decent performance, you need ConcurrencyMode=Multiple, but that means you have one instance of your service class handling multiple concurrent threads - and in that case, you as a programmer of that service class must make 100% sure that all your code, all your access to variables etc. is 100% thread-safe - and that's quite a task indeed! Only very few programmers really master this black art.
In my opinion, the overhead of creating service class instances in the per-call scenario is nothing compared to the requirements of creating a fully thread-safe implementation for a multi-threaded singleton WCF service class.
So in your concrete example with a central queue, I would:
create a simple WCF per-call service that gets called from your clients, and that only puts the message into the queue (in an appropriate fashion, e.g. possibly transforming the incoming data or something). This is a quick task, no big deal, no heavy processing of any kind - and thus your service class will be very easy, very straightforward, no big overhead to create those class instances at all
create a worker service (a Windows NT service or something) that then reads the queue and does the processing - this is basically totally independent of any WCF code - this is just doing dequeuing and processing
So what I'm saying is : try to separate the service call (that delivers the data) from having to build up a queue and do large and processing-intensive computation - split up the responsibilities: the WCF service should only receive the data and put it into a queue or database and then be done with it - and a second, separate process should do the processing/heavy-lifting. That keeps your WCF service lean'n'mean
Yes, per call means, you will have a new insance of the service per each connection, once you setup the instance context mode to percall and ConcurrencyMode to single, it will be single threaded per call. when the client leaves, done with the job, your instance will dispose. In this case, you want to becareful not to create your concurrentqueue multiple times, as far as i can imagine, you will need a single concurrentqueue? is that correct?
I would recommend you to use IntanceContextMode=Single and ConcurrencyMode to Mutli threaded. This scales better.if you use this scheme, you will have a single concurrent queue, and you can store all your items within that queue.
One small note, ConcurrentQueue, has a bug, you should be aware of, check the bug database.

RabbitMQ implementation of AMQP protocol

I have some problem so can you help me. Is instance of AmqpTemplate class from RabbitMQ ( implementation of AMQP protocol) thread safe. Can it be accessed from multiple threads?
Thanks
AmqpTemplate is the interface, and RabbitTemplate is the implementation, and I assume by "thread-safe" you mean that its send/receive/sendAndReceive methods may be used concurrently. If so, then YES. The only state it maintains within instance variables are "converter" strategies for the Message and MessageProperties as well as default Exchange, Queue, and Routing Key settings (which are not even used if you invoke the methods that take those as arguments instead), and all of those are typically configured one time initially (e.g. via dependency injection). The template does not maintain any non-local state for any particular operation at runtime. With AMQP, the "Channel" is the instance that can only be used by one thread at a time, and the RabbitTemplate manages that internally such that each operation is retrieving a Channel to use within the scope of that operation. Multiple concurrent operations therefore lead to multiple instances of Channel being used, but that is not something you need to be worried about as an end-user of the template.
Hope that helps.
-Mark