Service Broker Design - sql-server-2005

I’m looking to introduce SS Service Broker,
I have a remote orders database and a local processing database, all activity on the processing database has to happen in sequence, this seems a perfect job for Service Broker!
I’ve set up the infrastructure, I can send and receive messages and now I’m looking at the design of the processing. As I said all processes for one order need to be completed in sequence so I’ll put them in one conversation.
One of these processes is a request for external flat file data, we then wait (could be several days) and then import and process this file when it returns. How can I process half the tasks, then wait for the flat file to return before processing the other half.
I’ve had some ideas but I’m sure I’m missing a trick somewhere
1) Write all queue items to a status table and use status values – seems to remove some of the flexibility of SSSB and add another layer of tasks
2) Keep the transaction open until we get the data back – not ideal
3) Have the flat file import task continually polling for the file to appear – this seems inefficient
What is the most efficient way of managing this workflow?
thanks in advance

In my opinion it is like chain of responsibility. As far as i can understand we have the following workflow.
1.) Process for message.
2.) Wait for external file, now this can be a busy wait or if external data provides you a notification then we can actually do it in non-polling manner.
3.) Once data is received then process the data.
So my suggestion would be to use 3 different Queues one for each part, when one is done it will forward or put a new message in chained queue.
I am assuming, one order processing will not disrupt another order processing.
I am thinking MSMQ with Windows Sequential Work flow, might also be a candidate for this task.

Related

RabbitMQ - allow only one process per user

To keep it short, here is a simplified situation:
I need to implement a queue for background processing of imported data files. I want to dedicate a number of consumers for this specific task (let's say 10) so that multiple users can be processed at in parallel. At the same time, to avoid problems with concurrent data writes, I need to make sure that no one user is processed in multiple consumers at the same time, basically all files of a single user should be processed sequentially.
Current solution (but it does not feel right):
Have 1 queue where all import tasks are published (file_queue_main)
Have 10 queues for file processing (file_processing_n)
Have 1 result queue (file_results_queue)
Have a manager process (in this case in node.js) which consumes messages from file_queue_main one by one and decides to which file_processing queue to distribute that message. Basically keeps track of in which file_processing queues the current user is being processed.
Here is a little animation of my current solution and expected behaviour:
Is RabbitMQ even the tool for the job? For some reason, it feels like some sort of an anti-pattern. Appreciate any help!
The part about this that doesn't "feel right" to me is the manager process. It has to know the current state of each consumer, and it also has to stop and wait if all processors are working on other users. Ideally, you'd prefer to keep each process ignorant of the others. You're also getting very little benefit out of your processing queues, which are only used when a processor is already working on a message from the same user.
Ultimately, the best solution here is going to depend on exactly what your expected usage is and how likely it is that the next message is from a user that is already being processed. If you're expecting most of your messages coming in at any one time to be from 10 users or fewer, what you have might be fine. If you're expecting to be processing messages from many different users with only the occasional duplicate, your processing queues are going to be empty much of the time and you've created a lot of unnecessary complexity.
Other things you could do here:
Have all consumers pull from the same queue and use some sort of distributed locking to prevent collisions. If a consumer gets a message from a user that's already being worked on, requeue it and move on.
Set up your queue routing so that messages from the same user will always go to the same consumer. The downside is that if you don't spread the traffic out evenly, you could have some consumers backed up while others sit idle.
Also, if you're getting a lot of messages in from the same user at once that must be processed sequentially, I would question if they should be separate messages at all. Why not send a single message with a list of things to be processed? Much of the benefit of event queues comes from being able to treat each event as a discrete item that can be processed individually.
If the user has a unique ID, or the file being worked on has a unique ID then hash the ID to get the processing queue to enter. That way you will always have the same user / file task queued on the same processing queue.
I am not sure how this will affect queue length for the processing queues.

Question about moving events from redis to kafka

I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.

Bigquery streaming inserts taking time

During load testing of our module we found that bigquery insert calls are taking time (3-4 s). I am not sure if this is ok. We are using java biguqery client libarary and on an average we push 500 records per api call. We are expecting a million records per second traffic to our module so bigquery inserts are bottleneck to handle this traffic. Currently it is taking hours to push data.
Let me know if we need more info regarding code or scenario or anything.
Thanks
Pankaj
Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.
We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.
We seen several side effects although:
the request randomly fails with type 'Backend error'
the request randomly fails with type 'Connection error'
the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.
For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.
The approach you've chosen if takes hours that means it does not scale, and won't scale. You need to rethink the approach with async processes. In order to finish sooner, you need to run in parallel multiple workers, the streaming performance will be the same. Just having 10 workers in parallel it means time will be 10 times less.
Processing in background IO bound or cpu bound tasks is now a common practice in most web applications. There's plenty of software to help build background jobs, some based on a messaging system like Beanstalkd.
Basically, you needed to distribute insert jobs across a closed network, to prioritize them, and consume(run) them. Well, that's exactly what Beanstalkd provides.
Beanstalkd gives the possibility to organize jobs in tubes, each tube corresponding to a job type.
You need an API/producer which can put jobs on a tube, let's say a json representation of the row. This was a killer feature for our use case. So we have an API which gets the rows, and places them on tube, this takes just a few milliseconds, so you could achieve fast response time.
On the other part, you have now a bunch of jobs on some tubes. You need an agent. An agent/consumer can reserve a job.
It helps you also with job management and retries: When a job is successfully processed, a consumer can delete the job from the tube. In the case of failure, the consumer can bury the job. This job will not be pushed back to the tube, but will be available for further inspection.
A consumer can release a job, Beanstalkd will push this job back in the tube, and make it available for another client.
Beanstalkd clients can be found in most common languages, a web interface can be useful for debugging.

Read Tibco messages from a VB.Net application

I am new to the world of Tibco... I have been asked to create an VB.net application to do couple of things:
Update the value of a column in a database (which then generates a message in TIBCO EMS).
My application then needs to read this message from TIBCO and determine if the message has a particular word in it, and display the result as Pass or Fail
I have already written the first piece of the task, however, I have no clue on how to proceed on the second one. I am hoping to get some kind of help/guidance on how to proceed! Any suggestions?
Thanks,
NewTibcoUser
This can be done easily depending on which Tibco Tools you own. If you have BW and ADB (Active Database Adapter) then you can use that.
option 1:
If you don't have adb you can mimic it by doing something like the following (ADB isn't magical its pretty strait forward)
1) Create a Mirror of the table that is being monitored for changes (You could just put in the column you want to monitor plus the key)
Key
ColumnYouWantToMonitor
DeliveryStatus (Adb_L_DeliverStatus)
Transaction type (adb_opCode)
Time It happened (Adb_timestamp)
Delivery Status (ADB_L_DeliveryStatus)
2) Create a trigger on the table That inserts a record into the table.
3) Write a .Net Process that monitors the table every 5 seconds or 10 or whatever (Make it configurable) (select * from tableX where DeliveryStatus = 'N' order by transactionTime)
4) Place the message on the EMS Queue or do a service call to you .Net App.
Option 2
1) Create a trigger on the table and write the event to a SQL Server Brokering Service Queue
2) Write a .Net app that reads from that SSBS queue and converts it into a EMS Message
some design considerations
Try not to continually query (Aka poll) for changes on your main table (prevent blocking)
If your app is not running and DB changes are happening ensure that you have a message expire time. So when your app starts it doesn't have to process 1000's of messages off the queue (Depending if you need the message or not)
If you do need the messages you may want to set the Queue to be persistent to disk so you don't loose messages. Also Client acknowledgement in your .Net app would be a good idea not just auto ack.
As you mention, the first point is already done (Perhaps with ADB or a custom program reacting to the DB insert).
So, your problem is strictly the "React to content of an EMS message from VB.Net" part.
I see two possibilities :
1- If you have EMS, ADB and BW, make a custom Adapter subscriber (a BW config) to change the DB in some way in reaction to messages on the bus. Your VB application can then simply query the DB to get the response status.
2- If you don't have so many products from the TIBCO stack, then you should make a simple C# EMS client program (see examples provided within EMS docs). This client can then signal you VB application (some kind of .Net internal signaling maybe, I am not an expert myself) or write the response status in DB.

Worker process behind web-service... what ingredients to use

I have the following recipe: a web-service (SOAP) needs to be able to receive a lot of requests in a short time. They should be queued for asynchronous processing. So in the background, there should be a worker that takes the requests from the queue and does a job on them. Some of the jobs may even encounter unavailable (third party) resources, in which case the job should be retried later.
The question I have is: what are my ingredients? WCF, MSMQ, WAS? What is the basic structure of setting this up?
I don't think it's important whether you'll store them, in MSMQ or in SQL or somewhere else - any backstore you choose will require an additional service to dequeue and process the data. A SQL database could have some advantages over pure MSMQ, for example you could store some additional information with your data and then retrieve some statistics over time, how many requests were processed and what was their internal structure. This could help you in future to further tune the processing pipeline.