Best way to queue message in SQL Server with several writers and one reader - sql

I wish to create a queue where a lot of computers would be writing in but each computer will write only once in his entire life. What you think would be the best way to achieve that?
I have read about SQL Server queues, SQL Server tables used as queue or service broker infrastructure.
SQL Server table : pretty easy to create but I am afraid of the performance
Service broker : more complex infrastructure. It seems that you have to run a service on the sender and have a send queue which is useless in my case because because all of them only send one message in their entire life.
What solution would be the best in my case?

you don't have to create a service on each computer. Service Broker objects can be confined to one DB server. For example, if you have 100 computers that need to drop of a message, they will need a connection string to the database server and execute a stored procedure that would enqueue the said message.
that said, it seems like a Service Broker queue would be an overkill for this. A simple table would probably suffice, or even better an MSMSQ (which would eliminate the need to connect to a DB).

Our production code uses tables as queues. We don't really need the robustness of Service Broker, and all our code already connects to databases for other stuff anyway.
Our code doesn't need more than a few hundred transactions per second, and I've shown that our queue can achieve over 10k transactions per second, so I'm fairly happy with the performance.
Here's a great article describing how to design tables for use as queues: http://rusanu.com/2010/03/26/using-tables-as-queues/
I would not design your table without first giving it a read.
Our company is also contemplating an alternative queue strategy involving Redis that doesn't require disk access since we are considering a design that would require tens or hundreds of thousands of inserts a second, but don't necessarily care about losing the data in the event of a failure. I would also give those methods a consideration if you need the throughput.

Maybe the better way transform your whole system from "several writers and one reader" to "one writer and one reader"? I mean you may make some service (web or any other) who will receive requests to write and will be the only writer into your database. This is ordinary situation and has many standard solutions.

Related

Service to accept SQL queries and run in the background

Is there a service to accept large numbers of SQL queries and run them in the background with retires and logging?
I have multiple clients running large numbers of queries directly against a SQL Server database but because they’re only inserts it would be far more efficient to post the queries to some service which can run them offline in transactions freeing the clients from having to wait for the queries to finish and reducing the connections to the database.
Because the result isn’t needed by the application, I’d like to “fire and forget” the SQL statements knowing they’ll eventually complete, even if they need to retry due to timeouts or network issues.
Does such a service exist?
Does such a service exist?
There is not such a service out-of-the box. As suggested by Gordon Linhoff, you can SEND the batches into a Servcie Broker Queue, or INSERT them into regular Table, and have a background process run them.
In the case of Service Broker, the setup, programming, and troubledhooting is a bit trickier, but you get the Internal Activation to trigger a stored procedure you write when messages appear on the queue.
With a regular table you would just write a SQL Agent job (or similar) that runs in a loop and looks for new rows in the target table, runs the batches it finds, and deletes (or marks) the batches as complete. You don't get the low latency and automatic scale-out that Service Broker Activation provides, but it's much simpler to implement.

Gathering distributed data into central database

I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.

SQL Server, using a table as a queue

I'm using an SQL Server 2008 R2 as a queuing mechanism. I add items to the table, and an external service reads and processes these items. This works great, but is missing one thing - I need mechanism whereby I can attempt to select a single row from the table and, if there isn't one, block until there is (preferably for a specific period of time).
Can anyone advise on how I might achieve this?
The only way to achieve a non-pooling blocking dequeue is WAITFOR (RECEIVE). Which implies Service Broker queues, with all the added overhead.
If you're using ordinary tables as queues you will not be able to achieve non-polling blocking. You must poll the queue by asking for a dequeue operation, and if it returns nothing, sleep and try again later.
I'm afraid I'm going to disagree with Andomar here: while his answer works as a generic question 'are there any rows in the table?' when it comes to queueing, due to the busy nature of overlapping enqueue/dequeue, checking for rows like this is a (almost) guaranteed deadlock under load. When it comes to using tables as queue, one must always stick to the basic enqueue/dequeue operations and don't try fancy stuff.
"since SQL Server 2005 introduced the OUTPUT clause, using tables as queues is no longer a hard problem". A great post on how to do this.
http://rusanu.com/2010/03/26/using-tables-as-queues/
I need mechanism whereby I can attempt
to select a single row from the table
and, if there isn't one, block until
there is (preferably for a specific
period of time).
You can loop and check for new rows every second:
while not exists (select * from QueueTable)
begin
wait for delay '00:01'
end
Disclaimer: this is not code I would use for a production system, but it does what you ask.
The previous commenter that suggested using Service Broker likely had the best answer. Service Broker allows you to essentially block while waiting for more input.
If Service Broker is overkill, you should consider a different approach to your problem. Can you provide more details of what you're trying to do?
Let me share my experiences with you in this area, you may find it helpful.
My team first used MSMQ transactional queues that would feed our asynchronous services (be they IIS hosted or WAS). The biggest problem we encountered was MS DTC issues under heavy load, like 100+ messages/second load; all it took was one slow database operation somewhere to start causing timeout exceptions and MS DTC would bring the house down so to speak (transactions would actually become lost if things got bad enough), and although we're not 100% certain of the root cause to this day, we do suspect MS DTC in a clustered environment has some serious issues.
Because of this, we started looking into different solutions. Service Bus for Windows Server (the on-premise version of Azure Service Bus) looked promising, but it was non-transactional so didn't suit our requirements.
We finally decided on the roll-your-own approach, an approach suggested to us by the guys who built the Azure Service Bus, because of our transactional requirements. Essentially, we followed the Azure Worker Role model for a worker role that would be fed via some queue; a polling-blocking model.
Honestly, this has been far better for us than anything else we've used. The pseudocode for such a service is:
hasMsg = true
while(true)
if(!hasMsg)
sleep
msg = GetNextMessage
if(msg == null)
hasMsg = false
else
hasMsg = true
Process(msg);
We've found that CPU usage is significantly lower this way (lower than traditional WCF services).
The tricky part of course is handling transactions. If you'd like to have multiple instances of your service read from the queue, you'll need to employ read-past/updlock in your sql, and also have your .net service enlist in the transactions in a way that will roll-back should the service fail. in this case, you'll want to go with retry/poison queues as tables in addition to your regular queues.

Database good system decoupling point?

We have two systems where system A sends data to system B. It is a requirement that each system can run independently of the other and neither will blow up if the other is down. The question is what is the best way for system A to communicate with system B while meeting the decoupling requirement.
System B currently has a process that polls data in a db table and processes any new rows that have been inserted.
One proposed design is for system A to just insert data into system b's db table and have system B process the new rows by the existing process. Question is does this solution meet the requirement of decoupling the two systems? Is a database considered part of a system B which might become unavailable and cause system A to blow up?
Another solution is for system A to put data into an MQ queue and have a process that would read from MQ and then insert into system B's database. But is this just extra overhead? Ultimately is an MQ queue any more fault tolerant than a db table?
Generally speaking, database sharing is a close coupling and not to be preferred except possibly for speed purposes. Not only for availability purposes, but also because system A and B will be changed and upgraded at several points in their future, and should have minimal dependencies on each other - message passing is an obvious dependency, whereas shared databases tend to bite you (or your inheritors) on the posterior when least expected. If you go the database sharing route, at least make the sharing interface explicit with dedicated tables or views.
There are four common levels of integration:
Database sharing
File sharing
Remote procedure call
Message passing
which can be applied and combined in various situations, with different availability and maintainability. You have an excellent overview at the enterprise integration patterns site.
As with any central integration infrastructure, MQ should be hosted in an environment with great availability, full failover &c. There are other queue solutions which allow you to distribute the queue coordination.
Use Queues for communication. Do not "pass" data from System A to System B through the database. You're using the database as a giant, expensive, complex message queue.
Use a message queue as a message queue.
This is not "Extra" overhead. This is the best way to decouple systems. It's called Service Oriented Architecture (SOA) and using messages is absolutely central to the design.
An MQ queue is far simpler than a DB table.
Don't compare "fault tolerance" because an RDBMS uses huge (almost unimaginable) overheads to achieve a reasonable level of assurance that your transaction finished properly. Locking. Buffering. Write Queues. Storage Management. Etc. Etc.
A reliable message queue implementation uses some backing store to keep the queue's state. The overhead is much, much less than an RDBMS. The performance is much better. And it's much, much simpler to interact with.
In SQL Server I would do this through an SSIS package or a job (depending on the number of records and the complexity of what I was moving). Other databases also have ETL solutions. I like the ETL solution becasue I can keep logs of what was changed and what errors were processed, I can send records which for some reason won't go to the other system (data structures are rarely the same between two databases) to a holding table without killing the rest of the process. I can also make changes to the data as it flows to adjust for database differences (things like lookup table values, say the completed status in db1 is 5 and it is 7 in db2 or say db2 has a required field that db1 does not and you have to add a default value to the filed if it is null). If one or the other servver is down the job running the SSIS package will fail and neither system will be affected, so it keeps the datbases decoupled as using triggers or replication would not.

MSMQ v Database Table

An existing process changes the status field of a booking record in a table, in response to user input.
I have another process to write, that will run asynchronously for records with a particular status. It will read the table record, perform some operations (including calls to third party web services), and update the record's status field to indicate that processing is completed (or In Error, with an error count).
This operation sounds very similar to a queue. What are the benefits and tradeoffs of using MSMQ over a SQL Table in this situation, and why should I choose one over the other?
It is our software that is adding and updating records in the table.
It is a new piece of work (a Windows Service) that will be performing the asynchronous processing. This needs to be "always up".
There are several reasons, which were discussed on the Fog Creek forum here: http://discuss.fogcreek.com/joelonsoftware5/default.asp?cmd=show&ixPost=173704&ixReplies=5
The main benefit is that MSMQ can still be used when there is intermittant connectivity between computers (using a store and forward mechanism on the local machine). As far as the application is concerned it delivered the message to MSMQ, even though MSMQ will possibly deliver the message later.
You can only insert a record to a table when you can connect to the database.
A table approach is better when a workflow approach is required, and the process will move through various stages, and these stages need persisting in the DB.
If the rate at which booking records is created is low I would have the second process periodically check the table for new bookings.
Unless you are already using MSMQ, introducing it just gives you an extra platform component to support.
If the database is heavily loaded, or you get a lot of lock contention with two process reading and writing to the same region of the bookings table, then consider introducing MSMQ.
I also like this answer from le dorfier in the previous discussion:
I've used tables first, then refactor
to a full-fledged msg queue when (and
if) there's reason - which is trivial
if your design is reasonable.
Thanks, folks, for all the answers. Most helpful.
With MSMQ you can also offload the work to another server very easy by changing the location of the queue to another machine rather then the db server.
By the way, as of SQL Server 2005 there is built in queue in the DB. Its called SQL server Service Broker.
See : http://msdn.microsoft.com/en-us/library/ms345108.aspx
Also see previous discussion.
If you have MSMQ expertise, it's a good option. If you know databases but not MSMQ, ask yourself if you want to become expert in another technology; whether your application is a critical one; and which you'd rather debug when there's a problem.
I have recently been investigating this myself so wanted to mention my findings. The location of the Database in comparison to your application is a big factor on deciding which option is faster.
I tested inserting the time it took to insert 100 database entries versus logging the exact same data into a local MSMQ message. I then took the average of the results of performing this test several times.
What I found was that when the database is on the local network, inserting a row was up to 4 times faster than logging to an MSMQ.
When the database was being accessed over a decent internet connection, inserting a row into the database was up to 6 times slower than logging to an MSMQ.
So:
Local database - DB is faster, otherwise MSMQ is.
Instead of making raw MSMQ calls, it might be easier if you implement your sevice as a queued COM+ component and make queued function calls from your client application. In the end, the asynchronous service still uses MSMQ in the background, but your code will be much clearer and easier to use.
I would probably go with MSMQ, or ActiveMQ myself. I would suggest (presuming that you are considering MSMQ you are using windows, with MS technology) looking into WCF, or if you are using MS-SQL 2005+ having a trigger that calls into .net code to run your processing.
Service Broker was introduced in SQL 2005 and it is designed to be very quick at handling messages as the process is relatively simple (I believe its roots were in triggers). If you are concerned about scalability, in SQL 2008 they have released an independant processing executable to separate the processing from SQL Server (in standard Service Broker, everything is controlled by the SQL Server instances).
I would definitely consider using Service Broker over MSMQ but this is dependant on your SQL Development/DBA resources and their knowledge.
Besides of Mitch's answer, some other scenarios:
1. each of your message have its own due date to trigger the action, this can be done through MQ as well, but in this case I prefer to store it into db as it is more controllable;
2. subscriber needs to filter message and then process a portion of it, this can be done by LINQ too, depends on how complex the filter is, the db approach is better because I can use linq to EF do complex query easily;
3. For deployment, i want fully automated deployment process so that DB is a better choice for me. I am not a big fan of manual configurations.