Why use RavenDB with NServiceBus - ravendb

I watched the Introduction to NServiceBus on Pluralsight and the presenter is using both MSMQ and RavenDB to store messages. Why is this? What are the benefits of storing messages in a database as well, instead of just MSMQ? Searching I have found many results on how to use RavenDB with NServiceBus, but not why you should do it.

NServiceBus itself uses RavenDb for persistence, not as a transport. So, I cannot see why you say that RavenDb is used to "save messages". It might be that you have seen ServicePulse using RavenDb to store messages, more about this below.
There are three basic types of persistence needed for different parts of NServiceBus:
Saga data
Timeouts
Subscriptions
There are also two additional usages, that are not used as frequently:
Outbox
Gateway deduplication
All of them are described in the docs
You can use any supported persistence engine, including RavenDb, for those purposes. Since it is the document database, it is much easier to use for these purposes than, say, NHibernate persistence with a relational database.
I am not sure what exactly the video shows, but it might also be ServicePulse, which is a part of the Particular Platform but not the NServiceBus component itself. ServicePulse uses RavenDb for its own data storage (including messages).
RavenDb is not used as NServiceBus transport, at least not by Particular itself.

Related

Using ORM in dapr

Currently I am learning Dapr, and I find it powerful for micro services development, but after reading more, I found some limitations, the most important of them is "state management".
How can we use tools like EF Core or Dapper for interaction with databases in Dapr? Is DaprClient class the only way of interaction with databases?
You should continue to use those ORMs alongside Dapr as its not a replacement for them. ORMs already act like a translation layer (Repository layer) for relational stored data. State management is simple KVP storage with some minimal query ability. Think of your state as a Dictionary<string,object>, where that object can be anything simple or complex. this is well suited for technologies like Redis, DynamoDB, MongoDB, Cassandra etc. Again, what you would put in cache, is what you would put in state storage. So, you can still (and should) have an ORM in your service for relational data and pass in runtime configuration for provider, connection string etc, while being able to utilize all the other features of Dapr via DaprClient, via HTTPClient or via GRPC clients.
Bonus, with many providers in EFCore/EF you can add tracing headers (https://github.com/opentracing-contrib/csharp-netcore) that will seamlessly plug into the tracing provided for free with Dapr. That way you will see a full e2e trace all the way through your state and relational data

BigQuery distributed transactions

I'm trying to architect a microservice based system utilizing BigQuery as one of services. We need to preserve eventual consistency between BigQuery and other microservices, so that changes to BigQuery (data uploads, table creates, etc) were eventually propagated to other services.
I'm wondering if BigQuery has mechanisms, supporting this kind of consistency? As I checked, BigQuery does not support publishing its events to pub/sub, which would definitely solve a problem.
I'm thinking of utilizing labels for this. I hope updates of data and labels should be atomic in respect to one API call.
Something like keeping two labels with current version and committed version, and maybe uncommitted operation type. Mutation operation increases current version and queues task, publishing update to pub/sub, which on success updates committed version to match the current one. I though see a number of problems with this solution.
Basically, there is a broader question, of how APIs need to be designed to support eventual consistency with other systems, and if it is possible to use API not specially designed for this, in an eventually consistent distributed system.

Gathering distributed data into central database

I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.

How can I use NServiceBus with a database instead of MSMQ

Is it possible to use NServiceBus with a database as the queue storage instead of MSMQ? If so, how can I get started and what are the pros and cons of using a database instead of MSMQ?
If you want to use something other than MSMQ you'll have to plug in your own ITransport. I would take a look at the NSB Contrib project on GitHub, there is an implementation of of ITransport for the SQL Server Broker(messaging).
The cons I see for using a database includes cost and maintenance overhead. MSMQ comes with the OS for free and most admins have the skills to maintain it. Once you get in a DB, you have to pay for it and find someone to maintain it. This starts out ok, but once you get into multiple environments and things like clustering, licensing gets out of control.

Write-though caching of large data sets in WCF?

We've got a smart client that talks to a SQL Server database via WCF, displaying the entities in the database, and allowing the user to edit those entities.
Some of the WCF calls return a large data set. Since this data set doesn't change very often, I'm considering some sort of write-through cache on the client, and only getting the deltas from the WCF service.
That is: the client both reads from the service and writes to the service.
I'm not looking for disconnected/offline operation, but since the majority of the data doesn't change very often, I'd probably implement this with a local data store.
I don't want the local store to get too stale, and I don't think I'm too concerned about conflict resolution, because updates will always go straight to the WCF service -- think of it as a write-through cache.
Would Microsoft's Sync Framework be good for this? Could I use a local SQL-CE cache and perform the updates over WCF? The service end has a SQL Server 2005/2008 backend, but I don't want to talk to it directly. Does Sync Framework integrate well with WCF?
Are there other solutions out there? Should I roll something myself?
I don't think you have to couple it to WCF at all. FeedSync allows you to publish directly to an RSS feed.
The only that I'm not too sure about is if it would be suitable for a "large dataset" though. Since you don't need two way replication, if your dataset is extremely large, you might want to write your own WCF implementation to optimize it; especially for the initial population.