Write-though caching of large data sets in WCF? - wcf

We've got a smart client that talks to a SQL Server database via WCF, displaying the entities in the database, and allowing the user to edit those entities.
Some of the WCF calls return a large data set. Since this data set doesn't change very often, I'm considering some sort of write-through cache on the client, and only getting the deltas from the WCF service.
That is: the client both reads from the service and writes to the service.
I'm not looking for disconnected/offline operation, but since the majority of the data doesn't change very often, I'd probably implement this with a local data store.
I don't want the local store to get too stale, and I don't think I'm too concerned about conflict resolution, because updates will always go straight to the WCF service -- think of it as a write-through cache.
Would Microsoft's Sync Framework be good for this? Could I use a local SQL-CE cache and perform the updates over WCF? The service end has a SQL Server 2005/2008 backend, but I don't want to talk to it directly. Does Sync Framework integrate well with WCF?
Are there other solutions out there? Should I roll something myself?

I don't think you have to couple it to WCF at all. FeedSync allows you to publish directly to an RSS feed.
The only that I'm not too sure about is if it would be suitable for a "large dataset" though. Since you don't need two way replication, if your dataset is extremely large, you might want to write your own WCF implementation to optimize it; especially for the initial population.

Related

Microservices are compatible with existing SQL database?

I'm creating a microservice architecture with Core, rabbitMQ, strangler pattern ... but I have to use an existing SQL database (Transaction requeriment).
Doing a research I don't found a lot of information about how implement SQL database, but I think it's impossible to do a transactional operation on different services at the same time.
1- Every service must have access to entirely database?
2- Is a good idea do a service exclusive to do transactionals operations?
3- SQL with microservices it's maybe too much slow?
I don't know if exist a standard for this.
Thanks.
The whole point of microservices is about having small, independent services that are decoupled as much as possible.
Sharing a common database introduces very strong coupling, and is not recommended.
If two services need the same data, you could either (a) have a different database for each, and replicate the data, or (b) introduce a third service that is responsible for access to the database.
If you're looking for a bigger-scale distributed transaction across microservices, then you should look into things like sagas. Typically you'll have a coordinator ("process manager" in some literature) that tracks the various operations, and can compensate or cancel actions that have been performed if the transaction as a whole is bound to fail.
3- SQL with microservices it's maybe too much slow?
What makes you think so?
There is nothing about SQL that makes it inadequate for microservices. Microservices may vary wildly in terms of what they do and what they require. SQL will be perfectly suitable for some microservices, and possibly not so suitable for others. It depends on the service.
It look like you need a distributed transactions in your system
https://msdn.microsoft.com/en-us/library/windows/desktop/ms681205(v=vs.85).aspx
Also there is a nice book devoted to microservices. It includes distributed transactions and other patters used in microservice bases apps.
http://shop.oreilly.com/product/0636920033158.do
1- Every service must have access to entirely database?
No. A microservice has its own schema related to the Aggregate Root / Service that it offers. If a service needs data of another entity, it invokes the APIs provided by another micro service.
2- Is a good idea do a service exclusive to do transactionals
operations?
No. Each microservice is a transaction boundary in its own right. Distributed transactions, particularly using 2PC, do not perform particularly well.
3- SQL with microservices it's maybe too much slow?
I am not totally clear as to why you make such a statement.

Gathering distributed data into central database

I was assigned to update existing system of gathering data coming from points of sale and inserting it into central database. The one that is working now is based on FTP/SFTP transmission, where the information is sent once a day, usually at night. Unfortunately, because of unstable connection links (low quality 2G/3G modems), some of the files appear to be broken. With just a few shops connected that way everything was working smooth, but along with increasing number of shops, errors became more often. What is worse, the time needed to insert data into central database is taking up to 12 - 14h (including waiting for the data to be downloaded from all of the shops) and that cannot happen during the working day as it would block the process of creating sale reports and other activities with the database - so we are really tight with processing time here.
The idea my manager suggested is to send the data continuously, during the day. Data packages would be significantly smaller, so their transmission and insertion would be much faster, central server would contain actual (almost real time) data and night could be used for long running database activities like creating backups, rebuilding indexes etc.
After going through many websites, I found that:
using ASMX web service is now obsolete and WCF should be used instead
WCF with MSMQ or System Messaging could be used to safely transmit data, where I don't have to care that much about acknowledging delivery of data, consistency, nodes going offline etc.
according to http://blogs.msdn.com/b/motleyqueue/archive/2007/09/22/system-messaging-versus-wcf-queuing.aspx WCF queuing is better
there are also other technologies for implementing message queue, like RabbitMQ, ZeroMQ etc.
And that is where I become confused. With so many options, do you have any pros and cons of these technologies?
We were using .NET with Windows Forms and SQL Server, but if it would be necessary, we could change to something more suitable. I am also a bit afraid of server efficiency. After some calculations, server would be receiving about 15 packages of data per second (peak). Is it much? I know there are many websites without serious server infrastructure, that handle hundreds of visitors online and still run smooth, but the website mainly uploads data to the client, and here we would download it from the client.
I also found somewhat similar SO question: Middleware to build data-gathering and monitoring for a distributed system
where DDS was mentioned. What do you think about introducing some middleware servers that would cope with low quality links to points of sale, so the main server would not be clogged with 1KB/s transmission?
I'd be grateful with all your help. Thank you in advance!
Rabbitmq can easily cope with thousands of 1kb messages per second.
As your use case is not about processing real time data, I'd say you should combine few messages and send them as a batch. That would be good enough in order to spread load over the day.
As the motivation here is not to process the data in real time, then any transport layer would do the job. Even ftp/sftp. As rabbitmq will work fine here, it's not the typical use case for it.
As you mentioned that one of your concerns is slow/unreliable network, I'd suggest to compress the files before sending them, and on the receiving end, immediately verify their integrity. Rsync or similar will probably do great job in doing that.
From what I understand, you have basically two problems:
Potential for loss/corruption of call data
Database write performance
The potential for loss/corruption of call data is being caused by a lack of reliability in the transmission of data from client to service.
And it's not clear what is causing the database contention/performance issues, beyond a vague reference to high volumes, so this answer will be more geared towards solving the first problem.
You have correctly identified the need for reliable asynchronous communication transport as a way to address the reliability issues in your current setup.
Looking at MSMQ to deliver this is a valid first step. MSMQ provides reliable communication via a store and forward messaging semantic which comes out of the box and requires very little in the way of configuration.
Unfortunately, while suitable for your needs, MSMQ relies on 2 things:
A reliable network protocol, and
A client service running on both sending and receiving machine.
From your description above, I don't believe 1 exists (the internet is not a reliable network), and you might well struggle with 2 - MSMQ only ships with Windows Server or business/enterprise versions of Windows on the desktop.(*see below...)
As a possible solution to the network reliability problem, you could use a WCF or a RESTful endpoint (using Nancy or WebApi) to expose a service operation(s) exposed over HTTP, which would accept the incoming calls from the client machines. These technologies are quite different, so you'll need to make sure you're making the correct choice early on.
WCF supports WS-ReliableMessaging from the SOAP 1.2 specification out of the box, which allows for reliable web service calls over http, however it's very config-heavy and not generally a nice framework to work with.
REST much simpler than WCF in .Net, is very lightweight and easy to use. However, for reliable delivery you would have to expose some kind of GET operation (in addition to a POST to allow the client to send data) to be called (within a reasonable time-frame) to verify the data was committed. The client would have to implement some kind of retry semantic if the result of the GET "acknowledgement" was negative.
Despite requiring two operations rather than one for the WCF route, I would favour the REST approach. I've done plenty of both and find REST services way nicer to work with.
(*) That's not to say that MSMQ wouldn't work in your ultimate solution, just that it would not be used to address the transmission reliability issue. However it could still be used to address another of your problems, that of database write contention. If you were to queue incoming requests once they came into the server, then these could be processed by an "offline" process, which could then perform the required database operations in a reliable manner. This could be done by using MSMQ transactional queues.
In response to comments:
99% messages are passed from shop to main server, but if some change
is needed (price correction, discounts etc.), that data has to be sent
to shop.
This kind of changes things. Had I understood from the beginning that you had a bidirectional requirement, and seeing as how you have managed to establish msmq communication, I would have nudged you towards NServiceBus, which is a really, really cool wrapper around MSMQ. The reason I would have done this is that you appear to have both a one way, and a publish-subscribe requirement, which is supported really nicely by NServiceBus.

In 2012, is MSMQ still valid for queueing calls in WCF?

I have created a WCF web service that will upload data from SQL Server to our ISeries. When an end user is finished with their data entry (a batch), they will "send" the batch number to the web service. The web service will then upload that data to the ISeries. It cannot be assumed that this will be a quick process and there may be many end users hitting the web service at once. Likewise, because the way the database is setup on the ISeries, I can't upload this data simultaneously because we may run into locks, misfired triggers, etc. So, I want somehow to queue these calls so that they are done in order received.
I have been searching for methods to do this and there's a lot of information in 2011 and earlier discussing MSMQ. Is that the still preferred way to do this? Would Reentrant Concurrency mode be a more "modern" option?
There are a lot of alternative queuing systems. Since you have a SQL Server in place I would recommend using MSMQ. In this combination you can use TransactionScope out-of-the-box to handle transcations spanning your DB and the Queuing system.
From my own experience, MSMQ is a proven and stable technology stack.

WCF: sharing cached data across multiple services

We are developing a project that involves about 10 different WCF services with several endpoints each. One of the services keeps a few big tables of data cached in memory.
We have found we need access to that data from another service. Rather than keeping 2 copies of the cache, I'd like to be able to share those tables across all services.
I have done some research and found some articles about using an IExtension attached to the servicehosts to store the shared data.
Provided that all the services are running under the same web site, will that work? And is it the right approach? Or should I be looking elsewhere?
If the data that you're caching is required by more than one service, it sounds like - from a Service Oriented Architecture perspective, anyway - that it doesn't belong in either of services you have calling it.
If the data being cached isn't really related to either service, but is something that both services need, then perhaps it belongs in it's own seperate service. Have you considered encapsulating your cache in a third service, and performing a service-to-service call to retrieve the data you need? Benefits include...
It solves your original dilemma, avoiding the need to read the whole cache from the database several times;
It encapsulates the cache in one place for easy maintainance/change later.
It allows you to abstract the implementation of the cache away from the other services by putting another service interface in the way.
All in all, I'd suggest that's the best approach. The only downside is the extra overhead of making the service-to-service call, but that surely outperforms having to read the whole cache from the database.
Alternatively, if the data in your cache is very closely related to BOTH of the services that are calling the cache, i.e. both services add/change the data in the cache, etc. then perhaps the two existing services should be combined into a single service.
If what I'm saying is making some sense, then then principle of SOA I'm drawing on is Service Autonomy.
Provided all your services are part of the same application there doesn't seem to be any reason why you can't share the cache directly via a shared object reference. The simplest way of doing this is via a static field.
If you choose this approach, one thing to be very careful about is thread safety. If your cache is concurrently accessed via two WCF sessions, you must ensure that the two sessions are not going to interfere with each other by both changing the cache at the same time. If the cache is read-only, your need to do this is lessened, but you still might need to synchronrise initialisation of the cache.

WCF/Silverlight/SQL DB Caching Strategies

Ok, I have a pretty complex silverlight app that gets its data from a WCF service (asp.net hosted service layer) which in turn calls into a data layer that calls stored procedures in a SQL 2005 DB to extract the needed data. So the round trip goes like this:
Silverlight App --> WCF Service --> Data Layer --> DB --> Data Layer --> WCF Service transforms Data Entity into corresponding DTO (Data Transfer Object) or List<> thereof --> Silverlight App
Much of the data is highly relational (so it needs to exist in the DB), but it will change infrequently. It seems that I have several choices of locations to cache this "semi-constant" data:
I can cache it in the data layer. My data layer is already set up to use the SQLDependency class and cache the results from a stored procedure call. I think that this is or can be an application level cache.
I can cache the resulting DTO in an application level (or session level depending on the call) cache within the WCF service itself.
2(a) I could even take this a step further by serializing the XML for the resulting DTO(s) into a file on the WCF service side so that I could (a) check memory cache, then (b) check file cache and (c) hit the data layer
I could do something similar to 2(a) with isolated storage on the client side within the SL app. I could serialize the data to the local isolated storage with a hash (or a moddate or something) and then just make a call to check that.
One more thing to add: I am hosting this WCF service in IIS7 with dynamic compression turned on so that the (often very large and easily compressed) XML response gets gzip-ed. Ideally, it would seem, I would like IIS to cache this gzip-ed result to avoid all the extra processing. I think that it may do this already but I am not sure.
I am pretty sure that the final answer to this is some flavor of "it depends", but I would love to hear how others are approaching this. A good tactical recipe of Do X, Test Performance with tool Y, the do Z if needed would be great to have.
A few links (I will add to this as I research this):
WCF Caching Approach
If you have data that are user that will change quite rarely and need fast response, going for a custom mechanism bases on local storage is a great advantage quite faster than having to wait for a server roundtrip.
Dino Sposito published an interesting article about local storage and caching on MSDN Magazine there you can find as well an approach to catch assemblies (imagine just loading the minimum package required and just go loadin the rest of assemblies in background, ... performance rocket, more complexity on your code :)).
As you said is matter to go putting in a balance and decide.
HTH
Braulio
My approach would be this:
Determine if there is actually a problem with performance (isn't it alreade acceptable to my users?)
Measure the performance at each teir (how long does it take the database to come up with data? how long does it take the service to respond with data? how much time does it take from the service to the client?)
Based on the measurements I would then determine where to do my caching. Remember that, the closer to your data storage you do caching, the easier it is, but the closer to the client you do caching, the better the performance gain (usually).
Also remember that caching should not be the first thing to do to improve performance. You should also look into other performance gains as well. Are the stored procedures slow? Is there a lot of overhead in the WCF messages? Is there some inefficient processing in the service? Do I realy need all that data in one message?
HTH,
Jonathan
I think #2 is your best bet for maintainability and architecture. IIS provides caching, why not use it?
You don't want to have to reference System.Web from a data layer. Client side is not the best option either, because you'd have to write a bunch of additional code to keep the data synchronized.
Is System.Web caching even available to WCF when it's not running in ASP.NET compatible mode? Probably best not to depend on it and write your own.
On the other hand, look into Microsoft's Velocity project, which looks like it will produce a very interesting caching technology not dependant on ASP.NET.
We just recently implemented #3, the client-side caching using Isolated Storage.
In our app we have lot of drop downs and custom fields which the app used to get from the server every time it loads. Moving these data to IS really helped. The app now makes a call to check if there were any changes on the server, and if not - loads the data from the IS, otherwise ( which is pretty rare ) refreshes IS.
That eliminated a lot of WCF calls and data transfers, the SL pages' loading time is shorter, and the app in general became more scalable because of the reduced network traffic and db access.
Yes, there are some coding involved, but the benefits for the end users are essential.
Andrew
If you use RIA Services, then a simple approach is to have two separate edmx definitions. One for cached entities, one for transactional ones.
One domain context can reference the entities on another domaincontext via AddReference see.
The cached entities could be loaded immediately after user has authenticated. For simplicity, transactional data should not load until cached entities have loaded.
Depending on the size of the cache, you may also wish to consider serializing these values to local storage.