WCF/Silverlight/SQL DB Caching Strategies - wcf

Ok, I have a pretty complex silverlight app that gets its data from a WCF service (asp.net hosted service layer) which in turn calls into a data layer that calls stored procedures in a SQL 2005 DB to extract the needed data. So the round trip goes like this:
Silverlight App --> WCF Service --> Data Layer --> DB --> Data Layer --> WCF Service transforms Data Entity into corresponding DTO (Data Transfer Object) or List<> thereof --> Silverlight App
Much of the data is highly relational (so it needs to exist in the DB), but it will change infrequently. It seems that I have several choices of locations to cache this "semi-constant" data:
I can cache it in the data layer. My data layer is already set up to use the SQLDependency class and cache the results from a stored procedure call. I think that this is or can be an application level cache.
I can cache the resulting DTO in an application level (or session level depending on the call) cache within the WCF service itself.
2(a) I could even take this a step further by serializing the XML for the resulting DTO(s) into a file on the WCF service side so that I could (a) check memory cache, then (b) check file cache and (c) hit the data layer
I could do something similar to 2(a) with isolated storage on the client side within the SL app. I could serialize the data to the local isolated storage with a hash (or a moddate or something) and then just make a call to check that.
One more thing to add: I am hosting this WCF service in IIS7 with dynamic compression turned on so that the (often very large and easily compressed) XML response gets gzip-ed. Ideally, it would seem, I would like IIS to cache this gzip-ed result to avoid all the extra processing. I think that it may do this already but I am not sure.
I am pretty sure that the final answer to this is some flavor of "it depends", but I would love to hear how others are approaching this. A good tactical recipe of Do X, Test Performance with tool Y, the do Z if needed would be great to have.
A few links (I will add to this as I research this):
WCF Caching Approach

If you have data that are user that will change quite rarely and need fast response, going for a custom mechanism bases on local storage is a great advantage quite faster than having to wait for a server roundtrip.
Dino Sposito published an interesting article about local storage and caching on MSDN Magazine there you can find as well an approach to catch assemblies (imagine just loading the minimum package required and just go loadin the rest of assemblies in background, ... performance rocket, more complexity on your code :)).
As you said is matter to go putting in a balance and decide.
HTH
Braulio

My approach would be this:
Determine if there is actually a problem with performance (isn't it alreade acceptable to my users?)
Measure the performance at each teir (how long does it take the database to come up with data? how long does it take the service to respond with data? how much time does it take from the service to the client?)
Based on the measurements I would then determine where to do my caching. Remember that, the closer to your data storage you do caching, the easier it is, but the closer to the client you do caching, the better the performance gain (usually).
Also remember that caching should not be the first thing to do to improve performance. You should also look into other performance gains as well. Are the stored procedures slow? Is there a lot of overhead in the WCF messages? Is there some inefficient processing in the service? Do I realy need all that data in one message?
HTH,
Jonathan

I think #2 is your best bet for maintainability and architecture. IIS provides caching, why not use it?
You don't want to have to reference System.Web from a data layer. Client side is not the best option either, because you'd have to write a bunch of additional code to keep the data synchronized.

Is System.Web caching even available to WCF when it's not running in ASP.NET compatible mode? Probably best not to depend on it and write your own.
On the other hand, look into Microsoft's Velocity project, which looks like it will produce a very interesting caching technology not dependant on ASP.NET.

We just recently implemented #3, the client-side caching using Isolated Storage.
In our app we have lot of drop downs and custom fields which the app used to get from the server every time it loads. Moving these data to IS really helped. The app now makes a call to check if there were any changes on the server, and if not - loads the data from the IS, otherwise ( which is pretty rare ) refreshes IS.
That eliminated a lot of WCF calls and data transfers, the SL pages' loading time is shorter, and the app in general became more scalable because of the reduced network traffic and db access.
Yes, there are some coding involved, but the benefits for the end users are essential.
Andrew

If you use RIA Services, then a simple approach is to have two separate edmx definitions. One for cached entities, one for transactional ones.
One domain context can reference the entities on another domaincontext via AddReference see.
The cached entities could be loaded immediately after user has authenticated. For simplicity, transactional data should not load until cached entities have loaded.
Depending on the size of the cache, you may also wish to consider serializing these values to local storage.

Related

How can we setup DB and ORM for the absence of Data Consistency requierement?

Imagine we have a web-site which sends write and read requests into some DB via Hibernate. I use Java, but it doesn't matter for this question.
Usually we want to read the fresh data from DB. But I want to introduce some delay between the written data becomes visible to reads just to increase the performance. I.e. I dont need to "publish" the rows inserted into DB immediately. Its OK for me to "publish" fresh data after some delay.
How can I achieve it?
As far as I understand this can be set up on several different tiers of my system.
I can cache some requests in front-end. Probably I should set up proxy server for this. But this will work only if all the parameters of the query match.
I can cache the read requests in Hibernate. OK, but can I specify or estimate the average time the read query will return stale data after some fresh insert occurred? In other words how can I control the delay time between fresh data becomes visible to the users?
Or may be I should use something like a memcached system instead of Hibernate cache?
Probably I can set something in DB. I dont know what should I do with DB. Probably I can ease the isolation level to burst the performance of my DB.
So, which way is the best one?
And the main question, of course: does the relaxation of requirements I introduce here may REALLY help to increase the performance of my system?
If I am reading your architecture correct you have client -> server -> database server
Answers to each point
This will put the burden on the client to implement the caching if you only use your own client I would go for this method. It will have the side effect of improving client performance possibly and put less load on the server and database server so they will scale better.
Now caching on the server will improve scalability of the database server and possibly performance in the client but will put a memory burden on the server. This would be my second option
Implement something in the database. At this point what are you gaining? the database server still has to do work to determine what rows to send back. And also you will get no scalability benefits.
So to sum up I would cache at the client first if you can if not cache at the server. Leave the DB out of the loop.
To answer your main question - caching is one of the most effective ways of increasing both performance and scalability of web applications which are constrained by database performance - your application may or may not fall into this category.
In general, I'd recommend setting up a load testing rig, and measure the various parts of your app to identify the bottleneck before starting to optimize.
The most effective cache is one outside your system - a CDN or the user's browser. Read up on browser caching, and see if there's anything you can cache locally. Browsers have caching built in as a standard feature - you control them via HTTP headers. These caches are very effective, because they stop requests even reaching your infrastructure; they are very efficient for static web assets like images, javascript files or stylesheets. I'd consider a proxy server to be in the same category. The major drawback is that it's hard to manage this cache - once you've said to the browser "cache this for 2 weeks", refreshing it is hard.
The next most effective caching layer is to cache (parts of) web pages on your application server. If you can do this, you avoid both the cost of rendering the page, and the cost of retrieving data from the database. Different web frameworks have different solutions for this.
Next, you can cache at the ORM level. Hibernate has a pretty robust implementation, and it provides a lot of granularity in your cache strategies. This article shows a sample implementation, including how to control the expiration time. You get a lot of control over caching here - you can specify the behaviour at the table level, so you can cache "lookup" data for days, and "transaction" data for seconds.
The database already implements a cache "under the hood" - it will load frequently used data into memory, for instance. In some applications, you can further improve the database performance by "de-normalizing" complex data - so the import routine might turn a complex data structure into a simple one. This does trade of data consistency and maintainability against performance.

How can I handle 200K request per sec in wcf

I need to design a system that can handle 200K request per second in each machine over HTTP.
The wcf service need to be hosted under win service.
I wonder if wcf can handle such a requirement?
What is the best system setup/ best configuration?
The machine itself is pretty heavy 32G RAM and 8 core (or more), and can be upgraded if needed
Can I handle such amount of request in each single machine with wcf using http?
Doing this on a single machine is likely to be pretty tough (if indeed it's possible). It would be better to make your system scale horizontally, so you can add lots of machines as required. How you do that will depend on what your system actually needs to do. If it's some simple calculation which requires no persisted state, it shouldn't be too hard. If you've got some interaction with storage of some form which really needs to be read/written on each request, it'll be a lot harder - and choosing your persistence technology is likely to be pretty key to making it all hang together.
Note that there are other benefits to scaling horizontally too - in particular, the ability to upgrade the system without any downtime (if you're careful) and removing a huge single point of failure.
You need to give some more info on this.
Do you get the request and have to process it immediately?
Can you store the request data and delegate the processing to some other thread/process? Is there any way to scale the system out instead of up?
Is this in fact the only piece of infrastructure you can deploy stuff to?
I would start by asking what is it that I want to do during request handling. then what the bottlenecks are going to be.

Write-though caching of large data sets in WCF?

We've got a smart client that talks to a SQL Server database via WCF, displaying the entities in the database, and allowing the user to edit those entities.
Some of the WCF calls return a large data set. Since this data set doesn't change very often, I'm considering some sort of write-through cache on the client, and only getting the deltas from the WCF service.
That is: the client both reads from the service and writes to the service.
I'm not looking for disconnected/offline operation, but since the majority of the data doesn't change very often, I'd probably implement this with a local data store.
I don't want the local store to get too stale, and I don't think I'm too concerned about conflict resolution, because updates will always go straight to the WCF service -- think of it as a write-through cache.
Would Microsoft's Sync Framework be good for this? Could I use a local SQL-CE cache and perform the updates over WCF? The service end has a SQL Server 2005/2008 backend, but I don't want to talk to it directly. Does Sync Framework integrate well with WCF?
Are there other solutions out there? Should I roll something myself?
I don't think you have to couple it to WCF at all. FeedSync allows you to publish directly to an RSS feed.
The only that I'm not too sure about is if it would be suitable for a "large dataset" though. Since you don't need two way replication, if your dataset is extremely large, you might want to write your own WCF implementation to optimize it; especially for the initial population.

SQL Server 2005, Caches and all that jazz

Background to question: I'm looking to implement a caching system for my website. Currently we're exploring memcache as a means of doing this. However, I am looking to see if something similar exists for SQL Server. I understand that MySQL has query cache which although is not distributed works as a sort of 'stop gap' measure. Is MySQL query cache equivalent to the buffer cache in SQL Server?
So here are my questions:
Is there a way to know is currently stored in the buffer cache?
Follow up to this, is there a way to force certain tables or result sets into the cache
How much control do I have over what goes on in the buffer and procedure cache? I understand there used to be a DBCC PINTABLE command but that has since been discontinued.
Slightly off topic: Should the caching even exists on the database layer? Or it is more prudent to manage caches using Velocity/Memcache? Is so, why? It seems like cache invalidation is something of a pain when handling many objects with overlapping triggers.
Thanks!
SQL Server implements a buffer pool same way every database product under the sun does (more or less) since System R showed the way. The gory details are explain in Transaction Processing: Concepts and Techniques. I addition it has a caching framework used by the procedure cache, permission token cache and many many other caching classes. This framework is best described in Clock Hands - what are they for.
But this is not the kind of caching applications are usually interested in. The internal database cache is perfect for scale-up scenarios where a more powerfull back end database is able to respond faster to more queries by using these caches, but the modern application stack tends to scale out the web servers and the real problem is caching the results of query interogations in a cache used by the web farm. Ideally, this cache should be shared and distributed. Memcached and Velocity are examples of such application caching infrastructure. Memcache has a long history by now, its uses and shortcommings are understood, there is significant know-how around how to use it, deploy it, manage it and monitor it.
The biggest problem with caching in the application layer, and specially with distributed caching, is cache invalidation. How to detect the changes that occur in the back end data and mark cached entries invalid so that new requests don't use stale data.
The simplest (for some definition of simple...) alternative is proactive invalidation from the application. The code knows when it changes an entity in the database, and after the change occurs it takes the extra step to mark the cached entries invalid. This has several short commings:
Is difficult to know exactly which cached entries are to be invalidated. Dependencies can be quite complex, things are always more that just a simple table/entry, there are aggregate queries, joins, partitioned data etc etc.
Code discipline is required to ensure all paths that modify data also invalidate the cache.
Changes to the data that occur outside the application scope are not detected. In practice, there are always changes that occur outside the application scope: other applications using the same data, import/export and ETL jobs, manual intervention etc etc.
A more complicated alternative is a cache that is notified by the database itself when changes occur. Not many technologies are around to support this though, it cannot work without an active support from the database. SQL Server has Query Notifications for such scenarios, you can read more about it at The Mysterious Notification. Implementing QN based caching in a standalone application is fairly complicated (and often done badly) but it works fine when implemented correctly. Doing so in a shared scaled out cache like Memcached is quite a feats of strength, but is doable.
Nai,
Answers to your questions follow:
From Wiki - Always correct... ? :-). For a more Microsoft answer, here is their description on Buffer Cache.
Buffer management
SQL Server buffers pages in RAM to
minimize disc I/O. Any 8 KB page can
be buffered in-memory, and the set of
all pages currently buffered is called
the buffer cache. The amount of memory
available to SQL Server decides how
many pages will be cached in memory.
The buffer cache is managed by the
Buffer Manager. Either reading from or
writing to any page copies it to the
buffer cache. Subsequent reads or
writes are redirected to the in-memory
copy, rather than the on-disc version.
The page is updated on the disc by the
Buffer Manager only if the in-memory
cache has not been referenced for some
time. While writing pages back to
disc, asynchronous I/O is used whereby
the I/O operation is done in a
background thread so that other
operations do not have to wait for the
I/O operation to complete. Each page
is written along with its checksum
when it is written. When reading the
page back, its checksum is computed
again and matched with the stored
version to ensure the page has not
been damaged or tampered with in the
meantime.
For this answer, please refer to the above answer:
Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version.
You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.
I feel like Microsoft has had time to figure out caching for their product and should be trusted.
I hope this information was helpful,
Thanks!
Caching can take many different meaning for an ASP.Net application spread from the browser all the way to your hardware with the IIS, Application, Database thrown in the middle.
The caching you are talking about is Database level caching, this is mostly transparent to your application. This level of caching will include buffer pools, statement caches etc. Make sure your DB server has plenty of RAM. In theory a DB server should be able to load the entire DB store in memory. There is not much you can do at this level unless you pre-fetch some anticipated data when you start the application and ensure that it is in DB cache.
On the other hand is in-memory distributed caching system. Apart from memcache and velocity, you can look at some commercial solutions like NCache or Oracle Coherence. I have no experience in either of them to recommend. This level of caching promises scalability at a cheaper cost. It is expensive to scale the DB tier compared to this. You may have to consider aspects like network bandwidth though. This type of caching, specially with invalidation and expiry can be complicated
You can cache at Web Service tier using output caching at IIS level (in IIS 7) and ASP.Net level.
At the application level you can use ASP.Net cache. This is the one that you can control most and gives you good benefits.
Then there is caching going on at client web proxy tier that can be controlled by cache-control HTTP header.
Finally you have browser level caching, view state and cookies for small data.
And don't forget that hardware like SAN caches at physical disk access level too.
In summary caching can occur at many levels and it for you to analyse and implement the best solution for your scenario. You have find out stability and volatility of your data, expected load etc. I believe caching at ASP.Net level (specially for objects) gives you most flexibility and control.
Your specific technical questions about SQL Server's buffer cache are going down the wrong path when it comes to "implement a caching system for my website".
Sure, SQL Server is going to cache data so it can improve its performance (and it does so rather well), but the point of implementing a caching layer on your web front-ends is to avoid from having to talk to the database at all - because there is still overhead and resource contention even when your query is fulfilled entirely from SQL Server's cache.
You want to be looking into is: memcached, Velocity, ASP.NET Cache, P&P Caching Application Block, etc.

WCF: sharing cached data across multiple services

We are developing a project that involves about 10 different WCF services with several endpoints each. One of the services keeps a few big tables of data cached in memory.
We have found we need access to that data from another service. Rather than keeping 2 copies of the cache, I'd like to be able to share those tables across all services.
I have done some research and found some articles about using an IExtension attached to the servicehosts to store the shared data.
Provided that all the services are running under the same web site, will that work? And is it the right approach? Or should I be looking elsewhere?
If the data that you're caching is required by more than one service, it sounds like - from a Service Oriented Architecture perspective, anyway - that it doesn't belong in either of services you have calling it.
If the data being cached isn't really related to either service, but is something that both services need, then perhaps it belongs in it's own seperate service. Have you considered encapsulating your cache in a third service, and performing a service-to-service call to retrieve the data you need? Benefits include...
It solves your original dilemma, avoiding the need to read the whole cache from the database several times;
It encapsulates the cache in one place for easy maintainance/change later.
It allows you to abstract the implementation of the cache away from the other services by putting another service interface in the way.
All in all, I'd suggest that's the best approach. The only downside is the extra overhead of making the service-to-service call, but that surely outperforms having to read the whole cache from the database.
Alternatively, if the data in your cache is very closely related to BOTH of the services that are calling the cache, i.e. both services add/change the data in the cache, etc. then perhaps the two existing services should be combined into a single service.
If what I'm saying is making some sense, then then principle of SOA I'm drawing on is Service Autonomy.
Provided all your services are part of the same application there doesn't seem to be any reason why you can't share the cache directly via a shared object reference. The simplest way of doing this is via a static field.
If you choose this approach, one thing to be very careful about is thread safety. If your cache is concurrently accessed via two WCF sessions, you must ensure that the two sessions are not going to interfere with each other by both changing the cache at the same time. If the cache is read-only, your need to do this is lessened, but you still might need to synchronrise initialisation of the cache.