So here's the problem, I've created a database model. When I create the model, a = Model(args), and then perform a.put(), GAE seems to automatically update the memcache, because all the data seems up-to-date even without me hitting the database. Logging the number of elements in the cache works also shows the correct number of elements. But I'm not manually updating the cache. How do I prevent this? Cheers.
You can set policy functions:
Automatic caching is convenient for most applications but maybe your application is unusual and you want to turn off automatic caching for some or all entities. You can control the behavior of the caches by setting policy functions.
Memcache Policy
That's for NDB. You don't say what language/DB you are using but I'm sure it's all similar.
Related
Specifically, is there any way in .Net Core (3.0 or earlier) to use the local file system as a Response Cache instead of just in-memory?
After a fair amount of researching, the closest thing seems to be the Response Caching middleware [1], but this does not:
allow pages to be cached indefinitely,
preserve caches between application and server restarts,
allow invalidating the cache on a per-page basis (e.g. blog entry updated),
allow invalidating the entire cache when global changes are made (e.g. theme update, menu changes, etc.).
I'm guessing these features will require custom implementation of ResponseCaching that hits the local file system, but I don't want to reinvent it if it already exists.
Some background:
This will replace our use of a static site-generator, which is problematic for site-wide changes because of the sheer quantity of data (nearly 24 hours to generate and copy to all of the servers).
The scenario is very similar to an encyclopedia or news site -- the vast majority of the content changes infrequently, a few things are added per day, and there is no user-specific content (and if or when there is, it would be dynamically loaded via JS/Ajax). Additionally, the page loads happen to be processor/memory/database intensive.
We will be using a reverse proxy like CloudFlare or AWS CloudFront, but AWS automatically expires their edge caches daily. Edge node cache misses are still frequent.
This is different than IDistributedCache [2] in that it should be response caching, not just caching data used by the MVC Model.
We will also use in-memory cache [3], but again, that solves a different caching scenario.
References
[1] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/middleware
[2] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/distributed
[3] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/memory
I implemented this.
https://www.nuget.org/packages/AspNetCore.ResponseCaching.Extensions/
https://github.com/speige/AspNetCore.ResponseCaching.Extensions
Doesn't support .net core 3 yet though.
Currently (April 2019) the answer appears to be: no, there is nothing out-of-the-box for this.
There are three viable approaches to accomplish this using .Net Core:
Fork the built-in ResponseCaching middleware and create a flag for cache-to-disk:
https://github.com/aspnet/AspNetCore/tree/master/src/Middleware/ResponseCaching
This might be annoying to maintain because the namespaces and class names will collide with the core framework.
Implement this missing feature in EasyCaching, which apparently already has caching to disk on their radar:
https://github.com/dotnetcore/EasyCaching/blob/master/ToDoList.md
A pull request may be more likely accepted, since it's a planned feature.
There is apparently a port of Strathweb.CacheOutput to .Net Core, which would allow one to implement IApiOutputCache to save to disk:
https://github.com/Iamcerba/AspNetCore.CacheOutput#server-side-caching
Although this question is about caching within .Net Core using the local file system, this could also be accomplished using a local instance of Sqlite on each server node, and then configuring EasyCaching for response caching and to point it to the Sqlite instance on localhost.
I hope this helps someone else who finds themselves in this scenario!
Imagine we have a web-site which sends write and read requests into some DB via Hibernate. I use Java, but it doesn't matter for this question.
Usually we want to read the fresh data from DB. But I want to introduce some delay between the written data becomes visible to reads just to increase the performance. I.e. I dont need to "publish" the rows inserted into DB immediately. Its OK for me to "publish" fresh data after some delay.
How can I achieve it?
As far as I understand this can be set up on several different tiers of my system.
I can cache some requests in front-end. Probably I should set up proxy server for this. But this will work only if all the parameters of the query match.
I can cache the read requests in Hibernate. OK, but can I specify or estimate the average time the read query will return stale data after some fresh insert occurred? In other words how can I control the delay time between fresh data becomes visible to the users?
Or may be I should use something like a memcached system instead of Hibernate cache?
Probably I can set something in DB. I dont know what should I do with DB. Probably I can ease the isolation level to burst the performance of my DB.
So, which way is the best one?
And the main question, of course: does the relaxation of requirements I introduce here may REALLY help to increase the performance of my system?
If I am reading your architecture correct you have client -> server -> database server
Answers to each point
This will put the burden on the client to implement the caching if you only use your own client I would go for this method. It will have the side effect of improving client performance possibly and put less load on the server and database server so they will scale better.
Now caching on the server will improve scalability of the database server and possibly performance in the client but will put a memory burden on the server. This would be my second option
Implement something in the database. At this point what are you gaining? the database server still has to do work to determine what rows to send back. And also you will get no scalability benefits.
So to sum up I would cache at the client first if you can if not cache at the server. Leave the DB out of the loop.
To answer your main question - caching is one of the most effective ways of increasing both performance and scalability of web applications which are constrained by database performance - your application may or may not fall into this category.
In general, I'd recommend setting up a load testing rig, and measure the various parts of your app to identify the bottleneck before starting to optimize.
The most effective cache is one outside your system - a CDN or the user's browser. Read up on browser caching, and see if there's anything you can cache locally. Browsers have caching built in as a standard feature - you control them via HTTP headers. These caches are very effective, because they stop requests even reaching your infrastructure; they are very efficient for static web assets like images, javascript files or stylesheets. I'd consider a proxy server to be in the same category. The major drawback is that it's hard to manage this cache - once you've said to the browser "cache this for 2 weeks", refreshing it is hard.
The next most effective caching layer is to cache (parts of) web pages on your application server. If you can do this, you avoid both the cost of rendering the page, and the cost of retrieving data from the database. Different web frameworks have different solutions for this.
Next, you can cache at the ORM level. Hibernate has a pretty robust implementation, and it provides a lot of granularity in your cache strategies. This article shows a sample implementation, including how to control the expiration time. You get a lot of control over caching here - you can specify the behaviour at the table level, so you can cache "lookup" data for days, and "transaction" data for seconds.
The database already implements a cache "under the hood" - it will load frequently used data into memory, for instance. In some applications, you can further improve the database performance by "de-normalizing" complex data - so the import routine might turn a complex data structure into a simple one. This does trade of data consistency and maintainability against performance.
I am developing a WPF application using NHibernate to communicate with a PostgreSQL Database.
The only caching provider that works on a desktop app is Bamboo Prevalence (correct me if I am wrong). Given that every computer running my application will have different Session Factory, my application retrieves stale data from the cache.
My question is, how can I tell NHibernate/Prevalence to look at the timestamp of when the data was last updated, and if the cache is stale, refresh it?
Well, I found out that there is no way the Second Level cache can know if the database was changed outside Nhibernate/Cache, so what I did was creating a new column 'Timestamp' on all my tables.
On my queries, I first select the timestamp of the db using Session.Cachemode(CacheMode.Ignore) to get the timestamp of the db and I compare with the result from the cache. In the case the timestamps differ, I invalidate the cache for that query and run it again.
About the SysCache, even knowing it 'can work' on a WPF desktop app, I was not keen to use System.Web.Cache as my application would need the the complete .Net Framework instead of the Client Profile. I did a search and for my happiness someone wrote a Nhiberate cache proviver that implements (System.Runtime.Caching), witch is not a ASP.Net component. If anyone is interested you can find the source at:
https://github.com/Leftyx/nhcontrib/tree/master/src/NHibernate.Caches/MemoryCache
Well that is a property that you could set at the cache level and expire items according to your applications needs and then have the cache. Ncache is a possible L2 cache provider for NHibernate. NCache ensures that its cache is consistent across multiple servers and all cache updates are synchronized correctly so no data integrity issues arise. To learn more please visit:
http://www.alachisoft.com/ncache/nhibernate-l2cache-index.html
Say I want to use the dalli store for caching (fragments of) views. Does this mean that doing something like this will also use memcache for possible DB caching?
Rails.cache.fetch("something") { smth }
Also, if I do something like:
Author.all
the Rails console will show me that it's querying the database, but if I run Author.all again, it will show me that the results are retrieved from cache. When do I want to explicitly use Rails.cache and when should I rely on ActiveRecord to do the caching?
Fetch and read operations get items from the
cache store which is currently configured,
for example from the file system if the cache
store is :file_store, or from the Memcached
server if the store is :mem_cache_store.
Therefore if you want to use Memcached for
fragment caching, you have to configure the
cache_store accordingly:
ActionController::Base.cache_store = :mem_cache_store, "cache-1.example.com"
It is recommendable to use fragment caching
if one has large, complex views involving many
queries which change rarely or slowly. Fragement
caching is a good trade-off between completely
static pages (fast, but fixed) and dynamic pages
(slow, but variable). If you need to cache a certain section of a
page instead of the entire page, fragment caching is the way
to go, as Ryan Bates said in his Railscast about fragment caching.
Page and action caching are even better, they are great
for speeding up the performance of a page, but problematic
if it contains user-specific content. In this case
it is possible to use dynamic page caching. Stackoverflow
uses a similar technique.
SQL Caching persists only for the duration of
a single action.
Background to question: I'm looking to implement a caching system for my website. Currently we're exploring memcache as a means of doing this. However, I am looking to see if something similar exists for SQL Server. I understand that MySQL has query cache which although is not distributed works as a sort of 'stop gap' measure. Is MySQL query cache equivalent to the buffer cache in SQL Server?
So here are my questions:
Is there a way to know is currently stored in the buffer cache?
Follow up to this, is there a way to force certain tables or result sets into the cache
How much control do I have over what goes on in the buffer and procedure cache? I understand there used to be a DBCC PINTABLE command but that has since been discontinued.
Slightly off topic: Should the caching even exists on the database layer? Or it is more prudent to manage caches using Velocity/Memcache? Is so, why? It seems like cache invalidation is something of a pain when handling many objects with overlapping triggers.
Thanks!
SQL Server implements a buffer pool same way every database product under the sun does (more or less) since System R showed the way. The gory details are explain in Transaction Processing: Concepts and Techniques. I addition it has a caching framework used by the procedure cache, permission token cache and many many other caching classes. This framework is best described in Clock Hands - what are they for.
But this is not the kind of caching applications are usually interested in. The internal database cache is perfect for scale-up scenarios where a more powerfull back end database is able to respond faster to more queries by using these caches, but the modern application stack tends to scale out the web servers and the real problem is caching the results of query interogations in a cache used by the web farm. Ideally, this cache should be shared and distributed. Memcached and Velocity are examples of such application caching infrastructure. Memcache has a long history by now, its uses and shortcommings are understood, there is significant know-how around how to use it, deploy it, manage it and monitor it.
The biggest problem with caching in the application layer, and specially with distributed caching, is cache invalidation. How to detect the changes that occur in the back end data and mark cached entries invalid so that new requests don't use stale data.
The simplest (for some definition of simple...) alternative is proactive invalidation from the application. The code knows when it changes an entity in the database, and after the change occurs it takes the extra step to mark the cached entries invalid. This has several short commings:
Is difficult to know exactly which cached entries are to be invalidated. Dependencies can be quite complex, things are always more that just a simple table/entry, there are aggregate queries, joins, partitioned data etc etc.
Code discipline is required to ensure all paths that modify data also invalidate the cache.
Changes to the data that occur outside the application scope are not detected. In practice, there are always changes that occur outside the application scope: other applications using the same data, import/export and ETL jobs, manual intervention etc etc.
A more complicated alternative is a cache that is notified by the database itself when changes occur. Not many technologies are around to support this though, it cannot work without an active support from the database. SQL Server has Query Notifications for such scenarios, you can read more about it at The Mysterious Notification. Implementing QN based caching in a standalone application is fairly complicated (and often done badly) but it works fine when implemented correctly. Doing so in a shared scaled out cache like Memcached is quite a feats of strength, but is doable.
Nai,
Answers to your questions follow:
From Wiki - Always correct... ? :-). For a more Microsoft answer, here is their description on Buffer Cache.
Buffer management
SQL Server buffers pages in RAM to
minimize disc I/O. Any 8 KB page can
be buffered in-memory, and the set of
all pages currently buffered is called
the buffer cache. The amount of memory
available to SQL Server decides how
many pages will be cached in memory.
The buffer cache is managed by the
Buffer Manager. Either reading from or
writing to any page copies it to the
buffer cache. Subsequent reads or
writes are redirected to the in-memory
copy, rather than the on-disc version.
The page is updated on the disc by the
Buffer Manager only if the in-memory
cache has not been referenced for some
time. While writing pages back to
disc, asynchronous I/O is used whereby
the I/O operation is done in a
background thread so that other
operations do not have to wait for the
I/O operation to complete. Each page
is written along with its checksum
when it is written. When reading the
page back, its checksum is computed
again and matched with the stored
version to ensure the page has not
been damaged or tampered with in the
meantime.
For this answer, please refer to the above answer:
Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version.
You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.
I feel like Microsoft has had time to figure out caching for their product and should be trusted.
I hope this information was helpful,
Thanks!
Caching can take many different meaning for an ASP.Net application spread from the browser all the way to your hardware with the IIS, Application, Database thrown in the middle.
The caching you are talking about is Database level caching, this is mostly transparent to your application. This level of caching will include buffer pools, statement caches etc. Make sure your DB server has plenty of RAM. In theory a DB server should be able to load the entire DB store in memory. There is not much you can do at this level unless you pre-fetch some anticipated data when you start the application and ensure that it is in DB cache.
On the other hand is in-memory distributed caching system. Apart from memcache and velocity, you can look at some commercial solutions like NCache or Oracle Coherence. I have no experience in either of them to recommend. This level of caching promises scalability at a cheaper cost. It is expensive to scale the DB tier compared to this. You may have to consider aspects like network bandwidth though. This type of caching, specially with invalidation and expiry can be complicated
You can cache at Web Service tier using output caching at IIS level (in IIS 7) and ASP.Net level.
At the application level you can use ASP.Net cache. This is the one that you can control most and gives you good benefits.
Then there is caching going on at client web proxy tier that can be controlled by cache-control HTTP header.
Finally you have browser level caching, view state and cookies for small data.
And don't forget that hardware like SAN caches at physical disk access level too.
In summary caching can occur at many levels and it for you to analyse and implement the best solution for your scenario. You have find out stability and volatility of your data, expected load etc. I believe caching at ASP.Net level (specially for objects) gives you most flexibility and control.
Your specific technical questions about SQL Server's buffer cache are going down the wrong path when it comes to "implement a caching system for my website".
Sure, SQL Server is going to cache data so it can improve its performance (and it does so rather well), but the point of implementing a caching layer on your web front-ends is to avoid from having to talk to the database at all - because there is still overhead and resource contention even when your query is fulfilled entirely from SQL Server's cache.
You want to be looking into is: memcached, Velocity, ASP.NET Cache, P&P Caching Application Block, etc.