Is it possible to configure .Net Core to use the file system to cache responses? - asp.net-core

Specifically, is there any way in .Net Core (3.0 or earlier) to use the local file system as a Response Cache instead of just in-memory?
After a fair amount of researching, the closest thing seems to be the Response Caching middleware [1], but this does not:
allow pages to be cached indefinitely,
preserve caches between application and server restarts,
allow invalidating the cache on a per-page basis (e.g. blog entry updated),
allow invalidating the entire cache when global changes are made (e.g. theme update, menu changes, etc.).
I'm guessing these features will require custom implementation of ResponseCaching that hits the local file system, but I don't want to reinvent it if it already exists.
Some background:
This will replace our use of a static site-generator, which is problematic for site-wide changes because of the sheer quantity of data (nearly 24 hours to generate and copy to all of the servers).
The scenario is very similar to an encyclopedia or news site -- the vast majority of the content changes infrequently, a few things are added per day, and there is no user-specific content (and if or when there is, it would be dynamically loaded via JS/Ajax). Additionally, the page loads happen to be processor/memory/database intensive.
We will be using a reverse proxy like CloudFlare or AWS CloudFront, but AWS automatically expires their edge caches daily. Edge node cache misses are still frequent.
This is different than IDistributedCache [2] in that it should be response caching, not just caching data used by the MVC Model.
We will also use in-memory cache [3], but again, that solves a different caching scenario.
References
[1] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/middleware
[2] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/distributed
[3] https://learn.microsoft.com/en-us/aspnet/core/performance/caching/memory

I implemented this.
https://www.nuget.org/packages/AspNetCore.ResponseCaching.Extensions/
https://github.com/speige/AspNetCore.ResponseCaching.Extensions
Doesn't support .net core 3 yet though.

Currently (April 2019) the answer appears to be: no, there is nothing out-of-the-box for this.
There are three viable approaches to accomplish this using .Net Core:
Fork the built-in ResponseCaching middleware and create a flag for cache-to-disk:
https://github.com/aspnet/AspNetCore/tree/master/src/Middleware/ResponseCaching
This might be annoying to maintain because the namespaces and class names will collide with the core framework.
Implement this missing feature in EasyCaching, which apparently already has caching to disk on their radar:
https://github.com/dotnetcore/EasyCaching/blob/master/ToDoList.md
A pull request may be more likely accepted, since it's a planned feature.
There is apparently a port of Strathweb.CacheOutput to .Net Core, which would allow one to implement IApiOutputCache to save to disk:
https://github.com/Iamcerba/AspNetCore.CacheOutput#server-side-caching
Although this question is about caching within .Net Core using the local file system, this could also be accomplished using a local instance of Sqlite on each server node, and then configuring EasyCaching for response caching and to point it to the Sqlite instance on localhost.
I hope this helps someone else who finds themselves in this scenario!

Related

Upcoming deprecation of legacy GCE and GKE metadata server endpoints on Legacy Boxes

I have two legacy servers in GCE, which have both been flagged as using the deprecated metadata server endpoints. At this moment in time, they have hundreds of GB's of data between them in MySQL and MongoDB data, and risking upgrading something on these boxes which has an adverse affect is not an option.
We are currently in the process of migrating away from the data stored here, but for now, we need to keep them running.
Is anyone aware of any implications to either
a) doing nothing or
b) Just setting the disable-legacy-endpoints metadata flag to true ?
i.e. Will these instances stop working altogether if we leave them as they currently are?
After some more digging into what was actually using the Metadata API to start with, we found that they were being sent by stackdriver_agent which was installed an extremely long time ago while it was free, and just never removed.
Stopping this agent will remove all calls that we make with these two legacy servers.
If you are considering disabling with the disable-legacy-endpoints metadata flag, make sure to test it in a contained environment first, i.e. a new VM from a snapshot of the affected instance, before apply to production services.
For help identifying the instances making the calls, refer to this article
For help identifying the processes within the instances, refer to this article

Google App Engine automatically updating memcache

So here's the problem, I've created a database model. When I create the model, a = Model(args), and then perform a.put(), GAE seems to automatically update the memcache, because all the data seems up-to-date even without me hitting the database. Logging the number of elements in the cache works also shows the correct number of elements. But I'm not manually updating the cache. How do I prevent this? Cheers.
You can set policy functions:
Automatic caching is convenient for most applications but maybe your application is unusual and you want to turn off automatic caching for some or all entities. You can control the behavior of the caches by setting policy functions.
Memcache Policy
That's for NDB. You don't say what language/DB you are using but I'm sure it's all similar.

OS and/or IIS Caching

Is there a way where I can force caching files at an OS level and/or Web Server level (IIS)
The problem I am facing is that there a many static files ( xslt's for example ) that need to be loaded again and again - and I want to load all these files to memory so that no time wasted on hard disk I/O.
(1) I want to cache it at the OS level so that every program that runs on my OS and which tries to read a file must read it from memory. I want no changing in program source code - it must happen transparently. For example, read("c:\abc.txt") must not cause a disk I/O, it must read it from the memory.
(2) Achieving similar thing in IIS. I've read few things about output caching for database queries - but how to achieve it for files?
All suggestions are welcome!
Thanks
You should look into some tricks used by SO itself. One was that they moved all their static content off to another domain for efficiency.
The problem with default set ups for Apache (at a minimum) is that the web server will pass all requests through to an app server to see if the content is meant to be dynamic. That's a huge waste for content that you know to be static.
Far better to set up a separate domain for static content without an app server. That way, the static requests are not sent unnecessarily to another layer and the web server can run much faster.
Even in a setup where there's not another layer invoked every time, there are other reasons for a separate domain, as you'll see from that link (specifically removing cookies which both reduces traffic and improves the chances of the Internet caching your data).

SQL Server 2005, Caches and all that jazz

Background to question: I'm looking to implement a caching system for my website. Currently we're exploring memcache as a means of doing this. However, I am looking to see if something similar exists for SQL Server. I understand that MySQL has query cache which although is not distributed works as a sort of 'stop gap' measure. Is MySQL query cache equivalent to the buffer cache in SQL Server?
So here are my questions:
Is there a way to know is currently stored in the buffer cache?
Follow up to this, is there a way to force certain tables or result sets into the cache
How much control do I have over what goes on in the buffer and procedure cache? I understand there used to be a DBCC PINTABLE command but that has since been discontinued.
Slightly off topic: Should the caching even exists on the database layer? Or it is more prudent to manage caches using Velocity/Memcache? Is so, why? It seems like cache invalidation is something of a pain when handling many objects with overlapping triggers.
Thanks!
SQL Server implements a buffer pool same way every database product under the sun does (more or less) since System R showed the way. The gory details are explain in Transaction Processing: Concepts and Techniques. I addition it has a caching framework used by the procedure cache, permission token cache and many many other caching classes. This framework is best described in Clock Hands - what are they for.
But this is not the kind of caching applications are usually interested in. The internal database cache is perfect for scale-up scenarios where a more powerfull back end database is able to respond faster to more queries by using these caches, but the modern application stack tends to scale out the web servers and the real problem is caching the results of query interogations in a cache used by the web farm. Ideally, this cache should be shared and distributed. Memcached and Velocity are examples of such application caching infrastructure. Memcache has a long history by now, its uses and shortcommings are understood, there is significant know-how around how to use it, deploy it, manage it and monitor it.
The biggest problem with caching in the application layer, and specially with distributed caching, is cache invalidation. How to detect the changes that occur in the back end data and mark cached entries invalid so that new requests don't use stale data.
The simplest (for some definition of simple...) alternative is proactive invalidation from the application. The code knows when it changes an entity in the database, and after the change occurs it takes the extra step to mark the cached entries invalid. This has several short commings:
Is difficult to know exactly which cached entries are to be invalidated. Dependencies can be quite complex, things are always more that just a simple table/entry, there are aggregate queries, joins, partitioned data etc etc.
Code discipline is required to ensure all paths that modify data also invalidate the cache.
Changes to the data that occur outside the application scope are not detected. In practice, there are always changes that occur outside the application scope: other applications using the same data, import/export and ETL jobs, manual intervention etc etc.
A more complicated alternative is a cache that is notified by the database itself when changes occur. Not many technologies are around to support this though, it cannot work without an active support from the database. SQL Server has Query Notifications for such scenarios, you can read more about it at The Mysterious Notification. Implementing QN based caching in a standalone application is fairly complicated (and often done badly) but it works fine when implemented correctly. Doing so in a shared scaled out cache like Memcached is quite a feats of strength, but is doable.
Nai,
Answers to your questions follow:
From Wiki - Always correct... ? :-). For a more Microsoft answer, here is their description on Buffer Cache.
Buffer management
SQL Server buffers pages in RAM to
minimize disc I/O. Any 8 KB page can
be buffered in-memory, and the set of
all pages currently buffered is called
the buffer cache. The amount of memory
available to SQL Server decides how
many pages will be cached in memory.
The buffer cache is managed by the
Buffer Manager. Either reading from or
writing to any page copies it to the
buffer cache. Subsequent reads or
writes are redirected to the in-memory
copy, rather than the on-disc version.
The page is updated on the disc by the
Buffer Manager only if the in-memory
cache has not been referenced for some
time. While writing pages back to
disc, asynchronous I/O is used whereby
the I/O operation is done in a
background thread so that other
operations do not have to wait for the
I/O operation to complete. Each page
is written along with its checksum
when it is written. When reading the
page back, its checksum is computed
again and matched with the stored
version to ensure the page has not
been damaged or tampered with in the
meantime.
For this answer, please refer to the above answer:
Either reading from or writing to any page copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy, rather than the on-disc version.
You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.
I feel like Microsoft has had time to figure out caching for their product and should be trusted.
I hope this information was helpful,
Thanks!
Caching can take many different meaning for an ASP.Net application spread from the browser all the way to your hardware with the IIS, Application, Database thrown in the middle.
The caching you are talking about is Database level caching, this is mostly transparent to your application. This level of caching will include buffer pools, statement caches etc. Make sure your DB server has plenty of RAM. In theory a DB server should be able to load the entire DB store in memory. There is not much you can do at this level unless you pre-fetch some anticipated data when you start the application and ensure that it is in DB cache.
On the other hand is in-memory distributed caching system. Apart from memcache and velocity, you can look at some commercial solutions like NCache or Oracle Coherence. I have no experience in either of them to recommend. This level of caching promises scalability at a cheaper cost. It is expensive to scale the DB tier compared to this. You may have to consider aspects like network bandwidth though. This type of caching, specially with invalidation and expiry can be complicated
You can cache at Web Service tier using output caching at IIS level (in IIS 7) and ASP.Net level.
At the application level you can use ASP.Net cache. This is the one that you can control most and gives you good benefits.
Then there is caching going on at client web proxy tier that can be controlled by cache-control HTTP header.
Finally you have browser level caching, view state and cookies for small data.
And don't forget that hardware like SAN caches at physical disk access level too.
In summary caching can occur at many levels and it for you to analyse and implement the best solution for your scenario. You have find out stability and volatility of your data, expected load etc. I believe caching at ASP.Net level (specially for objects) gives you most flexibility and control.
Your specific technical questions about SQL Server's buffer cache are going down the wrong path when it comes to "implement a caching system for my website".
Sure, SQL Server is going to cache data so it can improve its performance (and it does so rather well), but the point of implementing a caching layer on your web front-ends is to avoid from having to talk to the database at all - because there is still overhead and resource contention even when your query is fulfilled entirely from SQL Server's cache.
You want to be looking into is: memcached, Velocity, ASP.NET Cache, P&P Caching Application Block, etc.

Index replication and Load balancing

Am using Lucene API in my web portal which is going to have 1000s of concurrent users.
Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing.
Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?
You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.
That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.
Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.
From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.
Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.
We are working on a similar implementation to what you are describing as a proof of concept. What we see as an end-product for us consists of three separate servers to accomplish this.
There is a "publication" server, that is responsible for generating the indices that will be used. There is a service implementation that handles the workflows used to build these indices, as well as being able to signal completion (a custom management API exposed via WCF web services).
There are two "site-facing" Lucene.NET servers. Access to the API is provided via WCF Services to the site. They sit behind a physical load balancer and will periodically "ping" the publication server to see if there is a more current set of indicies than what is currently running. If it is, it requests a lock from the publication server and updates the local indices by initiating a transfer to a local "incoming" folder. Once there, it is just a matter of suspending the searcher while the index is attached. It then releases its lock and the other server is available to do the same.
Like I said, we are only approaching the proof of concept stage with this, as a replacement for our current solution, which is a load balanced Endeca cluster. The size of the indices and the amount of time it will take to actually complete the tasks required are the larger questions that have yet to be proved out.
Just some random things that we are considering:
The downtime of a given server could be reduced if two local folders are used on each machine receiving data to achieve a "round-robin" approach.
We are looking to see if the load balancer allows programmatic access to have a node remove and add itself from the cluster. This would lessen the chance that a user experiences a hang if he/she accesses during an update.
We are looking at "request forwarding" in the event that cluster manipulation is not possible.
We looked at solr, too. While a lot of it just works out of the box, we have some bench time to explore this path as a learning exercise - learning things like Lucene.NET, improving our WF and WCF skills, and implementing ASP.NET MVC for a management front-end. Worst case scenario, we go with something like solr, but have gained experience in some skills we are looking to improve on.
I'm creating the Indices on the publishing Backend machines into the filesystem and replicate those over to the marketing.
That way every single, load & fail balanced, node has it's own index without network latency.
Only drawback is, you shouldn't try to recreate the index within the replicated folder, as you'll have the lockfile lying around at every node, blocking the indexreader until your reindex finished.