Hippocms Cache warmer - hippocms

Publication changes in cms invalidates whole cache in Hippocms. Is there any best practice to warm cache in hippocms? Cache warmer by scheduler not an option. Is it possible to warm by trigger on cache invalidation?

Currently, the page cache its primary goal is to be a hotspot pages cache (for example homepage that is requested 1000 times per second) and it provides a stampeding herd protection (100 requests at the same time request the same page: 1 request is executed, all other requests are served the same response)
Because of cluster requirements and since we do not know on which page which content is used, the page cache is currently very volatile : It would be easy to cache a page for, say, 5 minutes, regardless content changes: However in clustered setups, this can then result for some short period in alternating pages depending on which cluster node you hit: This is not acceptable.
That is how it works now. For 11.1 I already have a very much improved page cache solution which still uses the current one but adds a second level page cache (which in general we cluster via redis and on which we also set a TTL for a page) and next to the second level cache we provide a stale page cache: The stale page cache is used that if 100 requests arrive for the same page and there is no valid page in primary or second level cache, then 1 request passes and builds up the new page, and all other 99 get the stale response. After the 1 request that passes gets the response back, the stale page cache entry is purged and everybody receives the non-stale page. Note that stale page cache support is possible without a second level cache (and second level cache without stale page support is also possible).
Another advantage of the second level page cache, is that if you make it clustered (which you should (easy configuration option)), is that only one cluster node needs to create a page response.
Version 11.1 will ship with this improved page cache options. Note that that page cache as it is now, and in the future, works seamlessly with relevance (personalization of pages)
Hope this helps,
Regards Ard

Related

Purge cache in Cloudflare using API

What would be best practice to refresh content that is already cached by CF?
We have few API that generate JSON and we cache them.
Once a while JSON should be updated and what we do right now is - purge them via API.
https://api.cloudflare.com/client/v4/zones/dcbcd3e49376566e2a194827c689802d/purge_cache
later on, when user hits the page with required JSON it will be cached.
But in our case we have 100+ JSON files that we purge at once and we want to send new cache to CF instead of waiting for users (to avoid bad experience for them).
Right now I consider to PING (via HTTP request) needed JSON endpoints just after we have purged cache.
My question if that is the right way and if CF already has some API to do what we need.
Thanks.
Currently, the purge API is the recommended way to invalidate cached content on-demand.
Another approach for your scenario could be to look at Workers and Workers KV, and combine it with the Cloudflare API. You could have:
A Worker reading the JSON from the KV and returning it to the user.
When you have a new version of the JSON, you could use the API to create/update the JSON stored in the KV.
This setup could be significantly performant, since the Worker code in (1) runs on each Cloudflare datacenter and returns quickly to the users. It is also important to note that KV is "eventually consistent" storage, so feasibility depends on your specific application.

InstantSearch caching strategy

I'd like to implement a fast, smooth search. Searched items are not that many: ~100 max. Each item holds the amount of data a facebook event would hold. They will all show up on initial load (maybe an infite scroll). Data won't change frequently. No more than 100 concurrent users.
What's the best caching strategy for search results, given above conditions?
What's the most scalable strategy?
Stack
Frontend: Nuxt (VueJS) + InstantSearch (no Algolia!)
Backend: Spring boot
Dockerized
Possible solutions
Extra caching service on the backend (e.g. reddis, memcached) + make UI go to sever on each search operation. This would basically spam the backend on each keystroke
Load all items into local storage (e.g. Vuex) and search there directly. This will increase the app's memory footprint and may turn out messy overtime.
A combination of the two?
A cache layer definitely doesn't hurt. The user amount shouldn't be an issue. Even the smallest ec2-instance on aws could handle that easily.
You could try and add a tiny bit of delay in the textbox so not every keystroke fires a search but maybe give a leeway of ~50ms? Gotta try and see how it feels when typing in the searchbar.
For 100 items Vuex can be quite fast too, as long as you don't load static assets like images directly into Vuex. ~100 items in JSON data isn't a lot - but it doesn't scale as well if your app suddenly has 10000 items.
Best scenario in my opinion:
Redis cache because a lot of the requests will be very similar or even identical. You'd just need to find a sweet spot on how long the cache is valid
Load balance your backend and frontend i.e. create more instances of your docker-image on demand to handle potential spikes in requests if CPU-usage goes above a certain threshold
If your backend does more than just search, outsource that search to a dedicated instance so it doesn't interfere with the "regular requests"
Very important: Create indices in your database for faster search results, avoid full scans whereever you can!
Maybe think about going serverless if your app doesn't have traffic 24/7
Edit: - have the api, cache and database close by eachother so communication between the instances don't have to travel far.

Why do I have to test my site twice on GTmetrix to see the speed benefits of cloudflare?

The first test is always slow. The second test shows the speed benefits of Cloudflare. Why is that and does this mean users will have to load the website twice?
"speed benefits of Cloudflare" could be referring to a variety of unique features that Cloudflare offers (such as image compression, lazy loading javascript, etc.). For this answer, I am assuming that you are referring to its CDN/caching capabilities.
Essentially, being a CDN means that a client needs to request one of your site's resources from each of the CDN edge nodes to prime the cache at that node from the origin server.
GTmetrix is similar to a human website visitor in the sense that if it is the first to request a resource within its cache timeout from a CDN edge node, the request will have to go all the way back to the origin server rather than responding from the closer edge node. The second time that resource is requested from the edge node, however, the resource will be cached and will be served much quicker due to the reduced network latency.
I'd recommend reading up a bit more on how CDNs work if you are not already familiar with that. You will probably want to tweak your caching headers so that resources that are relatively static are rarely purged from the edge nodes which will reduce requests with this "first-timer penalty".

Redis - Reload content before expires

In Akamai we can order to reload a content from origin when 90% of the expiration time was consumed. In this case, Akamai is serving the cached content but is accessing to origin to reload the new content.
Is there a similar feature in Redis?
For example, I put a content in cache for 5 hours. But I want to reload it if someone access to this content when only left 30 minutes or less. If a user access to it in this period I will serve the cached content but in background we will be reloading the new content.
Is it possible?
Thanks.
Redis is not an active component regarding fetching data itself but rather a data store. It keeps your data, expires/evicts keys based on their TTL.
You/your application is in charge to populate Redis with the data you want to keep stored.
However, you can use Redis primitives to achieve a part of what would be needed to serve your requirement:
Redis TTL/Expiry
Keyspace notifications
Pub/Sub
Keyspace notifications publish a notification on certain events such as key creation or expiry. You could store two keys in Redis, a key representing your payload with the appropriate TTL and a phantom key which is a marker with a slightly shorter TTL (say 90% of the original TTL).
As soon as the phantom key expires you capture that notification. Then you can fetch the content of the cache you want to update. You update the cache key and write a phantom key again for the next cache update iteration.
The steps above are a strongly abbreviated but should guide you towards a feasible approach.

Blocking duplicate http request using mod security

I am using mod security to look for specific values in post parameters and blocking the request if duplicate comes in. I am using mod security user collection to do just that. The problem is that my requests are long running so a single request can take in more than 5 minutes. The user collection i assume does not get written to disk until the first request gets processed. If during the execution of the first request another request comes in using the duplicate value for post parameter the second request does not gets blocked since the collection is not available yet. I need to avoid this situation. Can I use memory based shared collections across requests in mod security? Any other way? Snippet below:
SecRule ARGS_NAMES "uploadfilename" "id:400000,phase:2,nolog,setuid:%{ARGS.uploadfilename},initcol:USER=%{ARGS.uploadfilename},setvar:USER.duplicaterequests=+1,expirevar:USER.duplicaterequests=3600"
SecRule USER:duplicaterequests "#gt 1" "id:400001,phase:2,deny,status:409,msg:'Duplicate Request!'"
ErrorDocument 409 "<h1>Duplicate request!</h1><p>Looks like this is a duplicate request, if this is not on purpose, your original request is most likely still being processed. If this is on purpose, you'll need to go back, refresh the page, and re-submit the data."
ModSecurity is really not a good place to put this logic.
As you rightly state there is no guarantee when a collection is written, so even if collections were otherwise reliable (which they are not - see below), you shouldn't use them for absolutes like duplicate checks. They are OK for things like brute force or DoS checks where, for example, stopping after 11 or 12 checks rather than 10 checks isn't that big a deal. However for absolute checks, like stopping duplicates, the lack of certainty here means this is a bad place to do this check. A WAF to me should be an extra layer of defence, rather than be something you depend on to make your application work (or at least stop breaking). To me, if a duplicate request causes a real problem to the transactional integrity of the application, then those checks belong in the application rather than in the WAF.
In addition to this, the disk based way that collections work in ModSecurity, causes lots of problems - especially when multiple processes/threads try to access them at once - which make them unreliable both for persisting data, and for removing persisted data. Many folks on the ModSecurity and OWASP ModSecurity CRS mailing lists have seen errors in the log file when ModSecurity tried to automatically clean up collections, and so have seen collections files grow and grow until it starts to have detrimental effects on Apache. In general I don't recommend user collections for production usage - especially for web servers with any volume.
There was a memcache version of ModSecurity created that was created which stopped using the dusk based SDBM format which may have addressed a lot of the above issues however it was not completed, though it may be part of ModSecurity v3. I still disagree however that a WAF is the place to check this.