What are the best way of handling AEM caching in dispatcher? - apache

I recently came across situation where I need to clear my dispatcher cache manually. For instance, If I am modifying any Js/css files, I would need to clear my dispatcher cache manually in order to get those modified new Js/css else it would be serving the old version of code. I just heard that ACS developed version clientlib which will help us do versioning. I have so many question on this.
Before version clientlib how did AEM manage?
Doesn't AEM has intelligent to manage versioned clientlibs?
Is it correct of way of handling it?
Can we create a Script whil will take back up of the existing before clearing those JS/css files?
What are the other options we have?

Versioned clientlib is correct solution. But you ll need bit more:
Versioned Clientlibs is a clientside cache busting technique. Used to bust the browser cache.
It will NOT bust dispatcher cache. Pages cached at dispatcher continues to serve unless manually cleared.
Refer here for similar question.
To answer your queries:
Before version clientlib how did AEM manage? As #Subhash points out, it is part of prod deployment scripts(Bamboo or Jenkins) to clear dispatcher cache when clientlibs change.
Doesn't AEM has intelligent to manage versioned clientlibs? - This is same as any cms tools. Caching strategy has to be responsibility of http servers and NOT AEM. Morever when you deploy js code changes, you need to clear dispatcher cache to get pages reflect new js changes.
Is it correct of way of handling it? For clientside busting - versioned clientlib is 100% foolproof technique. For dispatcher cache busting, you ll need different method.
Can we create a Script while will take back up of the existing before clearing those JS/css files? Should be part of your CI process defined in Jenkins/Bamboo jobs. Not a responsibility of AEM.
What are the other options we have? - For dispatcher cache clearance, try dispatcher-flush-rules. You can configure that when /etc design paths are published, they should automatically clear entire tree so that subsequent requests will hit publisher and get updated clientlibs.
Recommended:
Use Versioned Clientlibs + CI driven dispatcher cache clearance.
Since clientlibs are modified by IT team and requires deployment, make it part of CI process to clear cache. Dispatcher flush rules might help. But it is NOT usecase for someone to modify js/css and hit publish button in production. Production deployment cycle should perform this task. Reference links for dispatcher cache clear scripts: 1. Adobe documentation, 2. Jenkins way, 3. Bamboo way

Before versioned clientlibs -
You usually wire dispatcher invalidation as part of build and deployment pipeline. The last phase after the packages have been deployed to the author and publish instances, dispatcher is invalidated. This still leads to issue of browser cache not getting cleared (in cases where clientlib name has not changed.)
To overcome this, you can build custom cache busting techniques where you maintain a scheme for naming clientlibs for each release - eg: /etc/designs/homepageclientlib.v1.js or /etc/designs/homepageclientlib.<<timestamp>>.js. This is just for the browser to trigger a fresh download of the file from the server. But with CI/CD and frequent releases these days, this is just an overhead.
There are also non elegant ways of enforcing bypass of dispatcher using query params. In fact, even now if you open any of the AEM pages, you might notice cq_ck query param which is for disabling caching.
Versioned clientlibs from acs commons is now the way to go. Hassle free, the config generates unique md5hash for clientlibs, thereby forcing not just bypassing dispatcher cache, but also the browser level cache.

There is an add-on for Adobe AEM that does resource fingerprinting (not limited to clientlibs, basically for all static website content), Cache-Control header management and true resource-only flushing of the AEM dispatcher cache. It also deletes updated resources from the dispatcher cache that are not covered by AEM's authoring process (e.g. when you deploy your latest code). A free trial version is available from https://www.browsercachebooster.com/

Related

Overriding httpd/apache upstream proxy httpcode with another

I have some react code (written by someone else) that needs to be served. The preferred method is via a Google Storage Bucket, fronted by their Cloud CDN, and this works. However, due to some quirks in the code, there is a requirement to override 404s with 200s, and serve content from the homepage instead (i.e. if there is a 404, don't serve a 404, serve the content of the homepage and return as a 200 instead)
(If anyone is interested, this override currently is implemented in CloudFront on AWS. Google CDN does not provide this functionality yet)
So, if the code is served at "www.mysite.com/app/" and someone hits "www.mysite.com/app/not-here" (which would return a 404), what should happen is that the response should NOT be 404, but a 200 with the content being served from index.html instead.
I was able to get this working by bundling all the code inside a docker container and then using the solution here. However, this setup means if we have a code change, all the running containers need to be restarted, and the customer expects zero downtime, hence the bucket solution.
So I now need to do the same thing but with the files being proxied in (with the upstream being the CDN).
I cannot use the original solution since the files are no longer local, and httpd can't check for existence of something that is not local.
I've tried things like ProxyErrorOverride and ErrorDocument, and managed to get it to redirect, but that is not what is needed.
Does anyone know how/if this can be done?
If the question is: how to catch the 404 error provided by Cloud Storage when a file is missing with httpd/apache? I don't know.
However, I think that isn't the best solution. Serving files directly from Cloud Storage is convenient but not industrial.
Imagine, you deploy several broken files successively, how to rollback in a stable format?
The best is to package your different code release in an atomic bag, a container for instance. Each version are in a different container and performing a rollback is easier and consistent.
Now your "container restart" issue. I don't know on which platform you are running your container. If your run it on a Compute Engine (a VM) it's maybe the worse solution. Today, there is container orchestration system that allows you to deploy, scale up and down the containers, and to perform progressive rollout, to replace, without downtime, the existing running containers by a newer version.
Cloud Run is a wonderful serverless solution for that, you also have Kubernetes (GKE on Google Cloud) that you can use with Knative for a better developer experience.

Cache Credentials During SVN Merge

A merge from a feature branch to trunk took over 45 minutes to complete.
The merge included a whole lot of jars (~250MB), however, when I did it on the server with the file:// protocol the process took less than 30 seconds.
SVN is being served up by Apache over https.
The version of SVN on the server is
svn, version 1.6.12 (r955767)
compiled Sep 3 2013, 17:49:49
My local version is
svn, version 1.7.7 (r1393599)
compiled Oct 8 2012, 20:42:17
On checking the Apache logs I made over 10k requests and apparently each of these requests went through an authentication layer.
Is there a way to configure the server so that it caches the credentials for a period and doesn't make so many authentication requests?
I guess the tricky part is making sure the credentials are only cached for the life of single svn 'request'. If svn merge makes lots of unique individual https requests, how would you determine how long to store the credential for without adding potential security holes?
First of all I'd strongly suggest you upgrade the server to a 1.7 or 1.8 versions since 1.7 and newer servers support an updated version of the protocol that requires fewer requests for many actions.
Second, if you're using path based authorization you probably want SVNPathAuthz short_circuit in your configuration. Without this for secondary paths (i.e. paths not in the request URI) as may happen for many recursive requests (especially log) when the authorization for those paths are run it runs back through the entire Apache httpd authentication infrastructure. With the setting instead of running the entire authentication/authorization infrastructure for httpd, we simply ask mod_authz_svn to authorize the action against the path. Running through the entire httpd infrastructure can be especially painful if you're using LDAP and it needs to go back to the LDAP server to check credentials. The only reason not to use the short_circuit setting is if you have some other authentication module that depends on the path, I've yet to see an actual setup like this in the wild though.
Finally, if you are using LDAP then I suggest you configure the caching of credentials since this can greatly speed up authentication. Apache httpd provides the mod_ldap module for this and suggest you read the documentation for it.
If you provide more details of the server side setup I might be able to give more tailored suggestions.
The comments suggesting that you not put jars in the repository are valuable, but with some configuration improvements you can help resolve some of your slowness anyway.
The merge included a whole lot of jars (~250MB)
That's your problem! If you go through your network via http://, you have to send those jars via http://, and that can be painfully slow. You can increase the cache size of Apache httpd, or you can setup a parallel svn:// server, but you're still sending 1/4 gigabyte of jars through the network. It's why file:// was so much faster.
You should not be storing jars in your Subversion repository. Here's why:
Version control gives you a lot of power:
It helps you merge differences between branches
It helps you follow the changes taking place.
It helps identify a particular change and why a particular change took place.
Storing binary files like jars provide you none of that. You can't merge binary files, and you can't track their changes.
Not only that, but version control systems usually use diffs to track changes. This saves a lot of space. Imagine a 1 kilobyte text file. In 5 revisions, six lines are changed. Instead of taking up 6K of space, only 1K plus those six changes are stored.
When you store a jar, and then a new version of that jar, you can't easily do a diff, and since jar format is zip, you can't really compress them either, store five versions of a jar in Subversion, and you store pretty close to five times the size of that jar. If a jar file is 10K, you're storing 50K of space for that jar.
So, not only are jar files taking up a lot of space, and they don't give you any power in storage, they can quickly take over your repository. I've seen sites where over 90% of a 8 gigabyte repository is nothing but compiled code and third party jars. And, the useful life of these binary files is really quite limited too. So, in these places, 80% of their Subversion repository is wasted space.
Even worse, you tend to lose where you got that jar, and what is in it. When users put in a jar called commons-beans.jar, I don't know what version that jar is, whether that jar was built by someone, and whether it was somehow munged by that person. I've see users merge two separate jars into a single jar for ease of use. If someone calls that jar commmons-beanutils-1.5.jar because it was version 1.5, it's very likely that someone will update it to version 1.7, but not change the name. (It would affect the build, you have to add and delete, there is always some reason).
So, there's a massive amount of wasted space with little benefit and almost no information. Storing jars is just plain bad news.
But your build needs jars! What should you do?
Get a jar repository like Nexus or Artifactory. Both of these repository managers are free and open source.
Once you store your jars in there, you can fetch the revision of the jar you want either through Maven, Gradel, or if you use Ant and want to keep your Ant build system, Ivy. You can also, if you don't feel like being that fancy, fetch the jars via an Ant <get/> task. If you use Jenkins, Jenkins can easily deploy the built jars for other projects to use in your Maven repository.
So, get rid of the jars. Merging will then be a simple diff between text files. Merging branches will be much quicker, and less information has to be sent over the network. If you don't want to switch to Maven, then use Ivy, or simply update your builds with the <wget> task to fetch the jars and the versions you need.

JSF2 resources - compression, minification

I have two questions about resources in JSF2:
is there any way to set that all JSF2 resources (JS, CSS) should be compressed (gziped) or at least minified. (Something a la wro4j).
And the second one: is there any way to force exclude some library? I am using in my admin system OpenFaces, but the JS dependency is included even in the user frontend pages, even thought I never use (or import namespace) there.
Thanks
Gzipping is more a servletcontainer configuration. Consult its documentation for details. In Tomcat for example, it's a matter of adding the compression="on" attribute to the <Connector> element in /conf/server.xml. See also Tomcat Configuration Reference - The HTTP Connector.
<Connector ... compression="on">
You can also configure compressable mime types over there.
Minification is more a build process configuration. If you're using Ant as build tool, you may find the YuiCompressorAntTask useful. We use it here and it works wonderfully.
As to OpenFaces, that's a completely different question and I also don't use it so I don't have an answer for you. I'd suggest to just ask that in a separate question. I don't see how that's related to performance improvements as gzipping and minification.
For what concerns OpenFaces I had the same problem and I solved unpacking the JAR, minifying the huge Javascripts manually and repacking the JAR. It allowed me to save about 70 Kb per request which were impacting performance on heavy load.

How to use Twisted Web static.File resource without cache?

I am using the Twisted Web static.File resource for the static part of the web server.
For development I would like to be able to add new file or to modify the current static files without having the restart the Twisted web server.
I am looking at the source code for static.File at getChild method and I can not see how the resources are cached:
http://twistedmatrix.com/trac/browser/tags/releases/twisted-11.0.0/twisted/web/static.py#L280
From my understanding getChild method is returning a new resource at each call.
Any help in creating a non-cached static.File resource is much appreciated.
Many thanks,
Adi
twisted.web.static.File serves straight from the filesystem. It does not have a cache. Your web browser probably has a cache, though.

Serving dynamic zip files through Apache

One of the responsibilities of my Rails application is to create and serve signed xmls. Any signed xml, once created, never changes. So I store every xml in the public folder and redirect the client appropriately to avoid unnecessary processing from the controller.
Now I want a new feature: every xml is associated with a date, and I'd like to implement the ability to serve a compressed file containing every xml whose date lies in a period specified by the client. Nevertheless, the period cannot be limited to less than one month for the feature to be useful, and this implies some zip files being served will be as big as 50M.
My application is deployed as a Passenger module of Apache. Thus, it's totally unacceptable to serve the file with send_data, since the client will have to wait for the entire compressed file to be generated before the actual download begins. Although I have an idea on how to implement the feature in Rails so the compressed file is produced while being served, I feel my server will get scarce on resources once some lengthy Ruby/Passenger processes are allocated to serve big zip files.
I've read about a better solution to serve static files through Apache, but not dynamic ones.
So, what's the solution to the problem? Do I need something like a custom Apache handler? How do I inform Apache, from my application, how to handle the request, compressing the files and streaming the result simultaneously?
Check out my mod_zip module for Nginx:
http://wiki.nginx.org/NgxZip
You can have a backend script tell Nginx which URL locations to include in the archive, and Nginx will dynamically stream a ZIP file to the client containing those files. The module leverages Nginx's single-threaded proxy code and is extremely lightweight.
The module was first released in 2008 and is fairly mature at this point. From your description I think it will suit your needs.
You simply need to use whatever API you have available for you to create a zip file and write it to the response, flushing the output periodically. If this is serving large zip files, or will be requested frequently, consider running it in a separate process with a high nice/ionice value / low priority.
Worst case, you could run a command-line zip in a low priority process and pass the output along periodically.
it's tricky to do, but I've made a gem called zipline ( http://github.com/fringd/zipline ) that gets things working for me. I want to update it so that it can support plain file handles or paths, right now it assumes you're using carrierwave...
also, you probably can't stream the response with passenger... I had to use unicorn to make streaming work properly... and certain rack middleware can even screw that up (calling response.to_s breaks it)
if anybody still needs this bother me on the github page