Azure CDN caching with query string - azure-storage

i am curious about an issue that i am facing at the moment with Azure CDN and i don't have an answer for it. So, i have a CDN profile and endpoint configured to cache some content stored in a storage container. In the cache behavior, i am using default (ignore query strings). So i modified one file in the container, and i was able to retrieve the modified file from the container, but not from the CDN edge since the edge was returning the previous cached version of the file. So i proceed with the purge of the file in the CDN, and after the purge, i was able to get the modified version of the file. But, if i request the file to the cdn edge with any querystring parameter, i get the original version of the file, instead of the modified version of the file.
Example requesting the file via edge:
w/o qs: https://#storage_account#/#file_path#/hh.min.css -> It gives me the modified version
w qs: https://#storage_account#/#file_path#/hh.min.css?v=0.5 -> It gives me the original version
w qs (2): https://#storage_account#/#file_path#/hh.min.css?a=b -> It gives me the original version
Any idea why this is happening?
Thanks.

Most likely what's happening is the use of the query uses a cached asset, as mentioned per the documentation
Ignore query strings: Default mode. In this mode, the CDN point-of-presence (POP) node passes the query strings from the requestor to the origin server on the first request and caches the asset. All subsequent requests for the asset that are served from the POP ignore the query strings until the cached asset expires.
So my guess is the cached asset did not expire yet. To avoid this issue, you should consider bypassing the caching for query strings:
Bypass caching for query strings: In this mode, requests with query strings are not cached at the CDN POP node. The POP node retrieves the asset directly from the origin server and passes it to the requestor with each request.
If the above option results in latency, I'd recommend adjusting the caching rules.

Related

Akamai CDN Issue with URL Query Parameter

I am working on a client project, where the AKAMAI CDN has configured. They got Amazon S3 for hosting.
Problem:
I've committed the code in branch and could see the changes deployed on server in a codebase
Now I am trying to hit server URL in browser and trying to verify my code change
I couldn't see the UI change as per
I observer the CSS file URL is coming with query parameter (i.e.: server.com/css/filename.css??browserId=other&themeId=AbcTheme_WAR_abctheme&?t=125786954258&languageId=en_US&b=8569&t=1259648753695)
Now I am opening same URL in browser but now removing url query parameters from the file
This time I could see my changes in the same file
Questions:
Is this an issue related to CDN?
Is the CDN managing different versions of the same file to be served?
If so my changes should be merged into the latest file pointing to a webpage, which has url query parameters.
I know CDN will take time to refresh the pages but I am trying to verify my changes after 48 hours of the deployment.
Any help would be appreciated.
Thanks.

Cache xml files without application/octet-stream type

Is it possible to DiskCache non-image files without losing their type when rendered in the browser?
I followed the instructions on this page:
http://imageresizing.net/docs/v4/howto/cache-non-images
Per the instructions, I set PostAuthorizeRequestStart = True and cache = Always in the PostAuthorizeRequestStart event. I also added the .unknown mimeType in the config.
However, when an xml file is requested, it's returned as content-type "application/octet-stream" instead of "text/xml".
Is there anyway to preserve the original content-type of non-image files?
I'm afraid not - at least not without modifying ImageResizer source code.
We made the decision to prioritize security, and save all files with the ".unknown" extension to prevent them accidentally being executed as scripts by IIS. IIS sends the content-type based on the file extension, and (depending on your IIS configuration), the extension determines if the file should be executed as code.
I see no harm in expanding the "whitelist" of extensions to include non-image file types, as long as we're reasonably confident that other users haven't allowed IIS to consider those file types executable.
The code that would need to be modified (in v4+), would be HttpModuleRequestAssistant.EstimateResponseInfo. Instead of falling back to "unknown" immediately, a second whitelist could be consulted.
If you file an issue on GitHub about this, you can subscribe to notifications. We'd definitely accept a pull request addressing this feature request, particularly during the current v4 prerelease phase when changes to the pipeline are less risky.

Remote streaming with Solr

I'm having trouble using remote streaming with Apache Solr.
We previously had Solr running on the same server where the files to be indexed are located so all we had to to was pass it the path of the file we wanted to index.
We used something like this:
stream.file=/path/to/file.pdf
This worked fine. We have now moved Solr so that it runs on a different server to the website that uses it. This was because it was using up too many resources.
I'm now using the following to point Solr in the direction of the file:
stream.file=http://www.remotesite.com/path/to/file.pdf
When I do this Solr reports the following error:
http:/www.remotesite.com/path/to/file.pdf (No such file or directory)
Note that it is stripping one of the slashes from http://.
How can I get Solr to index a file at a certain URL like i'm trying to do above? The enableRemoteStreaming parameter is already set to true.
Thank you
For remote streaming
you would need to enable remote streaming
<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048" />
and probably use stream.url for urls
If remote streaming is enabled and URL content is called for during
request handling, the contents of each stream.url and stream.file
parameters are fetched and passed as a stream.

CloudFront and mod_pagespeed - wrong content received

I am using CloudFront with mod_pagespeed running on the server.
When updating a CSS or flushing the cache I see problematic behavior, first refresh on the browser returns the original css (this is fine). When I refresh a second time I get the correct manipulated CSS file name but the content of the file from CloudFront is still the original and not the correct manipulated content.
Why would this happen?
Any idea how to fix this?
Update:
For some reason it just stopped happening... I don't know why.
SimonW, since your original post there has been a feature added to pagespeed (in March 2013 in version 1.2.24.1) to deal with this issue directly. The directive is enabled via the following:
Apache:
ModPagespeedRewriteDeadlinePerFlushMs deadline_value_in_milliseconds
Nginx:
pagespeed RewriteDeadlinePerFlushMs deadline_value_in_milliseconds;
The docs describe the directive as follows (emphasis mine):
When PageSpeed attempts to rewrite an uncached (or expired) resource
it will wait for up to 10ms per flush window (by default) for it to
finish and return the optimized resource if it's available. If it has
not completed within that time the original (unoptimized) resource is
returned and the optimizer is moved to the background for future
requests. The following directive can be applied to change the
deadline. Increasing this value will increase page latency, but might
reduce load time (for instance on a bandwidth-constrained link where
it's worth waiting for image compression to complete). Note that a
value less than or equal to zero will cause PageSpeed to wait
indefinitely.
So, if you specify a value of 0 for deadline_value_in_milliseconds you should always get the fully optimized page. I would caution that the latency can be high on this in some cases. I my case, I really wanted this behavior, even with the latency concern, because the content was to be cached on my CDN's edge servers and, thus, I wanted the most optimized version possible to be served to the CDN for caching.
This could happen if you have multiple backend servers and CloudFront is hitting a different one than the HTML request went through. In that case the resource was rewritten on the HTML server, but not on the other server. There is a short timeout and if the other server doesn't finish the rewrite in that time, it will just serve the original content with Cache-Control: private,max-age=300. It's possible CloudFront caches that for a little while (even though obviously it shouldn't), but then eventually re-requests the resource from your backend and gets the correctly rewritten version this time.

AFNetworking - only load from cache if server responded http header status 304?

I have a question about using AFNetworking with SDURLCache.
In my application. I can see they work together fine, for repeated queries, responses from local cache will be used correctly.
But, let's say the server xml is updated one minute ago, in my app, cache is still used.
This doesn't match my client's expectation, the client's expectation is:
I need to check http header, if it is 200, load the file from server, if it is 304 (not modified), just use local cache from disk/
Any chance this is a built-in behaviour? Or I have to manually hack the class somewhere so that it can behave like this?