Is there any way to update files stored on Amazon CloudFront (Amazon's CDN service)?
Seems like it won't take any update of a file we make (e.g. removing the file and storing the new one with the same file name as before).
Do I have to explicitly trigger an update process to remove the files from the edge servers to get the new file contents published?
Thanks for your help
Here is how I do it using the CloudFront control panel.
Select CloudFront from the list of services.
Make sure Distributions from the top left is selected.
Next click the link for the associated distribution from the list (under id).
Select the Invalidations tab.
Click the Create Invalidation button and enter the location of the files you want to be invalidated (updated).
For example:
Then click the Invalidate button and you should now see InProgress under status.
It usually takes 10 to 15 minutes to complete your invalidation
request, depending on the size of your request.
Once it says completed you are good to go.
Tip:
Once you have created a few invalidations if you come back and need to invalidate the same files use the select box and the Copy link will become available making it even quicker.
Amazon added an Invalidation Feature. This is API Reference.
Sample Request from the API Reference:
POST /2010-08-01/distribution/[distribution ID]/invalidation HTTP/1.0
Host: cloudfront.amazonaws.com
Authorization: [AWS authentication string]
Content-Type: text/xml
<InvalidationBatch>
<Path>/image1.jpg</Path>
<Path>/image2.jpg</Path>
<Path>/videos/movie.flv</Path>
<CallerReference>my-batch</CallerReference>
</InvalidationBatch>
Set TTL=1 hour and replace
http://developer.amazonwebservices.com/connect/ann.jspa?annID=655
Download Cloudberry Explorer freeware version to do this on single files:
http://blog.cloudberrylab.com/2010/08/how-to-manage-cloudfront-object.html
Cyberduck for Mac & Windows provides a user interface for object invalidation. Refer to http://trac.cyberduck.ch/wiki/help/en/howto/cloudfront.
I seem to remember seeing this on serverfault already, but here's the answer:
By "Amazon CDN" I assume you mean "CloudFront"?
It's cached, so if you need it to be updated right now (as opposed to "new version will be visible in 24hours") you'll have to choose a new name. Instead of "logo.png", use "logo.png--0", and then update it using "logo.png--1", and change your html to point to that.
There is no way to "flush" amazon cloudfront.
Edit: This was not possible, it is now. See comments to this reply.
CloudFront's user interface offers this under the [i] button > "Distribution Settings", tab "Invalidations": https://console.aws.amazon.com/cloudfront/home#distribution-settings
In ruby, using the fog gem
AWS_ACCESS_KEY = ENV['AWS_ACCESS_KEY_ID']
AWS_SECRET_KEY = ENV['AWS_SECRET_ACCESS_KEY']
AWS_DISTRIBUTION_ID = ENV['AWS_DISTRIBUTION_ID']
conn = Fog::CDN.new(
:provider => 'AWS',
:aws_access_key_id => AWS_ACCESS_KEY,
:aws_secret_access_key => AWS_SECRET_KEY
)
images = ['/path/to/image1.jpg', '/path/to/another/image2.jpg']
conn.post_invalidation AWS_DISTRIBUTION_ID, images
even on invalidation, it still takes 5-10 minutes for the invalidation to process and refresh on all amazon edge servers
CrossFTP for Win, Mac, and Linux provides a user interface for CloudFront invalidation, check this for more details: http://crossftp.blogspot.com/2013/07/cloudfront-invalidation-with-crossftp.html
I am going to summarize possible solutions.
Case 1: One-time update: Use Console UI.
You can manually go through the console's UI as per #CoalaWeb's answer and initiate an "invalidation" on CloudFront that usually takes less than one minute to finish. It's a single click.
Additionally, you can manually update the path it points to in S3 there in the UI.
Case 2: Frequent update, on the Same path in S3: Use AWS CLI.
You can use AWS CLI to simply run the above thing via command line.
The command is:
aws cloudfront create-invalidation --distribution-id E1234567890 --paths "/*"
Replace the E1234567890 part with the DistributionId that you can see in the console. You can also limit this to certain files instead of /* for everything.
An example of how to put it in package.json for a Node/JavaScript project as a target can be found in this answer. (different question)
Notes:
I believe the first 1000 invalidations per month are free right now (April 2021).
The user that performs AWS CLI invalidation should have CreateInvalidation access in IAM. (Example in the case below.)
Case 3: Frequent update, the Path on S3 Changes every time: Use a Manual Script.
If you are storing different versions of your files in S3 (i.e. the path contains the version-id of the files/artifacts) and you need to change that in CloudFront every time, you need to write a script to perform that.
Unfortunately, AWS CLI for CloudFront doesn't allow you to easily update the path with one command. You need to have a detailed script. I wrote one, which is available with details in this answer. (different question)
Related
an input text file from the web browser needs to be processed in AWS lambda and the output (JSON) needs to rendered back to the browser. (note: AWS beanstalk is being used).
How to handle a case, if there are 10 users/client uploading text file with the same name. AWS lambda should render the output to the respective user/client. How to do this with S3 or EFS.
(NOTE: the users cannot be uniquely identified as there any login credentials for the users)
We had similar problem and solved in the following way.
Find the uniqueness and name it accordingly.
Filename-TimeStamp.Extension
If there are frequent uploads in a given time, then add a random sequence number.
Filename-TimeStamp-RandomSequence.Extension
If you want to make it completely random, you can use uuid(hexadecimal) or idgen (alphanumeric)
Hope this help.
I'd like to download a complete repository from S3. I know the bucket is reachable at https://s3.amazonaws.com/big-data-benchmark/pavlo
I'd like everything under /pavlo/sequence-snappy/5nodes
How should one download this with the least amount of manual effort with readily available tools like wget? (The s3 tools require an actual s3 account, which I do not have and want.)
Though a bit of manual effort is needed, this is how it can be done:
Goto the bucket http URL and add the ?marker=/pavlo/sequence-snappy/5nodes, resulting in https://s3.amazonaws.com/big-data-benchmark/pavlo/sequence-snappy/5nodes
Now, binary search manually on how large the dataset it is. Fortunately, the listing of your specific bucket is predictable and it seems to have 100 items ranging from 000000_0-000099_0
Use the following shell one-liner:
for i in {0000..0099}; do echo https://s3.amazonaws.com/big-data-benchmark/pavlo/sequence-snappy/5nodes/rankings/00${i}_0; done | xargs -n1 -P8 wget
Preferably we would like a more general solution which would also work for unpredictable filenames.
I think you will find that the S3 tools do not require an account for anonymous access to public buckets. (Nor do I understand why anyone wouldn't want a free account, but I digress.)
But here is a solution that works when the keys (paths/filenames) are not known or predictable:
If a bucket is truly public, as this one is, you'll find paginated XML list of all the keys at the root of the bucket.
curl -v https://s3.amazonaws.com/big-data-benchmark/, for example.
Each <Key> contains the path to an object. This is the List Objects V1 API, so you add ?marker= and the value of the last key in the listing, on the next request, to resume the listing, repeating the process until <IsTruncated> is no longer true.
Use this to build a list to pass to curl, wget, or your http client of choice, by appending the key to the bucket URL. S3 can handle many, many parallel requests, so you may want to parallelize the process.
I am using aws s3 for storing files and returning links to those files. It's very convenient but I only know how to generate a pre-signed URL which only lasts for a set amount of time.
Is there anyway to generate a default URL which lasts permanently? I need the user to be able to access the photo from within their app. They take a photo, it's uploaded. They view it. Pretty standard.
You have to use Handlers (DOI, PURL, PURLZ, persistant URIs)
All are free of charge, except for DOIs.
You create that handler, then you add it to your project.
I would like to send the last modified date of the uploaded file to the server. I have the javascroipt snippet to get that using FileApi ($(this).fineUploaderS3('getFile', id).lastModifiedDate). I would like to send this information when the uploadSuccess's endPoint is called, but I cannot find the callback which is right for me at Events | Fine Uploader documentation, and I cannot find the way I could inject the data.
These are submitted as POST parameters to my server when the upload finished to S3: key, uuid, name, bucket. I would like to inject the lastModified date here somehow.
Option 2:
Asking the Amazon S3 service about last modification date does not help directly, because the uploaded file has the current date, not the file's original date. It would be great if we could inject the information into the FineUploader->S3 communication in a way that S3 would use it for setting it's own last modified date for the uploaded file.
Other perspective I considered:
If I use onSubmit and setParams then I the Amazon S3 server will take it as 'x-amz-meta-lastModified'. The problem is that when I upload larger files (which is uploaded in chunks with an other dance) then I get signing error. ...<Code>SignatureDoesNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message>....
EDIT
The Other perspective I considered works. The bottleneck was the name of the custom metadata chih I used at setParams. It cannot contain capital letters, otherwise the signing fails. I did not find any reference documentation for it. For one I checked Object Key and Metadata - Amazon Simple Storage Service. If someone could find me a reference I would include that here.
The original question (when and how to send last modified date to the server component) remains.
(Server is PHP.)
EDIT2
The Option 2 will not work, as far my research went the "Last Modified" entry cannot be manually altered at Amazon S3.
If the S3 API does not return the expected last modified date, you can check the value of the lastModifiedDate on the File object associated with the upload (provided the browser supports the file API) and send that value as a parameter to the upload success endpoint. See the documentation for the setUploadSuccessParams API method for more details.
How does one write a script to download one's Google web history?
I know about
https://www.google.com/history/
https://www.google.com/history/lookup?hl=en&authuser=0&max=1326122791634447
feed:https://www.google.com/history/lookup?month=1&day=9&yr=2011&output=rss
but they fail when called programmatically rather than through a browser.
I wrote up a blog post on how to download your entire Google Web History using a script I put together.
It all works directly within your web browser on the client side (i.e. no data is transmitted to a third-party), and you can download it to a CSV file. You can view the source code here:
http://geeklad.com/tools/google-history/google-history.js
My blog post has a bookmarklet you can use to easily launch the script. It works by accessing the same feed, but performs the iteration of reading the entire history 1000 records at a time, converting it into a CSV string, and making the data downloadable at the touch of a button.
I ran it against my own history, and successfully downloaded over 130K records, which came out to around 30MB when exported to CSV.
EDIT: It seems that number of foks that have used my script have run into problems, likely due to some oddities in their history data. Unfortunately, since the script does everything within the browser, I cannot debug it when it encounters histories that break it. If you're a JavaScript developer, use my script, and it appears your history has caused it to break; please feel free to help me fix it and send me any updates to the code.
I tried GeekLad's system, unfortunately two breaking changes have occurred #1 URL has changed ( I modified and hosted my own copy which led to #2 type=rss arguments no longer works.
I only needed the timestamps... so began the best/worst hack I've written in a while.
Step 1 - https://stackoverflow.com/a/3177718/9908 - Using chrome disable ALL security protocols.
Step 2 - https://gist.github.com/devdave/22b578d562a0dc1a8303
Using contentscript.js and manifest.json, make a chrome extension, host ransack.js locally to whatever service you want ( PHP, Ruby, Python, etc ). Goto https://history.google.com/history/ after installing your contentscript extension in developer mode ( unpacked ). It will automatically inject ransack.js + jQuery into the dom, harvest the data, and then move on to the next "Later" link.
Every 60 seconds, Google will force you to re-login randomly so this is not a start and walk away process BUT it does work and if they up the obfustication ante, you can always resort to chaining Ajax calls and send the page back to the backend for post processing. At full tilt, my abomination script collected 1 page a second of data.
On moral grounds I will not help anyone modify this script to get search terms and results as this process is not sanctioned by Google ( though not blocked apparently ) and recommend it only to sufficiently motivated individuals to make it work for them. By my estimates it took me 3-4 hours to get all 9 years of data ( 90K records ) # 1 page every 900ms or faster.
While this thing is going, DO NOT browse the rest of the web because Chrome is running with no safeguards in place, most of them exist for a reason.
One can download her search logs directly from Google (In case downloading it using a script is not the primary purpose),
Steps:
1) Login and Go to https://history.google.com/history/
2) Just below your profile picture logo, towards the right side, you can find an icon for settings. See the second option called "Download". Click on that.
3) Then click on "Create Archive", then Google will mail you the log within minutes.
maybe before issuing a request to get the feed the script shuld add a User-Agent HTTP header of well known browser, for Google to decide that the request came from that browser.