How can I configure CloudFront so it costs me a bit less? - amazon-s3

I have a very static site, basically HTML and some Javascript on S3. I serve this through Cloudfront. My usage has gone up a bit plus one of my Javascript files is pretty large.
So what can I do to cut down the costs of serving those files? they need have very good uptime as it has thousands of active users all over the world.
This is the usage for yesterday:
Looking at other questions about this it seems like changing headers can help but I thought I already had caching enabled. This is what curl returns if I get one of those files:
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< content-type: text/html
< content-length: 2246
< date: Fri, 03 Apr 2020 20:28:47 GMT
< last-modified: Fri, 03 Apr 2020 15:21:11 GMT
< x-amz-version-id: some string
< etag: "83df2032241b5be7b4c337f0857095fc"
< server: AmazonS3
< x-cache: Miss from cloudfront
< via: 1.1 somestring.cloudfront.net (CloudFront)
< x-amz-cf-pop: some string
< x-amz-cf-id: some string
This is what the cache is configured as on CloudFront:
This is what S3 says when I use curl to query the file:
< HTTP/1.1 200 OK
< x-amz-id-2: some string
< x-amz-request-id: some string
< Date: Fri, 03 Apr 2020 20:27:22 GMT
< x-amz-replication-status: COMPLETED
< Last-Modified: Fri, 03 Apr 2020 15:21:11 GMT
< ETag: "83df2032241b5be7b4c337f0857095fc"
< x-amz-version-id: some string
< Accept-Ranges: bytes
< Content-Type: text/html
< Content-Length: 2246
< Server: AmazonS3
So what can I do? I don't often update the files and when I do I don't mind if it takes a day or two for the change to propagate.
Thanks.

If your goal is to reduce CloudFront costs, then it's worth reviewing how it is charged:
Regional Data Transfer Out to Internet (per GB): From $0.085 to $0.170 (depending upon location of your users)
Regional Data Transfer Out to Origin (per GB): From $0.020 to $0.160 (data going back to your application)
Request Pricing for All HTTP Methods (per 10,000): From $0.0075 to $0.0090
Compare that to Amazon S3:
GET Requests: $0.0004 per 1000
Data Transfer: $0.09 per GB (Also applies for traffic coming from Amazon EC2 instances)
Therefore, some options for you to save money are:
Choose a lower Price Class that restricts which regions send traffic "out". For example, Price Class 100 only sends traffic from USA and Europe, which has lower Data Transfer costs. This will reduce Data Transfer costs for other locations, but will give them a lower quality of service (higher latency).
Stop using CloudFront and serve content directly from S3 and EC2. This will save a bit on requests (about half the price), but Data Transfer would be a similar cost to Price Class 100.
Increase the caching duration for your objects. However, the report is showing 99.9%+ hit rates, so this won't help much.
Configure the objects to persist longer in user's browsers so less requests are made. However, this only works for "repeat traffic" and might not help much. It depends on app usage. (I'm not familiar with this part. It might not work in conjunction with CloudFront. Hopefully other readers can comment.)
Typically, mosts costs are related to the volume of traffic. If you app is popular, those Data Transfer costs will go up.
Take a look at your bills and try to determine which component is leading to most of the costs. Then, it's a trade-off between service to your customers and costs to you. Changing the Price Class might be the best option for now.

Related

Pig script new record

I am working on following mail data in a file.. (data source:infochimps)
Message-ID: <33025919.1075857594206.JavaMail.evans#thyme>
Date: Wed, 13 Dec 2000 13:09:00 -0800 (PST)
From: john.arnold#enron.com
To: slafontaine#globalp.com
Subject: re:spreads
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: John Arnold
X-To: slafontaine#globalp.com # ENRON
X-cc:
X-bcc:
X-Folder: \John_Arnold_Dec2000\Notes Folders\'sent mail
X-Origin: Arnold-J
X-FileName: Jarnold.nsf
saw a lot of the bulls sell summer against length in front to mitigate
margins/absolute position limits/var. as these guys are taking off the
front, they are also buying back summer. el paso large buyer of next winter
today taking off spreads. certainly a reason why the spreads were so strong
on the way up and such a piece now. really the only one left with any risk
premium built in is h/j now. it was trading equivalent of 180 on access,
down 40+ from this morning. certainly if we are entering a period of bearish
................]
I am loading above data as:-
A = load '/root/test/enron_mail/maildir/*/*/*' using PigStorage(':') as (f1:chararray,f2:chararray);
but for the message body I am getting separate tuples as message body includes new lines..
how to consolidate last lines into one ?
I want below part in single tuple as:
saw a lot of the bulls sell summer against length in front to mitigate
margins/absolute position limits/var. as these guys are taking off the
front, they are also buying back summer. el paso large buyer of next winter
today taking off spreads. certainly a reason why the spreads were so strong
on the way up and such a piece now. really the only one left with any risk
premium built in is h/j now. it was trading equivalent of 180 on access,
down 40+ from this morning. certainly if we are entering a period of bearish

Is max age relative to last-modified date or request time?

When a server gives Cache-Control: max-age=4320000,
Is the freshness considered 4320000 seconds after the time of request, or after the last modified date?
RFC 2616 section 14.9.3:
When the max-age
cache-control directive is present in a cached response, the response
is stale if its current age is greater than the age value given (in
seconds) at the time of a new request for that resource. The max-age
directive on a response implies that the response is cacheable (i.e.,
"public") unless some other, more restrictive cache directive is also
present.
It is always based on the time of request, not the last modified date. You can confirm this behavior by testing on the major browsers.
tl;dr: the age of a cached object is either the time it was stored by any cache or now() - "Date" response header, whichever is bigger.
Full response:
The accepted response is incorrect. The mentioned rfc 2616 states on section 13.2.4 that:
In order to decide whether a response is fresh or stale, we need to compare its freshness lifetime to its age. The age is calculated as described in section 13.2.3.
And on section 13.2.3 it is state that:
corrected_received_age = max(now - date_value, age_value)
date_value is the response header Date:
HTTP/1.1 requires origin servers to send a Date header, if possible, with every response, giving the time at which the response was generated [...] We use the term "date_value" to denote the value of the Date header.
age_value is for how long the item is stored on any cache:
In essence, the Age value is the sum of the time that the response has been resident in each of the caches along the path from the origin server, plus the amount of time it has been in transit along network paths.
This is why good cache providers will include a header called Age every time they cache an item, to tell any upstream caches for how long they cached the item. If an upstream cache decides to store that item, its age must start with that value.
A practical example: a item is stored on the cache. It was stored 5 days ago, and when this item was fetched, the response headers included:
Date: Sat, 1 Jan 2022 11:05:05 GMT
Cache-Control: max-age={30 days in seconds}
Age: {10 days in seconds}
Assuming now() is Feb 3 2022, the age of the item must be calculated like (rounding up a bit for clarity):
age_value=10 days + 5 days (age when received + age on this cache)
now - date_value = Feb 3 2022 - 1 Jan 2022 = 34 days
The corrected age is the biggest value, that is 34 days. That means that the item is expired and can't be used, since max-age is 30 days.
The RFC presents a tiny additional correction that compensates for the request latency (see section 3, "corrected_initial_age").
Unfortunately not all cache servers will include the "Age" response header, so it is very important to make sure all responses that use max-age also include the "date" header, allowing the age to always be calculated.

Magento possible SQL injection?

I've been having a few random search queries coming up in the popular search terms on a Magento site, the site has also been up and down like a Yo-Yo recently.
Could anyone shed some light on some, or all, of these search terms;
1)))) and benchmark(100000000,HEX(999999)) --
1 and benchmark(100000000,HEX(999999)) #
x Content-Length: 0 HTTP/1.1 200 OK Content-Type: text/html Content-Length: 18 <html>saint</html>
saint<!--#echo var="HTTP_USER_AGENT"-->
1 waitfor delay '0:0:6' /*
x;id|
<script>alert('SAINTL2NhdGFsb2dzZWFyY2gvcmVzdWx0L2luZGV4LyBx')</script>
christmas<script>alert("XSS");</script>
These terms are used (among thousands others) to test your site against various vulnerabilities. The presence of these markers doesn't mean your site is vulnerable. It means that your site being probed for that.

RavenDB Document Deleted Before Expiration

I am attempting to write a document to RavenDB with an expiration 20 minutes in the future. I am not using the .NET client, just curl. My request looks like this:
PUT /databases/FRUPublic/docs/test/123 HTTP/1.1
Host: ravendev
Connection: close
Accept-encoding: gzip, deflate
Content-Type: application/json
Raven-Entity-Name: tests
Raven-Expiration-Date: 2012-07-31T22:23:00
Content-Length: 14
{"data":"foo"}
In the studio I see my document saved with Raven-Expiration-Date set exactly 20 minutes from Last-Modified, however, within 5 minutes the document is deleted.
I see this same behavior (deleted in 5 minutes) if I increase the expiration date. If I set an expiration date in the past the document deletes immediately.
I am using build 960. Any ideas about what I'm doing wrong?
I specified the time to 10 millionth of a second and now documents are being deleted just as I would expect.
For example:
Raven-Expiration-Date: 2012-07-31T22:23:00.0000000
The date have to be in UTC, and it looks like you are sending local time.

Invalid Flickr API response

I've come across a very puzzling issue with the Flickr API.
Basically, there's certain queries I (and some developer friends) can run which result in broken resultsets.
Basically, what you request, isn't always returned...
Here's a few examples:
Request:
http://api.flickr.com/services/rest/?method=flickr.photos.search&safe_search=1&media=photos&extras=o_dims&per_page=30&page=1&format=json&nojsoncallback=1&api_key=XXXXXXX
Response:
HTTP/1.1 200 OK
Content-Length: 793
Date: Thu, 05 Jan 2012 23:30:56 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE GOV"
Access-Control-Allow-Origin: *
Cache-Control: private
X-Served-By: www71.flickr.mud.yahoo.com
Vary: Accept-Encoding
Connection: close
Content-Type: text/plain; charset=utf-8
{"photos":{"page":1, "pages":19886, "perpage":30, "total":"596560", "photo":[{"id":"6643915631", "owner":"74181952#N00", "secret":"8bc611c556", "server":"7023", "farm":8, "title":"IMG_5642", "ispublic":1, "isfriend":0, "isfamily":0}, {"id":"6643911681", "owner":"7240073#N04", "secret":"34837024f0", "server":"7004", "farm":8, "title":"26 weeks!!", "ispublic":1, "isfriend":0, "isfamily":0, "o_width":"768", "o_height":"1024"}, {"id":"6643919177", "owner":"54899865#N02", "secret":"170d3a336f", "server":"7153", "farm":8, "title":"IMGA0072", "ispublic":1, "isfriend":0, "isfamily":0}, {"id":"6643916265", "owner":"51191328#N06", "secret":"05905197ce", "server":"7034", "farm":8, "title":"IMG_1781", "ispublic":1, "isfriend":0, "isfamily":0, "o_width":"2736", "o_height":"3648"}]}, "stat":"ok"}
Notice there's only 4 images returned, when we asked for 30? (and there's 596560 pics matching)
If I change the perpage count to something different it may work, like right now, if I change it to 3, it'll return 3, but yesterday when I was testing, it only returned 2! and when I changed it to 10 it returned none!?
We've come across another example, this time with image size data:
Request
http://api.flickr.com/services/rest/?method=flickr.interestingness.getList&extras=o_dims&per_page=3&page=1&format=rest&api_key=XXXXXXXXXX
Response
<?xml version="1.0" encoding="utf-8" ?>
<rsp stat="ok">
<photos page="1" pages="167" perpage="3" total="500">
<photo id="6743082503" owner="29789996#N00" secret="7d6a1ab340" server="7165" farm="8" title="Glittering Marina [2]" ispublic="1" isfriend="0" isfamily="0" />
<photo id="6741988715" owner="44789014#N04" secret="ab1528fa9f" server="7009" farm="8" title="Heavy metal warrior" ispublic="1" isfriend="0" isfamily="0" o_width="1200" o_height="1202" />
<photo id="6741320397" owner="54880604#N06" secret="7b3bd8530f" server="7030" farm="8" title="Greetings from below, Village near Can Tho" ispublic="1" isfriend="0" isfamily="0" />
</photos>
</rsp>
Note only one of the images has image size data.
It's a very difficult issue to reproduce as it only happens every now and then, but once you've found a page/pagecount combo that causes an issue, you'll consistently get the incorrect response (I assume it's due to some form of caching).
Has anyone else come across this?
As you can see in my resultset above, there's no error, no warning, just an incorrect response.
Thanks in advance.
Aaron
Huh. I've filed myself a bug; let me look into it. Possibly a pagination bug on our end, or a caching thing as suggested.