Results of S3 function call are being cached by my Lambda function - amazon-s3

I have a lambda function that uses S3.listObjects to return a directory listing. The listing is sometimes (not always!) out of date - it doesn't contain recently uploaded objects and has old modification dates for the objects that it does have.
When I run the identical code locally it always works fine.
Clearly some sort of caching but I don't understand where...
Here's the relevant code:
function listFiles() {
return new Promise(function (resolve, reject) {
const params = {
Bucket: "XXXXX",
Prefix: "YYYYY"
};
s3.listObjects(params, function (err, data) {
if (err) reject(err);
else resolve(data.Contents);
});
})
}

That is due to Amazon S3 Data Consistency Model. S3 provides read-after-write consistency for PUTs, however other requests - including listObjects are eventually consistent which means there could be a delay in propagation.

The read-after-write consistency in practice settles in a matter of seconds. It's not a guarantee, however. It's unlikely, but not impossible that amazon returns stale data minutes later, esp if across zones. It's more likely however that your client is caching a previous response for that same URL.
You might have run into a side effect of your lambda container being reused. This is explained at a high-level here. One consequence of container reuse is that background processes, temporary files, and global variable modifications are still around when your lambda is re-invoked. Another article talking about how to guard for it.
If you are sending your logs to cloudwatch logs, you can confirm that a container is being reused if the logs for a lambda seem to be appended to the end of a previous log stream, instead of creating a new log stream.
When your lambda container gets reused, the global variables outside your handler function will be reused. For instance, if you change the loglevel of your logging calls to DEBUG at the end of your handler, if your container gets reused, it will start at the top of the handler in the same loglevel.
If you're using the default s3 client session (it seems like you are), then this connection stays in a global (singleton). If your s3 client connection is reused, it might pull the cached results of calls prior, and I would expect that connection to be reused in a later invocation.
One way to avoid this is to specify the If-None-Match request header. If the ETag of the object you're accessing doesn't match on the remote end, you'll get fresh data. You may set it to the last Etag you got (which you'd store in a global), or alternatively you may try setting a completely random value -- which should act as a cache buster. It doesn't look like list_objects() accepts an If-None-Match header, however. You may try to create a new client session just for the current invocation.
This article on recursive lambdas discusses the issue.

Related

HttpContext.Session in Blazor Server Application

I am trying to use HttpContext.Session in my ASP.NET Core Blazor Server application (as described in this MS Doc, I mean: all correctly set up in startup)
Here is the code part when I try to set a value:
var session = _contextAccessor.HttpContext?.Session;
if (session != null && session.IsAvailable)
{
session.Set(key, data);
await session.CommitAsync();
}
When this code called in Razor component's OnAfterRenderAsync the session.Set throws following exception:
The session cannot be established after the response has started.
I (probably) understand the message, but this renders the Session infrastructure pretty unusable: the application needs to access its state in every phase of the execution...
Question
Should I forget completely the DistributedSession infrastructure, and go for Cookies, or Browser SessionStorage? ...or is there a workaround here still utilizing HttpContext.Session? I would not want to just drop the distributed session infra for a way lower level implementation...
(just for the record: Browser's Session Storage is NOT across tabs, which is a pain)
Blazor is fundamentally incompatible with the concept of traditional server-side sessions, especially in the client-side or WebAssembly hosting model where there is no server-side to begin with. Even in the "server-side" hosting model, though, communication with the server is over websockets. There's only one initial request. Server-side sessions require a cookie which must be sent to the client when the session is established, which means the only point you could do that is on the first load. Afterwards, there's no further requests, and thus no opportunity to establish a session.
The docs give guidance on how to maintain state in a Blazor app. For the closest thing to traditional server-side sessions, you're looking at using the browser's sessionStorage.
Note: I know this answer is a little old, but I use sessions with WebSockets just fine, and I wanted to share my findings.
Answer
I think this Session.Set() error that you're describing is a bug, since Session.Get() works just fine even after the response has started, but Session.Set() doesn't. Regardless, the workaround (or "hack" if you will) includes making a throwaway call to Session.Set() to "prime" the session for future writing. Just find a line of code in your application where you KNOW the response hasn't sent, and insert a throwaway call to Session.Set() there. Then you will be able to make subsequent calls to Session.Set() with no error, including ones after the response has started, inside your OnInitializedAsync() method. You can check if the response is started by checking the property HttpContext.Response.HasStarted.
Try adding this app.Use() snippet into your Startup.cs Configure() method. Try to ensure the line is placed somewhere before app.UseRouting():
...
...
app.UseHttpsRedirection();
app.UseStaticFiles();
//begin Set() hack
app.Use(async delegate (HttpContext Context, Func<Task> Next)
{
//this throwaway session variable will "prime" the Set() method
//to allow it to be called after the response has started
var TempKey = Guid.NewGuid().ToString(); //create a random key
Context.Session.Set(TempKey, Array.Empty<byte>()); //set the throwaway session variable
Context.Session.Remove(TempKey); //remove the throwaway session variable
await Next(); //continue on with the request
});
//end Set() hack
app.UseRouting();
app.UseEndpoints(endpoints =>
{
endpoints.MapBlazorHub();
endpoints.MapFallbackToPage("/_Host");
});
...
...
Background Info
The info I can share here is not Blazor specific, but will help you pinpoint what's happening in your setup, as I've come across the same error myself. The error occurs when BOTH of the following criteria are met simultaneously:
Criteria 1. A request is sent to the server with no session cookie, or the included session cookie is invalid/expired.
Criteria 2. The request in Criteria 1 makes a call to Session.Set() after the response has started. In other words, if the property HttpContext.Response.HasStarted is true, and Session.Set() is called, the exception will be thrown.
Important: If Criteria 1 is not met, then calling Session.Set() after the response has started will NOT cause the error.
That is why the error only seems to happen upon first load of a page--it's because often in first loads, there is no session cookie that the server can use (or the one that was provided is invalid or too old), and the server has to spin up a new session data store (I don't know why it has to spin up a new one for Set(), that's why I say I think this is a bug). If the server has to spin up a new session data store, it does so upon the first call to Session.Set(), and new session data stores cannot be spun up after the response has started. On the other hand, if the session cookie provided was a valid one, then no new data store needs to be spun up, and thus you can call Session.Set() anytime you want, including after the response has started.
What you need to do, is make a preliminary call to Session.Set() before the response gets started, so that the session data store gets spun up, and then your call to Session.Set() won't cause the error.
SessionStorege has more space than cookies.
Syncing (two ways!) the sessionStorage is impossible correctly
I think you are thinking that if it is on the browser, how can you access that in C#? Please see some examples. It actually read from the browser and transfers (use) on the server side.
sessionstorage and localstorage in blazor are encrypted. We do not need to do extra for encryption. The same applies for serialization.

Lambda script to direct to fallback S3 domain subfolder when not found

As per this question, and this one the following piece of code, allows me to point a subfolder in a S3 bucket to my domain.
However in instances where the subdomain is not found, I get the following error message:
<Error>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
<RequestId>2CE9B7837081C817</RequestId>
<HostId>
T3p7mzSYztPhXetUu7GHPiCFN6l6mllZgry+qJWYs+GFOKMjScMmRNUpBQdeqtDcPMN3qSYU/Fk=
</HostId>
</Error>
I would not like it to display this error message, instead in instances like this I would like to serve from another S3 bucket subdomain (i.e. example-bucket.s3-website.us-east-2.amazonaws.com/error) for example where the user will be greeted with a fancy error message. So therefore in a situation where a S3 bucket subfolder is not found, it should fall back to there. How do I accomplish this by changing the node function below.
'use strict';
// if the end of incoming Host header matches this string,
// strip this part and prepend the remaining characters onto the request path,
// along with a new leading slash (otherwise, the request will be handled
// with an unmodified path, at the root of the bucket)
const remove_suffix = '.example.com';
// provide the correct origin hostname here so that we send the correct
// Host header to the S3 website endpoint
const origin_hostname = 'example-bucket.s3-website.us-east-2.amazonaws.com'; // see comments, below
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
const headers = request.headers;
const host_header = headers.host[0].value;
if(host_header.endsWith(remove_suffix))
{
// prepend '/' + the subdomain onto the existing request path ("uri")
request.uri = '/' + host_header.substring(0,host_header.length - remove_suffix.length) + request.uri;
}
// fix the host header so that S3 understands the request
headers.host[0].value = origin_hostname;
// return control to CloudFront with the modified request
return callback(null,request);
};
The Lambda#Edge function is an origin request trigger -- it runs after the CloudFront cache is checked and a cache miss has occurred, immediately before the request (as it stands after being modified by the trigger code) is sent to the origin server. By the time the response arrives from the origin, this code has finished and can't be used to modify the response.
There are several solutions, including some that are conceptually valid but extremely inefficient. Still, I'll mention those as well as the cleaner/better solutions, in the interest of thoroughness.
Lambda#Edge has 4 possible trigger points:
viewer-request - when request first arrives at CloudFront, before the cache is checked; fires for every request.
origin-request - after the request is confirmed to be a cache miss, but before the request is sent to the origin server; only fires in cache misses.
origin-response - after a response (whether success or error) is returned from the origin server, but before the response is potentially stored in the cache and returned to the viewer; if this trigger modifies the response, the modified response will be stored in the CloudFront cache if cacheable, and returned to the viewer; only fires on cache misses
viewer-response - inmediately before the response is returned to the viewer, whether from the origin or cache; fires for every non-error response, unless that response was spontaneously emitted by a viewer-request trigger, or is the result of a custom error document that sets the status code to 200 (a definite anti-pattern, but still possible), or is a CloudFront-generated HTTP to HTTPS redirect.
Any of the trigger points can assume control of the signal flow, generate its own spontaneous response, and thus change what CloudFront would have ordinarily done -- e.g. if you generate a response directly from an origin-request trigger, CloudFront doesn't actually contact the origin... so what you could theoretically do is check S3 in the origin-request trigger to see if the request will succeed and generate a custom error response, instead. The AWS Javascript SDK is automatically bundled into the Lambda#Edge environmemt. Technically legitimate, this is probably a terrible idea in almost any case, since it will increase both costs and latency due to extra "look-ahead" requests to S3.
Another option is to write a separate origin-response trigger to check for errors, and if occurs, replace it with a customized response from the trigger code. But this idea also qualifies as non-viable, since that trigger will fire for all responses to cache misses, whether success or failure, increasing costs and latency, wasting time for a majority of cases.
A better idea (cost, performance, ease-of-use) is CloudFront Custom Error Pages, which allows you to define a specific HTML document that CloudFront will use for every error matching the specified code (e.g. 403 for access denied, as in the original question). CloudFront can also change that 403 to a 404 when handling those errors. This requires that you do several things when the source of the error file is a bucket:
create a second CloudFront origin pointing to the bucket
create a new cache behavior that routes exactly that one path (e.g. /shared/errors/not-found.html) to the error file over to the new origin (this means you can't use that path on any of the subdomains -- it will always go directly to the error file any time it's requested)
configure a CloudFront custom error response for code 403 to use the path /shared/errors/not-found.html.
set Error Caching Minimum TTL to 0, at least while testing, to avoid some frustration for yourself. See my write-up on this feature but disregard the part where I said "Leave Customize Error Response set to No".
But... that may or may not be needed, since S3's web hosting feature also includes optional Custom Error Document support. You'll need to create a single HTML file in your original bucket, enable the web site hosting feature on the bucket, and change the CloudFront Origin Domain Name to the bucket's web site hosting endpoint, which is in the S3 console but takes the form of${bucket}.s3-website.${region}.amazonaws.com. In some regions, the hostname might have a dash - rather than a dot . after s3-website for legacy reasons, but the dot format should work in any region.
I almost hesitate mention one other option that comes to mind, since it's fairly advanced and I fear the description might seem quite convoluted... but you also could do the following, and it would be pretty slick, since it would allow you to potentiallh generate a custom HTML page for each erroneous URL requested.
Create a CloudFront Origin Group with your main bucket as the primary and a second, empty, "placeholder" bucket as secondary. The only purpose served by the second bucket is so that we give CloudFront a valid name that it plans to connect to, even though we won't actually connect to it, as may become clear, below.
When request fails to the primary origin, matching one of the configured error status codes, the secondary origin is contacted. This is intended for handling the case when an origin fails, but we can leverage it for our purposes, because before actually contacting the failover origin, the same origin request trigger fires a second time.
If the primary origin returns an HTTP status code that you’ve configured for failover, the Lambda function is triggered again, when CloudFront re-routes the request to the second origin.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/high_availability_origin_failover.html#concept_origin_groups.lambda
(It would be more accurate to say "...when CloudFront is preparing to re-route the request to the second origin," because the trigger fires first.)
When the trigger fires a second time, the specific reason it fires isn't preserved, but there is a way to identify whether you're running in the first or second invocation: one of these two values will contain the hostname of the origin server CloudFront is preparing to contact:
event.Records[0].cf.request.origin.s3.domainName # S3 rest endpoints
event.Records[0].cf.request.origin.custom.domainName # non-S3 origins and S3 website-hosting endpoints
So we can test the appropriate value (depending on origin type) in the trigger code, looking for the name of the second "placeholder" bucket. If it's there, bypass the current logic and generate the 404 response from inside the Lambda function. This could be dynamic/customized HTML, such as with the page URI or perhaps one that varies depending on whether / or some other page is requested. As noted above, spontaneously generating a response from an origin-request trigger prevents CloudFronr from actually contacting the origin. Generated responses from an origin-request trigger are limited to 1MB but that should be beyond sufficient for this use case.

How HTTP caching works in asp mvc 4

I have a controller which returns SVG images.As I wanted to have good performances, I decide to use Caching.
From what I read on the web,once you set the last modified date with HttpContext.Response.Cache.SetLastModified(date)
you can request it from the browser using HttpContext.Request.Headers.Get("If-Modified-Since"). Compare the two dates. If they are equal it means that the image has not been modified, therefore you can return HttpStatusCodeResult(304, "Not Modified").
But something weird is happening, here is my code:
[OutputCache(Duration = 60, Location = OutputCacheLocation.Any, VaryByParam = "id")]
public ActionResult GetSVGResources(string id)
{
DateTime lastModifiedDate = Assembly.GetAssembly(typeof(Resources)).GetLinkerTime();
string rawIfModifiedSince = HttpContext.Request.Headers.Get("If-Modified-Since");
if (string.IsNullOrEmpty(rawIfModifiedSince))
{
// Set Last Modified time
HttpContext.Response.Cache.SetLastModified(lastModifiedDate);
}
else
{
DateTime ifModifiedSince = DateTime.Parse(rawIfModifiedSince);
if (DateTime.Compare(lastModifiedDate, ifModifiedSince) == 0)
{
// The requested file has not changed
return new HttpStatusCodeResult(304, "Not Modified");
}
}
if (!id.Equals("null"))
return new FileContentResult(Resources.getsvg(id), "image/svg+xml");
else
return null;
}
What is happening is the function
HttpContext.Response.Cache.SetLastModified(lastModifiedDate); does not set the "If-Modified-Since" return from the browser, In fact the the function HttpContext.Request.Headers.Get("If-Modified-Since") retuns exactly the time when the image is returned from the previous call return new FileContentResult(Resources.getsvg(id), "image/svg+xml");.
So my question is,
1 - What does the function HttpContext.Response.Cache.SetLastModified(lastModifiedDate) set exactly ?
2 - How can I (the server) set the "If-Modified-Since" return by the browser ?
It seems like you're muddling a bunch of related but nonetheless completely different concepts here.
OutputCache is a memory-based cache on the server. Caching something there means that while it still exists in memory and is not yet stale, the server can forgo processing the action and just returned the already rendered response from earlier. The client is not involved at all.
HTTP cache is an interaction between the server and the client. The server sends a Last-Modified response header, indicating to the client when the resource was last updated. The client sends a If-Modified-Since request header, to indicate to the server that its not necessary to send the resource as part of the response if it hasn't been modified. All this does is save a bit on bandwidth. The request is still made and a response is still received, but the actual data of the resource (like your SVG) doesn't have to be transmitted down the pipe.
Then, there's basic browser-based caching that works in concert with HTTP cache, but can function without it just as well. The browser simply saves a copy of every resource it downloads. If it still has that copy, it doesn't bother making a request to the server to fetch it again. In this scenario, a request may not even be made. However, the browser may also send a request with that If-Modified-Since header to see if the file it has is still "fresh". Then, if it doesn't get the file again from the server, it just uses its saved copy.
Either way, it's all on the client. A client could be configured to never cache, in which case it will always request resources, whether or not they've been modified, or it may be configured to always use a cache and never even bother to check it the resource has been updated or not.
There's also things like proxies that complicate things further still, as the proxy acts as the client and may choose to cache or not cache at will, before the web browser or other client of the end-user even gets a say in the matter.
What all that boils down to is that you can't set If-Modified-Since on the server and you can't control whether or not the client sends it in the request. When it comes to forms of caching that involve a client, you're at the whims of the client.

Pass data across Hapi JS application

I want to detect current selected language from a domain (like es.domain.com, de.domain.com) so I need to pass it to all non static route handlers and to all views.
To detect a language I need a request object. But global view context it is possible to update where request object is not accessible (in server.views({})). Also server.bind (to pass data to route handler) works only where request object is not accessible.
Hapi version: 11.1.2
You could try something like this:
server.ext('onPreResponse', function (request, reply) {
if (request.response.variety === 'view') {
request.response.source.context.lang = request.path;
}
reply.continue();
});
This will attach a lang data point to the context that is being sent into the view. You'll have to extract the lang from the url as request.path is probably not what you actually want.
Also, if you look here you'll see a few pieces of request data is made available to every view via reply.view() If the locale/language is available directly in one of those data points, or can be derived from them, you can skip the extension point approach entirely.
Again, this is assuming version 10+ of hapi. If you're using an older version, the extension point method is your best bet.

Keeping SAP's RFC data for consecutive calls of RFC using JCO

I was wondering if it was possible to keep an RFC called via JCO opened in SAP memory so I can cache stuff, this is the scenario I have in mind:
Suppose a simple function increments a number. The function starts with 0, so the first time I call it with import parameter 1 it should return 1.
The second time I call it, it should return 2 and so on.
Is this possible with JCO?
If I have the function object and make two successive calls it always return 1.
Can I do what I'm depicting?
Designing an application around the stability of a certain connection is almost never a good idea (unless you're building a stability monitoring software). Build your software so that it just works, no matter how often the connection is closed and re-opened and no matter how often the session is initialized and destroyed on the server side. You may want to persist some state using the database, or you may need to (or want to) use the shared memory mechanisms provided by the system. All of this is inconsequential for the RFC handling itself.
Note, however, that you may need to ensure that a sequence of calls happen in a single context or "business transaction". See this question and my answer for an example. These contexts are short-lived and allow for what you probably intended to get in the first place - just be aware that you should not design your application so that it has to keep these contexts alive for minutes or hours.
The answer is yes. In order to make it work, you need to implement two tasks:
The ABAP code needs to store its variable in the ABAP session memory. A variable in the function group's global section will do that. Or alternatively you could use the standard ABAP technique "EXPORT TO MEMORY/IMPORT FROM MEMORY".
JCo needs to keep the user session between calls. By default, JCo resets the backend-side user session after every call, which of course destroys all data stored in that user session memory. In order to prevent it, you need to use JCoContext.begin() and JCoContext.end() to get a stateful RFC connection that keeps the user session alive on backend side.
Sample code:
JCoDestination dest = ...
JCoFunction func = ...
try{
JCoContext.begin(dest);
func.execute(dest); // Will return "1"
func.execute(dest); // Will return "2"
}
catch (JCoException e){
// Handle network problems, ABAP exceptions, SYSTEM_FAILUREs
}
finally{
// Make sure to release the stateful connection, otherwise you have
// a resource-leak in your program and on backend side!
JCoContext.end(dest);
}