Can Flink send files with sink specific S3 Server Side Encryption Headers? - amazon-s3

Trying to send records to Amazon S3 with Flink: however these records need to be sent with an AES256 SSE header to request server side encryption
see aws documentation:
If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy. For example, the following bucket policy denies permissions to upload an object unless the request includes the x-amz-server-side-encryption header to request server-side encryption:
Is this something that can be set for specific file sinks? have not found any documentation on the matter and beginning to think a forwarding lambda will be needed to transform the data.

Related

Architecture to upload files via API gateway

I am working on designing a system to upload files to the server. The request to upload a file must go through the API gateway. The request will be a REST API POST request and the request body is a file type form-data (ie. the location of the file to upload). The upload of a single file should be replicated on a quorum of file servers. For eg., if I have 3 file servers, the client should get acknowledged of successful upload after the file has been written to at least 2 file servers. The actual file upload (data transfer) should happen directly between the client and the file servers and not through the API gateway (or any proxy server in the path).
My Solution - API gateway returns the list of file servers(URL) to write and the client library orchestrates the uploads and makes sure that the upload happens of a quorum of file servers. But this creates a thick client which contains all the orchestration logic and is hard to maintain for different languages.
Is there any better way to solve this? How is this done in production servers? For eg, AWS S3/Azure blob store or any other production-grade system must be sending the request to API gateway(or proxy) first, how are they handling this?
It looks like you're trying to build a serverless solution which I'm not expert on.
One way I can think of is to use S3 bucket as a proxy (I know you said no proxy server but somehow did mention S3 🤷🏼‍♂️) storage server. You can then setup Lambda service to act on S3 upload complete. That Lambda function will then be responsible to upload S3 object to whichever file servers that will be hosting the uploaded file.
At least this way, the client only needs to concern with uploading file one time. If the client needs to check to see if at least 2 file servers have the file, you can poll using HEAD request since you would have endpoint urls with the initial request.
I'm not sure if this is a workable solution for you. If not, hopefully someone with more experience on the serverless architect can give you better answer.

aws s3 and cloudfront process cookie

I have a problem about s3 and cloudfront process cookie.
My client web is set on s3 with cloudfront, and server is set on beanstalk(node.js+express).
I need to send a jwt token from server to client through 'cookie' (code: res.cookie(token)).
It all woks on local. But after I deploy on aws, I cannot get any cookie from server. I find that aws doc says
"Amazon S3 and some HTTP servers don’t process cookies. Don’t
configure CloudFront to forward cookies to an origin that doesn’t
process cookies, or you’ll adversely affect cacheability and,
therefore, performance."
Is there any possible solution with revising cloudfront cookies setting? or I need to change my server code that not send token with cookies?

google cloud storage transfer tls minimum version

We have a number of google cloud storage transfer job that sync from aws s3 buckets to google buckets. I am assuming that they are using https to transfer the data but where can I get a confirmation that they do. Where can I get information about minimum TLS version used in these transfer jobs.
Regarding the Cloud Storage TLS, in this document you could find the TLS information for the gsutil commands which requests are done via the JSON API. These requests are via HTTPS only, and are used within the cloud console too.

Is the S3 Protocol encrypted in transmission when using the SDK?

if I'm using the AmazonS3Client to put and fetch files, is my connection encrypted? This seems basic, but my googling seems to return things about encrypting the S3 storage and not whether the transmission from this client is secure. If it's not secure is there a setting to make it secure?
Amazon S3 endpoints support both HTTP and HTTPS. It is recommended that you communicate via HTTPS to ensure your data is encrypted in transit.
You can also create a Bucket Policy that enforces communication via HTTPS. See:
Stackoverflow: Force SSL on Amazon S3
Sample policy: s3BucketPolicyEncryptionSSL.json

Understanding server architecture: Delivering content from AWS S3 using Nginx reverse-proxy or Apache server

The purpose of this question is to understand the strategy while designing server side architecture.
Use case:
I want to build a http server for an app which allow users to upload and download multimedia content (images, videos etc.) Large number of concurrent users (say, around 50k) are expected to upload/download the content.
All the content will be stored on AWS S3 bucket. Information regarding S3 bucket i.e. bucket name/authentication headers should be masked from the user. Since there are multiple Access Control Options (AWS-ACL) for S3 bucket, it would be preferable to refrain from making the bucket available for All_Users (authenticated and anonymous users). I do not want to expose the content in public domain.
Queries
Since I want to mask AWS S3 from the users, I will need to
use a web-server or reverse proxy. I have gone through multiple
resources that compare Apache Vs Nginx. Since the server needs to
deliver static content from S3 to high number of concurrent users,
Nginx seems to be a better option. Isn't it??
Does setting Access Control Level to S3 bucket to ALL_USERS ( to
authenticated and anonymous users) compromise on data privacy? If I
use reverse proxy, there is no way for the user to determine S3 bucket
urls. Is the data safe and private?
However, if S3 bucket is made available for Authenticated users only,
will nginx reverse proxy work? I have gone through Nginx Reverse
Proxy for S3. In order for Nginx to work as a reverse proxy, a
Pre-signed URL needs to be prepared. The expiry time of pre-signed
url is again a tricky decision. Does setting a huge expiry time for
pre-signed url makes sense? Does it compromise on the security or
privacy of data (similar to s3 access control to ALL_USERS)? If yes,
is there a way to reverse proxy the request to dynamically generated
pre-signed url (with short expiry time) via nginx only?
Any information and resources to consolidate my understanding will be really helpful.
Does setting Access Control Level to S3 bucket to ALL_USERS ( to authenticated and anonymous users) compromise on data privacy?
Absolutely. Don't do it.
If I use reverse proxy, there is no way for the user to determine S3 bucket urls. Is the data safe and private?
Theoretically, they can't determine it, but what if an error message or misconfiguration leaks the information? This is security through obscurity, which gives you nothing more than a false sense of security. There's always a better way.
Information regarding S3 bucket i.e. bucket name/authentication headers should be masked from the user.
The authentication mechanism of S3, with signed URLs, is designed so that there is no harm in exposing it to the user. The only thing secret is your AWS Secret Key, which you'll note is not exposed in a signed URL. It also can't reasonably be reverse-engineered, and a signed URL is good for only the resource and action that the signature permits.
Signing URLs and presenting them to the user does not pose a security risk, although, admittedly, there are other reasons why you might not want to do that. I do that routinely -- signing a URL while a page is being rendered, with a relatively long expiration time, or signing a URL and redirecting a user to the signed URL when they click on a link back to my application server (which validates their authorization to access the resource, and then returns a signed URL with a very short expiration time, such as 5 to 10 seconds; the expiration can occur while a download is in progress without causing a problem -- the signature only needs to avoid expiring before the request to S3 is accepted).
However, if you want to go the proxy route (which, in addition to the above, is something I do in my systems as well), there's a much easier way than what you're envisioning: the bucket policy can be configured to permit specific permissions to be granted based on source IP addresses... of your servers.
Here's a (sanitized) policy taken directly from one of my buckets. The IP addresses are from RFC-5737 to avoid the confusion that private IP addresses in this example would cause.
These IP addresses are public IP addresses... they would be your elastic IP addresses attached to your web servers, or, preferably, to the NAT instances that the web servers use for their outgoing requests.
{
"Version": "2008-10-17",
"Id": "Policy123456789101112",
"Statement": [
{
"Sid": "Stmt123456789101112",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::example-bucket/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": [
"203.0.113.173/32",
"203.0.113.102/32",
"203.0.113.52/32",
"203.0.113.19/32"
]
}
}
}
]
}
What does this do? If a request arrives at S3 from one of the listed IP addresses, the GetObject permission is granted to the requester. With a proxy, your proxy's IP address will be the IP address seen by S3, and the request will be granted if it matches the bucket policy, allowing your proxies to fetch objects from S3 while not allowing the rest of the Internet to, unless alternate credentials are presented, such as with a signed URL. This policy doesn't "deny" anything directly, because the deny is implicit. Importantly, don't upload your objects with the public-read ACL, because that would allow the objected to be downloaded by anyone. The default private ACL works perfectly for this application.
S3 can grant permissions like this based on other criteria, such as the Referer: header, and you may find examples of that online, but don't do that. Trusting what the browser reports as the referring page is an extremely weak and primitive security mechanism that provides virtually no real protection -- headers are incredibly simple to spoof. That sort of filtering is really only good for annoying lazy people who are hot-linking to your content. The source IP address is a different matter altogether, as it's not carried in a layer 7 header, and cannot be readily spoofed.
Because S3 only interacts with the Internet via the TCP protocol, your source addresses -- even it it were known how you had enabled the bucket to trust these addresses -- cannot be spoofed in any practical way, because to do so would mean to breach the security of AWS's core IP network infrastructure -- TCP requires the originating machine to be reachable across subnet boundaries by the source IP address it uses, and the AWS network would only ever route those responses back to your legitimately-allocated IP address, which would have no option other than to reset or discard the connections, since they were not initiated with you.
Note that this solution does not work in conjunction with S3 VPC endpoints which Amazon recently announced, because with S3 VPC endpoints, your source IP address (seen by S3) will be the private address, which isn't unique to your VPC... but that should not be a problem. I mention this caveat only in the interest of thoroughness. S3 VPC endpoints are not required and not enabled by default, and if enabled, can be provisioned on a per-subnet basis.