Architecture to upload files via API gateway - amazon-s3

I am working on designing a system to upload files to the server. The request to upload a file must go through the API gateway. The request will be a REST API POST request and the request body is a file type form-data (ie. the location of the file to upload). The upload of a single file should be replicated on a quorum of file servers. For eg., if I have 3 file servers, the client should get acknowledged of successful upload after the file has been written to at least 2 file servers. The actual file upload (data transfer) should happen directly between the client and the file servers and not through the API gateway (or any proxy server in the path).
My Solution - API gateway returns the list of file servers(URL) to write and the client library orchestrates the uploads and makes sure that the upload happens of a quorum of file servers. But this creates a thick client which contains all the orchestration logic and is hard to maintain for different languages.
Is there any better way to solve this? How is this done in production servers? For eg, AWS S3/Azure blob store or any other production-grade system must be sending the request to API gateway(or proxy) first, how are they handling this?

It looks like you're trying to build a serverless solution which I'm not expert on.
One way I can think of is to use S3 bucket as a proxy (I know you said no proxy server but somehow did mention S3 🤷🏼‍♂️) storage server. You can then setup Lambda service to act on S3 upload complete. That Lambda function will then be responsible to upload S3 object to whichever file servers that will be hosting the uploaded file.
At least this way, the client only needs to concern with uploading file one time. If the client needs to check to see if at least 2 file servers have the file, you can poll using HEAD request since you would have endpoint urls with the initial request.
I'm not sure if this is a workable solution for you. If not, hopefully someone with more experience on the serverless architect can give you better answer.

Related

Apollo studio is not working after running Apollo server with google cloud load balancing

I am unable to connect to my Apollo (graphql) server through Apollo Studio (https://studio.apollographql.com/sandbox/explorer) OR Apollo Client library on frontend. But the server is working fine when a request is sent through Postman, graphql-request library OR a CURL request.
Details of Deployment:
The server is deployed on GCP instance groups which include 4 instances in two different regions. I have used Nginx as reverse proxy to redirect traffic to localhost:4000 of each instance (the app is running on port 4000 of each machine).
The instance groups are attached to the GCP HTTPS load balancer. The backends are in the healthy state in the load balancer.
Apollo studio - not working
Postman - working
If it's working in postman but not in studio, it's generally either an issue with CORS, some other header issue, or something similar to that.
Studio is running in a browser, so things will be a big more finicky. It will send headers that browsers always send, like the origin it's running on, and those that Apollo decides are best, like certain accept / content-type headers that your load balancer might not be allowing through.
Things like Postman and cURL generally come with less "baggage". They only send the headers and content you ask them to.
The best thing to check next is what your browser thinks is going wrong, since servers won't "lie" about the problem unless you specifically tell it to (e.g. for security reasons, some information is sometimes best left out). Open up your browser debugger on the Studio website when you try to make a request and check your Network panel. The HTTP call will fail in a certain way if it's one of these issues, and it should be pretty straight-forward with you that it was rejected because of X.

Can Flink send files with sink specific S3 Server Side Encryption Headers?

Trying to send records to Amazon S3 with Flink: however these records need to be sent with an AES256 SSE header to request server side encryption
see aws documentation:
If you need server-side encryption for all of the objects that are stored in a bucket, use a bucket policy. For example, the following bucket policy denies permissions to upload an object unless the request includes the x-amz-server-side-encryption header to request server-side encryption:
Is this something that can be set for specific file sinks? have not found any documentation on the matter and beginning to think a forwarding lambda will be needed to transform the data.

How exactly do I allow on-prem client app to access S3 objects when VPC is using direct connect?

When a client app is on prem and an AWS is setup with Direct Connect with the corporate on-prem network, how exactly can the client app gain access to the s3 objects?
For example, suppose a client app simply wants to obtain jpg images which live in an S3 bucket.
What type of configuration do I need to make to the S3 bucket permissions?
What configuration do I need to do at the VPC level?
I'd imagine that since Direct Connect is setup, this would greatly simplify an on prem app gaining access to an S3 bucket. Correct?
Would VPC endpoints come in to play here?
Also, 1 constaint here : the client app is not within my control: the client app simply needs a URL it can reach for the image. It cannot easily be changed to support sending credentials in the request, unfortunately. This may be a very important constraint worth mentioning.
Any insight is appreciated. Thank you so much.
you might want to consider these
https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/
https://aws.amazon.com/premiumsupport/knowledge-center/s3-private-connection-no-authentication/
And for troubleshooting, try this
https://aws.amazon.com/premiumsupport/knowledge-center/connect-s3-vpc-endpoint/
If you need to access S3 over DirectConnect,
S3-DirectConnect
//BR
P.S. let me know if that's work for you.. :)
I had very similar issue to solve, also searching like you on how to force client to use direct connect to download content from S3.
In my case, the client is one on-prem load-balancer facing internet that needed to serve content hosted on S3 (CloudFront was not possible).
2 articles already mentioned are important to take into account but not sufficient:
Direct connect for virtual private interface
https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-access-direct-connect/
=> Needed to setup all the VPC endpoint and routing between onprem and AWS.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html#accessing-bucket-and-aps-from-interface-endpoints
=> Explain partially how to access bucket using VPC-Endpoints
The missing information from the latest AWS page is what URL structure you need to use to connect to your S3 endpoint, here is the structure I discovered working:
https://bucket.[vpc-endpoint-id].s3.[region].vpce.amazonaws.com/[bucket-name]/[key]
With that scheme, you can address any object on one S3 bucket using S3 VPC endpoint using normal web request.
We use that concept to serve securely files hosted on S3 bucket via our on-prem load-balancer and specific domain name using our Direct-Connect capacity.
The LB just rewrite the URL and get the files directly from the S3 bucket. The real client doesn't know either the file is served from S3 in backend in reality.

How to Configure SSL for Amazon S3 bucket

I am using an Amazon S3 bucket for uploading and downloading of data using my .NET application. Now my question is: I want to access my S3 bucket using SSL. Is it possible to implement SSL for an Amazon s3 bucket?
You can access your files via SSL like this:
https://s3.amazonaws.com/bucket_name/images/logo.gif
If you use a custom domain for your bucket, you can use S3 and CloudFront together with your own SSL certificate (or generate a free one via Amazon Certificate Manager): http://aws.amazon.com/cloudfront/custom-ssl-domains/
Custom domain SSL certs were just added today for $600/cert/month. Sign up for your invite below:
http://aws.amazon.com/cloudfront/custom-ssl-domains/
Update: SNI customer provided certs are now available for no additional charge. Much cheaper than $600/mo, and with XP nearly killed off, it should work well for most use cases.
#skalee AWS has a mechanism for achieving what the poster asks for, "implement SSL for an Amazon s3 bucket", it's called CloudFront. I'm reading "implement" as "use my SSL certs," not "just put an S on the HTTP URL which I'm sure the OP could have surmised.
Since CloudFront costs exactly the same as S3 ($0.12/GB), but has a ton of additional features around SSL AND allows you to add your own SNI cert at no additional cost, it's the obvious fix for "implementing SSL" on your domain.
I found you can do this easily via the Cloud Flare service.
Set up a bucket, enable webhosting on the bucket and point the desired CNAME to that endpoint via Cloudflare... and pay for the service of course... but $5-$20 VS $600 is much easier to stomach.
Full detail here:
https://www.engaging.io/easy-way-to-configure-ssl-for-amazon-s3-bucket-via-cloudflare/
It is not possible directly with S3, but you can create a Cloud Front distribution from you bucket. Then go to certificate manager and request a certificate. Amazon gives them for free. Ones you have successfully confirmed the certification, assign it to your Cloud Front distribution. Also remember to set the rule to re-direct http to https.
I'm hosting couple of static websites on Amazon S3, like my personal website to which I have assigned the SSL certificate as they have the Cloud Front distribution.
If you really need it, consider redirections.
For example, on request to assets.my-domain.example.com/path/to/file you could perform a 301 or 302 redirection to my-bucket-name.s3.amazonaws.com/path/to/file or s3.amazonaws.com/my-bucket-name/path/to/file (please remember that in the first case my-bucket-name cannot contain any dots, otherwise it won't match *.s3.amazonaws.com, s3.amazonaws.com stated in S3 certificate).
Not tested, but I believe it would work. I see few gotchas, however.
The first one is pretty obvious, an additional request to get this redirection. And I doubt you could use redirection server provided by your domain name registrar — you'd have to upload proper certificate there somehow — so you have to use your own server for this.
The second one is that you can have urls with your domain name in page source code, but when for example user opens the pic in separate tab, then address bar will display the target url.
As mentioned before, you cannot create free certificates for S3 buckets. However, you can create Cloud Front distribution and then assign the certificate for the Cloud Front instead. You request the certificate for your domain and then just assign it to the Cloud Front distribution in the Cloud Front settings. I've used this method to serve static websites via SSL as well as serve static files.
For static website creation Amazon is the go to place. It is really affordable to get a static website with SSL.

Passing large files to WCF service

We have an encryption service that we've exposed over net. tcp. Most of the time, the service is used to encrypt/decrypt strings. However, every now and then, we the need to encrypt large documents (pdf, JPG, bmp, etc.).
What are the best endpoint settings for a scenario like this? Should I accept/return a stream? I've read a lot about this, but no one gives guidance on what to do when the large file doesn't occur frequently.
MSDN describes how to enable streaming over WCF rather well.
Note, if the link between client and server needs to be encrypted, then you'll need to "roll your own" encryption mechanism. The default net.tcp encryption requires X.509 certificates, which won't work with streams as this kind of encryption needs to work on an entire message in one go rather than a stream of bytes.
This, in turn, means that you won't be able to authenticate the client using the default WCF security mechanisms as authentication requires encryption. The only work-around for this that I know of is to implement your own custom behaviour extensions on client and server to handle authentication.
A really good reference on how to add custom behaviour extensions is here: this documents how to provide custom configuration, too (something that I don't think is discussed anywhere in the MSDN documents at this time).
One pattern you could follow is to have an asynchronous service that works on files on a shared file system location:
Place the file to be encrypted on a shared location
Call the service and tell it to encrypt the file, passing both the location and name of the file, and the addres of a callback service on the client
The service would encrypt the file and place the encrypted copy in a shared location (the same as where the unencrypted was placed or different, doesn't matter)
The service would call back to the client, giving the name and location of the encrypted file
The client can retrieve the encrypted file