How does Twitter do multipart file upload from the client app to its storage? - file-upload

I am not speaking from the perspective of a user uploading a file to Twitter. Rather, I want to build a twitter-like app and upload a file from my client app -> to my server -> to my s3 storage.
My current approach is to break the file up into chunks, and then upload it to my web server. Then from my web server, I then upload it to s3 storage.
The part I am unsure about is how I should be storing it on the webserver before it goes to s3 storage. Currently, I save it in memory, and once complete I send it to s3. This consumes too much memory. The other approach is to store the file locally on the webserver, and then do the upload to s3. Is this how a high traffic site usually do it?
This is an architecture based question. --> How is this done for a high traffic website like Twitter?

Related

Google cloud storage compatibility with aws s3 multipart upload

Okay, I have a working apps that use amazon s3 multipart, they use CreateMultipart, UploadPart and CompleteMultiPart.
Now we are migrating to google cloud storage and we have a problem with multipart. As far as I understood google doesn't support s3 multipart, got info from here Google Cloud Storage support of S3 multipart upload.
So I see that google has closest method Compose https://cloud.google.com/storage/docs/composite-objects, where I just upload different objects and then send request to combine them, or I can use uploadType=multipart https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload#resumable, but this seems to be completely different from s3 multipart. And there is resumable upload https://cloud.google.com/storage/docs/resumable-uploads, that seems to allow upload files in chunks, but without complete multipart.
What is the best option to use? Some services already use CreateMultiPart, UploadPart, CompletePart and I need to write "adapter" for this services in order to make them compatible with google cloud storage.
Update: below answer is no longer correct. GCS does support multipart uploads: https://cloud.google.com/storage/docs/xml-api/post-object-multipart
You are correct. Google Cloud Storage does not currently support multipart upload.
The main benefits to multipart upload are allowing multiple streams to upload in parallel from one or more machines and allowing a partial upload failure not to ruin the whole upload. The best way to get those same benefits with GCS would be to upload the parts as separate objects and then using Compose to combine them into a final object. Indeed, this is exactly what the gsutil command-line utility does when uploading in parallel.
Resumable uploads are a great tool if you want to upload a single object in a single stream, in order, and you want the ability to resume if the connection is lost.
"uploadtype=multipart" uploads are a bit different. They are a way to specify an object's complete metadata and also its data in a single upload operation, using an HTTP multipart request.

Cloud Storage customer access best practices

Let's say I have a use case where users can buy mp3 files inside an app. The objects are stored in GCP Cloud Storage . What is the best practice to deliver those objects only to the users that purchased the files?
After researching the topic I came up with three solutions:
Client calls a REST (e.g. one running inside App Engine) service. This service downloads the files from Cloud Storage and then sends them back to the client.
Instead of sending the files via the REST call, I could send the download URL (from Cloud Storage) to the client. This would be more cost efficient, however this sounds like a security concern to me as anyone who simply monitors his network could capture the URL.
Creating a (time-limited) signed url to allow the user the download
Obviously a permission check would have to happen first, e.g. a database that contains if user X purchased mp3 Y.
This problem could also be applied to Azure Blob Storage or AWS S3...
In your use case, you have a constant:
You need a backend to authenticate the user (for example Authentication performed with Cloud Identity Platform and hosted on App Engine or Cloud Run
You need to check the list of MP3 that it has bought (stored in Firestore for example)
And then, you need to allow him to download the file. On this last point I recommend you to generated a signedURL. Download URL exists only in Firebase area (maybe your project is a firebase projet?) but it's the same thing than signerURL. Finally I don't recommend you the #1 proposal. It will work, but in case of long download (because network is poor), the connexion will be interrupted after 60 seconds. And this will keep your AppEngine up for nothing (and you will pay for this...).

Migrate videos from Vimeo to S3

I have a large quantity of videos on my Vimeo account that I would like to migrate to my AWS S3 account.
Rather than go through the time consuming process of downloading from Vimeo to my local machine then uploading from my local machine to S3, is there a way where I can do a direct transfer from Vimeo to S3?
If possible, I would want to create a script to iterate through each video via Vimeo API and set up the path to where it would go into S3 then initiate a direct transfer. Any ideas or suggestions would be much appreciated!
If you have a PRO account or higher, you can use the API to get download links for videos on your account, including download links for the original source file. Those download file links should be able to be used for importing into S3. Note that the links provided via the Vimeo API are expiring HTTP 302 redirects to the video file resource, so make sure you take note of the expiration time also provided in the response.
Download links are returned with the rest of a video's metadata, so I suggest using the fields parameter to only return the metadata needed.
http://developer.vimeo.com/api/common-formats#json-filter
https://developer.vimeo.com/api/reference/videos#GET/users/{user_id}/videos

Storing a remote hosted image on S3 directly using java sdk

I know I can download the image on server and then upload again to S3 or any other cloud hosting service, but is there any way to store the image asset directly on s3 by supplying URL of asset instead of a file, because I don't want to add unwanted download and upload on my server.
Note: I am assured that the URI will be 99.9% up and image file will also be there. And I am OK to use services other than S3 if they have such a feature.
No. There is no API call for Amazon S3 that will retrieve content from another location.
You must supply the content as part of the API call.

Amazon S3: how to use/load an mp3 file simultaneously

I'm using Amazon S3 to store some mp3 files.
My web application, uses the Soundmanager2 javascript library to load the files from the Amazon bucket, and play them to users.
When the first user clicks on an mp3, soundmanager starts playing the file, and as intended, caches the rest of the song as it is being played.
Problem is, if a second user clicks on the same mp3, he must wait until the first user caches the whole song, which is unacceptable for my website.
I understand that Amazon S3 somehow 'streams' the file exclusively to the first request. Is there a way to be able to use that file simultaneously, i.e. users be able to play the same mp3's at the same time?
Also, would the CloudFront functionality solve this issue?
Thank you for your help!
Alex
(By the way, my application is built on Ruby on Rails 3, and hosted on Heroku)
There is no limitation in S3 that restricts simultaneous downloads of a single object.
I would suggest that you use a tool, like Charles, to inspect the HTTP requests and see if another service is causing the second client's request to be delayed.