How to handle large file upload/download with REST API? - api

With several design interviews that I have been giving, this scenario has come question that has come up often is how to handle large file upload for let's say a service like Youtube/S3 etc.
I tried looking on stack exchange and online elsewhere, but the only relevant answer I could find was the use of multipart form. But then I was wondering does the upload for large files really happens over a single API call in real-world scenarios (I am not sure on this). Can somebody explain how these uploads and downloads happen in the above-mentioned web services? Also, a follow-up question about how do the downloads happen in a 'seemingly' streaming service like youtube/ NetFlix (seemingly because data arrives in chunk here).

Related

Storing files on the API or on the microservice filesystem

I work on an app that consists of a
Frontend app
API, that I like to think of as a gateway
Microservices that handle the business logic and db work
Upon implementing a file store-like feature, for uploading both small and large files, I just assumed that I'd store these files on the microservice's filesystem and save paths, along with metadata, into the microservice's db.
Because the microservices don't implement any Http API endpoints, I upload files over my API gateway. But after realizing how much work must go into transferring these files from the API to the microservice, aswell as serving the same back, I just went with storing them on the API's file system and saving the paths into the microservice's db.
Is this approach ok?
Is it weird that my API gateway stores and serves files from it's own file system?
If so, should I transfer the files from the API to the microservice, upon an upload, even considering the files can be large - or should the microservice implement a specific API itself?
I hope this question doesn't get interpreted as opinion-based - I'd like to know what approach would be best considering the frontend-api-microservice pattern and if there are any architecture standards that address this scenario, and also if any approach has it's gotchas.
Based on comments above
API Gateway
The purpose of gateway is to redirect the requests and handle cross cutting concerns like authentication , logging etc. It shouldn't be doing more than that. Gateway has to be highly available and any problem to gateway means you can't access associated services.
File Upload
The file upload should be handled by microservice itself. Your gateway will only be used to pass and get the stream. Depending on nature of your system and if you are using cloud store you can use of pattern like "valet key".
https://learn.microsoft.com/en-us/azure/architecture/patterns/valet-key
After some time and some experience, the right answer to this question would be API Gateway. Microservices are complex enough on their own, and storing any files, small or large, would rather be a waste of networking, bandwith etc. and would just introduce latency issues and degrade performance as well as UX.
I'll leave this out here just so people can hear this, as neither approach would be wrong, while the API Gateway choice just provides more practical benefits and thus is more appropriate. If this question was targetting data or files that are stored within a DB, microservice and it's DB would be the obvious choice.
If you have the convenience to add an file server to your whole stack, then sure, that would be the correct approach, but that as well introduces more complexity and other stuff described above.

Proper way to name objects in mass storage service

I wonder as one of my personal projects development goes further forward how should i organize the files ( images, videos, audio files ) uploaded by the users onto AWS's S3/GCE Cloud Storage, i'm used to see these kinds of URL below;
Facebook fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xft1/v/t1.0-9/11873531_1015...750483_5263546700711467249_n.jpg?oh=b3f06f7e...b7ebf7&oe=56392950&__gda__=1446569890_628...c7765669456
Tumblr 36.media.tumblr.com/686b47...e93fa09c2478/tumblr_nt7lnyP3ld1rqbl96o1_500.png
Twitter pbs.twimg.com/media/CMimixsV...AcZeM.jpg
Does these random characters carry some kind of meaning? or they're just "UUIDs"? Is there a performance/organization issue in using, for instance this kind of URL below?
content.socialnetworkX.com/userY/post/customName_dinosaurs.jpg
EDIT: Let be clear that i'm considering millions of files.
For S3, see the Performance Considerations page where it talks about object naming. Specifically, if you plan to upload objects at a high rate, you should avoid sequentially named objects, as they can be a bottleneck.
Google Cloud Storage does not have this performance bottleneck. See this answer.

Difference between RSS and Web API?

I have searched a lot on web to find a satisfactory answer but I did't get an answer.
Some says RSS is static xml while in web API we make a proper format of request and get a proper format of response
Kindly help me on this
thanks,
The reason RESTful APIs are so inconsistent across different services is because REST is not a standard, it's not a protocol. It's an architectural style. Some things to take into consideration for your API would be; what HTTP verbs to support, what URI structure to follow, how to consistently return error messages, how to handle partial selection, versioning, authentication, pagination, and so on and so forth.. There is no specific right way of doing it (it's debated often), but there are many ways that are not so great!
RSS stands for Really Simple Syndication, which is essentially a format for delivering regularly changing web content. RSS feeds allow a user to subscribe to their favorite news sources, blogs, websites, and other digital properties, and then receive the latest content from all those different areas or sites in one place, without having to repeatedly visit each individual site.
Your question sounds like "What difference is between Ford Focus and a taxi service?" Ford Focus can be one of the cars in a taxi service. But nothing more.
RSS is a standard, which describes specific format of news feed. You can have a standalone locally stored RSS-formatted file, remote stored RSS-file somewhere on a server, or you can have a web-service which constructs RSS-formatted file on the fly. It will be RSS in all three cases because RSS is something that describes internal structure of a file.
Web Service is, basically, an application which runs somewhere on a server, accepts requests, processes it according to the application's internal logic, and then provides answers. Web service can take any kind of requests and provide any kind on responses, including RSS-formatted ones.
Hope that makes things a bit clearer for you.
We will be using RSS as broadcasting channel, who ever wants to know what's happning in my Company can follow my company website's RSS feed.

Dropbox differential/incremental uploads using REST API

We know that Dropbox desktop clients use a binary diff algorithm to break down all files into blocks, and only upload blocks that it doesn't already have in the cloud (https://serverfault.com/questions/52861/how-does-dropbox-version-upload-large-files).
Nevertheless, the Dropbox API, as far as I see, can only upload the whole file (/files_put, /files (POST)) when a sync is needed.
Is there any way to do differential/incremental syncing using the Dropbox API, i.e. upload only the changed portion of the file like the desktop clients do?
If this is not possible, then what are the best practices to periodically sync large files that has small changes using the Dropbox API?
Unfortunately this isn't possible and I would suspect that it may never be available.
After doing a bit of research, I found a feature request for delta-syncing to be integrated into the API[*]. Dropbox hasn't responded, nor has the community upvoted this request.
I would make an educated guess that the reason why Dropbox hasn't provided this functionality, and likely never will, is because this is a dangerous feature in the hands of unknown developers.
Consider the case where you write an application that uses such a delta-change update system for updating large files. You thoroughly test your app and publish it to an app store. A couple of weeks after your initial release, and numerous downloads, you start receiving bad reviews and complaints because you managed to miss a very specific test case.
Within this specific, buggy case you've miscalculated a differential offset by 1-byte. Oh no! You've now corrupted thousands of files, for hundreds of users!
Considering such a possibility, I think I would personally request that Dropbox NEVER provide such a dev feature. If they integrated such a function into the API, they would be breaking their #1 purpose-- to provide consistent, safe, & reliable cloud backups of your important files.
[*]: This was the original reference location, but it is now a dead link.
(https://www.dropbox.com/votebox/1515/delta-sync-api-for-mobile-applications)

How does Http live streaming works?

I have created one sample application for demonstrating a working of HTTP live streaming.
What I have done is, I have one library that takes input as video file (avi, mpeg, mov, .ts) and generating segments (.ts) and playlist (.m3u8) files for the given video file. I am storing playlist (as string) in a linked list, as an when i am getting playlist data from the library.
I have written one basic web server which will server the user requested segment and playlist files. I am requesting playlist.m3u8 file from the iPhone safari browser and it is launching the QuickTime player where it is requesting the segment.ts files listed in the received playlist files. after playing every segments (listed in current playlist) it again requests for the playlist, where i am responding with the next playlist file which contains the next set of segment.ts files listed in it.
Is this what we call HTTP live streaming?
Is there anything else, other that this i need to do for implementing HTTP live streaming?
Thanks.
Not much more. If you are taking input streams of media, encoding them, encapsulating in a format suitable for delivery and preparing the encapsulated media for distribution by placing it in such a way that they can be requested from the HTTP server, you are done. The idea behind the live streaming is that it leverages existing Internet architecture that is already optimized for serving HTTP requests for reasonably sized resources.
HTTP streaming renders many existing CDN solutions obsolete with their custom streaming protocols, custom routing and custom content caching.
You can also use media stream validator command line application for mac os x for validating streams generated by the HTTP Web server.
More or less but there's also adaptive bit-rate streaming to take care of if you want your server to push files to iOS devices. Which means your scope expands from having a single "index.m3u8" file that tracks all the TS files to a master index that then tracks the index files for each bitrate you'd want to support in your application which then individually track the TS files encoded at the respective bit-rates.
It's a good amount of work, but mostly routine/repetitive once you've got the hang of the basics.
For more on streaming, your bible, from the iOS standpoint, should ALWAYS be TN2224. Adhering closely to the specs in the Technote, is your best chance of getting through the App Store approval process vis-a-vis streaming.
Some people don't bother (building a streaming app over the past couple of months and looked at the HTTP logs of a whole bunch of video apps that don't quite seem to stick by the rules) - sometimes Apple notices, sometimes they don't, and sometimes the player is just too big for Apple to interfere.
So it's not very different there from every other aspect of the functionality of your app that undergoes Apple's scrutiny. It's just that there are ways you can be sure you're on the right track.
And of course, from a purely technical standpoint, as #psp1 mentioned the mediastreamvalidator tool can help you figure out if your streams are - at their very core, even if not in terms of their overall abilities - compatible with what's expected of HLS implementations.
Note: You can either roll with your own encoding solution (with ffmpeg, the plus being you have more control, the minus being it takes time to configure and get working just RIGHT. Plus once you start talking even the least amount of scale, you run into a whole host of other problems. And once you're done with all the technical hard-work, you'd find that was easy. Now you'd have to actually figure out which license you need to get for having a fancy H.264 encoder with you and jump through all the legal/procedural hoops to get one).
The easier solution for a developer without a legal/accounting team that could fill a football field, IMO, it's easier to go third-party with sites like Encoding.com, Zencoder etc who provide their encoding services a-la-carte or with a monthly fee. The plus is that they've taken care of all the licensing BS and are just providing you a simple "pay to use" service, which could also be extremely useful when you're building a project for a client. The minus is that you're now DEPENDENT on Zencoder/Encoding, the flip-side of which you'd know when your encoding jobs fail for a whole day because their servers are down, or even otherwise, when the API doesn't quite act as you expect or has been documented!
But anyhow that's about all the factors you got to Grok before pushing a HLS server into production!