I wonder as one of my personal projects development goes further forward how should i organize the files ( images, videos, audio files ) uploaded by the users onto AWS's S3/GCE Cloud Storage, i'm used to see these kinds of URL below;
Facebook fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xft1/v/t1.0-9/11873531_1015...750483_5263546700711467249_n.jpg?oh=b3f06f7e...b7ebf7&oe=56392950&__gda__=1446569890_628...c7765669456
Tumblr 36.media.tumblr.com/686b47...e93fa09c2478/tumblr_nt7lnyP3ld1rqbl96o1_500.png
Twitter pbs.twimg.com/media/CMimixsV...AcZeM.jpg
Does these random characters carry some kind of meaning? or they're just "UUIDs"? Is there a performance/organization issue in using, for instance this kind of URL below?
content.socialnetworkX.com/userY/post/customName_dinosaurs.jpg
EDIT: Let be clear that i'm considering millions of files.
For S3, see the Performance Considerations page where it talks about object naming. Specifically, if you plan to upload objects at a high rate, you should avoid sequentially named objects, as they can be a bottleneck.
Google Cloud Storage does not have this performance bottleneck. See this answer.
Related
What is best architecture to store images for blog and retrieval? I have a usecase where I have to design image storage / retrieval system for articles. Where and how should I store store these images and retrieve / access those while displaying contents of article with minimum latency?
It would be great if you can provide any reference for this. Thanks.
If you want minimum latency for image retrieval, you need to use a CDN (Content Delivery Network)
Check out this article for more details.
For example, AWS offers Cloud Front which is very simple to use - store the images into an S3 bucket, and then use dedicated CloudFront URLs on your client-side code, to fetch the images.
There are other CDN providers out there, you can find them right away on a Google search.
I've found this nice article on how to directly stream data from Google Storage to tf.data. This is super handy if your compute tier has limited storage (like on KNative in my case) and network bandwidth is sufficient (and free of charge anyway).
tfds.load(..., try_gcs=True)
Unfortunately, my data resides in a non Google bucket and it isn't documented for other Cloud Object Store systems.
Does anybody know if it also works in non GS environments?
I'm not sure how this is implemented in the library, but it should be possible to access other object store systems in a similar way.
You might need to extend the current mechanism to use a more generic API like the S3 API (most object stores have this as a compatibility layer). If you do need to do this, I'd recommend contributing it back upstream, as it seems like a generally-useful capability when either storage space is tight or when fast startup is desired.
I work on an app that consists of a
Frontend app
API, that I like to think of as a gateway
Microservices that handle the business logic and db work
Upon implementing a file store-like feature, for uploading both small and large files, I just assumed that I'd store these files on the microservice's filesystem and save paths, along with metadata, into the microservice's db.
Because the microservices don't implement any Http API endpoints, I upload files over my API gateway. But after realizing how much work must go into transferring these files from the API to the microservice, aswell as serving the same back, I just went with storing them on the API's file system and saving the paths into the microservice's db.
Is this approach ok?
Is it weird that my API gateway stores and serves files from it's own file system?
If so, should I transfer the files from the API to the microservice, upon an upload, even considering the files can be large - or should the microservice implement a specific API itself?
I hope this question doesn't get interpreted as opinion-based - I'd like to know what approach would be best considering the frontend-api-microservice pattern and if there are any architecture standards that address this scenario, and also if any approach has it's gotchas.
Based on comments above
API Gateway
The purpose of gateway is to redirect the requests and handle cross cutting concerns like authentication , logging etc. It shouldn't be doing more than that. Gateway has to be highly available and any problem to gateway means you can't access associated services.
File Upload
The file upload should be handled by microservice itself. Your gateway will only be used to pass and get the stream. Depending on nature of your system and if you are using cloud store you can use of pattern like "valet key".
https://learn.microsoft.com/en-us/azure/architecture/patterns/valet-key
After some time and some experience, the right answer to this question would be API Gateway. Microservices are complex enough on their own, and storing any files, small or large, would rather be a waste of networking, bandwith etc. and would just introduce latency issues and degrade performance as well as UX.
I'll leave this out here just so people can hear this, as neither approach would be wrong, while the API Gateway choice just provides more practical benefits and thus is more appropriate. If this question was targetting data or files that are stored within a DB, microservice and it's DB would be the obvious choice.
If you have the convenience to add an file server to your whole stack, then sure, that would be the correct approach, but that as well introduces more complexity and other stuff described above.
I'm trying to make a google gadget that stores some data (say, statistics of users' actions) in a persistent way (i.e. statistics accumulates over time and over multiple users). Also I want these data to be placed at google free hosting, possibly together with the gadget itself.
Any ideas on how to do that?
I know, Google gadgets API has tools for working with remote data, but then the question is where to host it. Google Wave seemed to be an option, but it is no longer supported.
You should get a server and host it there.
You have then the best control over the code, the performance and the data itself.
There are several hosting providers out there who provide hosting for a reasonable price.
Naming some: Hostgator.com (US), Hetzner.de (DE), http://swedendedicated.com (SE, never used, just a quick search on the internet).
I've seen the recently Google Drive pricing changes and they are amazing.
1Tb in Google Drive = $9.99
1Tb in Amazon S3 = $85 ($43 if you have more than 5000TB with them)
This changes everything !
We have a SaaS website in which we keep customer's files. Does anyone know if Google Drive can be used to keep this kind of files/service or it's just for personal use?
Does it have a robust API for uploading, downloading, and create public URL's to access files as S3 have ?
Edit: I saw the SDK here (https://developers.google.com/drive/v2/reference/). The main concern is if this service can be used for keeping customer's files, I mean, a SaaS website offering a service and keeping files there.
This doesn't really change anything.
“Google Drive storage is for users and Google Cloud Storage is for developers.”
— https://support.google.com/a/answer/2490100?hl=en
The analogous service with comparable functionality to S3 is Google Cloud Storage, which is remarkably similar to S3 in pricing.
https://developers.google.com/storage/pricing
Does anyone know if Google Drive can be used to keep this kind of files/service or it's just for personal use?
Yes you can. That's exactly why the Drive SDK exists. You can either store files under the user's own account, or under an "app" account called a Service Account.
Does it have a robust API for uploading, downloading, and create public URL's to access files as S3 have ?
"Robust" is a bit subjective, but there is certainly an API.
There are a number of techniques you can use to access the stored files. Look at https://developers.google.com/drive/v2/reference/files to see the various URLs which are provided.
Por true public access, you will probably need to have the files under a public directory. See https://support.google.com/drive/answer/2881970?hl=en
NB. If you are in the TB space, be very aware that Drive has a bunch of quotas, some of which are unpublished. Make sure you test any proof of concept at full scale.
Sorry to spoil your party, before you get too excited, look at this issue. It is in Google's own product, and has been active since November 2013 (i.e.4 months). Now imagine re-syncing a few hundred GB of files once a while. Or better, ask your customers to do it with their files after you recommended Drive to them.