When using TPUs on Google Colab (such as in the MNIST example), we are told to create a GCS bucket. However, it doesn't tell us where. Without knowing the region/zone of the Colab instance, I am afraid to create a bucket in fear of running into billing issues.
There are actually several questions:
Is accessing a GCS bucket from Colab free, or do the normal network egress fees apply?
Can I get the region/zone of the colab instance? Most likely not.
If the question to both questions above is "no": is there any solution for minimizing costs when using TPUs with Colab?
Thank you for your question.
No, you can not get the region/zone of the colab instance. So you can try creating a multi-regional GCS bucket which should be accessible by the colab. As per the comment, https://github.com/googlecolab/colabtools/issues/597#issuecomment-502746530, Colab TPU instances are only in US zone. So while creating a GCS bucket, you can choose a Multi-region bucket in US.
Checkout https://cloud.google.com/storage/pricing to get more details about the pricing information for the GCS buckets.
You can also sign up for a Google Cloud Platform account with 5GB of free storage and $300 in credits at https://cloud.google.com/free/, so that should be able to provide you with enough credits to get started.
We are told to create a GCS bucket. However, it doesn't tell us where.
Running (within Colab)
!curl ipinfo.io
You get something similar to
{
"ip": "3X.20X.4X.1XX",
"hostname": "13X.4X.20X.3X.bc.googleusercontent.com",
"city": "Groningen",
"region": "Groningen",
"country": "NL",
"loc": "53.21XX,6.56XX",
"org": "AS396XXX Google LLC",
"postal": "9711",
"timezone": "Europe/Amsterdam",
"readme": "https://ipinfo.io/missingauth"
}
Which tells you where you Colab is running.
You can create a GCS bucket in just one region (if you don't need multi-region).
Assuming you don't change country/area very often, you can check that a few times (different days) and get an idea of where your Colab is probably going to be located.
For your other questions (egress,...) see the Conclusion on
https://ostrokach.gitlab.io/post/google-colab-storage/
[...] Google Cloud Storage is a good option for hosting our data. Only
we should be sure to check that the Colab notebook is running in the
same continent as your Cloud Storage bucket, or we will incur network
egress charges!
Related
I am looking at creating an app that uses cloud storage for image storing, and according to this it seems like it is smart to create a bucket per user. However, this seems like a bad idea when you think about scaleability and such because you'd have millions of buckets. My question is: for an app that uses storage buckets to store images, is it better to create a per-user bucket or use a single bucket and just name files uniquely according to user-email and limit accesses to the files inside to each user?
It seems like every doc I visit mentions creating the bucket either in the console or in gsutil but I am looking to see if there is a way to do it from the react-native client side. This way when a user creates a new account, a new bucket can be allocated to them. I have looked into the Google Cloud JSON API too.
I am trying to understand when Google will charge for what they call "premium" and when the "standard" costs apply. The main info is here, but it is not very clear:
Pricing | Cloud Speech-to-Text Documentation | Google Cloud
https://cloud.google.com/speech-to-text/pricing?hl=uk
Does anyone know where there is more specific information?
If we look at the API references page, we find that there is a flag for declaring that you wish to use premium models see:
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/RecognitionConfig
This implies that your application controls which model is used and hence which pricing applies.
When I try to launch a transfer from Amazon S3 to Google GCS using the google API from the console, my transfer keep stuck at the "calculating" step. I have been using this API a lot for four months and it is the first time I get this kind of behaviour. Maybe be this problem is linked with the last API version deployment.
I wonder as one of my personal projects development goes further forward how should i organize the files ( images, videos, audio files ) uploaded by the users onto AWS's S3/GCE Cloud Storage, i'm used to see these kinds of URL below;
Facebook fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xft1/v/t1.0-9/11873531_1015...750483_5263546700711467249_n.jpg?oh=b3f06f7e...b7ebf7&oe=56392950&__gda__=1446569890_628...c7765669456
Tumblr 36.media.tumblr.com/686b47...e93fa09c2478/tumblr_nt7lnyP3ld1rqbl96o1_500.png
Twitter pbs.twimg.com/media/CMimixsV...AcZeM.jpg
Does these random characters carry some kind of meaning? or they're just "UUIDs"? Is there a performance/organization issue in using, for instance this kind of URL below?
content.socialnetworkX.com/userY/post/customName_dinosaurs.jpg
EDIT: Let be clear that i'm considering millions of files.
For S3, see the Performance Considerations page where it talks about object naming. Specifically, if you plan to upload objects at a high rate, you should avoid sequentially named objects, as they can be a bottleneck.
Google Cloud Storage does not have this performance bottleneck. See this answer.
I've seen the recently Google Drive pricing changes and they are amazing.
1Tb in Google Drive = $9.99
1Tb in Amazon S3 = $85 ($43 if you have more than 5000TB with them)
This changes everything !
We have a SaaS website in which we keep customer's files. Does anyone know if Google Drive can be used to keep this kind of files/service or it's just for personal use?
Does it have a robust API for uploading, downloading, and create public URL's to access files as S3 have ?
Edit: I saw the SDK here (https://developers.google.com/drive/v2/reference/). The main concern is if this service can be used for keeping customer's files, I mean, a SaaS website offering a service and keeping files there.
This doesn't really change anything.
“Google Drive storage is for users and Google Cloud Storage is for developers.”
— https://support.google.com/a/answer/2490100?hl=en
The analogous service with comparable functionality to S3 is Google Cloud Storage, which is remarkably similar to S3 in pricing.
https://developers.google.com/storage/pricing
Does anyone know if Google Drive can be used to keep this kind of files/service or it's just for personal use?
Yes you can. That's exactly why the Drive SDK exists. You can either store files under the user's own account, or under an "app" account called a Service Account.
Does it have a robust API for uploading, downloading, and create public URL's to access files as S3 have ?
"Robust" is a bit subjective, but there is certainly an API.
There are a number of techniques you can use to access the stored files. Look at https://developers.google.com/drive/v2/reference/files to see the various URLs which are provided.
Por true public access, you will probably need to have the files under a public directory. See https://support.google.com/drive/answer/2881970?hl=en
NB. If you are in the TB space, be very aware that Drive has a bunch of quotas, some of which are unpublished. Make sure you test any proof of concept at full scale.
Sorry to spoil your party, before you get too excited, look at this issue. It is in Google's own product, and has been active since November 2013 (i.e.4 months). Now imagine re-syncing a few hundred GB of files once a while. Or better, ask your customers to do it with their files after you recommended Drive to them.