How to access images directly from Google Cloud Storage (GCS) when using Keras? - tensorflow

I have developed a model in Keras that works perfectly when reading data stored locally. However, I now want to take advantage of Google Cloud Platform's GPUs for training the model. I have set up the GPU on GCP and am working in a Jupyter notebook. I have moved my images to Google Cloud Storage.
My question is:
How can I access these images (specifically the directories - training, validation, test) directly from Cloud Storage using the Keras' flow_from_directory method of the ImageDataGenerator class?
here's my directory structure in Google Cloud Storage (GCS):
mybucketname/
class_1/
img001.jpg
img002.jpg
...
class_2/
img001.jpg
img002.jpg
...
class_3/
img001.jpg
img002.jpg
...

While I haven't yet figured out a way to read the image data directly from GCS, in the meantime I can copy the files directly from Cloud Storage to the VM via import os, sys os.system('gsutil cp -r gs://mybucketname/ .')

Related

How to load my train.tfrecord files in saturn cloud for running via Dask?

I am working on Object Detection and I have two record files. Train.tfrecord(1.6GB) and Test.tfrecord(65MB) file. How do I load the training file in Saturn cloud, as I want to speed up the training time using Dask in Saturn Cloud?
As #SultanOrazbayev mentioned, since Saturn runs on AWS, the best way is to put your data in an S3 bucket. Then you can access it using whichever library you prefer. Normally we recommend using s3fs.
Note: I work at Saturn Cloud.

export_inference_graph.py vs export_tflite_ssd_graph.py

The output of export_inference_graph.py is
- model.ckpt.data-00000-of-00001
- model.ckpt.info
- model.ckpt.meta
- frozen_inference_graph.pb
+ saved_model (a directory)
while the output of export_tflite_ssd_graph.py
- tflite_graph.pbtxt
- tflite_graph.pb
What is difference in both the frozen files?
I assume you are trying to use your object detection model on mobile devices. For which you need to convert your model to tflite version.
But, you cannot convert models like fasterRCNN to tflite. You need to go for SSD models to be used for mobile devices.
Another way to use model like fasterRCNN in your deployment is,
Use AWS EC2 tensorflow AMI, deploy your model on cloud and have it routed to your website domain or mobile device. When server gets an image through http form that user fills, model will process it on your cloud server and send it back to your required terminal.

How do I add GCS credentials to tensorflow?

I'm trying to train a model on kaggle and dump tensorboard logs into a GCS bucket. I'm hesitant to allow anonymous read/write on my project and would like to be able to have tensorflow use a custom service account with limited quotas for all GCP / gfile.GFile operations. Is there anyway to provide tensorflow with a service account json to use?
Is my best bet just security by obscurity?
I am not experienced using Kraggle and I do not really understand what limits do you want to apply on the service account, but you can follow the next steps to determine a service account access for Google Cloud Storage while using TensorFlow:
Follow this guide to implement GCS custom FileSystem in Tensorflow.
Check the Python client library to instantiate the client.
The service account permissions required for storage are listed here.
To grant roles to a service account, follow this guide.
Check the snippet in Federico's post here, based on this documentation, to implement the service account in your Python code.
Snippet:
from google.oauth2 import service_account
SERVICE_ACCOUNT_FILE = 'service.json'
credentials = service_account.Credentials.from_service_account_file(
SERVICE_ACCOUNT_FILE)
If you have service account credentials in a json file, you can specify it in the GOOGLE_APPLICATION_CREDENTIALS environment variable to have TensorFlow be able to read/write to GCS via gs:// urls.
You can test it out in the following way, by running the following in bash (it downloads a smoke test script from TensorFlow's repository and runs it on your bucket url with your credentials):
wget https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/tools/gcs_test/python/gcs_smoke.py
GOOGLE_APPLICATION_CREDENTIALS=my_credentials.json python gcs_smoke.py --gcs_bucket_url=gs://my_bucket/test_tf
This should create some dummy records in GCS and read from them. After this, you'd want to clean up the remaining temporary outputs to avoid further charges:
gsutil rm -r gs://my_bucket/test_tf

Save and Load models from S3

Any way to allow an H2O cluster to save/load directly to S3?
model.save('s3n://my-domain/gbm-from-the-future')
model.load('s3n://my-domain/gbm-from-the-future')
Historically, I have achieved this by:
- Saving to a file-system off of the Cluster
- Syncing with S3
- Downloading from S3
- Loading from the file-system
Obviously, there has to be a better way from the cluster itself.
According to the Python docs for h2o.save_model() this is already supported (you did not mention which of the APIs you are using, so I am using Python as an example). Have you tried putting an S3 address in the file location argument of the standard model save and load functions? If you find that this is not working, please file a bug report on the H2O JIRA.

Migrate s3 data to google cloud storage

I have a python web application deployed on Google App Engine.
I need to grab a log file stored on Amazon S3 and load it into Google Cloud Storage. Once it is in Google Cloud Storage I may need to perform some transformations and eventually import the data into BigQuery for analysis.
I tried using gsutil as a some sort of proof of concept, since boto is under the hood of gsutil and I'd like to use boto in my project. This did not work.
I'd like to know if anyone has managed to transfer file directly between the 2 clouds. If possible I'd like to see a simple example. In the end this task has to be accomplished through code executing on GAE.
Per this thread, you can stream data from S3 to Google Cloud Storage using gsutil but every byte still has to take two hops: S3 to your local computer and then your computer to GCS. Since you're using App Engine, however, you should be able to pull from S3 and deposit into GCS. It's the same progression as above except App Engine is the intermediary, i.e. every byte travels from S3 to your app and then to GCS. You could use boto for the pull side and the Google Cloud Storage API for the push side.
Google allows you to import entire buckets from S3 to the storage service:
https://cloud.google.com/storage/transfer/getting-started
You can set file filters on the source bucket to only import the file you want, or a "directory" (i.e. anything with a certain prefix).
I'm not aware of any cloud provider that provides an API for transferring data to a competing cloud provider. Cloud providers have no incentive to help you move your data to the competition. You will almost certainly have to read the data to an intermediate machine then write it to Google.
GCP supports not only transfer from S3, also it supports all the storage which have S3-compatible API's.
https://cloud.google.com/storage-transfer/docs/create-transfers
https://cloud.google.com/storage-transfer/docs/s3-compatible