port vertex AI dashbrd to jupyter notebook - how create tf.dataset from jsonl exported from dashboard - tensorflow

using Vertex AI dashboard on a model with labeled image dataset went fine.
Exported the dataset from dashboard to jsonl with a gs URI in a bucket
from Github samples i have a sample using a std platform Dataset for image labeling and i just want to change the code to load from my gs URI for the jsonl export of MY image dataset.
Jupyter notebook DS related code i need to change:
(see below ) my exported jsonl data for training (format sample )
{"imageGcsUri":"gs://test_yayatv/gcs3/Screenshot 2023-02-07 1.54.08 PM.png","classificationAnnotation":{"displayName":"mural","annotationResourceLabels":{"aiplatform.googleapis.com/annotation_set_name":"8066716590760001536"}},"dataItemResourceLabels":{}}
my question - how to consume the export above in the context of a jupyter notebook training sample set up to train on labeled image dataset like the export above ???
training dataset linked above ( after export )

Related

how to work with large training set when dealing with auto-encoders on google colaboratory?

I am training an auto-encoder (keras) on google colab. however, I have 25000 input image and 25000 output image. I tried to:
1- copy the large file from google drive to colab each time (takes 5-6 hours).
2- convert the set to numpy array but when normalizing the images, the size get a lot bigger (from 7GB to 24GB for example) and then I can not fit it into the ram memory.
3- I can not zip and unzip my data.
So please, if anyone knows how to convert it into numpy array( and normalize it) without having large file(24GB).
What I usually do :
Zip all the images and load the .zip file on your Google Drive
Dezip in your colab :
from zipfile import ZipFile
with ZipFile('data.zip', 'r') as zip:
zip.extractall()
All your images are dezipped and stored on the Colab Disk, now you can have a faster acces to them.
Use Generators in keras like flow_from_directory or create your own generator
use you generator when you fit your model :
moel.fit(train_generator, steps_per_epoch = ntrain // batch_size,
epochs=epochs,validation_data=val_generator,
validation_steps= nval // batch_size)
with ntrain and nval the number of images in your train and validation dataset

Method to load image data for TPU

So currently both keras.ImageDataGenerator and tf.data.Dataset are not working with tensorflow TPU, Is there any Other method to load image data for training model on Tpu ?
You can use GCS buckets.
Or you read from files:
with open(image_path, "rb") as local_file:
img = local_file.read()
And then convert to tensor and create dataset with tf.data.Dataset.from_tensor_slices.

Multi-column input to ML.PREDICT for a TensorFlow model in BigQueryML

We have trained a model in Google Cloud AutoML (a tool that we like a lot) and successfully exported it to GCS, and then created the model in BigQuery using the below command:
create or replace model my_dataset.my_bq_ml_model
options(model_type='tensorflow',
model_path='my gcs path to exported tensorflow model'))
However, when we use BigQueryML to try and run some predictions using the model we are unsure of the how to format the multiple features that our model uses into the single "inputs" string the exported Tensorflow model accepts in BigQuery.
select *
from ml.predict(model my_project.my_dataset.my_bq_ml_model,
(
select 'How do we format this?' as inputs
from my_rows_to_predict
))
Has anyone done this yet?
This is similar to this question, which remains open:
Multi-column input to ML.PREDICT for a TensorFlow model in BigQuery ML
Thank you all.
After you load the model into BigQuery ML, click on the model in the BigQuery UI and switch over to the "Schema" tab. This should tell you what columns the model wants.
Alternately, run the program saved_model_cli on the model (it's a python program that comes with tensorflow) to see what the supported signature is
saved_model_cli show --dir $export_path --all

Tensorflow estimator exporter when training data is from tfrecords and inference is from raw data

Background
0) I'm working on an NLP model that I would like to export
1) I have training data in the form of tfRecords
2) I would like to export my model and host it on a flask app, so the data that comes in is raw text
3) I handle all my pre-processing (tokenization and such) as part of my tensorflow graph
Question
1) Given the fact that I do the data loading (tf.Dataset creation, and pre-processing) as part of tensorflow graph, would the raw text that comes in break the process? (specifically in the tf.Dataset creation step)
2) Would it make more sense to just load in raw text instead of tf.Dataset data?
NVM, I completely forgot that you want to have an input_function which you feed datasets through, and a serving_input_function which accepts raw data

Tensorflow Keras GCP ML engine model serving

I'm working on an image classifier with tensorflow estimator + keras retraining the last layer of a pretrained application inception_v3 on GCP ML engine.
The keras model is exported with tf.keras.estimator.model_to_estimator and the input function receive the path of the image stored on GCP cloud storage open the image with tf.image.decode_jpeg and return a dataset with the following format dict(zip(['inception_v3_input'], [image])), label
I'm trying to define the tf.estimator.export.ServingInputReceiver but I'm having some trouble defining it.
The model is serving correctly the prediction with the predict method using the input function without the labels.
My idea was to reuse the input_function to decode the image passing only the path of the image on cloud storage to the prediction also for the google endpoint, but I can't understand how to do it.
Thank's for your help
If I'm understanding correctly, your question is how to get the file from Cloud Storage, considering that you want to decode the image this way:
image_decoded = tf.image.decode_jpeg(image_string)
So, in this case, you can use:
image_string = file_io.FileIO(filename, mode='r')
By importing file_io first:
from tensorflow.python.lib.io import file_io
According to the comments on this question about reading input data from GCS, using the file_read function should provide the same results since " there was a bunch of work done to abstract file io and file systems, so there all the io functionality works consistently". So you can try also with read_file function.