Google Vision AutoML > Datasets | Validation data in csv doesn't upload - google-vision

I am using Google Vision Automl. In order to train a model the data needs to be uploaded. There are following two ways.
Upload directly from your computer
Upload to google bucket and make a csv which contains the paths to the image files.
See the following image
Since, i want to compare my locally pre-trained model with the model i will train on Google Automl, i want to ensure that the same data splits are used (train, test, validation). So #2 way is the best way
Issue:
I have made a the csv in the following format. But when i upload it, only train and test sets are loaded.

I solved it by putting "Validation" instead of "Validate" in the set column.
So the issue was the language used on the upload form, where they have the following.
Optionally, you can specify the TRAIN, VALIDATE, or TEST split.
Which is misleading and they also did not show the sample row for Validation.
For more details:
https://cloud.google.com/vision/automl/docs/prepare#csv

Related

"Empty table from specified data source" error in Create ML

I'm trying to train a new object detection model using the Create ML tool from Apple. I've already used RectLabel to generate annotations for all of the JPEG images in my directory of training images.
However, every time I try loading the directory in Create ML, I receive this error message:
Empty table from specified data source
I already looked on the Apple Developer forums and that thread incorrectly claims the problem was solved in a previous update.
What causes this error? How can I get Create ML to accept my training data?
I'm using Create ML Version 2.0 (53.2.2) and RectLabel Version 3.04.2 (3.04.2) on macOS Big Sur 11.0.1 (20B29).
The “Empty table from specified data source” error occurs if any of the filenames contain spaces.
My solution was to rename all the files so the filenames don't contain spaces.
Make sure that there are only images and annotations.json file in your directory of training images.
If there are any other files including .mlproj file in the folder, Create ML shows the "Empty table from specified data source" error.
When you create a new project on Create ML, specify outside the directory of training images.

Loading Keras Model in [Google App Engine]

Use-case:
I am trying to load a pre-trained Keras Model as .h5 file in Google App Engine. I am running App Engine on a Python runtime 3.7 and Standard Environment.
Issue:
I tried using the load_model() Keras function. Unfortunately, the load_model function does require a 'file_path' and I failed to load the Model from the Google App Engine file explorer. Further, Google Cloud Storage seems not to be an option as it is not recognized as a file path.
Questions:
(1) How can I load a pretrained model (e.g. .h5) into Google App Engine (without saving it locally first)?
(2) Maybe there is a way to load the model.h5 into Google App Engine from Google Storage that I have not thought of, e.g by using another function (other than tf.keras.models.load_model()) or in another format?
I just want to read the model in order to make predictions. Writing or training the model in not required.
I finally managed to load the Keras Model in Google App Engine -- overcoming four challenges:
Solution:
First challenge: As of today Google App Engine does only provide TF Version 2.0.0x. Hence, make sure to set in your requirements.txt file the correct version. I ended up using 2.0.0b1 for my project.
Second challenge: In order to use a pretrained model, make sure the model has been saved using this particular TensorFlow Version, which is running on Google App Engine.
Third challenge: Google App Engine does not allow you to read from disk. The only possibility to read / or store data is to use memory respectively the /tmp folder (as correctly pointed out by user bhito). I ended up connecting my Gcloud bucket and loaded the model.h5 file as a blob into the /tmp folder.
Fourth challenge: By default the instance class of Google App Engine is limited to 256mb. Due to my model size, I needed to increase the instance class accordingly.
In summary, YES tf.keras.models.load_model() does work on App Engine reading from Cloud Storage having the right TF Version and the right instance (with enough memory)
I hope this will help future folks who want to use Google App Engine to deploy there ML Models.
You will have to download the file first before using it, Cloud Storage paths can't be used to access objects. There is a sample on how to download objects in the documentation:
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
"""Downloads a blob from the bucket."""
# bucket_name = "your-bucket-name"
# source_blob_name = "storage-object-name"
# destination_file_name = "local/path/to/file"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(source_blob_name)
blob.download_to_filename(destination_file_name)
print(
"Blob {} downloaded to {}.".format(
source_blob_name, destination_file_name
)
)
And then write the file to the /tmp temporary folder which is the only one available in App Engine. But you have to take into consideration that once the instance using the file is deleted, the file will be deleted as well.
Being more specific to your question, to load a keras model, it's useful to have it as a pickle, as this tutorial shows:
def _load_model():
global MODEL
client = storage.Client()
bucket = client.get_bucket(MODEL_BUCKET)
blob = bucket.get_blob(MODEL_FILENAME)
s = blob.download_as_string()
MODEL = pickle.loads(s)
I also have been able to found an answer to another Stackoverflow post that covers what you're actually looking for.

kaggle directly download input data from copied kernel

How can I download all the input data from a kaggle kernel? For example this kernel: https://www.kaggle.com/davidmezzetti/cord-19-study-metadata-export.
Once you make a copy and have the option to edit, you have the ability to run the notebook and make changes.
One thing I have noticed is that anything that goes in the output directory is provided with an option of a download button next to the file icon. So I see that I can surely just read each and every file and write to the output but it seems like a waste.
Am I missing something here?
The notebook you list contains two data sources;
another notebook (https://www.kaggle.com/davidmezzetti/cord-19-analysis-with-sentence-embeddings)
and a dataset (https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge)
You can use Kaggle's API to retrieve a kernel's output:
kaggle kernels output davidmezzetti/cord-19-analysis-with-sentence-embeddings
And to download dataset files:
kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge

Does google colab permanently change file

I am doing some data pre-processing on Google Colab and just wondering how it works with manipulating dataset. For example R does not change the original dataset until you use write.csv to export the changed dataset. Does it work similarly in colab? Thank you!
Until you explicitly save your changed data, e.g. using df.to_csv to the same file you read from, your changed dataset is not saved.
You must remember that due to inactivity (up to an hour or so), you colab session might expire and all progress be lost.
Update
To download a model, dataset or a big file from Google Drive, gdown command is already available
!gdown https://drive.google.com/uc?id=FILE_ID
Download your code from GitHub and run predictions using the model you already downloaded
!git clone https://USERNAME:PASSWORD#github.com/username/project.git
Write ! before a line of your code in colab and it would be treated as bash command. You can download files form internet using wget for example
!wget file_url
You can commit and push your updated code to GitHub etc. And updated dataset / model to Google Drive or Dropbox.

Speech training files and registry locations

I have a speech project that requires acoustic training to be done in code. I a successfully able to create training files with transcripts and their associated registry entries under Windows 7 using SAPI. However, I am unable to determine if the Recognition Engine is successfully using these files and adapting its model. My questions are as follows:
When performing training through the Control Panel training UI, the system stores the training files in "{AppData}\Local\Microsoft\Speech\Files\TrainingAudio". Do the audio training files HAVE to be stored in this location, or can I store them elsewhere as long as the registry entries for the profile reflect the correct path?
The Speech Control Panel creates registry entries for the training audio files in the key "HKCU\Software\Microsoft\Speech\RecoProfiles\Tokens{ProfileGUID}{00000000-0000-0000-0000-0000000000000000}\Files".
a) Do the registry entries created by my training code HAVE to be placed in "{00000000-0000-0000-0000-0000000000000000}\Files" or can I create a new random GUID under {ProfileGUID}?
b) Does the subkey HAVE to be named "Files"?
c) And do the registry values HAVE to follow the form "TrainingAudio-xxxx-xxxxxxxx-xxxxxxxx" or can I use other values?
d) Finally, the Registry Value Data is of the form "%1c%\Microsoft\Speech\Files\TrainingAudio\SP-xxx....xxx". Can I specify an absolute path?
e) Do the file names HAVE to follow the form "SP-xxx....xxx.wav" or can I use any unique file names?
Thanks.
Giri