Related
I am trying to connect with Python to my USB devices.
The final result should be a connection to my Blood Pressure Monitor but I am failing already to connect to ANY device.
My simple code - which I found here - is bellow. The Product- and Vendor ID I got from Apple Menu > About this Mac > System Information
import usb.core
import usb.util
# find our device
dev = usb.core.find(idVendor=0x0781, idProduct=0x55a4)
# was it found?
if dev is None:
raise ValueError('Device not found')
# set the active configuration. With no arguments, the first
# configuration will be the active one
dev.set_configuration()
# get an endpoint instance
cfg = dev.get_active_configuration()
intf = cfg[(0,0)]
ep = usb.util.find_descriptor(
intf,
# match the first OUT endpoint
custom_match = \
lambda e: \
usb.util.endpoint_direction(e.bEndpointAddress) == \
usb.util.ENDPOINT_OUT)
assert ep is not None
# write the data
ep.write('test')
But I get always NoBackendError: No backend available from dev = usb.core.find(idVendor=0x0781, idProduct=0x55a4)
For the connection I installed pyusb in my Python env and with Homebrew libusb on my mac.
I have no clue how to get a connection or even a simple list via iteration with all my connected Product- and Vendor IDs.
This error is to be expected if pyusb cannot find the dynamic libraries of libusb.
Installing libusb with Homebrew is not sufficient. Homebrew puts the relevant files in /opt/homebrew/Cellar/libusb/1.0.24/lib and creates symbolic links in /opt/homebrew/lib. But pyusb is not aware of these paths.
You have two main options:
Add /opt/homebrew/lib to the environment variable DYLD_LIBRARY_PATH. For a permanent setup, add it to ~/.zshenv:
export DYLD_LIBRARY_PATH="/opt/homebrew/lib:$DYLD_LIBRARY_PATH"
Create a symbolic link in your home directory. This takes advantage of the fact that ~/lib is a default fallback path for libraries:
ln -s /opt/homebrew/lib ~/lib
I am trying to follow this tutorial:
https://medium.com/#natu.neeraj/training-a-keras-model-on-google-cloud-ml-cb831341c196
to upload and train a Keras model on Google Cloud Platform, but I can't get it to work.
Right now I have downloaded the package from GitHub, and I have created a cloud environment with AI-Platform and a bucket for storage.
I am uploading the files (with the suggested folder structure) to my Cloud Storage bucket (basically to the root of my storage), and then trying the following command in the cloud terminal:
gcloud ai-platform jobs submit training JOB1
--module-name=trainer.cnn_with_keras
--package-path=./trainer
--job-dir=gs://mykerasstorage
--region=europe-north1
--config=gs://mykerasstorage/trainer/cloudml-gpu.yaml
But I get errors, first the cloudml-gpu.yaml file can't be found, it says "no such folder or file", and trying to just remove it, I get errors because it says the --init--.py file is missing, but it isn't, even if it is empty (which it was when I downloaded from the tutorial GitHub). I am Guessing I haven't uploaded it the right way.
Any suggestions of how I should do this? There is really no info on this in the tutorial itself.
I have read in another guide that it is possible to let gcloud package and upload the job directly, but I am not sure how to do this or where to write the commands, in my terminal with gcloud command? Or in the Cloud Shell in the browser? And how do I define the path where my python files are located?
Should mention that I am working with Mac, and pretty new to using Keras and Python.
I was able to follow the tutorial you mentioned successfully, with some modifications along the way.
I will mention all the steps although you made it halfway as you mentioned.
First of all create a Cloud Storage Bucket for the job:
gsutil mb -l europe-north1 gs://keras-cloud-tutorial
To answer your question on where you should write these commands, depends on where you want to store the files that you will download from GitHub. In the tutorial you posted, the writer is using his own computer to run the commands and that's why he initializes the gcloud command with gcloud init. However, you can submit the job from the Cloud Shell too, if you download the needed files there.
The only files we need from the repository are the trainer folder and the setup.py file. So, if we put them in a folder named keras-cloud-tutorial we will have this file structure:
keras-cloud-tutorial/
├── setup.py
└── trainer
├── __init__.py
├── cloudml-gpu.yaml
└── cnn_with_keras.py
Now, a possible reason for the ImportError: No module named eager error is that you might have changed the runtimeVersion inside the cloudml-gpu.yaml file. As we can read here, eager was introduced in Tensorflow 1.5. If you have specified an earlier version, it is expected to experience this error. So the structure of cloudml-gpu.yaml should be like this:
trainingInput:
scaleTier: CUSTOM
# standard_gpu provides 1 GPU. Change to complex_model_m_gpu for 4 GPUs
masterType: standard_gpu
runtimeVersion: "1.5"
Note: "standard_gpu" is a legacy machine type.
Also, the setup.py file should look like this:
from setuptools import setup, find_packages
setup(name='trainer',
version='0.1',
packages=find_packages(),
description='Example on how to run keras on gcloud ml-engine',
author='Username',
author_email='user#gmail.com',
install_requires=[
'keras==2.1.5',
'h5py'
],
zip_safe=False)
Attention: As you can see, I have specified that I want version 2.1.5 of keras. This is because if I don't do that, the latest version is used which has compatibility issues with versions of Tensorflow earlier than 2.0.
If everything is set, you can submit the job by running the following command inside the folder keras-cloud-tutorial:
gcloud ai-platform jobs submit training test_job --module-name=trainer.cnn_with_keras --package-path=./trainer --job-dir=gs://keras-cloud-tutorial --region=europe-west1 --config=trainer/cloudml-gpu.yaml
Note: I used gcloud ai-platform instead of gcloud ml-engine command although both will work. At some point in the future though, gcloud ml-engine will be deprecated.
Attention: Be careful when choosing the region in which the job will be submitted. Some regions do not support GPUs and will throw an error if chosen. For example, if in my command I set the region parameter to europe-north1 instead of europe-west1, I will receive the following error:
ERROR: (gcloud.ai-platform.jobs.submit.training) RESOURCE_EXHAUSTED:
Quota failure for project . The request for 1 K80
accelerators exceeds the allowed maximum of 0 K80, 0 P100, 0 P4, 0 T4,
0 TPU_V2, 0 TPU_V3, 0 V100. To read more about Cloud ML Engine quota,
see https://cloud.google.com/ml-engine/quotas.
- '#type': type.googleapis.com/google.rpc.QuotaFailure violations:
- description: The request for 1 K80 accelerators exceeds the allowed maximum of
0 K80, 0 P100, 0 P4, 0 T4, 0 TPU_V2, 0 TPU_V3, 0 V100.
subject:
You can read more about the features of each region here and here.
EDIT:
After the completion of the training job, there should be 3 folders in the bucket that you specified: logs/, model/ and packages/. The model is saved on the model/ folder a an .h5 file. Have in mind that if you set a specific folder for the destination you should include the '/' at the end. For example, you should set gs://my-bucket/output/ instead of gs://mybucket/output. If you do the latter you will end up with folders output, outputlogs and outputmodel. Inside output there should be packages. The job page link should direct to output folder so make sure to check the rest of the bucket too!
In addition, in the AI-Platform job page you should be able to see information regarding CPU, GPU and Network utilization:
Also, I would like to clarify something as I saw that you posted some related questions as an answer:
Your local environment, either it is your personal Mac or the Cloud Shell has nothing to do with the actual training job. You don't need to install any specific package or framework locally. You just need to have the Google Cloud SDK installed (in Cloud Shell is of course already installed) to run the appropriate gcloud and gsutil commands. You can read more on how exactly training jobs on the AI-Platform work here.
I hope that you will find my answer helpful.
I got it to work halfway now by not uploading the files but just running the upload commands from cloud at my local terminal... however there was an error during it running ending in "job failed"
Seems it was trying to import something from the TensorFlow backend called "from tensorflow.python.eager import context" but there was an ImportError: No module named eager
I have tried "pip install tf-nightly" which was suggested at another place, but it says I don't have permission or I am loosing the connection to cloud shell(exactly when I try to run the command).
I have also tried making a virtual environment locally to match that on gcloud (with Conda), and have made an environment with Conda with Python=3.5, Tensorflow=1.14.0 and Keras=2.2.5, which should be supported for gcloud.
The python program works fine in this environment locally, but I still get the (ImportError: No module named eager) when trying to run the job on gcloud.
I am putting the flag --python-version 3.5 when submitting the job, but when I write the command "Python -V" in the google cloud shell, it says Python=2.7. Could this be the issue? I have not fins a way to update the python version with the cloud shell prompt, but it says google cloud should support python 3.5. If this is anyway the issue, any suggestions on how to upgrade python version on google cloud?
It is also possible to manually there a new job in the google cloud web interface, doing this, I get a different error message: ERROR: Could not find a version that satisfies the requirement cnn_with_keras.py (from versions: none) and No matching distribution found for cnn_with_keras.py. Where cnn_with_keras.py is my python code from the tutorial, which runs fine locally.
Really don't know what to do next. Any suggestions or tips would be very helpful!
The issue with the GPU is solved now, it was something so simple as, my google cloud account had GPU settings disabled and needed to be upgraded.
Short Question:
Since Tensorflow is moving towards Keras and away from Estimators, how can we incorporate our preprocessing pipelines e.g. using tf.Transform and build_serving_input_fn() (which are used for estimators), into our tf.keras models?
From my understanding, the only way to incorporate this preprocessing graph is to first build the model using Keras. Train it. Then export it as an estimator using tf.keras.estimator.model_to_estimator. Then create a serving_input_fn and export the estimator as a saved model, along with this serving_input_fn to be used at serving time.
To me it seems tedious, and not the correct way of doing things. Instead, I would like to go directly from Keras to Saved Model.
Problem
I would like to be able to include APAHCE BEAM preprocessing graph in a Keras saved model.
I would like to serve the trained Keras model, hence I export it using SavedModel. Given a trained model, I would like to apply the following logic to predict the outcome.
raw_features = { 'feature_col_name': [100] } # features to transform
working_dir = 'gs://path_to_transform_fn'
# transform features
transformed_features = tf_transform_output.transform_raw_features(raw_features)
model = load_model('model.h5')
model.predict(x=transformed_features)
When I define my model, I use Functional API and the model has the following inputs:
for i in numerical_features:
num_inputs.append(tf.keras.layers.Input(shape=(1,), name=i))
This is the problem, because tensors are not inputed directly into keras from tf.Dataset, but instead, are linked using the Input() layer.
When I export the model using tf.contrib.saved_model.save_keras_model(model=model, saved_model_path=saved_model_path), I can readily serve predictions if I handle the preprocessing in a separate script. The output of this would look like
Is this what generally happens? For example, I would preprocess the features as part of some external script, and then send transformed_features to the model for prediction.
Ideally, it would all happen within the Keras model/part of a single graph. Currently it seems that I'm using the output of one graph as an input into another graph. Instead I would like to be able to use a single graph.
If using Estimators, we can build a serving_input_fn() which can be included as an argument to the estimator, which allows us to incorporate preprocessing logic into the graph.
I would like to also hear your Keras + SavedModel + Preprocessing ideas on serving models using Cloud ML
For your question on incorporating Apache Beam into the input function for a tf.transform pipeline, see this TF tutorial that explains how to do that:
"https://www.tensorflow.org/tfx/transform/get_started#apache_beam_implementation
On using TF 2.0 SavedModel with Keras, this notebook tutorial demonstrates how to do that:
https://www.tensorflow.org/beta/guide/keras/saving_and_serializing#export_to_savedmodel
The Cloud ML is a way of Google Clouds's Machine learning.
It's quite simple to Get Started and train the UI using their documentation:
Develop and validate your training application locally
Before you run your training application in the cloud, get it running
locally. Local environments provide an efficient development and
validation workflow so that you can iterate quickly. You also won't
incur charges for cloud resources when debugging your application
locally. Get your training data
The relevant data files, adult.data and adult.test, are hosted in a
public Cloud Storage bucket. For purposes of this sample, use the
versions on Cloud Storage, which have undergone some trivial cleaning,
instead of the original source data. See below for more information
about the data.
You can read the data files directly from Cloud Storage or copy them
to your local environment. For purposes of this sample you will
download the samples for local training, and later upload them to your
own Cloud Storage bucket for cloud training.
Download the data to a folder:
mkdir data
gsutil -m cp gs://cloud-samples-data/ml-engine/census/data/* data/
Then, just set the TRAIN_DATA AND EVAL_DATA variables to your local file paths. For example, the following commands set the variables to local paths.
TRAIN_DATA=$(pwd)/data/adult.data.csv
EVAL_DATA=$(pwd)/data/adult.test.csv
Then you have a TSV file like this:
39, State-gov, 77516, Bachelors, 13, Never-married, Adm-clerical, Not-in-family, White, Male, 2174, 0, 40, United-States, <=50K
50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse, Exec-managerial, Husband, White, Male, 0, 0, 13, United-States, <=50K
38, Private, 215646, HS-grad, 9, Divorced, Handlers-cleaners, Not-in-family, White, Male, 0, 0, 40, United-States, <=50K
53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners, Husband, Black, Male, 0, 0, 40, United-States, <=50K
To run it:
gcloud ml-engine local train \
--module-name trainer.task \
--package-path trainer/ \
--job-dir $MODEL_DIR \
-- \
--train-files $TRAIN_DATA \
--eval-files $EVAL_DATA \
--train-steps 1000 \
--eval-steps 100
For moreTraining considerations like your question says:
Running a Training Job
Cloud Machine Learning Engine provides model training as an
asynchronous (batch) service. This page describes how to configure and
submit a training job by running gcloud ml-engine jobs submit training
from the command line or by sending a request to the API at
projects.jobs.create. Before you begin
Before you can submit a training job, you must package your
application and upload it and any unusual dependencies to a Cloud
Storage bucket. Note: If you use the gcloud command-line tool to
submit your job, you can package the application and submit the job in
the same step. Configuring the job
You pass your parameters to the training service by setting the
members of the Job resource, which includes the items in the
TrainingInput resource.
If you use the gcloud command-line tool to submit your training jobs,
you can:
Specify the most common training parameters as flags of the gcloud ml-engine jobs submit training command.
Pass the remaining parameters in a YAML configuration file, named config.yaml by convention. The configuration file mirrors the
structure of the JSON representation of the Job resource. You pass the
path of your configuration file in the --config flag of the gcloud
ml-engine jobs submit training command. So, if the path to your
configuration file is config.yaml, you must set --config=config.yaml.
Gathering the job configuration data
The following properties are used to define your job.
Job name (jobId)
A name to use for the job (mixed-case letters, numbers, and underscores only, starting with a letter). Cluster configuration
(scaleTier)
A scale tier specifying the type of processing cluster to run your job on. This can be the CUSTOM scale tier, in which case you also
explicitly specify the number and type of machines to use. Training
application package (packageUris)
A packaged training application that is staged in a Cloud Storage location. If you are using the gcloud command-line tool, the
application packaging step is largely automated. See the details in
the guide to packaging your application. Module name (pythonModule)
The name of the main module in your package. The main module is the Python file you call to start the application. If you use the
gcloud command to submit your job, specify the main module name in the
--module-name flag. See the guide to packaging your application. Region (region)
The Compute Engine region where you want your job to run. You should run your training job in the same region as the Cloud Storage
bucket that stores your training data. See the available regions for
Cloud ML Engine services. Job directory (jobDir)
The path to a Cloud Storage location to use for job output. Most training applications save checkpoints during training and save the
trained model to a file at the end of the job. You need a Cloud
Storage location to save them to. Your Google Cloud Platform project
must have write access to this bucket. The training service
automatically passes the path you set for the job directory to your
training application as a command-line argument named job_dir. You can
parse it along with your application's other arguments and use it in
your code. The advantage to using the job directory is that the
training service validates the directory before starting your
application. Runtime version (runtimeVersion)
The Cloud ML Engine version to use for the job. If you don't specify a runtime version, the training service uses the default Cloud
ML Engine runtime version 1.0. Python version (pythonVersion)
The Python version to use for the job. Python 3.5 is available with Cloud ML Engine runtime version 1.4 or greater. If you don't
specify a Python version, the training service uses Python 2.7.
Formatting your configuration parameters
How you specify your configuration details depends on how you are
starting your training job:
Provide the job configuration details to the gcloud ml-engine jobs
submit training command. You can do this in two ways:
With command-line flags.
In a YAML file representing the Job resource. You can name this file whatever you want. By convention the name is config.yaml.
Even if you use a YAML file, certain details must be supplied as
command-line flags. For example, you must provide the --module-name
flag and at least one of --package-path or --packages. If you use
--package-path, you must also include --job-dir or --staging-bucket. Additionally, you must either provide the --region flag or set a
default region for your gcloud client. These options—and any others
you provide as command line flags—will override values for those
options in your configuration file.
Example 1: In this example, you choose a preconfigured machine cluster
and supply all the required details as command-line flags when
submitting the job. No configuration file is necessary. See the guide
to submitting the job in the next section.
Example 2: The following example shows the contents of the
configuration file for a job with a custom processing cluster. The
configuration file includes some but not all of the configuration
details, assuming that you supply the other required details as
command-line flags when submitting the job.
trainingInput: scaleTier: CUSTOM masterType: complex_model_m
workerType: complex_model_m parameterServerType: large_model
workerCount: 9 parameterServerCount: 3 runtimeVersion: '1.13'
pythonVersion: '3.5'
The above example specifies Python version 3.5, which is available
when you use Cloud ML Engine runtime version 1.4 or greater.
Submitting the job
When submitting a training job, you specify two sets of flags:
Job configuration parameters. Cloud ML Engine needs these values to set up resources in the cloud and deploy your application on each
node in the processing cluster.
User arguments, or application parameters. Cloud ML Engine passes the value of these flags through to your application.
Submit a training job using the gcloud ml-engine jobs submit training
command.
First, it's useful to define some environment variables containing
your configuration details. To create a job name, the following code
appends the date and time to the model name:
TRAINER_PACKAGE_PATH="/path/to/your/application/sources"
now=$(date +"%Y%m%d_%H%M%S")
JOB_NAME="your_name_$now"
MAIN_TRAINER_MODULE="trainer.task"
JOB_DIR="gs://your/chosen/job/output/path"
PACKAGE_STAGING_PATH="gs://your/chosen/staging/path"
REGION="us-east1"
RUNTIME_VERSION="1.13"
The following job submission corresponds to configuration example 1
above, where you choose a preconfigured scale tier (basic) and you
decide to supply all the configuration details via command-line flags.
There is no need for a config.yaml file:
gcloud ml-engine jobs submit training $JOB_NAME \
--scale-tier basic \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
-- \
--user_first_arg=first_arg_value \
--user_second_arg=second_arg_value
The following job submission corresponds to configuration example 2
above, where some of the configuration is in the file and you supply
the other details via command-line flags:
gcloud ml-engine jobs submit training $JOB_NAME \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--job-dir $JOB_DIR \
--region $REGION \
--config config.yaml \
-- \
--user_first_arg=first_arg_value \
--user_second_arg=second_arg_value
Notes:
If you specify an option both in your configuration file (config.yaml) and as a command-line flag, the value on the command
line overrides the value in the configuration file.
The empty -- flag marks the end of the gcloud specific flags and the start of the USER_ARGS that you want to pass to your application.
Flags specific to Cloud ML Engine, such as --module-name, --runtime-version, and --job-dir, must come before the empty -- flag. The Cloud ML Engine service interprets these flags.
The --job-dir flag, if specified, must come before the empty -- flag, because Cloud ML Engine uses the --job-dir to validate the path.
Your application must handle the --job-dir flag too, if specified. Even though the flag comes before the empty --, the --job-dir is also
passed to your application as a command-line flag.
You can define as many USER_ARGS as you need. Cloud ML Engine passes --user_first_arg, --user_second_arg, and so on, through to your
application.
I don't think I'm asking this question right but I have jupyter notebook that launches a Tensorflow training job with a python training script I wrote.
That training script requires certain modules. Seems my sagemaker training job is failing because some of the modules don't exist.
How can I ensure that my training job script has all the modules it needs?
Edit
An example of one of these modules is keras.
The odd thing is, I can import keras in the jupyter notebook, but when that import statement is in my training script then I get the No module named keras error
If you want to install multiple packages, one way is to upgrade to Sagemaker Python SDK v2. With this, you can create a requirements.txt in the same directory as your notebook, and run the training. Sagemaker will automatically take care of the installation.
If you want to stay on v1 SDK, you can add the following snippet to your entry_point script.
import subprocess
import sys
def install(package):
subprocess.check_call([sys.executable, "-q", "-m", "pip", "install", package])
install('keras')
The module script runs within a docker container which obviously does not have the dependency installed. Jupyter notebook on the other hand has keras pre-installed.
Easy way to do this is to have a requirements.txt file with all the requirements and then pass that on when creating your model.
env = {
'SAGEMAKER_REQUIREMENTS': 'requirements.txt', # path relative to `source_dir` below.
}
sagemaker_model = TensorFlowModel(model_data = 's3://mybucket/modelTarFile,
role = role,
entry_point = 'entry.py',
code_location = 's3://mybucket/runtime-code/',
source_dir = 'src',
env = env,
name = 'model_name',
sagemaker_session = sagemaker_session,
)
You can upload your requirements.txt file to s3 bucket which can be
accessible by sagemaker and download the file to your working
directory of the container using boto3. Install the libraries from
requirements.txt the entry file.
import os
import boto3
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', '/opt/ml/code/requirements.txt')
os.command('pip install -r /opt/ml/code/requirements.txt')
The other way you can do it is by building your own container using
bring your own algorithm option provided by aws.
Ref-links:
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb
The EstimatorBase class (and TensorFlow class) accept the parameter dependencies which you can use as follows to pass your requirements.txt:
estimator = TensorFlow(
dependencies=['requirements.txt'], # copies this file
)
e.g.
estimator = TensorFlow(
entry_point='src/train.py',
dependencies=['requirements.txt'], # copies this file
)
or
estimator = TensorFlow(
source_dir='src', # this copies the entire src folder
entry_point='train.py', # when using source_dir has to be directly under that dir
dependencies=['requirements.txt'], # copies this file
)
This copies the requirements.txt file into your sourcedir.tar.gz along with the training code.
This may only work on newer image versions. I read that in older versions you may need to put the requirements.txt file in the same folder as your training code.
If this doesn't work, you can use pip download to download your dependencies defined in requirements.txt locally, then use the dependencies parameter to specify the folder to which you downloaded your dependencies.
Another option is in your entry_point .py file you can add
import os
if __name__ == "__main__":
os.system('pip install mymodule')
import mymodule
# rest of code goes here
This worked for me for simple modules such as pyparsing, but I think with keras you better just use a Tensorflow container that has keras preinstalled, as mentioned above.
The environment on your notebook instance is exclusive from the environment of your training job on SageMaker, unless it is local mode.
If you're using a custom docker image, then most likely your docker image doesn't have Keras installed.
If you are using the SageMaker predefined TensorFlow container, which is most likely invoked through the following code:
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/estimator.py#L170
TensorFlow(entry_point='training_code.py',
blah,
blah
)
Then you will need to install your dependencies within that container. There are currently two modes for training for TensorFlow on SageMaker, "framework" and "script" mode.
If training through "framework" mode, which is only available with 1.12 and below, then you will be limited to using a keras_model_fn defined here:
https://github.com/aws/sagemaker-python-sdk/tree/v1.12.0/src/sagemaker/tensorflow#preparing-the-tensorflow-training-script
Installing your dependencies would be done by passing in a requirements.txt.
On "script mode", which is introduced with TensorFlow 1.11 and above:
https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#training-with-tensorflow
Requirements.txt is not supported for "script" mode and instead it is recommended to install your dependencies within your user script, which would be your Python file that contains all of your Keras code.
Please let me know if there is anything I can clarify.
For examples:
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_script_mode_quickstart
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/tensorflow_iris_dnn_classifier_using_estimators
I am trying to use Google Cloud ML Engine to optimize hyperparameters for my variational autoencoder model, but the job fails because the .tfrecord files I specify for my input are not found. In my model code, I pass train.tfrecords to my input tensor as in the canonical cifar10 example and specify the location of train.tfrecords with the full path.
Relevant information:
JOB_DIR points to the trainer directory
This image shows my directory structure
My setup.py file is below:
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['tensorflow==1.3.0', 'opencv-python']
setup(
name='trainer',
version='0.1',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='My trainer application package.'
)
When the job executes it will not be able to read data from your local machine. The easiest way to make TFRecord files available to your job is by copying them to GCS and then passing the location of the GCS files to your program as flags and using those flags to configure your readers and writers.