Why I don't have the file savemodel.pbtxt or savemodel.pb in my bucket [Cloud Storage, BigQueryML, AI Plateform] - tensorflow

I am currently following this tutorial : https://cloud.google.com/architecture/predicting-customer-propensity-to-buy
to create a predictive model on customer behavior in BigQuery and BigQueryML. I then need to be able to extract my model to Cloud Storage and send it to AI Plateform to make predictions online.
My problem is this: the "gcloud ai-platform versions create" step (STEP 5) does not work.
While searching, I noticed that during the extract , the file that Google Shell is asking for is missing in my bucket.
I show you the error in my Shell. The file in question is savemodel.pb.
name#cloudshell:~ (name_account_analytics)$ gcloud ai-platform versions create --model=model_test V_1 --region=us-central1 --framework=tensorflow --python-version=3.7 --runtime-version=1.15 --origin=gs://bucket_model_test/V_1/ --staging-bucket=gs://bucket_model_test
Using endpoint [https://us-central1-ml.googleapis.com/]
ERROR: (gcloud.ai-platform.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: Deployment directory gs://bucket_model_test/V_1/ is expected to contain exactly one of: [saved_model.pb, saved_model.pbtxt].
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: 'Deployment directory gs://bucket_model_test/V_1/ is expected to
contain exactly one of: [saved_model.pb, saved_model.pbtxt].'
field: version.deployment_uri
name#cloudshell:~ (name_account_analytics)$ gcloud ai-platform versions create --model=model_test V_1 --region=us-central1 --framework=tensorflow --python-version=3.7 --runtime-version=1.15 --origin=gs://bucket_model_test/V_1/model.bst --staging-bucket=gs://bucket_model_test
Using endpoint [https://us-central1-ml.googleapis.com/]
ERROR: (gcloud.ai-platform.versions.create) FAILED_PRECONDITION: Field: version.deployment_uri Error: The provided URI for model files doesn't contain any objects.
- '#type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: The provided URI for model files doesn't contain any objects.
field: version.deployment_uri
How do I tell it to create it? Why doesn't it do it automatically?
Thanks for your help!
Welodya

Related

Azureml : error "The SSL connection could not be established, see inner exception." while creating Tabular Dataset from Azure Blob Storage file

I have a new error using Azure ML maybe due to the Ubuntu upgrade to 22.04 which I did yesterday.
I have a workspace azureml created through the portal and I can access it whitout any issue with python SDK
from azureml.core import Workspace
ws = Workspace.from_config("config/config.json")
ws.get_details()
output
{'id': '/subscriptions/XXXXX/resourceGroups/gr_louis/providers/Microsoft.MachineLearningServices/workspaces/azml_lk',
'name': 'azml_lk',
'identity': {'principal_id': 'XXXXX',
'tenant_id': 'XXXXX',
'type': 'SystemAssigned'},
'location': 'westeurope',
'type': 'Microsoft.MachineLearningServices/workspaces',
'tags': {},
'sku': 'Basic',
'workspaceid': 'XXXXX',
'sdkTelemetryAppInsightsKey': 'XXXXX',
'description': '',
'friendlyName': 'azml_lk',
'keyVault': '/subscriptions/XXXXX/resourceGroups/gr_louis/providers/Microsoft.Keyvault/vaults/azmllkXXXXX',
'applicationInsights': '/subscriptions/XXXXX/resourceGroups/gr_louis/providers/Microsoft.insights/components/azmllkXXXXX',
'storageAccount': '/subscriptions/XXXXX/resourceGroups/gr_louis/providers/Microsoft.Storage/storageAccounts/azmllkXXXXX',
'hbiWorkspace': False,
'provisioningState': 'Succeeded',
'discoveryUrl': 'https://westeurope.api.azureml.ms/discovery',
'notebookInfo': {'fqdn': 'ml-azmllk-westeurope-XXXXX.westeurope.notebooks.azure.net',
'resource_id': 'XXXXX'},
'v1LegacyMode': False}
I then use this workspace ws to upload a file (or a directory) to Azure Blob Storage like so
from azureml.core import Dataset
ds = ws.get_default_datastore()
Dataset.File.upload_directory(
src_dir="./data",
target=ds,
pattern="*dataset1.csv",
overwrite=True,
show_progress=True
)
which again works fine and outputs
Validating arguments.
Arguments validated.
Uploading file to /
Filtering files with pattern matching *dataset1.csv
Uploading an estimated of 1 files
Uploading ./data/dataset1.csv
Uploaded ./data/dataset1.csv, 1 files out of an estimated total of 1
Uploaded 1 files
Creating new dataset
{
"source": [
"('workspaceblobstore', '//')"
],
"definition": [
"GetDatastoreFiles"
]
}
My file is indeed uploaded to Blob Storage and I can see it either on azure portal or on azure ml studio (ml.azure.com).
The error comes up when I try to create a Tabular dataset from the uploaded file. The following code doesn't work :
from azureml.core import Dataset
data1 = Dataset.Tabular.from_delimited_files(
path=[(ds, "dataset1.csv")]
)
and it gives me the error :
ExecutionError:
Error Code: ScriptExecution.DatastoreResolution.Unexpected
Failed Step: XXXXXX
Error Message: ScriptExecutionException was caused by DatastoreResolutionException.
DatastoreResolutionException was caused by UnexpectedException.
Unexpected failure making request to fetching info for Datastore 'workspaceblobstore' in subscription: 'XXXXXX', resource group: 'gr_louis', workspace: 'azml_lk'. Using base service url: https://westeurope.experiments.azureml.net. HResult: 0x80131501.
The SSL connection could not be established, see inner exception.
| session_id=XXXXXX
After some research, I assumed it might be due to openssl version (which now is 1.1.1) but I am not sure and I surely don't know how to fix it...any ideas ?
According to the document there is no direct procedure to convert the file dataset into tabular dataset. Instead, we can create a workspace and that creates two storage methods (blobstorage which is the default storage, file storage). The SSL will be taken care by workspace.
We can create a datastore in the workspace and connect that to the blob storage.
Follow the procedure to do the same.
Create a workspace
If we want, we can create a dataset.
We can create from local files of datastore.
To choose a datastore, first we need to have a file in the datastore
Goto Datastores and click on create dataset. Observe that the name is workspaceblobstorage(default).
Fill the details and see that the dataset type is Tabular.
In the path, we will be having the local files path and can check there, under the select or create a datastore, it is showing default storage as blob.
After uploading, we can wee the name in this section which is a datastore and tabular dataset.
In your workspace created, check whether the public access is Disabled or Enabled. If disabled, it will not allow to access due to lack of SSL. Checkout the image below. After enabling, use the same procedure which was implemented till now.

DBT - [WARNING]: Did not find matching node for patch

I keep getting the error below when I use dbt run - I can't find anything on why this error occurs or how to fix it within the dbt documentation.
[WARNING]: Did not find matching node for patch with name 'vGenericView' in the 'models' section of file 'models\generic_schema\schema.sql'
did you by chance recently upgrade to dbt 1.0.0? If so, this means that you have a model, vGenericView defined in a schema.yml but you don't have a vGenericView.sql model file to which it corresponds.
If all views and tables defined in schema are 1 to 1 with model files then try to run dbt clean and test or run afterward.
Not sure what happened to my project, but ran into frustration looking for missing and/or misspelled files when it was just leftovers from different compiled files not cleaned out. Previously moved views around to different schemas and renamed others.
So the mistake is here in the naming:
The model name in the models.yml file should for example be: employees
And the sql file should be named: employees.sql
So your models.yml will look like:
version: 2
models:
- name: employees
description: "View of employees"
And there must be a model with file name: employees.sql
One case when this will happen is if you have the same data source defined in two different schema.yml file (or whatever you call it)

how to concatenate the OutputPathPlaceholder with a string with Kubeflow pipelines?

I am using Kubeflow pipelines (KFP) with GCP Vertex AI pipelines. I am using kfp==1.8.5 (kfp SDK) and google-cloud-pipeline-components==0.1.7. Not sure if I can find which version of Kubeflow is used on GCP.
I am bulding a component (yaml) using python inspired form this Github issue. I am defining an output like:
outputs=[(OutputSpec(name='drt_model', type='Model'))]
This will be a base output directory to store few artifacts on Cloud Storage like model checkpoints and model.
I would to keep one base output directory but add sub directories depending of the artifact:
<output_dir_base>/model
<output_dir_base>/checkpoints
<output_dir_base>/tensorboard
but I didn't find how to concatenate the OutputPathPlaceholder('drt_model') with a string like '/model'.
How can append extra folder structure like /model or /tensorboard to the OutputPathPlaceholder that KFP will set during run time ?
I didn't realized in the first place that ConcatPlaceholder accept both Artifact and string. This is exactly what I wanted to achieve:
ConcatPlaceholder([OutputPathPlaceholder('drt_model'), '/model'])

How do you specify Project ID in the AWS Glue to BigQuery connector?

I'm trying to use the AWS Glue connector to BigQuery following the tutorial in https://aws.amazon.com/blogs/big-data/migrating-data-from-google-bigquery-to-amazon-s3-using-aws-glue-custom-connectors/ but after following all steps I get a:
: java.lang.IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment. Please set a project ID using the builder.
The Python exception shows that:
Traceback (most recent call last):
File "/tmp/ETHBlockchainExport.py", line 20, in <module>
DataSource0 = glueContext.create_dynamic_frame.from_options(connection_type = "marketplace.spark", connection_options =
{
"parentProject": "MYGOOGLE_PROJECT_ID",
"connectionName": "BigQuery",
"table": "MYPROJECT.DATASET.TABLE"
}
So everything seems provided but still complains about Project Id. How can I provide that info to the connector?
You specify in the data source connector option when you create the Glue job, as key-value pair.
from your log it seems that you included the project id in the table field as well, should be dataset.table
Also another possibility is that you didn't specify the values of the project ID and table values etc in environment variable (this seems to be more likely based on the error shown)
Example
Reference

Federated table/query not working - "Cannot read in location: us-west1"

I have a GCS bucket in US-WEST1:
That bucket has two files:
wiki_1b_000000000000.csv.gz
wiki_1b_000000000001.csv.gz
I've created a external table definition to read those files like so:
The dataset where this external table definition exists is also in the US.
When I query it with:
SELECT
*
FROM
`grey-sort-challenge.bigtable.federated`
LIMIT
100
..I get the following error:
Error: Cannot read in location: us-west1
I tested with asia-northeast1 and it works fine.
Why isn't this working for the US region?
Faced the same earlier. See G's answer - must use us-central1 for now: https://issuetracker.google.com/issues/76127552#comment11
For people from Europe
If you get an error Cannot read in location: EU while trying to read from external source - regional GCS bucket, you have to place your data in region europe-west1 as per the same comment. Unfortunately it is not reflected in the documentation yet.
I wanted to create a federation(external table) to contiually load up data from a new csv file which was imported each day.
In attempting to do so I was getting "Error: Cannot read in location: xxxx "
I solved the problem by:
I recreated a NEW bucket, this time select the US ( Multiple regions )
I then went back to BIG query and created a NEW data set with the data location as United States (US)
Presto!, I am now able to query an (constantly updating) external table!