Vertex AI Pipeline is failing while trying to get data from BigQuery - google-bigquery

I am trying to run a Google Vertex AI pipeline to query from a BigQuery table. In the pipeline, I am using the right project and the service account(which has bigquery.jobs.create access). But I see when it runs, it is accessing another project e1cd7306fb577e88gq-uq. I am not able to figure out where from this project is coming from. I am running the pipeline from Vertex AI user managed notebook
pandas_gbq.exceptions.GenericGBQException: Reason: 403 POST https://bigquery.googleapis.com/bigquery/v2/projects/e1cd7306fb577e88gq-uq/jobs?prettyPrint=false: Access Denied: Project e1cd7306fb577e88gq-uq: User does not have bigquery.jobs.create permission in project e1cd7306fb577e88gq-uq.

The service agent or service account running your code does have the required permission, but your code is trying to access a resource in the wrong project. Due to the way Vertex AI runs your training code, this problem can occur inadvertently if you don't explicitly specify a project ID or project number in your code.
You can explicitly select the project you want this way:
import os
from google.cloud import bigquery
project_number = os.environ["CLOUD_ML_PROJECT_ID"]
client = bigquery.Client(project=project_number)
You can read more about training code requirements here.

Related

dbt : Database Error Insufficient Permission

In my current dbt project, I run everything on the same Google cloud project (let's say project : dataA). Since datasets becomes a lot, I decide to split the project into 2: The current project for import of raw data and a new project (for example : dataB) for production environment where I stock all data marts.
I use a service account to manage the lecture or editing data sources for both two projects. And I am sure that there is no issues on rights. The profile setting is quite similar to my current settings which work fine.
But I am experiencing some Database Error issues from dbt say that I don't have Insufficient Permission.
Does anyone have an idea about the reason of the issue? And how to fix it?
Many thanks!

Error when creating scheduled query on Bigquery "Error creating scheduled query: er"

I just started a new project on Google Cloud, set up some bigquery datasets and tables. I now want to set up some scheduled queries. I have already enabled BigQuery Data Transfer API. My query is valid (it's just SELECT * FROM table). I can't find anything about this error online.
See screenshot
UPDATE: I've experimented a bit and it seems to be an organization wide issue. All projects, new and old within my organization get this same error when trying to schedule a query. I tried for a project in a different organization and did not have the issue. What could be causing this error for ALL projects in an organization?
UPDATE 2:
By querying a table that is not empty the error change to "Error creating scheduled query: Yn" instead of "Error creating scheduled query: er" (when the scheduled query would have queried an empty table).
I faced the same issue than you, and basically I just needed to run the query first before creating the the scheduled query... And that did the trick.
from the BQ FAQs :
"Scheduled queries use features of BigQuery Data Transfer Service. Verify that you have completed all actions required in Enabling BigQuery Data Transfer Service."
basically, what this means is that you need to enable the data transfer api in your project, AND give the user who creates the scheduled query a BQ admin role in order to have the right permissions to access that transfer service.
If done right, you should get a popup when creating the scheduled query to confirm that the data transfer service has access to your uses account (if you block popups you might not see this message and get stuck)
If this error only occurs in your organisation, I believe it might be caused by a organisation policy on Google Cloud. I would encourage you to double check if there is any org policy causing this error. If that's not the case, open a support ticket with GCP.
What worked for me was signing in through Incognito Mode with just my account and attempting to save the scheduled query. I have multiple Google Accounts signed it at one time and for whatever reason, BigQuery throws this generic error after authorization is successful and BigQuery is granted the access it requested.
You need to make sure that you are creating the query under the project targeted not in any other projects because it won't appear
Also you need to enable the API as one of the above answers
This eventually worked for me when i ran this in an cognito window

How to migrate MlFlow experiments from one Databricks workspace to another with registered models?

so unfortunatly we have to redeploy our Databricks Workspace in which we use the MlFlow functonality with the Experiments and the registering of Models.
However if you export the user folder where the eyperiment is saved with a DBC and import it into the new workspace, the Experiments are not migrated and are just missing.
So the easiest solution did not work. The next thing I tried was to create a new experiment in the new workspace. Copy all the experiment data from the dbfs of the old workspace (with dbfs cp -r dbfs:/databricks/mlflow source, and then the same again to upload it to the new workspace) to the new one. And then just reference the location of the data to the experiment like in the following picture:
This is also not working, no run is visible, although the path is already existing.
The next idea was that the registred models are the most important one so at least those should be there and accessible. For that I used the documentation here: https://www.mlflow.org/docs/latest/model-registry.html.
With the following code you get a list of the registred models on the old workspace with the reference on the run_id and location.
from mlflow.tracking import MlflowClient
client = MlflowClient()
for rm in client.list_registered_models():
pprint(dict(rm), indent=4)
And with this code you can add models to a model registry with a reference to the location of the artifact data (on the new workspace):
# first the general model must be defined
client.create_registered_model(name='MyModel')
# and then the run of the model you want to registre will be added to the model as version one
client.create_model_version( name='MyModel', run_id='9fde022012046af935fe52435840cf1', source='dbfs:/databricks/mlflow/experiment_id/run_id/artifacts/model')
But that did also not worked out. if you go into the Model Registry you get a message like this: .
And I really checked, at the given path (the source) there the data is really uploaded and also a model is existing.
Do you have any new ideas to migrate those models in Databricks?
There is no official way to migrate experiments from one workspace to another. However, leveraging the MLflow API, there is an "unofficial" tool that can migrate experiments minus the notebook revision associated with a run.
mlflow-tools
As an addition to #Andre's anwser
you can also check mlflow-export-import from the same developer
mlflow-export-import

Google BigQuery, unable to load data into shared datasets

I created a project on Google BigQuery and enabled billing.
Went on to create few datasets that were shared with my team members (Can EDIT premissions).
However, my team mates are unable to load data into the respective datasets shared with them. Whenever they try it says billing not enabled for this project.
I am able to load data into the datasets but not my team.
It's been more than 24 hours
Thanks in advance
Note that in order to load data, they need to run a load job, and that load job needs to be run in a project. Perhaps billing is not enabled on the project they are using?
You can give your team members read access to the project (or greater) to allow them to run jobs in your own billing-enabled project.
You can share a BigQuery project at the project level and at the dataset level.
See https://developers.google.com/bigquery/access-control.
I assume you are sharing at the dataset level. Can you try sharing the project instead with your team members? (here: https://cloud.google.com/console/project)
Please report back!

BigQuery authorization

I am trying to use the bq command line tool to load data into BigQuery from GCS bucket and I receive the following error message:
BigQuery error in load operation: Access Denied: Job mythical-maxim-293:bqjob_r11765e0cd9ceb52b_000001427694f0e1_1: RUN_JOB
I was using service account (with private key) for authentication. I followed the following links for granting the service account access level:
https://developers.google.com/bigquery/loading-data-into-bigquery
https://developers.google.com/bigquery/access-control
The service account email was granted WRITE access with the BigQuery dataset and READ access with the GCS bucket.
Note: Adding the service account email as a writer to the project, solved the issue but this is not feasible for my case. I am not allowed to request project level write access but BigQuery and GCS (readonly).
Thanks!
In order to run the job, the service account must be given at least READ permissions on the project. This is because whoever runs a job in the project can do things that cost the project owner money (e.g. run queries).
To add the service account to the project, go to https://cloud.google.com/console, then click on "Permissions", then "Add member".
You must provide the WRITE permission on the dataset.
https://cloud.google.com/bigquery/loading-data-into-bigquery#access
This is bad, as WRITE permission imply that you have READ permission. But, for bigquery READ is paid and Load is free. For doing free task, access to paid service should be necessary.
Google must correct this.