Access Denied while globbing file pattern in transfer data from Google Cloud Platform to BigQuery - google-bigquery

I'm quite new to the BigQuery world so apologize if I'm asking a stupid question.
I'm trying to create a scheduled transfer data job that import data into BigQuery from Google Cloud Storage.
Unfortunately I always get the following error message:
Failed to start job for table MyTable with error PERMISSION_DENIED: Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.
I verified to have all the required permissions already but it still isn't working.

Related

Content of directory on path https://xxxxxxx.dfs.core.windows.net/dataverse-xxxx-org5a2/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed

When I try to query our Serverless SQL pool in Azure Synapse Analytics I get the following error:
"Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/account/Snapshot/2018-08_1656570292/*.csv' cannot be listed.".
I have checked out the following link for clues as to what could be cause:
https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?tabs=x80070002
It is suggested that the error is due permissions:
However, I believe I have the correct permissons,
I get this error whether I try to execute the query in SSMS or Synapse Workspace.
The error in SSMS is as follows:
Warning: Unable to resolve path https://xxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv. Error number 13807, Level 16, State 1, Message "Content of directory on path 'https://xxxxxx.dfs.core.windows.net/dataverse-xxxxx-org5a2bcccf/account/Snapshot/2018-10_1657304551/*.csv' cannot be listed.".
Can someone let me know how to resolve this?
The query that I'm attempting to execute can be located here:
https://github.com/slavatrofimov/Synapse-Link-for-Dataverse-data-enrichment-in-Serverless-SQL-Pools/blob/main/SQL/Enrich%20Synapse%20Link%20for%20Dataverse%20Entities%20with%20Human-Readable%20Labels.sql
Is there a definitive way to determine if the problem is due to lack of permissions?
Update Question:
I have just realised that the issue is access the Lake on https://xxxxxx.dfs.core.windows.net/dataverse-xxxxxx-org5a2bcccf/
Therefore please take a look at my permissons on the lake and let me know if it is sufficient?
This issue occurs when the user trying to query the external table does not have the relevant permissions or if there is a firewall enabled on your storage network.
When looked at the permissions you have provided, I see Storage Blob Data reader and Storage Blob Data contributor have been given.
Ref doc: Control storage account access for serverless SQL pool in Azure Synapse Analytics
In case if your storage account is firewall protect then you will have to follow the steps described in this document to overcome the issue: Access storage that is protected with the firewall
Here are couple of relevant articles which might help you configure your storage firewall to overcome this issue:
Storage configuration for external table is not accessible while query on Serverless
Synapse Studio error while trying to read data from Storage Account using SQL On Demand

Airflow Permission denied while getting Drive credentials

I am trying to run a bigquery query on Airflow with MWAA.
This query uses a table that is based on a Google Sheet. When I run it, I have the following error:
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
I already have a working Google cloud connection on Airflow with an admin service account.
Also:
This service account has access to the google sheet
I added https://www.googleapis.com/auth/drive in the scopes of the Airflow connection
I re-generated a JSON file
Am I doing something wrong? Any idea what I can do to fix this problem?
Thanks a lot
I fixed my issue by creating a NEW Airflow connection. It's a new google cloud connection, with the exact same values as the default google_cloud_default values. Now it works perfectly.
Hope it can help !

Access BigQuery data from Jupyter Notebook in AI Platform Google Cloud

I am trying to get access to the data stored in BigQuery from Jupyter Notebook in AI Platform on Google cloud platform.
First, I tried the following code:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(r'\local_path\gcpcred.json')
project_id = 'my-bq'
client = bigquery.Client(credentials= credentials,project=project_id)
The authentication credentials are stored in a json file named gcpcred on the local machine but
this gives me an error saying
FileNotFoundError: [Errno 2] No such file or directory:
'\local_path\gcpcred.json
I thought that since I am running this in AI Platform(on the cloud itself), I would not have to use this API and authenticate.
So I simply wrote:
%%bigquery
SELECT * FROM `project.dataset.table` LIMIT 1000
I got an error saying
ERROR:
403 Access Denied: User does not have access to the table
How do I access the table? Please help
Seems like the service account assosiated with jupyter notebooks doesn't have enough privilage to access bigquery. You can update it in IAM service Account section with required privilages.
The links Bellow will provide further clarification:
Visualizing BigQuery data in a Jupyter notebook
Getting started with authentication

Access Denied: Permission denied while getting Drive credentials

Since today our Airflow service is not able to access queries in BigQuery. All jobs fail with the following message:
[2021-03-12 10:17:28,079] {taskinstance.py:1150} ERROR - Reason: 403 GET https://bigquery.googleapis.com/bigquery/v2/projects/waipu-app-prod/queries/e62030d7-36eb-4420-b482-b5327f4f6c7e?maxResults=0&timeoutMs=900&location=EU: Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
We haven't changed anything in recent days. Therefore we are quite puzzled what the reason might be. Is there a temporary bug? Or might we have to check any settings?
Thanks & Best regards
Albrecht
I solved this by:
Giving the Airflow service account email access to Google Sheet where BigQuery table is derived from
Adding https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/bigquery,https://www.googleapis.com/auth/drive to scopes in the Airflow connection
Regenerating the service account JSON keyfile and pasting into the Keyfile JSON in the Airflow connection

Google ML Engine - Unable to log objective metric due to exception <HttpError 403>

I am running a TensorFlow application on the Google ML Engine with hyper-parameter tuning and I've been running into some strange authentication issues.
My Data and Permissions Setup
My trainer code supports two ways of obtaining input data for my model:
Getting a table from BigQuery.
Reading from a .csv file.
For my IAM permissions, I have two members set up:
My user account:
Assigned to the following IAM roles:
Project Owner (roles/owner)
BigQuery Admin (roles/bigquery.admin)
Credentials were created automatically when I used gcloud auth application-default login
A service account:
Assigned to the following IAM roles:
BigQuery Admin (roles/bigquery.admin)
Storage Admin (roles/storage.admin)
PubSub Admin (roles/pubsub.admin)
Credentials were downloaded to a .json file when I created it in the Google Cloud Platform interface.
The Problem
When I run my trainer code on the Google ML Engine using my user account credentials and reading from a .csv file, everything works fine.
However, if I try to get my data from BigQuery, I get the following error:
Forbidden: 403 Insufficient Permission (GET https://www.googleapis.com/bigquery/v2/projects/MY-PROJECT-ID/datasets/MY-DATASET-ID/tables/MY-TABLE-NAME)
This is the reason why I created a service account, but the service account has a separate set of issues. When using the service account, I am able to read from both a .csv file and from BigQuery, but in both cases, I get the following error at the end of each trial:
Unable to log objective metric due to exception <HttpError 403 when requesting https://pubsub.googleapis.com/v1/projects/MY-PROJECT-ID/topics/ml_MY-JOB-ID:publish?alt=json returned "User not authorized to perform this action.">.
This doesn't cause the job to fail, but it prevents the objective metric from being recorded, so the hyper-parameter tuning does not provide any helpful output.
The Question
I'm not sure why I'm getting these permission errors when my IAM members are assigned to what I'm pretty sure are the correct roles.
My trainer code works in every case when I run it locally (although PubSub is obviously not being used when running locally), so I'm fairly certain it's not a bug in the code.
Any suggestions?
Notes
There was one point at which my service account was getting the same error as my user account when trying to access BigQuery. The solution I stumbled upon is a strange one. I decided to remove all roles from my service account and add them again, and this fixed the BigQuery permission issue for that member.
Thanks for the very detailed question.
To explain what happened here, in the first case Cloud ML Engine used an internal service account (the one that is added to your project with the Cloud ML Service Agent role). Due to some internal security considerations, that service account is restricted from accessing BigQuery, so hence the first 403 error that you saw.
Now, when you replaced machine credentials with your own service account using the .json credentials file, that restriction went away. However your service account didn't have all the access to the internal systems, such as the pubsub service used for Hyperparameter tuning mechanism internally. Hence the pubsub error in the second case.
There are a few possible solutions to this problem:
on the Cloud ML Engine side, we're working on better BigQuery support out-of-the-box, although we don't have an ETA at this point.
your approach with a custom service account might work as a short-term solution as long as you don't use Hyperparameter tuning. However this is obviously fragile because it depends on the implementation details in Cloud ML Engine, so I wouldn't recommend relying on this long-term
finally, consider exporting data from BigQuery to GCS first and using GCS to read training data. This scenario is well-supported in Cloud ML Engine. Besides you'll get performance gains on large datasets compared to reading BigQuery directly: the current implementation of BigQueryReader in TensorFlow has suboptimal perf characteristics, which we're also working to improve.