Access BigQuery data from Jupyter Notebook in AI Platform Google Cloud - google-bigquery

I am trying to get access to the data stored in BigQuery from Jupyter Notebook in AI Platform on Google cloud platform.
First, I tried the following code:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(r'\local_path\gcpcred.json')
project_id = 'my-bq'
client = bigquery.Client(credentials= credentials,project=project_id)
The authentication credentials are stored in a json file named gcpcred on the local machine but
this gives me an error saying
FileNotFoundError: [Errno 2] No such file or directory:
'\local_path\gcpcred.json
I thought that since I am running this in AI Platform(on the cloud itself), I would not have to use this API and authenticate.
So I simply wrote:
%%bigquery
SELECT * FROM `project.dataset.table` LIMIT 1000
I got an error saying
ERROR:
403 Access Denied: User does not have access to the table
How do I access the table? Please help

Seems like the service account assosiated with jupyter notebooks doesn't have enough privilage to access bigquery. You can update it in IAM service Account section with required privilages.
The links Bellow will provide further clarification:
Visualizing BigQuery data in a Jupyter notebook
Getting started with authentication

Related

Query from Bigquery using local command line

I'm trying to query from BigQuery using PowerShell. I've initialised gcloud init and logged in to my account.
The request was this:
bq query --use_legacy_sql=false 'SELECT customer_id FROM `demo1.customers1`'
Resulting with this error:
BigQuery error in query operation: Error processing job
'PROJECT-ID:bqjob': Access Denied:
BigQuery BigQuery: Permission denied while getting Drive credentials.
This worked when I run it in cloud shell.
I've created a service account before and a key for the project. I tried to run this command and doesn't solve it:
gcloud auth activate-service-account SERVICE_ACCOUNT#DOMAIN.COM --key-file=D:/folder/key.json --project=MYPROJECT_ID
Service account should have the OAuth scope for Drive to access drive, below command can be used to authenticate with Drive.
gcloud auth login --enable-gdrive-access

Airflow Permission denied while getting Drive credentials

I am trying to run a bigquery query on Airflow with MWAA.
This query uses a table that is based on a Google Sheet. When I run it, I have the following error:
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
I already have a working Google cloud connection on Airflow with an admin service account.
Also:
This service account has access to the google sheet
I added https://www.googleapis.com/auth/drive in the scopes of the Airflow connection
I re-generated a JSON file
Am I doing something wrong? Any idea what I can do to fix this problem?
Thanks a lot
I fixed my issue by creating a NEW Airflow connection. It's a new google cloud connection, with the exact same values as the default google_cloud_default values. Now it works perfectly.
Hope it can help !

Permission denied error while accessing files under mount point via local file system API in Azure Synapse Notebook

Working from Azure Synapse Notebook, I have mounted the ADLS Gen2 folder using LinkedServiceA as per below command,
mssparkutils.fs.mount(
"abfss://<CONTAINER>#<STORAGENAME>.dfs.core.windows.net/", #ADLS GEN 2 PATH
"/adlsmount", #Mount Point Name
{ "linkedService" : "<REPLACE LINKED SERVICE NAME>"})
I am trying to access the mount path using the local file system API as below.
Folder structure is like container/team/SALES/BILLING/YEAR.
LinkedServiceA that is used to create the mount is having access to only SALES and its subfolders.
import os
jobId = mssparkutils.env.getJobId()
synfs_bill_path = f'synfs:/{jobId}/adlsmount/team/SALES/BILLING/YEAR' #SYNFS
local_bill_path= f'/synfs/{jobId}/adlsmount/team/SALES/BILLING/YEAR' #Local File System
mssparkutils.fs.ls(synfs_bill_path) #this is working
bills = os.listdir(local_bill_path) #this is failing with permission denied error
But i am able to list all the parent directories using Local File System API Path using os lib
local_base_path= f'/synfs/{jobId}/adlsmount/team/SALES/'
bills = os.listdir(local_base_path) #this is working,
print(bills ) #lists "BILLING" folder
Error Message
PermissionError: [Errno 13] Permission denied: '/synfs/152/adlsmount/team/SALES/BILLING/YEAR'
Traceback (most recent call last):
Spark API using synfs_bill_path is also working. I wanted to process large number of small files in the SALES/BILLING/YEAR to reduce the number of files.(spark read fails with large number of files)
I have tried to repro your code in my lab environment and your code works fine without any errors for me.
Permission denied [Errno 13] is mostly seen when you try to access a path without having the necessary permissions. Please make sure the user has all the necessary permissions.

Access Denied while globbing file pattern in transfer data from Google Cloud Platform to BigQuery

I'm quite new to the BigQuery world so apologize if I'm asking a stupid question.
I'm trying to create a scheduled transfer data job that import data into BigQuery from Google Cloud Storage.
Unfortunately I always get the following error message:
Failed to start job for table MyTable with error PERMISSION_DENIED: Access Denied: BigQuery BigQuery: Permission denied while globbing file pattern.
I verified to have all the required permissions already but it still isn't working.

Permissions Error using Jupyter magic command %load_ext google.cloud.bigquery

Apologies for the complexity of this question, and I really appreciate any help. I'm currently trying to follow the Google tutorial to visualize BigQuery data in a Jupyter notebook (https://cloud.google.com/bigquery/docs/visualize-jupyter). I have permission to use Project-1, but not Project-2.
When I execute the first 2 commands:
%load_ext google.cloud.bigquery
%%bigquery
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 15
...I get an error in the following format:
Caller does not have required permission to use project Project-2
However, when I run !gcloud config list in the notebook, it lists the following (along w/ the correct email account)
[accessibility]
screen_reader = true
[core]
disable_usage_reporting = True
project = Project-1
Your active configuration is: [default]
Am I incorrectly understanding how the %load_ext google.cloud.bigquery statement works? Thanks!
Go to project selector page and select project Project-2, and run gcloud config set project Project-2 command in a cloud shell. Than, check in API & Services -> Credentials section, if you have created any credentials, which allows you to access your enabled APIs, look here.
You can also execute gcloud auth login to specify the credentials that you want to use. Use the same ones that you login to the Google Cloud Console.
The BigQuery Python client library support querying data stored in BigQuery. %load_ext google.cloud.bigquery is one of the many Jupyter built-in commands, which loads the commands from the client library.
Let me know about the results. I hope it helps you.