Big Query remote function access denied on cloud function gen2 - google-bigquery

I've not been able to reach a high number of instances with BigQuery remote functions & first gen cloud function (link). As such, I've deployed a 2nd gen cloud function with the same code/config. But I get a Access denied error from the BQ web interface.
The connection does have the invoke permission. This is confirmed by the fact that if I configure the connection to call a first gen cloud function I don't get a access denied error. This is illustrated bellow where the first gen call works, while the second gen does not while both are using the same connection.
CREATE OR REPLACE FUNCTION `project_name`.trash.add_fake_first_gen(user_id int64, corp_id STRING) RETURNS STRING REMOTE
WITH CONNECTION `project_name.eu.gcf-con` OPTIONS (endpoint = 'first_gen_url', max_batching_rows=1);
SELECT`project_name.trash.add_fake_first_gen`(1, "B");
CREATE OR REPLACE FUNCTION `project_name`.trash.add_fake_second_gen(user_id int64, corp_id STRING) RETURNS STRING REMOTE
WITH CONNECTION `project_name.eu.gcf-con` OPTIONS (endpoint = 'second_gen_url', max_batching_rows=1);
SELECT `project_name.trash.add_fake_second_gen`(1, "B");
Both cloud function share the same networking configuration & service account:
Configuration of the first gen cloud function (working):
Configuration of the second gen cloud function (access denied):
Does 2nd gen functions need additional configuration to work with remote functions?

As suggest by #guillaumeblaquiere, the service account associated with the cloud function gen2 should also have the Cloud Run invoker role:

Related

Using a service account and JSON key which is sent to you to upload data into google cloud storage

I wrote a python script that uploads files from a local folder into Google cloud storage.
I also created a service account with sufficient permission and tested it on my computer using that service account JSON key and it worked.
Now I send the code and JSON key to someone else to run but the authentication fails on her side.
Are we missing any authentication through GCP UI?
def config_gcloud():
subprocess.run(
[
shutil.which("gcloud"),
"auth",
"activate-service-account",
"--key-file",
CREDENTIALS_LOCATION,
]
)
storage_client = storage.Client.from_service_account_json(CREDENTIALS_LOCATION)
return storage_client
def file_upload(bucket, source, destination):
storage_client = config_gcloud()
...
The error happens in the config_cloud and it says it is expecting str, path, ... but gets NonType.
As I said, the code is fine and works on my computer. How anotehr person can use it using JSON key which I sent her?She stored Json locally and path to Json is in the code.
CREDENTIALS_LOCATION is None instead of the correct path, hence it complaining about it being NoneType instead of str|Path.
Also you don't need that gcloud call, that would only matter for gcloud/gsutil commands, not python client stuff.
And please post the actual stacktrace of the error next time, not just a misspelled interpretation of it.

Why Big Query remote function do not activate more than 60 cloud function instances?

I've started to work with remote functions:
https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions
I've been able to setup a cloud function & call it with BigQuery. But no more than 60 instances of this cloud function are active at the same time, while the maximum is set to 3000.
This small number of instance seems not to be impacted by changing max_batching_rows nor the number of rows on which the function is called.
Configuration of the cloud function:
Graph showing the small number of instances active:
Variation over time are due to successive test with various load.
Code of the cloud function:
A delay of 10s has been added for each call, it matches the time my processing will take.
import json
import time
import uuid
def add_fake_user(request):
request_json = request.get_json(silent=True)
replies = []
calls = request_json['calls']
call_id = str(uuid.uuid4())
for call in calls:
time.sleep(10)
userno = call[0]
corp = call[1]
replies.append({
'username': f'user_{userno}',
'email': f'user_{userno}#{corp}.com',
'n_call': len(calls),
'call_id': call_id
})
return json.dumps({
# each reply is a STRING (JSON not currently supported)
'replies': [json.dumps(reply) for reply in replies]
})
configuration of the remote function:
CREATE OR REPLACE FUNCTION `PROJECT_NAME`.trash.add_fake_user(user_id int64, corp_id STRING) RETURNS STRING
REMOTE WITH CONNECTION `PROJECT_NAME.eu.gcf-conn` OPTIONS (endpoint = 'my_url', max_batching_rows=1)
Query calling the remote function
SELECT
`PROJECT_NAME`.trash.add_fake_user(var1, var2) AS foo
FROM
base
I've created an issue on Google's issue tracker: https://issuetracker.google.com/issues/235252503

Access SQL DB Managed Identity in Data Factory using Key Vault

I'm trying to connect to Azure SQL DB using AD Authentication (Managed Identity) in Data Factory by saving the connection string in Azure Key Vault. I've setup the Managed Identity access in Azure SQL DB by providing the access to ADF (ADF name). I've stored the connection string in Key Vault in following formats but I was not successful.
Tried following formats of connection strings:
Server=tcp:xxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxx;Authentication = 'Active Directory Interactive';
Server=tcp:xxxxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxxxxxx;User ID=DatafactoryName;Authentication = 'Active Directory Interactive'; -- Actual DatafactoryName
Server=tcp:xxxxxxxxxxxxxx.windows.net;Initial Catalog=xxxxxxxxx;User ID=MSI_ID;Authentication = 'Active Directory Interactive'; -- Actual MSI ID for the DataFactory
Server=tcp:xxxxxxxxxxxxxx.windows.net;Initial Catalog=xxxxxxxxx;User ID=a;Authentication = 'Active Directory Interactive'; -- Tried arbitrary value
I'm getting the following error
The connection string should be:
Data Source=tcp:<servername>.database.windows.net,1433;Initial Catalog=<databasename>;Connection Timeout=30
The connection should like this:
Ref: Managed identities for Azure resources authentication and Reference secret stored in key vault
You can try
Integrated Security=False;Encrypt=True;Connection Timeout=30;Data Source=xxxxxxxxxx.database.windows.net;Initial Catalog=xxxxxxx

Can't connect Azure Table Storage to PowerBI (415) Unsupported Media Type)

I'm getting the error below while connecting to Azure Table Storage,
Details:
Blockquote "AzureTables: Request failed: The remote server returned an error:
(415) Unsupported Media Type. (None of the provided media types are
supported)
The one thing I noticed is that if I fill up only the account name it will automatically add the rest of the url which is ".table.core.windows.net" where in the portal its table.cosmosdb.azure.com.
With core.windows.net Im getting err "AzureTables: Request failed: The remote name could not be resolved". But it might messing up some headers while using table.cosmosdb.azure.com
Please advise.
Thank you.
m
You should be able to connect to your azure table storage/CosmosDB account using powerBi using the following link structure: https://STORAGEACCOUNTNAME.table.core.windows.net/ , or https://yourcosmosdbname.documents.azure.com:443/ for cosmosdb
You can get the correct link by going to Portal > go to Storage accounts > Click on Tables/CosmosDB > You'll find the table link you would like to link to powerbi > remove the last table name after "/", then use it to connect in powerbi, it will later allow you to select the specific table in powerBI:
These are screenshots from testing for CosmosDB:
Errors 415:
When it comes to these errors, they can be caused by cache, which can be flushed by going to:
In Power BI Desktop: Go to "File" and select "Options". Under "Data Load" you have the option to clear the cache. After doing this you can use "Get Data" and "OData-feed" as normal and the URL won't return the 415 error
Check the following link for additional suggestions:
Not clear how you consume the table service API, but here is the solution that worked for me for React SPA and fetch api.
Request header must contain:
"Content-Type":"application/json"
It was failing for me with single quotes, and worked with double.

Credentials Error when integrating Google Drive with

I am using Google Big Query, I want to integrate Google Big Query to Google Drive. In Big query I am giving the Google spread sheet url to upload my data It is updating well, but when I write the query in google Add-on(OWOX BI Big Query Reports):
Select * from [datasetName.TableName]
I am getting an error:
Query failed: tableUnavailable: No suitable credentials found to access Google Drive. Contact the table owner for assistance.
I just faced the same issue in a some code I was writing - it might not directly help you here since it looks like you are not responsible for the code, but it might help someone else, or you can ask the person who does write the code you're using to read this :-)
So I had to do a couple of things:
Enable the Drive API for my Google Cloud Platform project in addition to BigQuery.
Make sure that your BigQuery client is created with both the BigQuery scope AND the Drive scope.
Make sure that the Google Sheets you want BigQuery to access are shared with the "...#appspot.gserviceaccount.com" account that your Google Cloud Platform identifies itself as.
After that I was able to successfully query the Google Sheets backed tables from BigQuery in my own project.
What was previously said is right:
Make sure that your dataset in BigQuery is also shared with the Service Account you will use to authenticate.
Make sure your Federated Google Sheet is also shared with the service account.
The Drive Api should as well be active
When using the OAuthClient you need to inject both scopes for the Drive and for the BigQuery
If you are writing Python:
credentials = GoogleCredentials.get_application_default() (can't inject scopes #I didn't find a way :D at least
Build your request from scratch:
scopes = (
'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/cloud-platform')
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/client_secret.json', scopes)
http = credentials.authorize(Http())
bigquery_service = build('bigquery', 'v2', http=http)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [test.federated_sheet]')
}
query_response = query_request.query(
projectId='hello_world_project',
body=query_data).execute()
print('Query Results:')
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))
This likely has the same root cause as:
BigQuery Credential Problems when Accessing Google Sheets Federated Table
Accessing federated tables in Drive requires additional OAuth scopes and your tool may only be requesting the bigquery scope. Try contacting your vendor to update their application?
If you're using pd.read_gbq() as I was, then this would be the best place to get your answer: https://github.com/pydata/pandas-gbq/issues/161#issuecomment-433993166
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df)