Credentials Error when integrating Google Drive with - sql

I am using Google Big Query, I want to integrate Google Big Query to Google Drive. In Big query I am giving the Google spread sheet url to upload my data It is updating well, but when I write the query in google Add-on(OWOX BI Big Query Reports):
Select * from [datasetName.TableName]
I am getting an error:
Query failed: tableUnavailable: No suitable credentials found to access Google Drive. Contact the table owner for assistance.

I just faced the same issue in a some code I was writing - it might not directly help you here since it looks like you are not responsible for the code, but it might help someone else, or you can ask the person who does write the code you're using to read this :-)
So I had to do a couple of things:
Enable the Drive API for my Google Cloud Platform project in addition to BigQuery.
Make sure that your BigQuery client is created with both the BigQuery scope AND the Drive scope.
Make sure that the Google Sheets you want BigQuery to access are shared with the "...#appspot.gserviceaccount.com" account that your Google Cloud Platform identifies itself as.
After that I was able to successfully query the Google Sheets backed tables from BigQuery in my own project.

What was previously said is right:
Make sure that your dataset in BigQuery is also shared with the Service Account you will use to authenticate.
Make sure your Federated Google Sheet is also shared with the service account.
The Drive Api should as well be active
When using the OAuthClient you need to inject both scopes for the Drive and for the BigQuery
If you are writing Python:
credentials = GoogleCredentials.get_application_default() (can't inject scopes #I didn't find a way :D at least
Build your request from scratch:
scopes = (
'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/cloud-platform')
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/client_secret.json', scopes)
http = credentials.authorize(Http())
bigquery_service = build('bigquery', 'v2', http=http)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [test.federated_sheet]')
}
query_response = query_request.query(
projectId='hello_world_project',
body=query_data).execute()
print('Query Results:')
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))

This likely has the same root cause as:
BigQuery Credential Problems when Accessing Google Sheets Federated Table
Accessing federated tables in Drive requires additional OAuth scopes and your tool may only be requesting the bigquery scope. Try contacting your vendor to update their application?

If you're using pd.read_gbq() as I was, then this would be the best place to get your answer: https://github.com/pydata/pandas-gbq/issues/161#issuecomment-433993166
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df)

Related

Using a service account and JSON key which is sent to you to upload data into google cloud storage

I wrote a python script that uploads files from a local folder into Google cloud storage.
I also created a service account with sufficient permission and tested it on my computer using that service account JSON key and it worked.
Now I send the code and JSON key to someone else to run but the authentication fails on her side.
Are we missing any authentication through GCP UI?
def config_gcloud():
subprocess.run(
[
shutil.which("gcloud"),
"auth",
"activate-service-account",
"--key-file",
CREDENTIALS_LOCATION,
]
)
storage_client = storage.Client.from_service_account_json(CREDENTIALS_LOCATION)
return storage_client
def file_upload(bucket, source, destination):
storage_client = config_gcloud()
...
The error happens in the config_cloud and it says it is expecting str, path, ... but gets NonType.
As I said, the code is fine and works on my computer. How anotehr person can use it using JSON key which I sent her?She stored Json locally and path to Json is in the code.
CREDENTIALS_LOCATION is None instead of the correct path, hence it complaining about it being NoneType instead of str|Path.
Also you don't need that gcloud call, that would only matter for gcloud/gsutil commands, not python client stuff.
And please post the actual stacktrace of the error next time, not just a misspelled interpretation of it.

I want to download bigquery result from my own python code and something wrong with my authentication

I am trying to download bigQuery results from GCP and I was following the instruction on GCP documentation GCP authentication. It tells me to create a service account which I did, however, the output tells me that this service account has no permission to access the table
google.api_core.exceptions.Forbidden: 403 Access Denied: Table dbd-sdlc-prod:HKG_NORMALISED.HKG_NORMALISED: User does not have permission to query table dbd-sdlc-prod:HKG_NORMALISED.HKG_NORMALISED.
This reminds me that the table I wished to query was provided by a third party, they grant my account permission to access these data and the permission was only granted for my google account. I wish to find a way to authenticate it with my own account instead of a service account to download the query result, will it be possible and how can I do that exactly?
And following is the role for my test service account, I believe I have set them right as the top role "owner". Thanks in advance
from google.cloud import bigquery
bqclient = bigquery.Client()
# Download query results.
query_string = """
SELECT
Date_Time,
Price,
Volume,
Market_VWAP,
Qualifiers AS Qualifiers,
Ex_Cntrb_ID,
Qualifiers AS TradeCategory
FROM
`dbd-sdlc-prod.HKG_NORMALISED.HKG_NORMALISED`
WHERE
RIC = '1606.HK'
AND (Date_Time BETWEEN TIMESTAMP('2016-07-11 00:00:00.000000') AND
TIMESTAMP('2016-07-11 23:59:59.999999'))
AND Type="Trade"
AND Volume >0
AND Price >0
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())
If are you using Are using the Google Cloud SDK you can just run gcloud auth login and authenticate on your google account.
If not, you will have to Authenticate as an end user, here is the details and examples of how to do that.
On your code, you will have to add the code for authenticate your application. (Don't forget to do the other steps in the tutorial)
Your code will be like this:
from google.cloud import bigquery
from google_auth_oauthlib import flow
#--- Authentication
appflow = flow.InstalledAppFlow.from_client_secrets_file(
"client_secrets.json", scopes=["https://www.googleapis.com/auth/bigquery"]
)
if launch_browser:
appflow.run_local_server()
else:
appflow.run_console()
credentials = appflow.credentials
project = 'user-project-id'
#---
bqclient = bigquery.Client(project=project, credentials=credentials)
# Download query results.
query_string = """
SELECT
Date_Time,
Price,
Volume,
Market_VWAP,
Qualifiers AS Qualifiers,
Ex_Cntrb_ID,
Qualifiers AS TradeCategory
FROM
`dbd-sdlc-prod.HKG_NORMALISED.HKG_NORMALISED`
WHERE
RIC = '1606.HK'
AND (Date_Time BETWEEN TIMESTAMP('2016-07-11 00:00:00.000000') AND
TIMESTAMP('2016-07-11 23:59:59.999999'))
AND Type="Trade"
AND Volume >0
AND Price >0
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())

AAD Authentication with Azure Data Explorer (Kusto) not working for simple query via API

I'm attempting to access Kusto via the API with Python (a "headless" script, in other words), and would like to use an AAD application for authentication. I'm specifically working with the sample code on https://github.com/Azure/azure-kusto-python/blob/master/azure-kusto-data/tests/sample.py, which attempts to query the Samples > StormEvents table on the cluster https://help.kusto.windows.net. I can run the query in the Kusto explorer just fine, but I'm getting "Caller is not authorized to perform this action" when trying to run the sample code.
I followed the instructions on https://kusto.azurewebsites.net/docs/management/access-control/aad.html and https://kusto.azurewebsites.net/docs/management/access-control/how-to-provision-aad-app.html to create an AAD application on the Azure portal and add API permissions for Azure Data Explorer. In the code, I have the "Application (client) ID" from the portal in the client_id field, and the appropriate secret in the client_secret field. The authority_id field is set to 72f988bf-86f1-41af-91ab-2d7cd011db47, which is what's shown on the portal as well as the table on https://kusto.azurewebsites.net/docs/management/access-control/aad.html#authenticating-with-aad-programmatically The app name (and client ID) is accepted on https://www.analytics.msftcloudes.com/support/directory just fine.
The code is thus as follows (omitting the imports and the specific secrets):
cluster = "https://help.kusto.windows.net"
client_id = "<omitted>"
client_secret = "<omitted>"
authority_id = "72f988bf-86f1-41af-91ab-2d7cd011db47"
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(
cluster, client_id, client_secret, authority_id
)
client = KustoClient(kcsb)
db = "Samples"
query = "StormEvents | take 10"
response = client.execute(db, query)
The failure output is:
azure.kusto.data.exceptions.KustoServiceError: (KustoServiceError(...), [{'error': {'code': 'Forbidden', 'message': 'Caller is not authorized to perform this action', '#type': 'Kusto.DataNode.Exceptions.UnauthorizedDatabaseAccessException', '#message': "Principal 'AAD app id=(omitted)' is not authorized to access database 'Samples'.", '#context': {'timestamp': '2019-06-05T19:39:17.3493255Z', 'serviceAlias': 'HELP', 'machineName': 'KEngine000000', 'processName': 'Kusto.WinSvc.Svc', 'processId': 18832, 'threadId': 25568, 'appDomainName': 'Kusto.WinSvc.Svc.exe', 'clientRequestd': 'KPC.execute;9ede2b2d-5fba-478c-ad8f-8306284cf6e9', 'activityId': 'efdb96c9-da46-4d5f-b739-54661e7002e3', 'subActivityId': '33f89e2b-2347-447a-abe9-81e586d0e2a0', 'activityType': 'DN-FE-ExecuteQuery', 'parentActivityId': '438b2bb3-26fb-4f7e-813d-bc8a5c39ce1c', 'activityStack': '(Activity stack: CRID=KPC.execute;9ede2b2d-5fba-478c-ad8f-8306284cf6e9 ARID=efdb96c9-da46-4d5f-b739-54661e7002e3 > KD-Query-Client-ExecuteQueryAsKustoDataStream/5ddd9239-e742-4edc-ab3e-55d59a1f2c99 > P-WCF-Service-ExecuteQueryInternalAsKustoDataStream--IClientServiceCommunicationContract/438b2bb3-26fb-4f7e-813d-bc8a5c39ce1c > DN-FE-ExecuteQuery/33f89e2b-2347-447a-abe9-81e586d0e2a0)'}, '#permanent': True}}])
I've also added the sample cluster in Kusto Explorer, like the docs say.
Am I still missing something?
https://help.kusto.windows.net is the URL of an ADX cluster which is an exploratory aid, and only allows interactive access by AAD users (not AAD applications).
for running automation using AAD application authentication, you should redirect your code at your own cluster/database, on which you grant your AAD application the necessary permissions (database user/viewer)

Query data from Google Sheets-based table in BigQuery via API using service account

I can fetch data from native BigQuery tables using a service account.
However, I encounter an error when attempting to select from a Google Sheets-based table in BigQuery using the same service account.
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(
json_credentials_path='creds.json',
project='xxx',
)
# this works fine
print('test basic query: select 1')
job = client.run_sync_query('select 1')
job.run()
print('results:', list(job.fetch_data()))
print('-'*50)
# this breaks
print('attempting to fetch from sheets-based BQ table')
job2 = client.run_sync_query('select * from testing.asdf')
job2.run()
The output:
⚡ ~/Desktop ⚡ python3 bq_test.py
test basic query: select 1
results: [(1,)]
--------------------------------------------------
attempting to fetch from sheets-based BQ table
Traceback (most recent call last):
File "bq_test.py", line 16, in <module>
job2.run()
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/query.py", line 381, in run
method='POST', path=path, data=self._build_resource())
File "/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.cloud.exceptions.Forbidden: 403 POST https://www.googleapis.com/bigquery/v2/projects/warby-parker-1348/queries: Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
I've attempted to use oauth2client.service_account.ServiceAccountCredentials for explicitly defining scopes, including a scope for drive, but I get the following error when attempting to do so:
ValueError: This library only supports credentials from google-auth-library-python. See https://google-cloud-python.readthedocs.io/en/latest/core/auth.html for help on authentication with this library.
My understanding is that auth is handled via IAM now, but I don't see any roles to apply to this service account that have anything to do with drive.
How can I select from a sheets-backed table using the BigQuery python client?
I've ran into the same issue and figured out how to solve it.
When exploring google.cloud.bigquery.Client class, there is a global variable tuple SCOPE that is not being updated by any arguments nor by any Credentials object, persisting its default value to the classes that follows its use.
To solve this, you can simply add a new scope URL to the google.cloud.bigquery.Client.SCOPE tuple.
In the following code I add the Google Drive scope to it:
from google.cloud import bigquery
#Add any scopes needed onto this scopes tuple.
scopes = (
'https://www.googleapis.com/auth/drive'
)
bigquery.Client.SCOPE+=scopes
client = bigquery.Client.from_service_account_json(
json_credentials_path='/path/to/your/credentials.json',
project='your_project_name',
)
With the code above you'll be able to query data from Sheets-based tables in BigQuery.
Hope it helps!
I think you're right that you need to pass the scope for gdrive when authenticating. The scopes are passed here https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/core/google/cloud/client.py#L126 and it seems like the BigQuery client lacks these scopes https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/client.py#L117 . I suggest asking on github and also as a workaround you can try to override client credentials including gdrive scope, but you'll need to use google.auth.credentials from GoogleCloudPlatform/google-auth-library-python instead of oauth2client, as error message suggests.

twitter stream API track all tweet posted by user who registered in twitter application

I am creating the application which need to track all tweets from user who registered to my application, i tried to track those with streaming API , there are public API, user API , and site API,
in those API it just have an option to follow the user ID by add the comma separated user ID
https://dev.twitter.com/streaming/overview/request-parameters#follow
but i think it is not flexible, if there are a new user registered , i need to rebuild the HTTP request , and also if there are so many users try to listen this stream and query will be so long,
it will be
https://stream.twitter.com/1.1/statuses/filter.json?follow=[user1],[user2],[user3]........[userN],
i afraid the query wont fit, i just need a parameter to filter all user who registered in my application such as, for example.
https://stream.twitter.com/1.1/statuses/filter.json?application=[applicationID]
but i think twitter dev does not provide it
so, is there any way to filter stream by application ID?
I didn't see anything like tracking by application id. If your query become too complex (too many follows/keywords), public streaming api will reject it,
and you can't open more than 2 connections with user stream. So, last solution is using Site Stream, -> you can open as many user connections as you have users registered to your app.
BUT the docs says :
"Site Streams is currently in a closed beta. Applications are no
longer being accepted."
Contact twitter to be sure
Arstechnica has a very interesting article about it. Take a look at this code and the link in the end of this post
If you are using python pycurl will do the job. Its provides a way to execute a function for every little piece of data received.
import pycurl, json
STREAM_URL = "http://chirpstream.twitter.com/2b/user.json"
USER = "YOUR_USERNAME"
PASS = "XXXXXXXXX"
userlist = ['user1',...,'userN']
def on_receive(self, data):
self.buffer += data
if data.endswith("rn") and self.buffer.strip():
content = json.loads(self.buffer)
self.buffer = ""
if "text" in content and content['user'] in userlist:
#do stuff
conn = pycurl.Curl()
conn.setopt(pycurl.USERPWD, "%s:%s" % (USER, PASS))
conn.setopt(pycurl.URL, STREAM_URL)
conn.setopt(pycurl.WRITEFUNCTION, on_receive)
conn.perform()
You can find more information here Real time twitter stream api