Using a service account and JSON key which is sent to you to upload data into google cloud storage - authentication

I wrote a python script that uploads files from a local folder into Google cloud storage.
I also created a service account with sufficient permission and tested it on my computer using that service account JSON key and it worked.
Now I send the code and JSON key to someone else to run but the authentication fails on her side.
Are we missing any authentication through GCP UI?
def config_gcloud():
subprocess.run(
[
shutil.which("gcloud"),
"auth",
"activate-service-account",
"--key-file",
CREDENTIALS_LOCATION,
]
)
storage_client = storage.Client.from_service_account_json(CREDENTIALS_LOCATION)
return storage_client
def file_upload(bucket, source, destination):
storage_client = config_gcloud()
...
The error happens in the config_cloud and it says it is expecting str, path, ... but gets NonType.
As I said, the code is fine and works on my computer. How anotehr person can use it using JSON key which I sent her?She stored Json locally and path to Json is in the code.

CREDENTIALS_LOCATION is None instead of the correct path, hence it complaining about it being NoneType instead of str|Path.
Also you don't need that gcloud call, that would only matter for gcloud/gsutil commands, not python client stuff.
And please post the actual stacktrace of the error next time, not just a misspelled interpretation of it.

Related

Boto3 generate presinged url does not work

Here is my code that I use to create a s3 client and generate a presigned url, which are some quite standard codes. They have been up running in the server for quite a while. I pulled the code out and ran it locally in a jupyter notebook
def get_s3_client():
return get_s3(create_session=False)
def get_s3(create_session=False):
session = boto3.session.Session() if create_session else boto3
S3_ENDPOINT = os.environ.get('AWS_S3_ENDPOINT')
if S3_ENDPOINT:
AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID']
AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY']
AWS_DEFAULT_REGION = os.environ["AWS_DEFAULT_REGION"]
s3 = session.client('s3',
endpoint_url=S3_ENDPOINT,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_DEFAULT_REGION)
else:
s3 = session.client('s3', region_name='us-east-2')
return s3
s3 = get_s3_client()
BUCKET=[my-bucket-name]
OBJECT_KEY=[my-object-name]
signed_url = s3.generate_presigned_url(
'get_object',
ExpiresIn=3600,
Params={
"Bucket": BUCKET,
"Key": OBJECT_KEY,
}
)
print(signed_url)
When I tried to download the file using the url in the browser, I got an error message and it says "The specified key does not exist." I noticed in the error message that my object key becomes "[my-bucket-name]/[my-object-name]" rather than just "[my-object-name]".
Then I used the same bucket/key combination to generate a presigned url using aws cli, which is working as expected. I found out that somehow the s3 client method (boto3) inserted [my-object-name] in front of [my-object-name] compared to the aws cli method. Here are the results
From s3.generate_presigned_url()
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-bucket-name]/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAV17K253JHUDLKKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T175014Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=5cdcc38e5933e92b5xed07b58e421e5418c16942cb9ac6ac6429ac65c9f87d64
From aws cli s3 presign
https://[my-bucket-name].s3.us-east-2.amazonaws.com/[my-object-name]?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAYA7K15LJHUDAVKHB%2F20210520%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20210520T155926Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=58208f91985bf3ce72ccf884ba804af30151d158d6ba410dd8fe9d2457369894
I've been working on this and searching for solutions for day and half and I couldn't find out what was wrong with my implementation. I guess it might be that I ignored some basic but important settings to create a s3 client using boto3 or something else. Thanks for the help!
Ok, myth is solved, I shouldn't provide the endpoint_url=S3_ENDPOINT param when I create the s3 client, boto3 will figure it out. After i removed it, everything works as expected.

How to rename a file in blob storage by using Azure Datalake Gen2 Rest API

I've try to do the following instruction of this document : LINK
I used SAS authentication and added this to request header "x-ms-rename-source" but i kept getting this error "403-AuthorizationPermissionMismatch". Doing fine with all others api method but this one seem really tricky. Does anyone have success rename a file or directory with this one ?
Instead of using SAS authentication, i used authorization headers. You can check it here.
My request headers :
DateTime now = DateTime.UtcNow;
requestMessage.Headers.Add("x-ms-date", now.ToString("R", CultureInfo.InvariantCulture));
requestMessage.Headers.Add("x-ms-version", "2018-11-09");
//your source path you want to rename
requestMessage.Headers.Add("x-ms-rename-source", renameSourcePath);
//rename operation only accept authorize by shared key via header
requestMessage.Headers.Authorization = AzureStorageAuthenticationHelper.GetAuthorizationHeader(
StorageGen2AccountName, StorageGen2AccountKey, now, requestMessage);
You can try to rename the file in Blob Storage by using Storage Explorer tool
Kindly let us know if the above helps or you need further assistance on this issue.

Query data from Google Sheets-based table in BigQuery via API using service account

I can fetch data from native BigQuery tables using a service account.
However, I encounter an error when attempting to select from a Google Sheets-based table in BigQuery using the same service account.
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(
json_credentials_path='creds.json',
project='xxx',
)
# this works fine
print('test basic query: select 1')
job = client.run_sync_query('select 1')
job.run()
print('results:', list(job.fetch_data()))
print('-'*50)
# this breaks
print('attempting to fetch from sheets-based BQ table')
job2 = client.run_sync_query('select * from testing.asdf')
job2.run()
The output:
⚡ ~/Desktop ⚡ python3 bq_test.py
test basic query: select 1
results: [(1,)]
--------------------------------------------------
attempting to fetch from sheets-based BQ table
Traceback (most recent call last):
File "bq_test.py", line 16, in <module>
job2.run()
File "/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/query.py", line 381, in run
method='POST', path=path, data=self._build_resource())
File "/usr/local/lib/python3.6/site-packages/google/cloud/_http.py", line 293, in api_request
raise exceptions.from_http_response(response)
google.cloud.exceptions.Forbidden: 403 POST https://www.googleapis.com/bigquery/v2/projects/warby-parker-1348/queries: Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
I've attempted to use oauth2client.service_account.ServiceAccountCredentials for explicitly defining scopes, including a scope for drive, but I get the following error when attempting to do so:
ValueError: This library only supports credentials from google-auth-library-python. See https://google-cloud-python.readthedocs.io/en/latest/core/auth.html for help on authentication with this library.
My understanding is that auth is handled via IAM now, but I don't see any roles to apply to this service account that have anything to do with drive.
How can I select from a sheets-backed table using the BigQuery python client?
I've ran into the same issue and figured out how to solve it.
When exploring google.cloud.bigquery.Client class, there is a global variable tuple SCOPE that is not being updated by any arguments nor by any Credentials object, persisting its default value to the classes that follows its use.
To solve this, you can simply add a new scope URL to the google.cloud.bigquery.Client.SCOPE tuple.
In the following code I add the Google Drive scope to it:
from google.cloud import bigquery
#Add any scopes needed onto this scopes tuple.
scopes = (
'https://www.googleapis.com/auth/drive'
)
bigquery.Client.SCOPE+=scopes
client = bigquery.Client.from_service_account_json(
json_credentials_path='/path/to/your/credentials.json',
project='your_project_name',
)
With the code above you'll be able to query data from Sheets-based tables in BigQuery.
Hope it helps!
I think you're right that you need to pass the scope for gdrive when authenticating. The scopes are passed here https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/core/google/cloud/client.py#L126 and it seems like the BigQuery client lacks these scopes https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/client.py#L117 . I suggest asking on github and also as a workaround you can try to override client credentials including gdrive scope, but you'll need to use google.auth.credentials from GoogleCloudPlatform/google-auth-library-python instead of oauth2client, as error message suggests.

Credentials Error when integrating Google Drive with

I am using Google Big Query, I want to integrate Google Big Query to Google Drive. In Big query I am giving the Google spread sheet url to upload my data It is updating well, but when I write the query in google Add-on(OWOX BI Big Query Reports):
Select * from [datasetName.TableName]
I am getting an error:
Query failed: tableUnavailable: No suitable credentials found to access Google Drive. Contact the table owner for assistance.
I just faced the same issue in a some code I was writing - it might not directly help you here since it looks like you are not responsible for the code, but it might help someone else, or you can ask the person who does write the code you're using to read this :-)
So I had to do a couple of things:
Enable the Drive API for my Google Cloud Platform project in addition to BigQuery.
Make sure that your BigQuery client is created with both the BigQuery scope AND the Drive scope.
Make sure that the Google Sheets you want BigQuery to access are shared with the "...#appspot.gserviceaccount.com" account that your Google Cloud Platform identifies itself as.
After that I was able to successfully query the Google Sheets backed tables from BigQuery in my own project.
What was previously said is right:
Make sure that your dataset in BigQuery is also shared with the Service Account you will use to authenticate.
Make sure your Federated Google Sheet is also shared with the service account.
The Drive Api should as well be active
When using the OAuthClient you need to inject both scopes for the Drive and for the BigQuery
If you are writing Python:
credentials = GoogleCredentials.get_application_default() (can't inject scopes #I didn't find a way :D at least
Build your request from scratch:
scopes = (
'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/cloud-platform')
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/client_secret.json', scopes)
http = credentials.authorize(Http())
bigquery_service = build('bigquery', 'v2', http=http)
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [test.federated_sheet]')
}
query_response = query_request.query(
projectId='hello_world_project',
body=query_data).execute()
print('Query Results:')
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))
This likely has the same root cause as:
BigQuery Credential Problems when Accessing Google Sheets Federated Table
Accessing federated tables in Drive requires additional OAuth scopes and your tool may only be requesting the bigquery scope. Try contacting your vendor to update their application?
If you're using pd.read_gbq() as I was, then this would be the best place to get your answer: https://github.com/pydata/pandas-gbq/issues/161#issuecomment-433993166
import pandas_gbq
import pydata_google_auth
import pydata_google_auth.cache
# Instead of get_user_credentials(), you could do default(), but that may not
# be able to get the right scopes if running on GCE or using credentials from
# the gcloud command-line tool.
credentials = pydata_google_auth.get_user_credentials(
scopes=[
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/cloud-platform',
],
# Use reauth to get new credentials if you haven't used the drive scope
# before. You only have to do this once.
credentials_cache=pydata_google_auth.cache.REAUTH,
# Set auth_local_webserver to True to have a slightly more convienient
# authorization flow. Note, this doesn't work if you're running from a
# notebook on a remote sever, such as with Google Colab.
auth_local_webserver=True,
)
sql = """SELECT state_name
FROM `my_dataset.us_states_from_google_sheets`
WHERE post_abbr LIKE 'W%'
"""
df = pandas_gbq.read_gbq(
sql,
project_id='YOUR-PROJECT-ID',
credentials=credentials,
dialect='standard',
)
print(df)

How do I generate authenticated S3 urls using IAM roles?

When I had my hard-coded access key and secret key present, I was able to generate authenticated urls for users to view private files on S3. This was done with the following code:
import hmac
import sha
import urllib
import time
filename = "".join([s3_url_prefix, filename])
expiry = str(int(time.time()) + 3600)
h = hmac.new(
current_app.config.get("S3_SECRET_KEY", None),
"".join(["GET\n\n\n", expiry, "\n", filename]),
sha
)
signature = urllib.quote_plus(base64.encodestring(h.digest()).strip())
return "".join([
"https://s3.amazonaws.com",
filename,
"?AWSAccessKeyId=",
current_app.config.get("S3_ACCESS_KEY", None),
"&Expires=",
expiry,
"&Signature=",
signature
])
Which gave me something to the effect of https://s3.amazonaws.com/bucket_name/path_to_file?AWSAccessKeyId=xxxxx&Expires=5555555555&Signature=eBFeN32eBb2MwxKk4nhGR1UPhk%3D. Unfortunately, I am unable to store keys in config files for security reasons. For this reason, I switched over to IAM roles. Now, I get my keys using:
_iam = boto.connect_iam()
S3_ACCESS_KEY = _iam.access_key
S3_SECRET_KEY = _iam.secret_key
However, this gives me the error "The AWS Access Key Id you provided does not exist in our records.". From my research, I understand that this is because my IAM keys aren't the actual keys, but instead used with a token. My question therefore, is twofold:
How do I get the token programatically? It doesn't seem that there is a simple iam property I can use.
How do I send the token in the signature? I believe my signature should end up looking something like "".join(["GET\n\n\n", expiry, "\n", token, filename]), but I can't figure out what format to use.
Any help would be greatly appreciated.
There was a change to the generate_url method in https://github.com/boto/boto/commit/99e14f3df039997f54a5377cb6aecc83f22c2670 (June 2012) that made it possible to sign a URL using session credentials. That means you would need to be using a boto version 2.6.0 or later. If you are, you should be able to just do this:
import boto
s3 = boto.connect_s3()
url = s3.generate_url(expires_in=3600, method='GET', bucket='<bucket_name>', key='<key_name>')
What version are you using?