GBQexception: How to read data with big query that is stored on google drive spreadsheet - pandas

I uploaded a dataset to bigquery via the google drive option and linking the google spreadsheet to a dataset which I call 'dim_table'
I then created a query to pull data from that dim_table dataset that I run daily.
I am trying to create an automated script that will run the same query code I created to get the dim_table data set and create a new dataset call chart_A
When I run this simple code:
import pandas_gbq as gbq
gbq.read_gbq("Select * from data.dim_stats",'ProjectID')
I get an error:
GenericGBQException: Reason: 403 Access Denied: BigQuery BigQuery: No
OAuth token with Google Drive scope was found.
I have been trying to read documentation on pandas gbq but could not find any documentation that points me on how I can authenticate gdrive with pandas gbq or use oauth. Any help is appreciated! :)
Let me know if you need me to comeup with a sample table online for testing.
best

I haven't used pandas-gbq but authentication methods with BigQuery mentioned here [1].
Create service account with a BigQuery role that can access to your datasets [2].
Create and download the service account's JSON key [3].
Set the private_key parameter to a file path to the JSON file or a string contains the JSON contents.
Also related guide to query Google Drive data without using pandas-gbq is here [4].

Related

export BigQuery output to Google CloudStore

Our organization has data in Google Bigtable - hosted by our Vendor. We want to run jobs in BigQuery to query from Bigtable and export the data to CloudStore as .csv files without the storing the data as a dataset in BigQuery.
We do not want to store in BigQuery datasets as we are not doing any analysis using BigQuery as all Analysis is done using on premise Analytical solution.
Is this possible ?
You have a few options, and the best solution would be to automate using Cloud Workflows.
The steps I see would be:
Export from BigTable in Avro or Parquet format to Cloud Storage.
There is a gcloud and API way to do this described here.
You then import the exported files into BigQuery.
There is a a way to use bq CLI tool and API way as well to do this described here.
Then you export from BigQuery to multiple CSV files as it's documented here.
You get multiple CSV files, you can then run the gcloud compose tool to merge them.
All the above can be done in Cloud Workflows. Each call can be implemented either via API (preferred) or using the command line options using Cloud Build triggers for example. For Workflow syntax you can get guidance from this article, and the linked content from the footer section of the article.

Azure Synapse Analytics Error when using saveAsTable from a DataFrame which is loaded from a SQL source

I'm following the guide (https://learn.microsoft.com/en-us/azure/synapse-analytics/get-started) for loading data from a SQL Pool and writing the DataFrame to a table in the metastore. However I'm getting an error:
Error : org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, PUT, https://xxx.dfs.core.windows.net/tempdata/synapse/workspaces/xxx/sparkpools/SparkPool/sparkpoolinstances/8f3ec14a-1e59-4597-8fd9-42da0db65331?action=setAccessControl&timeout=90, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:fe61799c-e01f-0003-119e-37fdb1000000 Time:2020-05-31T22:57:55.8271281Z"
I've replaced my resource names with xxx.
Other DataFrame saveAsTable operations work fine. From what I can see, the data is being read from the SQL Pool successfully and being staged as when I browse the data lake location specified in the error I can see the data.
/tempdata/synapse/workspaces/xxx/sparkpools/SparkPool/sparkpoolinstances/8f3ec14a-1e59-4597-8fd9-42da0db65331
The Synapse workspace managed identity has storage blob data contributor permissions and my own domain account has owner access.
Has anyone else had issues?
Thanks
Andy
Please assign yourself (account with which you're trying to run the script) a role of Storage Blob Data Contributor.
Below information is now showing up during the creation of Azure Synapse workspace.
It was a big struggle to figure this out during it's private preview.
More information related to securing synapse workspace can be found here.
Let me know if this worked.
Thank you.

Does Google Cloud Dataprep support importing Google Drive Sheets as data sources?

I'm importing datasets in Google Cloud Dataprep (by Trifacta) to perform transformations on my data sources. But I can't see Google Drive Sheets in the list after connecting them to Big Query Console. I'm about to use them as rules for my transformations.
I've already created another dataset and the problem persists.
Is it possible to import them or not supported yet?
Thanks,
You are right. According to the documentation Dataprep only supports native BigQuery tables and views as BigQuery sources.
You could try downloading your Drive sheets as csv and then creating a BigQuery table from it, or maybe you could create a load job from your external table into a new native table using:
SELECT * FROM my_dataset.my_external_table

Execute Transfer in Google Bigquery - PERMISSION_DENIED: No OAuth token with Google Drive scope was found

I am trying the new 'Transfers' function in google BigQuery.
I am using the option: 'Scheduled Query'
It works with a simple query, but when I am trying another query that is normally working based on a view, that is based on a join between two tables (on table based on a google sheet shared with me) none of the more complicated Transfers I created are working.
I get the following error message:
Failed to start job for table 'xxx' with error PERMISSION_DENIED: Access Denied: BigQuery BigQuery: No OAuth token with Google Drive scope was found.
Is it because one of the source tables is based on a google sheet?
I tried to copy the source table to another table, but when I do this BigQuery automatically deletes this table.
Any ideas?
The problem is with the view which queries Google Drive data. In order to resolve your problem you need to request Google Drive scopes. Quoting directly from documentation:
Accessing data hosted within Google Drive requires an additional OAuth
scope, both when defining the federated source as well as during query
execution.
In the documentation page linked above you'll also find ways to do this via command line, api and web UI.

Getting a user's marketing source from Google Analytics

I'm a backend developer who has no experience with Google Analytics, but I've a requirement to find a way to collect the Marketing Medium/Source for each user from Google Analytics and save it in my database, I've been searching and looking how to get it from an API request but I didn't find a way yet, could you guys help?
You can use the Google Python API to fetch the Google Analytics data. You can read more here.
Medium and Source information can be found out by using the dimension ga:sourceMedium
You can find more info about dimensions and metrics here
Following which you can setup a daily script and fetches the data from your Google Analytics account and dumps data into csv which you can successively load into your database using libraries such as psycopg2.