pandas_gbq Authorization Code field missing in Visual Studio Code - - pandas

Using the visual code studio, I tried to run query the data from Google BigQuery on my jupyter notebook by utilizing pandas_gbq library. The code is simple like:
query = """
SELECT * FROM `bigquery-dataset.my-table`
limit 100
"""
df = gbq.read_gbq(query = query, project_id='MYPROJECT',dialect='standard',reauth=True)
after run this code, it returns the link to approve the authorization on my Google account. Once I click the link and approved it, I got the Authorization code to copy-paste. But, I can't found the field to paste the Authorization code. Usually, when I run jupyter notebook in browser the field located on below of link.

Related

Read/Write Data in Google Spreadsheet

I am not using MS office in my local machine. So I am using Google Docs.
Now I need to create a script that fetches the data from google spreadsheet in Selenium.
I want to fetch the data[Read/Write] from the Google spreadsheet using selenium web-driver.
Is anyone have an idea about how to do it?
Technologies:
Selenium Web-Driver
JAVA
TestNG
Eclipse IDE
I don't have access to Google Sheets right now, but I'm guessing it would look something like this.
pip install gspread oauth2client
Then...
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
# Find a workbook by name and open the first sheet
# Make sure you use the right name here.
sheet = client.open("Copy of Legislators 2017").sheet1
# Extract and print all of the values
list_of_hashes = sheet.get_all_records()
print(list_of_hashes)
Or, get a list of lists:
sheet.get_all_values()
Finally, you could just pull the data from a single row, column, or cell:
sheet.row_values(1)
sheet.col_values(1)
sheet.cell(1, 1).value
https://www.twilio.com/blog/2017/02/an-easy-way-to-read-and-write-to-a-google-spreadsheet-in-python.html
https://towardsdatascience.com/accessing-google-spreadsheet-data-using-python-90a5bc214fd2

Is it possible to use service accounts to schedule queries in BigQuery "Schedule Query" feature ?

We are using the Beta Scheduled query feature of BigQuery.
Details: https://cloud.google.com/bigquery/docs/scheduling-queries
We have few ETL scheduled queries running overnight to optimize the aggregation and reduce query cost. It works well and there hasn't been much issues.
The problem arises when the person who scheduled the query using their own credentials leaves the organization. I know we can do "update credential" in such cases.
I read through the document and also gave it some try but couldn't really find if we can use a service account instead of individual accounts to schedule queries.
Service accounts are cleaner and ties up to the rest of the IAM framework and is not dependent on a single user.
So if you have any additional information regarding scheduled queries and service account please share.
Thanks for taking time to read the question and respond to it.
Regards
BigQuery Scheduled Query now does support creating a scheduled query with a service account and updating a scheduled query with a service account. Will these work for you?
While it's not supported in BigQuery UI, it's possible to create a transfer (including a scheduled query) using python GCP SDK for DTS, or from BQ CLI.
The following is an example using Python SDK:
r"""Example of creating TransferConfig using service account.
Usage Example:
1. Install GCP BQ python client library.
2. If it has not been done, please grant p4 service account with
iam.serviceAccout.GetAccessTokens permission on your project.
$ gcloud projects add-iam-policy-binding {user_project_id} \
--member='serviceAccount:service-{user_project_number}#'\
'gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com' \
--role='roles/iam.serviceAccountTokenCreator'
where {user_project_id} and {user_project_number} are the user project's
project id and project number, respectively. E.g.,
$ gcloud projects add-iam-policy-binding my-test-proj \
--member='serviceAccount:service-123456789#'\
'gcp-sa-bigquerydatatransfer.iam.gserviceaccount.com'\
--role='roles/iam.serviceAccountTokenCreator'
3. Set environment var PROJECT to your user project, and
GOOGLE_APPLICATION_CREDENTIALS to the service account key path. E.g.,
$ export PROJECT_ID='my_project_id'
$ export GOOGLE_APPLICATION_CREDENTIALS=./serviceacct-creds.json'
4. $ python3 ./create_transfer_config.py
"""
import os
from google.cloud import bigquery_datatransfer
from google.oauth2 import service_account
from google.protobuf.struct_pb2 import Struct
PROJECT = os.environ["PROJECT_ID"]
SA_KEY_PATH = os.environ["GOOGLE_APPLICATION_CREDENTIALS"]
credentials = (
service_account.Credentials.from_service_account_file(SA_KEY_PATH))
client = bigquery_datatransfer.DataTransferServiceClient(
credentials=credentials)
# Get full path to project
parent_base = client.project_path(PROJECT)
params = Struct()
params["query"] = "SELECT CURRENT_DATE() as date, RAND() as val"
transfer_config = {
"destination_dataset_id": "my_data_set",
"display_name": "scheduled_query_test",
"data_source_id": "scheduled_query",
"params": params,
}
parent = parent_base + "/locations/us"
response = client.create_transfer_config(parent, transfer_config)
print response
As far as I know, unfortunately you can't use a service account to directly schedule queries yet. Maybe a Googler will correct me, but the BigQuery docs implicitly state this:
https://cloud.google.com/bigquery/docs/scheduling-queries#quotas
A scheduled query is executed with the creator's credentials and
project, as if you were executing the query yourself
If you need to use a service account (which is great practice BTW), then there are a few workarounds listed here. I've raised a FR here for posterity.
This question is very old and came on this thread while I was searching for same.
Yes, It is possible to use service account to schedule big query jobs.
While creating schedule query job, click on "Advance options", you will get option to select service account.
By default is uses credential of requesting user.
Image from bigquery "create schedule query"1

ZeroBrane : Register APIs on a per file basis

I'm writing a ZeroBrane Studio plugin for our Solarus Game Engine and It works like a charm. Autocompletion included.
I'm wondering now if it's do-able to register lua APIs for one file only.
I need this to offer autocompletion/documentation on global symbols that may vary per-script but are deducible from annex files from the engine.
To summary : Is it possible to register an api for a single file? For example in the onEditorLoad() event.
Thanks.
Greg
EDIT:
I tried the following without sucess:
local function switch_editor(editor)
if current_editor == editor then
ide:Print("same editor")
return
end
current_editor = editor
if not editor then
ide:Print("null ed")
return
end
lua_file_path = ide:GetDocument(editor).filePath
if lua_file_path:match('/data/maps/') then
ide:Print("map file!",type(editor))
local map_api = make_map_api(lua_file_path)
current_api = map_api
ide:AddAPI('lua','solarus_map',map_api)
else
ide:Print('other file')
if current_api then
ide:RemoveAPI('lua','solarus_map')
current_api = nil
end
end
end
api = {"baselib", "solarus", "solarus_map"}, --in interpreter table
... -- in the plugin table :
onEditorFocusSet = function(self,editor)
switch_editor(editor)
end,
Completion with the solarus api works fine but the on-fly registration of the solarus_map api seem not to be taken in account.
EDIT2:
Silly my, I must have done a typo, because after checking and rewriting some things pretty much as in the code pasted above... it works! Awesome!
The only small gotcha is that when switching to a file where I don't want the solarus_map API... ide:RemoveAPI isn't sufficient. Instead I must do ide:AddAPI('lua','solarus_map',{}) to replace the API with an empty one. Which I can live with.
To summary, to achieve a custom api which change from file to file:
Add the api name to the interpreter
In the onEditorFocusSet event, update the API with ide:AddAPI(...), eventually setting it to {} if it needs to be empty/disabled.
Code sample in the editions of my Question.

Making a Google BigQuery from Python on Windows

I am trying to do something which is very simple in other data services. I am trying to make a relatively simple SQL query and return it as a dataframe in python. I am on Windows 10 and using Phython 2.7 (specifically Canopy 1.7.4)
Typically this would be done with pandas.read_sql_query but due to some specifics with BigQuery they require a different method pandas.io.gbq.read_gbq
This method works fine unless you want to make a Big Query. If you make a Big Query on BigQuery you get the error
GenericGBQException: Reason: responseTooLarge, Message: Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
This was asked and answered before in this ticket but neither of the solutions are relevant for my case
Python BigQuery allowLargeResults with pandas.io.gbq
One solution is for python 3 so it is a nonstarter. The other is giving an error due to me being unable to set my credentials as an environment variable in windows.
ApplicationDefaultCredentialsError: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
I was able to download the JSON credentials file and I have set it as an environment variable in the few ways I know how but I still get the above error. Do I need to load this in some way in python? It seems to be looking for it but unable to find is correctly. Is there a special way to set it as an environment variable in this case?
You can do it in Python 2.7 by changing the default dialect from legacy to standard in pd.read_gbq function.
pd.read_gbq(query, 'my-super-project', dialect='standard')
Indeed, you can read in Big Query documentation for the parameter AllowLargeResults:
AllowLargeResults: For standard SQL queries, this flag is
ignored and large results are always allowed.
I have found two ways of directly importing the JSON credentials file. Both based on the original answer in Python BigQuery allowLargeResults with pandas.io.gbq
1) Credit to Tim Swast
First
pip install google-api-python-client
pip install google-auth
pip install google-cloud-core
then
replace
credentials = GoogleCredentials.get_application_default()
in create_service() with
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path/file.json')
2)
Set the environment variable manually in the code like
import os,os.path
os.environ['GOOGLE_APPLICATION_CREDENTIALS']=os.path.expanduser('path/file.json')
I prefer method 2 since it does not require new modules to be installed and is also closer to the intended use of the JSON credentials.
Note:
You must create a destinationTable and add the information to run_query()
Here is a code that fully works within python 2.7 on Windows:
import pandas as pd
my_qry="<insert your big query here>"
### Here Put the data from your credentials file of the service account - all fields are available from there###
my_file="""{
"type": "service_account",
"project_id": "cb4recs",
"private_key_id": "<id>",
"private_key": "<your private key>\n",
"client_email": "<email>",
"client_id": "<id>",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "<x509 url>"
}"""
df=pd.read_gbq(qry,project_id='<your project id>',private_key=my_file)
That's it :)

How do I make a Bigquery dataset public using command line tool or Python?

I'm making an open data website powered by BigQuery. How do I make a Bigquery dataset public using command line tool or Python?
Note I tried to make every dataset in my project public but got an unexplained error. In project permission settings via WebUI under "Add members" I put
allAuthenticatedUsers and did the permission Data Viewer. The error was "Error
Sorry, there’s a problem. If you entered information, check it and try again. Otherwise, the problem might clear up on its own, so check back later."
I wasn't able to find any command line examples for updating permissions. I also can't find a JSON string to pass to https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets/update
To achieve this programatically, you need to use a dataset patch request and use the specialGroup item with the value allAuthenticatedUsers, like so:
{
"datasetReference":{
"projectId":"<removed>",
"datasetId":"<removed>"
},
"access":[
... //other access roles
{
"specialGroup":"allAuthenticatedUsers",
"role":"READER"
}
]
}
Note: You should use a read-modify-write cycle as described here & here:
Note about arrays: Patch requests that contain arrays replace the existing array with the one you provide. You cannot modify, add, or delete items in an array in a piecemeal fashion.