Read/Write Data in Google Spreadsheet - selenium

I am not using MS office in my local machine. So I am using Google Docs.
Now I need to create a script that fetches the data from google spreadsheet in Selenium.
I want to fetch the data[Read/Write] from the Google spreadsheet using selenium web-driver.
Is anyone have an idea about how to do it?
Technologies:
Selenium Web-Driver
JAVA
TestNG
Eclipse IDE

I don't have access to Google Sheets right now, but I'm guessing it would look something like this.
pip install gspread oauth2client
Then...
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds']
creds = ServiceAccountCredentials.from_json_keyfile_name('client_secret.json', scope)
client = gspread.authorize(creds)
# Find a workbook by name and open the first sheet
# Make sure you use the right name here.
sheet = client.open("Copy of Legislators 2017").sheet1
# Extract and print all of the values
list_of_hashes = sheet.get_all_records()
print(list_of_hashes)
Or, get a list of lists:
sheet.get_all_values()
Finally, you could just pull the data from a single row, column, or cell:
sheet.row_values(1)
sheet.col_values(1)
sheet.cell(1, 1).value
https://www.twilio.com/blog/2017/02/an-easy-way-to-read-and-write-to-a-google-spreadsheet-in-python.html
https://towardsdatascience.com/accessing-google-spreadsheet-data-using-python-90a5bc214fd2

Related

pandas_gbq Authorization Code field missing in Visual Studio Code -

Using the visual code studio, I tried to run query the data from Google BigQuery on my jupyter notebook by utilizing pandas_gbq library. The code is simple like:
query = """
SELECT * FROM `bigquery-dataset.my-table`
limit 100
"""
df = gbq.read_gbq(query = query, project_id='MYPROJECT',dialect='standard',reauth=True)
after run this code, it returns the link to approve the authorization on my Google account. Once I click the link and approved it, I got the Authorization code to copy-paste. But, I can't found the field to paste the Authorization code. Usually, when I run jupyter notebook in browser the field located on below of link.

Is there a way to automate this Python script in GCP?

I am a complete beginner in using GCP functions/products.
I have written the following code below, that takes a list of cities from a local folder, and call in weather data for each city in that list, eventually uploading those weather values into a table in BigQuery. I don't need to change the code anymore, as it creates new tables when a new week begins, now I would want to "deploy" (I am not even sure if this is called deploying a code) in the cloud for it to automatically run there. I tried using App Engine and Cloud Functions but faced issues in both places.
import requests, json, sqlite3, os, csv, datetime, re
from google.cloud import bigquery
#from google.cloud import storage
list_city = []
with open("list_of_cities.txt", "r") as pointer:
for line in pointer:
list_city.append(line.strip())
API_key = "PLACEHOLDER"
Base_URL = "http://api.weatherapi.com/v1/history.json?key="
yday = datetime.date.today() - datetime.timedelta(days = 1)
Date = yday.strftime("%Y-%m-%d")
table_id = f"sonic-cat-315013.weather_data.Historical_Weather_{yday.isocalendar()[0]}_{yday.isocalendar()[1]}"
credentials_path = r"PATH_TO_JSON_FILE"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = credentials_path
client = bigquery.Client()
try:
schema = [
bigquery.SchemaField("city", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Date", "Date", mode="REQUIRED"),
bigquery.SchemaField("Hour", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Temperature", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Humidity", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Condition", "STRING", mode="REQUIRED"),
bigquery.SchemaField("Chance_of_rain", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Precipitation_mm", "FLOAT", mode="REQUIRED"),
bigquery.SchemaField("Cloud_coverage", "INTEGER", mode="REQUIRED"),
bigquery.SchemaField("Visibility_km", "FLOAT", mode="REQUIRED")
]
table = bigquery.Table(table_id, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="Date", # name of column to use for partitioning
)
table = client.create_table(table) # Make an API request.
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
except:
print("Table {}_{} already exists".format(yday.isocalendar()[0], yday.isocalendar()[1]))
def get_weather():
try:
x["location"]
except:
print(f"API could not call city {city_name}")
global day, time, dailytemp, dailyhum, dailycond, chance_rain, Precipitation, Cloud_coverage, Visibility_km
day = []
time = []
dailytemp = []
dailyhum = []
dailycond = []
chance_rain = []
Precipitation = []
Cloud_coverage = []
Visibility_km = []
for i in range(24):
dayval = re.search("^\S*\s" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
timeval = re.search("\s(.*)" ,x["forecast"]["forecastday"][0]["hour"][i]["time"])
day.append(dayval.group()[:-1])
time.append(timeval.group()[1:])
dailytemp.append(x["forecast"]["forecastday"][0]["hour"][i]["temp_c"])
dailyhum.append(x["forecast"]["forecastday"][0]["hour"][i]["humidity"])
dailycond.append(x["forecast"]["forecastday"][0]["hour"][i]["condition"]["text"])
chance_rain.append(x["forecast"]["forecastday"][0]["hour"][i]["chance_of_rain"])
Precipitation.append(x["forecast"]["forecastday"][0]["hour"][i]["precip_mm"])
Cloud_coverage.append(x["forecast"]["forecastday"][0]["hour"][i]["cloud"])
Visibility_km.append(x["forecast"]["forecastday"][0]["hour"][i]["vis_km"])
for i in range(len(time)):
time[i] = int(time[i][:2])
def main():
i = 0
while i < len(list_city):
try:
global city_name
city_name = list_city[i]
complete_URL = Base_URL + API_key + "&q=" + city_name + "&dt=" + Date
response = requests.get(complete_URL, timeout = 10)
global x
x = response.json()
get_weather()
table = client.get_table(table_id)
varlist = []
for j in range(24):
variables = city_name, day[j], time[j], dailytemp[j], dailyhum[j], dailycond[j], chance_rain[j], Precipitation[j], Cloud_coverage[j], Visibility_km[j]
varlist.append(variables)
client.insert_rows(table, varlist)
print(f"City {city_name}, ({i+1} out of {len(list_city)}) successfully inserted")
i += 1
except Exception as e:
print(e)
continue
In the code, there is direct reference to two files that is located locally, one is the list of cities and the other is the JSON file containing the credentials to access my project in GCP. I believed that uploading these files in Cloud Storage and referencing them there won't be an issue, but then I realised that I can't actually access my Buckets in Cloud Storage without using the credential files.
This leads me to being unsure whether the entire process would be possible at all, how do I authenticate in the first place from the cloud, if I need to reference that first locally? Seems like an endless circle, where I'd authenticate from the file in Cloud Storage, but I'd need authentication first to access that file.
I'd really appreciate some help here, I have no idea where to go from this, and I also don't have great knowledge in SE/CS, I only know Python R and SQL.
For Cloud Functions, the deployed function will run with the project service account credentials by default, without needing a separate credentials file. Just make sure this service account is granted access to whatever resources it will be trying to access.
You can read more info about this approach here (along with options for using a different service account if you desire): https://cloud.google.com/functions/docs/securing/function-identity
This approach is very easy, and keeps you from having to deal with a credentials file at all on the server. Note that you should remove the os.environ line, as it's unneeded. The BigQuery client will use the default credentials as noted above.
If you want the code to run the same whether on your local machine or deployed to the cloud, simply set a "GOOGLE_APPLICATION_CREDENTIALS" environment variable permanently in the OS on your machine. This is similar to what you're doing in the code you posted; however, you're temporarily setting it every time using os.environ rather than permanently setting the environment variable on your machine. The os.environ call only sets that environment variable for that one process execution.
If for some reason you don't want to use the default service account approach outlined above, you can instead directly reference it when you instantiate the bigquery.Client()
https://cloud.google.com/bigquery/docs/authentication/service-account-file
You just need to package the credential file with your code (i.e. in the same folder as your main.py file), and deploy it alongside so it's in the execution environment. In that case, it is referenceable/loadable from your script without needing any special permissions or credentials. Just provide the relative path to the file (i.e. assuming you have it in the same directory as your python script, just reference only the filename)
There may be different flavors and options to deploy your application and these will depend on your application semantics and execution constraints.
It will be too hard to cover all of them and the official Google Cloud Platform documentation cover all of them in great details:
Google Compute Engine
Google Kubernetes Engine
Google App Engine
Google Cloud Functions
Google Cloud Run
Based on my understanding of your application design, the most suitable ones would be:
Google App Engine
Google Cloud Functions
Google Cloud Run: Check these criteria to see if you application is a good fit for this deployment style
I would suggest using Cloud Functions as you deployment option in which case your application will default to using the project App Engine service account to authenticate itself and perform allowed actions. Hence, you should only check if the default account PROJECT_ID#appspot.gserviceaccount.com under the IAM configuration section has proper access to needed APIs (BigQuery in your case).
In such a setup, you want need to push your service account key to Cloud Storage which I would recommend to avoid in either cases, and you want need to pull it either as the runtime will handle authentication the function for you.

Data overwrite google sheet - Jupyter connection

I created a connection between my Jupyter notebook and google sheet.
My idea was to create a log so everytime I run the notebook it would update my google sheet with the new data but I dont want to overwrite the existing data, I want to add. I tried many solutions but it didnt work
Currently my code is:
## Connect to our service account
scope =["https://spreadsheets.google.com/feeds",'https://www.googleapis.com/auth/spreadsheets',"https://www.googleapis.com/auth/drive.file","https://www.googleapis.com/auth/drive"]
credentials = ServiceAccountCredentials.from_json_keyfile_name('jupyter-and-gsheet-303208-63903bea8f5d.json', scope)
gc = gspread.authorize(credentials)
spreadsheet_key = '1RbPnMdJ-EcJHbly280vrJxc8UvqwiBPkUTFLyo4efEA'
from df2gspread import df2gspread as d2g
wks_name = 'Data04'
d2g.upload(df_apn1, spreadsheet_key, wks_name, credentials=credentials)
It works perfectly but always overwriting the existing data.
Does anybody know how I can add instead of replace?
thank you
df2gspread document for upload() indicates that
if spreadsheet already exists, all data of provided worksheet(or first as default) will be replaced with data of given DataFrame, make sure that this is what you need!.
Another workaround is to convert your dataframe to a list and use gspread append_rows.
Example:
Code:
import gspread
import pandas as pd
gc = gspread.service_account()
sh = gc.open_by_key("someid").sheet1
df = pd.DataFrame({'Name': ['Bea', 'Andrew', 'Mike'], 'Age': [20, 19, 23]})
values = df.values.tolist()
sh.append_rows(values)
Before append:
After append:
You may also check the following libraries:
gspread-pandas
gspread-dataframe
Reference:
gspread

Repeating actions with Selenium with a different value each time

I am new-ish to Selenium, so I use Katalon Automation Recorder through Chrome to quickly draft scripts.
I have a script that makes an account on a website, but I want to make more than one account at a time (using a catchall). Is there a way for Selenium/Katalon to alternate its input from a database of preset emails (CSV sort of thing) or even generate random values in-front of the #domain.com each time the script loops over?
Here is the current state of the script:
Thanks
As #Shivan Mishra mentioned, you have to do some data driven testing. In Katalon you can created test data in object repository (See https://docs.katalon.com/katalon-studio/docs/manage-test-data.html)
You can manage your test data in script like following example:
import static com.kms.katalon.core.testdata.TestDataFactory.findTestData
def data = findTestData('path/to/your/testdata/in/object repository')
for(int=0;i<data.getRowNumbers();i++){
def value = data.getValue(1, i)
// do any action with your value
}

How to write a Python script that uses the OpenERP ORM to directly upload to Postgres Database

I need to write a "standalone" script in Python to upload sales taxes to the account_tax table in the database using ONLY the ORM module of OpenERP. What I would like to do is something like the pseudo code below.
Can someone provide me a more details on the following:
1) what sys.path's do I need to set
2) what modules do I need to import before importing the "account" module. Currently when I import the "account" module I get the following error:
AssertionError: The report "report.custom" already exists!
3) What is the proper way to get my database cursor. In the code below I am simply calling psycopg2 directly to get a cursor.
If this approach cannot work, can anyone suggest an alternative approach other than writing XML files to load the data from the OpenERP application itself. This process needs to run outside of the the standard OpenERP application.
PSEUDO CODE:
import sys
# set Python paths to access openerp modules
sys.path.append("./openerp")
sys.path.append("./openerp/addons")
# import OpenERP
import openerp
# import the account addon modules that contains the tables
# to be populated.
import account
# define connection string
conn_string2 = "dbname='test2' user='xyz' password='password'"
# get a db connection
conn = psycopg2.connect(conn_string2)
# conn.cursor() will return a cursor object
cursor = conn.cursor()
# and finally use the ORM to insert data into table.
If you wanna do it via web service then have look at the OpenERP XML-RPC Web services
Example code top work with OpenERP Web Services :
import xmlrpclib
username = 'admin' #the user
pwd = 'admin' #the password of the user
dbname = 'test' #the database
# OpenERP Common login Service proxy object
sock_common = xmlrpclib.ServerProxy ('http://localhost:8069/xmlrpc/common')
uid = sock_common.login(dbname, username, pwd)
#replace localhost with the address of the server
# OpenERP Object manipulation service
sock = xmlrpclib.ServerProxy('http://localhost:8069/xmlrpc/object')
partner = {
'name': 'Fabien Pinckaers',
'lang': 'fr_FR',
}
#calling remote ORM create method to create a record
partner_id = sock.execute(dbname, uid, pwd, 'res.partner', 'create', partner)
More clearly you can also use the OpenERP Client lib
Example Code with client lib :
import openerplib
connection = openerplib.get_connection(hostname="localhost", database="test", \
login="admin", password="admin")
user_model = connection.get_model("res.users")
ids = user_model.search([("login", "=", "admin")])
user_info = user_model.read(ids[0], ["name"])
print user_info["name"]
You see both way are good but when you use the client lib, code is less and easy to understand while using xmlrpc proxy is lower level calls that you will handle
Hope this will help you.
As per my view one must go for XMLRPC or NETSVC services provided by Open ERP for such needs.
You don't need to import accounts module of Open ERP, there are possibilities that other modules have inherited accounts.tax object and had altered its behaviour as per your business needs.
Eventually if you feed data by calling those methods manually without using Open ERP Web service its possible you'll get undesired result / unexpected failures / inconsistent database state.
You can use Erppeek to browse data, but not sure if you can really upload data to DB, personally I use/prefer XMLRPC
Why don't you use the xmlrpc call of openerp.
it will not need to import account or openerp . and even you can have all orm functionality.
You can use python library to access openerp server using xmlrpc service.
Please check https://github.com/OpenERP/openerp-client-lib
It is officially supported by OpenERP SA.
If you want to interacti directly with the DB, you could just import psycopg2 and:
conn = psycopg2.connect(dbname='dbname', user='dbuser', password='dbpassword', host='dbhost')
cur = conn.cursor()
cur.execute('select * from table where id = %d' % table_id)
cur.execute('insert into table(column1, column2) values(%d, %d)' % (value1, value2))
cur.close()
conn.close()
Why you want to fix it like that?! You should create a localization module and define data in XML files. This is the standard way to fix such a problem in OpenERP.
You want to insert sales taxes for which country? Explain more plz.
from openerp.modules.registry import RegistryManager
registry = RegistryManager.get("databasename")
with registry.cursor() as cr:
user = registry.get('res.users').browse(cr, userid, listids)
print user