I am new to the python and want to form SQL query dynamically in python.so tried below sample code:
empId = 12
query = ''' select name, ''' +
if empId > 10:
'''basic_salary'''
else:
''' bonus'''
+ ''' from employee '''
print(query)
but , getting syntax error. does anyone knows how to form dynamic query in python.
You need to indicate that the assignment to query continues on the next line, which you can do with a \ at the end of the line. Also, you need to write the if statement as an inline if expression as you can't have an if statement in the middle of an assignment statement:
empId = 12
query = ''' select name, ''' + \
('''basic_salary''' if empId > 10 else ''' bonus''') + \
''' from employee '''
print(query)
Output:
select name, basic_salary from employee
If you have multiple conditions, you can just add to query in the conditions. For example:
empId = 6
query = 'select name, '
if empId > 10:
query += 'basic_salary'
elif empId > 5:
query += 'benefits'
else:
query += 'bonus'
query += ' from employee'
print(query)
Output
select name, benefits from employee
#dynamic sql query formation using python
#This is for PostgresSQL you can use this for other queries as well:
def updateQuery(self,tableName,setFields,setValues,whereFields,whereValues):
print("Generating update query started")
querySetfields = None
queryWhereFields = None
# Loop for set fields
for i in range(len(setFields)):
if querySetfields is None:
querySetfields=setFields[i]+"='"+setValues[i]+"'"
else:
querySetfields=querySetfields+","+setFields[i]+"='"+setValues[i]+"'"
# Loop for whereFields
for i in range(len(whereFields)):
if queryWhereFields is None:
queryWhereFields=whereFields[i]+"='"+whereValues[i]+"'"
else:
queryWhereFields=queryWhereFields+","+whereFields[i]+"='"+whereValues[i]+"'"
#Form the complete update query
query="UPDATE "+tableName+" SET "+querySetfields+" WHERE "+queryWhereFields
print("Generating update query completed")
return query
print(updateQuery(None,"EMPLOYEE_DETAILS",["EMPI_ID","EMP_LANID","EMP_NAME","EMP_EMAIL"],["A","B","C"],["EMPI_ID","EMP_LANID"],["X","Y","Z"]))
Related
I am trying to use a loop to update a table in Bigquery. My table structure is as following (with 100 columns and thousands of rows):
DATE
PERIOD1
PERIOD2
PERIOD3
PERIOD4
PERIOD5
PERIOD6
PERIOD...
PERIOD100
2021-01-01
row
2021-02-01
row
For each date, I would need to use a loop to populate the values with something like
---
DECLARE VAR_PERIOD INT64 DEFAULT 1
LOOP
IF PERIOD > 100 THEN LEAVE;
END IF;
---
update `mydataset.mytable` set CONCAT('PERIOD',VAR_PERIOD) = (select{+my query})
which obviously cannot work, so I'm wondering what alternative method can be used to easily update my table columns ?
For this you can try using BigQuery API client libraries link.
There are more languages available but I am using python here.
You can directly start in cloud shell. There you can write a Python program to
do your job.
There is some assumption I am taking regarding your requirements :
You want to use SELECT clause to get some value and using which you want to
update values of Period columns.
from google.cloud import bigquery
''' Construct a BigQuery client object. '''
client = bigquery.Client()
query = """
select col_name from `projectID.dataset.table`
where condition
"""
''' Make an API request. '''
query_job = client.query(query)
''' Store the value of query result in some variable (value). '''
for row in query_job:
value = row[0]
'''
Creating the query to update the columns using the value.
UPADTE `projectID.dataset.table`
SET Period1 = Period1 + value, Period2 = Period2 + value ...
where condition
'''
query = "UPDATE `projectID.dataset.table` SET "
for i in range(1,101):
query += 'period'+str(i)+' = ' + 'period'+ str(i) + ' + ' +str(value) +','
query = query[0:-1]
query += ' WHERE condition'
''' Make an API request. '''
query_job = client.query(query)
All BigQuery Update statement must have a WHERE Clause. If you
want to update all the rows than in WHERE condition mention TRUE link.
Scripting/procedures for BigQuery just came out in beta - is it possible to invoke procedures using the BigQuery python client?
I tried:
query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)
print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?
Thank you!
Edited - stored procedure I am calling
CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
THEN
SET saleExists = (SELECT 1);
ELSE
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
SET saleExists = (SELECT 0);
END IF;
END;
If you follow the CALL command with a SELECT statement, you can get the return value of the function as a result set. For example, I created the following stored procedure:
BEGIN
-- Build an array of the top 100 names from the year 2017.
DECLARE
top_names ARRAY<STRING>;
SET
top_names = (
SELECT
ARRAY_AGG(name
ORDER BY
number DESC
LIMIT
100)
FROM
`bigquery-public-data.usa_names.usa_1910_current`
WHERE
year = 2017 );
-- Which names appear as words in Shakespeare's plays?
SET
top_shakespeare_names = (
SELECT
ARRAY_AGG(name)
FROM
UNNEST(top_names) AS name
WHERE
name IN (
SELECT
word
FROM
`bigquery-public-data.samples.shakespeare` ));
END
Running the following query will return the procedure's return as the top-level results set.
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
In Python:
from google.cloud import bigquery
client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)
Related: If you have SELECT statements within a stored procedure, you can walk the job to fetch the results, even if the SELECT statement isn't the last statement in the procedure.
# TODO(developer): Import the client library.
# from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()
# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;
-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);
-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)
# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))
# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))
# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
child_rows = list(child_job.result())
print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))
It works if you have SELECT inside your procedure, given the procedure being:
create or replace procedure dataset.proc_output() BEGIN
SELECT t FROM UNNEST(['1','2','3']) t;
END;
Code:
from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
print result
will output:
Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})
However, if there are multiple SELECT inside a procedure, only the last result set can be fetched this way.
Update
See below example:
CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
IF saleExists = 0 THEN
INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
END IF;
END;
BEGIN
DECLARE saleExists INT64;
CALL zyun.exists("US", DATE "2019-2-1", saleExists);
SELECT saleExists;
END
BTW, your example is much better served with a single MERGE statement instead of a script.
Trying to write a simple pandas script which executes a query from SQL Server with WHERE clause. However, the query doesnt return any values. Possibly because the parameter is not passed? I thought we could pass the key-value pairs as below. Can you please point out what i am doing wrong here?
Posting just the query and relevant pieces. All the libraries have been imported as needed.
curr_sales_month = '2015-08-01'
sql_query = """SELECT sale_month,region,oem,nameplate,Model,Segment,Sales FROM [MONTHLY_SALES] WHERE Sale_Month = %(salesmonth)s"""
print ("Executed SQL Extract", sql_query)
df = pd.read_sql_query(sql_query,conn,params={"salesmonth":curr_sales_month})
The program returned with:
Closed Connection - Fetched 0 rows for Report
Process finished with exit code 0
Further to my comment. Here is an example that uses pyodbc to communicate to sql server and demonstrates passing a variable.
import pandas as pd
import pyodbc
pd.set_option('display.max_columns',50)
pd.set_option('display.width',5000)
conn_str = r"DRIVER={0};SERVER={1};DATABASE={2};UID={3};PWD={4}".format("SQL Server",'.','master','user','pwd')
cnxn = pyodbc.connect(conn_str)
sql_statement = "SELECT * FROM sys.databases WHERE database_id = ?"
df = pd.read_sql_query(sql = sql_statement, con = cnxn, params = [2])
cnxn.close()
print df.iloc[:,0:2].head()
which produces:
name database_id
0 tempdb 2
And if you wish to pass multiple parameters:
sql_statement = "SELECT * FROM sys.databases WHERE database_id > ? and database_id < ?"
df = pd.read_sql_query(sql = sql_statement, con = cnxn, params = [2,5])
cnxn.close()
print df.iloc[:,0:2].head()
which produces:
name database_id
0 model 3
1 msdb 4
my preferred way with dynamic inline sql statements
create_date = '2015-01-01'
name = 'mod'
sql_statement_template = r"""SELECT * FROM sys.databases WHERE database_id > {0} AND database_id < {1} AND create_date > '{2}' AND name LIKE '{3}%'"""
sql_statement = sql_statement_template.format('2','5',create_date,name)
print sql_statement
yields
SELECT * FROM sys.databases WHERE database_id > 2 AND database_id < 5 AND create_date > '2015-01-01' AND name LIKE 'mod%'
A further benefit if you do print this out, is you can copy and paste the sql commnand to management studio (or equivalent) and test your sql syntax easily.
and result should be:
name database_id
0 model 3
So this example demonstrates handling: date,string and int datatypes.
Including a LIKE with wildcard %
How do I make the query like this?
UPDATE DUA_DATA_FIL_AUD
SET REV = :rev,
SYS_UPDT_TS = :now
WHERE DUA_DATA_FIL_ID = 283
AND REV = 2524;
And so on for all the next 13 records and update all the corresponding columns?
Create a result of dynamic update queries using a select form the table you want to apply the updates on it's result set.
This shall create you a sort of script that you can copy and run in your command window to update the desired lines.
Hope this addresses what you really want :
SELECT 'UPDATE DUA_DATA_FIL_AUD SET REV = :rev, SYS_UPDT_TS = :now WHERE
DUA_DATA_FIL_ID =' || DUA_DATA_FIL_ID || 'AND REV =' || MAX(REV) || ';/'
FROM DUA_DATA_FIL_AUD GROUP BY DUA_DATA_FIL_ID,REV
I have an SQL table with Member ID's.
Each member has information regarding their insurance claims listed in the table.
One column lists insurance ID's in relation to Member-ID.
for example:-
|Member ID | Insurance Code |
----------------------------------
|C#$###!1231 | 67 |
Now the issue is the same member could have different Insurance Code#'s as a part of previous claims.
For example between 2010 -2011, the member had a claim Code = 67
But for 2011 - 2012, the member had a claim Code = 3
Now when I create a SQL query, I only get one value for claim code.... how can I get all the values, say 67, 3, and 110? Such that I can respond to all claims that the member has been a part of.
// SQL QUERY to gather member information.
DMS.ADOQuery1.SQL.Clear;
DMS.ADOQuery1.SQL.Add('select HPFROMDT, HPthruDt, MEMBERKEY, MEMBHPKEY, OPFROMDT, OPTHRUDT, HPCODEKEY' +
' from MEMBHP' +
' where MEMBERKEY = ''' + MembKey + ''' and OPTHRUDT >= ''' + init_date + ''' and OPFROMDT <= ''' + final_date +''' and OPFROMDT < OPTHRUDT'
);
// Showmessage(DMS.ADOQuery1.SQL[0]);
DMS.ADOQuery1.Open;
// Adding the query values to the appropriate variables.
HPCodeKey := (DMS.ADOQuery1.FieldByNAme('HPCODEKEY').AsString);
DMS.ADOQuery1.Close;
Check all the records returned by query, not only the first one.
Check also filter dates in SQL statement (where condition).
// SQL QUERY to gather member information.
DMS.ADOQuery1.SQL.Clear;
DMS.ADOQuery1.SQL.Add('select HPFROMDT, HPthruDt, MEMBERKEY, MEMBHPKEY, OPFROMDT, OPTHRUDT, HPCODEKEY' +
' from MEMBHP' +
' where MEMBERKEY = ''' + MembKey + ''' and OPTHRUDT >= ''' + init_date + ''' and OPFROMDT <= ''' + final_date +''' and OPFROMDT < OPTHRUDT'
);
// Showmessage(DMS.ADOQuery1.SQL[0]);
DMS.ADOQuery1.Open;
while not DMS.ADOQuery1.Eof do
begin
// Adding the query values to the appropriate variables.
Showmessage(DMS.ADOQuery1.FieldByNAme('HPCODEKEY').AsString);
DMS.ADOQuery1.Next;
end;
DMS.ADOQuery1.Close;