I am using Python to extract data from SQL by using ODBC to linking Python to SQL database. when I do the query, I need to use variables in the query to make my query result changeable. For example, my code is:
import pyodbc
myConnect = pyodbc.connect('DSN=B1P HANA;UID=***;PWD=***')
myCursor = myConnect.cursor()
Start = 20180501
End = 20180501
myOffice = pd.Series([1,2,3])
myRow = myCursor.execute("""
SELECT "CALDAY" AS "Date",
"/BIC/ZSALE_OFF" AS "Office"
FROM "SAPB1P"."/BIC/AZ_RT_A212"
WHERE "CALDAY" BETWEEN 20180501 AND 20180501
GROUP BY "CALDAY","/BIC/ZSALE_OFF"
""")
Result = myRow.fetchall()
d = pd.DataFrame(columns=['Date','Office'])
for i in Result:
d= d.append({'Date': i.Date,
'Office': i.Office},
ignore_index=True)
You can see that I retrieve data from SQL database and save it into a list (Result), then I convert this list to a data frame (d).
But, my problems are:
I need to specify a start date and an end data in myCursor.execute part, something like "CALDAY" BETWEEN Start AND End
Let's say I have 100 offices in my data. Now I just need 3 of them (myOffice). So, I need to put a condition in myCursor.execute part, like myOffice in (1,2,3)
In R, I know how to deal with these two problems. the code is like:
office_clause = ""
if (myOffice != 0) {
office_clause = paste(
'AND "/BIC/ZSALE_OFF" IN (',paste(myOffice, collapse=", "),')'
)
}
a <- sqlQuery(ch,paste(' SELECT ***
FROM ***
WHERE "CALDAY" BETWEEN',Start,'AND',End,'
',office_clause1,'
GROUP BY ***
'))
But I do not know how to do this in Python. How can I do this?
You can use string formatting operations for this.
First define
query = """
SELECT
"CALDAY" AS "Date",
"/BIC/ZSALE_OFF" AS "Office"
FROM
"SAPB1P"."/BIC/AZ_RT_A212"
WHERE
"CALDAY" BETWEEN {start} AND {end}
{other_conds}
GROUP BY
"CALDAY","/BIC/ZSALE_OFF"
"""
Now you can use
myRow = myCursor.execute(query.format(
start='20180501'
end='20180501',
other_conds=''))
and
myRow = myCursor.execute(query.format(
start='20180501'
end='20180501',
other_conds='AND myOffice IN (1,2,3)'))
Related
I am executing a sql query from a python script to retrieve the data from snowflake in windows 10 but the resulting query is missing column names and its getting replaced by 0,1,2,3 so on. While executing query in snowflake interface and downloading csv is giving the columns in the file. I am passing column names as Aliases in my query
Below is code
def _CONSUMPTION(con):
data2 = con.cursor().execute("""select sd.sales_force_lvl_1_code "Plan-To Code",sd.sales_force_lvl_1_desc "Plan-To Description",pd.matl_code "Product Code",pd.matl_desc "Product Description",pd.ean_upc_code "UPC",dd.fiscal_week_desc "Fiscal Week Description",f.unit_sales_qty "Sales Units",f.incr_units_qty "Incremental Units"
from DW.consumption_fact1 f, DW.market_dim md, DW.matl_dim pd, DW.fiscal_week_dim dd, (select sales_force_lvl_1_code,max(sales_force_lvl_1_desc) sales_force_lvl_1_desc from DW.mv_us_sales_force_dim group by sales_force_lvl_1_code) sd
where dd.fiscal_week_key = f.fiscal_week_key
and pd.matl_key = f.matl_key
and md.market_key = f.market_key
and sd.sales_force_lvl_1_code = md.curr_sales_force_lvl_1_code
and dd.fiscal_week_key between (select curr_fy_week_key-6 from DW.curr_date_lkp) and (select curr_fy_week_key-1 from DW.curr_date_lkp)
and f.company_key = 6006
and (f.unit_sales_qty <> 0 and f.sales_amt <> 0)
and md.curr_sales_force_lvl_1_code is not null
UNION
select '5000016240' "Plan-To Code", 'AWG TOTAL' "Plan-To Description",pd.matl_code "Product Code",pd.matl_desc "Product Description",pd.ean_upc_code "UPC",dd.fiscal_week_desc "Fiscal Week Description",f.unit_sales_qty "Sales Units",f.incr_units_qty "Incremental Units"
from DW.consumption_fact1 f, DW.market_dim md, DW.matl_dim pd, DW.fiscal_week_dim dd
where dd.fiscal_week_key = f.fiscal_week_key
and pd.matl_key = f.matl_key
and md.market_key = f.market_key
and dd.fiscal_week_key between (select curr_fy_week_key-6 from DW.curr_date_lkp) and (select curr_fy_week_key-1 from DW.curr_date_lkp)
and f.company_key = 6006
and (f.unit_sales_qty <> 0 and f.sales_amt <> 0)
and md.market_code = '20267'""").fetchall()
df = pd.DataFrame(data2)
df.head(5)
df.to_csv('CONSUMPTION.csv',index = False)
Looking [at the docs], seems the easiest way is to use the cursor method .fetch_pandas_all():
query = "SELECT 1 a, 2 b, 'a' c UNION ALL SELECT 7,4,'snow'"
cur = connection.cursor()
cur.execute(query).fetch_pandas_all()
Or if you want to dump the results into a CSV, just do so as in the question:
query = "SELECT 1 a, 2 b, 'a' c UNION ALL SELECT 7,4,'snow'"
cur = connection.cursor()
df = cur.execute(query).fetch_pandas_all()
df.to_csv('x.csv', index = False)
Visualized:
Looks like you haven’t defined the column methods in your code to define the data frame.
My recommendation will be to add column methods first df.columns
In addition refer snowflake page for details
https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
Try this
import pandas as pd
def fetch_pandas_old(cur, sql):
cur.execute(sql)
rows = 0
while True:
dat = cur.fetchmany(50000)
if not dat:
break
df = pd.DataFrame(dat, columns=cur.description)
rows += df.shape[0]
print(rows)
A nice way to extract the column headings from the cursor description and save in a pandas df using the Snowflake connector (also works for psycopg2 btw) is as follows:
#Create the connection
def connect_snowflake(uname, pword, acct, role_name, whouse, dbase, schema_name):
conn = snowflake.connector.connect(
user=uname,
password=pword,
account=acct,
role = role_name,
warehouse = whouse,
database = dbase,
schema = schema_name
)
cur = conn.cursor()
return conn, cur
Then execute your query. The cur.description object returns a list of tuples, the first of each being the column name :)
conn, cur = connect_snowflake(username, password, account_name, role, warehouse, database, schema)
cur.execute('select * from my_schema.my_table')
result =cur.fetchall()
# Extract the column names
col_names = []
for elt in cur.description:
col_names.append(elt[0])
df = pd.DataFrame(result, columns=col_names)
cur.close()
conn.close()
I'm currently building a feature that requires me to loop over an hash, and for each key in the hash, dynamically modify an SQL query.
The actual SQL query should look something like this:
select * from space_dates d
inner join space_prices p on p.space_date_id = d.id
where d.space_id = ?
and d.date between ? and ?
and (
(p.price_type = 'monthly' and p.price_cents <> 9360) or
(p.price_type = 'daily' and p.price_cents <> 66198) or
(p.price_type = 'hourly' and p.price_cents <> 66198) # This part should be added in dynamically
)
The last and query is to be added dynamically, as you can see, I basically need only one of the conditions to be true but not all.
query = space.dates
.joins(:price)
.where('date between ? and ?', start_date, end_date)
# We are looping over the rails enum (hash) and getting the key for each key value pair, alongside the index
SpacePrice.price_types.each_with_index do |(price_type, _), index|
amount_cents = space.send("#{price_type}_price").price_cents
query = if index.positive? # It's not the first item so we want to chain it as an 'OR'
query.or(
space.dates
.joins(:price)
.where('space_prices.price_type = ?', price_type)
.where('space_prices.price_cents <> ?', amount_cents)
)
else
query # It's the first item, chain it as an and
.where('space_prices.price_type = ?', price_type)
.where('space_prices.price_cents <> ?', amount_cents)
end
end
The output of this in rails is:
SELECT "space_dates".* FROM "space_dates"
INNER JOIN "space_prices" ON "space_prices"."space_date_id" = "space_dates"."id"
WHERE "space_dates"."space_id" = $1 AND (
(
(date between '2020-06-11' and '2020-06-11') AND
(space_prices.price_type = 'hourly') AND (space_prices.price_cents <> 9360) OR
(space_prices.price_type = 'daily') AND (space_prices.price_cents <> 66198)) OR
(space_prices.price_type = 'monthly') AND (space_prices.price_cents <> 5500)
) LIMIT $2
Which isn't as expected. I need to wrap the last few lines in another set of round brackets in order to produce the same output. I'm not sure how to go about this using ActiveRecord.
It's not possible for me to use find_by_sql since this would be dynamically generated SQL too.
So, I managed to solve this in about an hour using Arel with rails
dt = SpaceDate.arel_table
pt = SpacePrice.arel_table
combined_clauses = SpacePrice.price_types.map do |price_type, _|
amount_cents = space.send("#{price_type}_price").price_cents
pt[:price_type]
.eq(price_type)
.and(pt[:price_cents].not_eq(amount_cents))
end.reduce(&:or)
space.dates
.joins(:price)
.where(dt[:date].between(start_date..end_date).and(combined_clauses))
end
And the SQL output is:
SELECT "space_dates".* FROM "space_dates"
INNER JOIN "space_prices" ON "space_prices"."space_date_id" = "space_dates"."id"
WHERE "space_dates"."space_id" = $1
AND "space_dates"."date" BETWEEN '2020-06-11' AND '2020-06-15'
AND (
("space_prices"."price_type" = 'hourly'
AND "space_prices"."price_cents" != 9360
OR "space_prices"."price_type" = 'daily'
AND "space_prices"."price_cents" != 66198)
OR "space_prices"."price_type" = 'monthly'
AND "space_prices"."price_cents" != 5500
) LIMIT $2
What I ended up doing was:
Creating an array of clauses based on the enum key and the price_cents
Reduced the clauses and joined them using or
Added this to the main query with an and operator and the combined_clauses
I wish to convert the following MS Access SQL statement to SQL Server. All of my attempts are resulting in different results from the old & original data.
SELECT
Sum(ADA_LAST.MA) AS MA,
Sum(ADA_LAST.DA) AS DA,
ADA_LAST.ID_BAS,
ADA_LAST.PRO_NUMBER,
ADA_LAST.ACC_NUMBER,
ADA_LAST.DATA,
"" AS Q,
"" AS P,
Last(ADA_LAST.Date) AS [DATE],
"" AS UNIT,
0 AS ID,
[MA]-[DA] AS R
FROM ADA_LAST
GROUP BY
ADA_LAST.ID_BAS,
ADA_LAST.PRO_NUMBER,
ADA_LAST.ACC_NUMBER,
ADA_LAST.DATA,
"",
0,
[MA]-[DA],
"",
""
;
The new Query is:
SELECT
MA = Sum([ADA_LAST].[MA]),
DA = Sum([ADA_LAST].[DA]),
[ADA_LAST].[ID_BAS],
[ADA_LAST].[PRO_NUMBER],
[ADA_LAST].[ACC_NUMBER],
[ADA_LAST].[DATA],
Q = '',
P = '',
[DATE] = ADA_LAST.[Date],
UNIT = '',
ID = 0,
Sum([ADA_LAST].[MA]) - Sum([ADA_LAST].[DA]) AS R
FROM [ADA_LAST](#PRO_NAME,#SDAY)
GROUP BY
[ADA_LAST].[ACC_NUMBER],
[ADA_LAST].[Date],
[ADA_LAST].[PRO_NUMBER],
[ADA_LAST].[ID_BAS],
[ADA_LAST].[DATA]
The problem caused by grouping date column in new statement, but in old one it is used in Last function to avoid grouping it and still exists in the select statement, How can I do like this.
You can try the below query. Changes to the original:
empty string is note as '' instead of ""
I replace LAST with MAX(); this is likely to do what you want, since you are using aggregation
constant columns do not need to be listed in the GROUP BY clause
Code:
SELECT
SUM(ADA_LAST.MA) AS MA,
SUM(ADA_LAST.DA) AS DA,
ADA_LAST.ID_BAS,
ADA_LAST.PRO_NUMBER,
ADA_LAST.ACC_NUMBER,
ADA_LAST.DATA,
'' AS Q,
'' AS P,
MAX(ADA_LAST.Date) AS [DATE],
'' AS UNIT,
0 AS ID,
[MA] - [DA] AS R
FROM ADA_LAST
GROUP BY
ADA_LAST.ID_BAS,
ADA_LAST.PRO_NUMBER,
ADA_LAST.ACC_NUMBER,
ADA_LAST.DATA,
[MA] - [DA]
;
I am applying a mask to data and believe the best way is to use a Case Statement. However, I need the case statement to run a sub query. When I pull data, it will either be a number or appear as 99999999999v999b:99999999999v999-
Using
TO_NUMBER(REGEXP_REPLACE(RD.subm_quantity, '^(\d+)(-)?$', '\2\1'))/1000 as "Submitted_Quantity"
This will convert it to a number. So if 00000000100000 is present, it will convert to 100
However, I need a case to not divide when not needed. To determine if I need to divide, I need to add a rule in the below sql:
if the result is 99999999999v999b:99999999999v999-, apply the conversion;
if not, just output RD.subm_quantity.
How can I get a case statement to run a query?
Running in TOAD for Oracle:
select m.mask
FROM Valiuser.ivd_mapping m,
Valiuser.ivd_mappingset s,
Valiuser.ivd_mapping_record r,
Valiuser.ivd_transaction_file tf,
VALIUSER.ivd_transaction_record_details RD
WHERE s.mappingset_id = r.mappingset_id
AND r.mapping_record_id = m.mapping_record_ID
AND m.repository_column_id = '34'
AND s.mappingset_id = tf.MAPPINGSET_ID
AND rd.file_id = tf.file_id
AND rd.TRANSACTION_RECORD_ID =
If the mask and original subm_quantity are both available from the query you showed, which seems to be the same as that includes the rd table you're referencing in the conversion, then I think you want something like this:
case when m.mask = '99999999999v999b:99999999999v999-'
then TO_NUMBER(REGEXP_REPLACE(rd.subm_quantity, '^(\d+)(-)?$', '\2\1')) / 1000
else rd.subm_quantity
end as "Submitted_Quantity"
rather than a subquery. So plugged into your current query that would make it:
SELECT
case when m.mask = '99999999999v999b:99999999999v999-'
then TO_NUMBER(REGEXP_REPLACE(rd.subm_quantity, '^(\d+)(-)?$', '\2\1')) / 1000
else rd.subm_quantity
end as "Submitted_Quantity"
FROM
Valiuser.ivd_mapping m,
Valiuser.ivd_mappingset s,
Valiuser.ivd_mapping_record r,
Valiuser.ivd_transaction_file tf,
Valiuser.ivd_transaction_record_details rd
WHERE
s.mappingset_id = r.mappingset_id
AND r.mapping_record_id = m.mapping_record_ID
AND m.repository_column_id = '34'
AND s.mappingset_id = tf.mappingset_id
AND rd.file_id = tf.file_id
AND rd.Transaction_Record_Id = <?>
or with modern join syntax instead of the old version, something like:
SELECT
case when m.mask = '99999999999v999b:99999999999v999-'
then TO_NUMBER(REGEXP_REPLACE(rd.subm_quantity, '^(\d+)(-)?$', '\2\1')) / 1000
else rd.subm_quantity
end as "Submitted_Quantity"
FROM Valiuser.ivd_mapping m
JOIN Valiuser.ivd_mapping_record r ON r.mapping_record_id = m.mapping_record_ID
JOIN Valiuser.ivd_mappingset s ON s.mappingset_id = r.mappingset_id
JOIN Valiuser.ivd_transaction_file tf ON tf.mappingset_id = s.mappingset_id
JOIN Valiuser.ivd_transaction_record_details rd ON rd.file_id = tf.file_id
WHERE m.repository_column_id = '34'
AND rd.transaction_record_id = <?>
I have a query like this:
SELECT prodId, prod_name , prod_type FROM mytable WHERE prod_type in (:list_prod_names)
I want to get the information of a product, depending on the possible types are: "day", "week", "weekend", "month". Depending on the date it might be at least one of those option, or a combination of all of them.
This info (List type) is returned by the function prod_names(date_search)
I am using cx_oracle bindings with code like:
def get_prod_by_type(search_date :datetime):
query_path = r'./queries/prod_by_name.sql'
raw_query = open(query_path).read().strip().replace('\n', ' ').replace('\t', ' ').replace(' ', ' ')
print(sql_read_op)
# Depending on the date the product types may be different
prod_names(search_date) #This returns a list with possible names
qry_params = {"list_prod_names": prod_names} # See attempts bellow
try:
db = DB(username='username', password='pss', hostname="localhost")
df = db.get(raw_query,qry_params)
except Exception:
exception_error = traceback.format_exc()
exception_error = 'Exception on DB.get_short_cov_op2() : %s\n%s' % exception_error
print(exception_error)
return df
For this: qry_params = {"list_prod_names": prod_names} I have tried multiple different things such as:
prod_names = ''.join(prod_names)
prod_names = str(prod_names)
prod_names =." \'"+''.join(prod_names)+"\'"
The only thing I have managed to get it work is by doing:
new_query = raw_query.format(list_prod_names=prodnames_for_date(search_date)).replace('[', '').replace(']','')
df = db.query(new_query)
I am trying not to use .format() because is bad practie to do a .format to an sql to prevent attacks.
db.py contains among other functions:
def get(self, sql, params={}):
cur = self.con.cursor()
cur.prepare(sql)
try:
cur.execute(sql, **params)
df = pd.DataFrame(cur.fetchall(), columns=[c[0] for c in cur.description])
except Exception:
exception_error = traceback.format_exc()
exception_error = 'Exception on DB.get() : %s\n%s' % exception_error
print(exception_error)
self.con.rollback()
cur.close()
df.columns = df.columns.map(lambda x: x.upper())
return df
I would like to be able to do a type binding.
I am using:
python = 3.6
cx_oracle = 6.3.1
I have read the followig articles but I a still unable to find a solution:
Python cx_Oracle bind variables
Python cx_Oracle SQL with bind string variable
Search for name in cx_Oracle
Unfortunately you cannot bind an array directly unless you convert it to a SQL type and use a subquery -- which is fairly complex. So instead you need to do something like this:
inClauseParts = []
for i, inValue in enumerate(ARRAY_VALUE):
argName = "arg_" + str(i + 1)
inClauseParts.append(":" + argName)
clause = "%s in (%s)" % (columnName, ",".join(inClauseParts))
This works fine but be aware that if the number of elements in the array changes regularly that using this technique will create a separate statement that must be parsed for each number of elements. If you know that (in general) you won't have more than (for example) 10 elements in the array it would be better to append None to the incoming array so that the number of elements is always 10.
Hopefully that is clear enough!
I have finally manage to do it. It might not be pretty but it works.
I have modified my sql query to include an extra select which returns the value of my list of descriptors:
inner join (
SELECT regexp_substr(:my_list_of_items, '[^,]+', 1, LEVEL) as mylist
FROM dual
CONNECT BY LEVEL <= length(:my_list_of_items) - length(REPLACE(:my_list_of_items, ',', '')) + 1
) d
on d.mylist= a.corresponding_columns