SQL repetitive code in where clause, how to insert the whole where into variable - sql

I write a lot of queries with the same WHERE clause. I wish i could create a variable to insert each time for a query.
My query:
select distinct order_external_status
from analytics.dwh_orders_details dod
**where dod.merchant_id = 7797
and order_type = 'pre_live'
and order_date >= '2019-09-10' and order_date <= '2019-09-24';**
Next query with the same WHERE:
select dod.order_id,
oc.*
from analytics.dwh_orders_details dod
left join analytics.dwh_oc_all_details oc
on dod.order_id = oc.order_id
**where dod.merchant_id = 7797
and order_type = 'pre_live'
and order_date >= '2019-09-10' and order_date <= '2019-09-24';**
Can have 10 to 15 queries like that in a day. It will be nice if i could put where clause in a variable and just write it once. For now we use Redshift, we will move to Snowflake soon, if it matters.
DBs not allow to create views or temp tables...

You can create a view and use the view in your queries:
create view v_myview as
select dod.*
from analytics.dwh_orders_details dod
where dod.merchant_id = 7797 and
dod.order_type = 'pre_live' and
dod.order_date >= '2019-09-10' and
dod.order_date <= '2019-09-24';

Related

Slow running query, Postgresql

I have a very slow query (30+ minutes or more) that I think can be sped up with more efficient coding. Below is the code and the query plan that results. So I am looking for answers to speed up with query that is performing several joins on large tables.
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad, round(cast(price/totalprice as numeric),4) as per_shad from
totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility,
(price/totalprice)
order by per_shad desc
limit 5;
EXPLAIN select facility, percshad.pricedate, percshad.hour, per_shad,
minmcc.rtmcc, minnode.nodename, maxmcc.rtmcc, maxnode.nodename from percshad
join pjm.prices minmcc on
minmcc.pricedate = percshad.pricedate
and
minmcc.hour = percshad.hour
and
minmcc.rtmcc = (select min(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes minnode on
minnode.node_id = minmcc.node_id
join pjm.prices maxmcc on
maxmcc.pricedate = percshad.pricedate
and
maxmcc.hour = percshad.hour
and
maxmcc.rtmcc = (select max(rtmcc) from pjm.prices where pricedate =
percshad.pricedate and hour = percshad.hour)
join pjm.nodes maxnode on
maxnode.node_id = maxmcc.node_id
order by per_shad desc
limit 5
And here is the EXPLAIN output:
UPDATE: I have now simplified my code down to the following. But as can be seen from the EXPLAIN, it stills takes forever to find the node_id in the last select statement
drop table if exists totalshad;
create temporary table totalshad as
select pricedate, hour, sum(cast(price as numeric)) as totalprice from
pjm.rtcons
where
rtcons.pricedate >= '2017-12-01'
-- and
-- rtcons.pricedate <= '2018-01-23'
group by pricedate, hour
order by pricedate, hour;
-----------------------------
drop table if exists percshad;
create temporary table percshad as
select totalshad.pricedate, totalshad.hour, facility, round(sum(cast(price
as numeric)),2) as cons_shad, round(sum(cast(totalprice as numeric)),2) as
total_shad,
round(cast(price/totalprice as numeric),4) as per_shad from totalshad
join pjm.rtcons on
rtcons.pricedate = totalshad.pricedate
and
rtcons.hour = totalshad.hour
and
facility = 'ETOWANDA-NMESHOPP ETL 1057 A 115 KV'
where totalprice <> 0 and totalshad.pricedate > '2017-12-01'
group by totalshad.pricedate, totalshad.hour, facility, (price/totalprice)
order by per_shad desc
limit 5;
drop table if exists mincong;
create temporary table mincong as
select pricedate, hour, min(rtmcc) as rtmcc
from pjm.prices JOIN percshad USING (pricedate, hour)
group by pricedate, hour;
EXPLAIN select distinct on (pricedate, hour) prices.node_id from mincong
JOIN pjm.prices USING (pricedate, hour, rtmcc)
group by pricedate, hour, node_id
The problem are the subselects in the join condition; they have to be executed for every row joined.
If you cannot get rid of them, try to create an index that will support the subselects as good as possible:
CREATE INDEX ON pjm.prices(pricedate, hour, rtmcc);

Aggregate SQL Query, GROUP BY Causing Issues

Everything in this query works except for the second LEFT JOIN, where BEGIN_DATE and END_DATE are. Because I have to group by the additional columns, so they can be used in the "on join", I am getting false numbers. Is there any way to do this without having to group by. I hope this makes sense. Basically because I have to include BEGIN_DATE AND END_DATE in the group by, everything gets lost.
SELECT
to_char(T1.CALL_TIMESTAMP,'YYYY-IW') AS OMONTH
,COUNT(T1.HOUSE) AS NODECALLS
,T3.NODE_CODE
,T5.NODECUSTCOUNT
,T1.CALL_CATEGORY_LVL_3
,sum((CASE WHEN T1.TC_WIP_TRANSACTION_ID IS NOT NULL THEN 1 ELSE 0 END )) AS TC
,sum((CASE WHEN T1.TC_WIP_TRANSACTION_ID IS NOT NULL THEN 1 ELSE 0 END ))/nullif(COUNT(T1.HOUSE), 0) AS SVRATEPERCALL
,COUNT(T1.HOUSE)/ nullif(T5.NODECUSTCOUNT, 0) AS CALLRATE
FROM CVKOMNZP.NZKOMUSER.NFOV_INBD_REMEDY_CALL_DETAILS T1
LEFT JOIN
(
SELECT T2.NODE_CODE,T2.BEGIN_DATE,T2.END_DATE,T2.HOUSE,T2.CORP
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST T2
) T3
ON T1.CORP = T3.CORP AND T1.HOUSE = T3.HOUSE AND (T1.CALL_TIMESTAMP BETWEEN T3.BEGIN_DATE AND T3.END_DATE)
LEFT JOIN
(
SELECT count(ADM_HOUSEHOLD_ID) AS NODECUSTCOUNT,NODE_CODE,BEGIN_DATE, END_DATE
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST
WHERE HOUSE_STATUS_CODE = 2
AND END_DATE = '2999-12-31 00:00:00'
AND T1.CALL_TIMESTAMP BETWEEN BEGIN_DATE AND END_DATE
GROUP BY NODE_CODE,BEGIN_DATE,END_DATE
) T5
ON T5.NODE_CODE = T3.NODE_CODE AND T1.CALL_TIMESTAMP BETWEEN T5.BEGIN_DATE AND T5.END_DATE
WHERE T1.EXCLUSION_FLAG = 'N'
AND T1.CALL_TIMESTAMP >= To_Date ('07-29-2017', 'MM-DD-YYYY' ) AND T1.CALL_TIMESTAMP <= To_Date ('07-31-2017', 'MM-DD-YYYY' )
GROUP BY
to_char(T1.CALL_TIMESTAMP,'YYYY-IW')
,T3.NODE_CODE
,T5.NODECUSTCOUNT
,T1.CALL_CATEGORY_LVL_3
If I am understanding this right, you want to get a COUNT without grouping by BEGIN and END DATE. However, because your Subquery (2nd LEFT JOIN) needs to include the BEGIN and NEED, you do not know how to group without it.
If this is the case, you'll need a subquery for your count and JOIN it back to the same table.
FYI: Your T1.CALL_TIMESTAMP does not make sense in this subquery since you don't have a table called T1. I renamed it to "a". Feel free to change it to what you want.
See if this make sense
LEFT JOIN
(
SELECT a.BEGIN_DATE,
a.END_DATE,
node.NODECUSTCOUNT,
a.node_code
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST a
/**Subquery to get a COUNT of all the Node based on NODE_CODE.
You link this back to your query above using the NODE CODE**/
JOIN ( SELECT count(ADM_HOUSEHOLD_ID) AS NODECUSTCOUNT,
NODE_CODE
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST
GROUP BY NODE_CODE ) node on node.node_code = a.node_code
WHERE a.HOUSE_STATUS_CODE = 2
AND a.END_DATE = '2999-12-31 00:00:00'
AND a.CALL_TIMESTAMP BETWEEN BEGIN_DATE AND END_DATE
) ..JOIN THIS BACK TO YOUR MAIN TABLE

SQL / sum on different date ranges with other conditions

I have the following code:
SELECT
day
,product_id
,product_name
,quantity_on_hand
,inventory_condition
FROM
(
SELECT
table1.product_id as product_id
,table1.product_name as product_name
FROM table1
WHERE
product_id = XXXX
)product_table
,
(
SELECT
table2.day as day
,table2.product_id as inv_product_id
,inventory_condition
,sum( table2.quantity) AS quantity_on_hand
FROM table2
WHERE
table2.day = TO_DATE('{RUN_DATE_YYYY/MM/DD}', 'YYYY/MM/DD')
AND table2.inventory_condition = XXX
GROUP BY
table2.day
,table2.product_id
,inventory_conditio
) inv
WHERE
product_id = inv.product_id
this code works great if I want to extract the data for a single day. But I want to extract the data for 3 different days in the same query. I've tried to use a OR() on my condition on table2.day but it will give me the sum of the data for the 3 days all together. I've also tried to do
Sum() over (Partition by table2.day)
But i'm not sure how to use the syntax.
tahks a lot for your help

SQL query feels inefficient - how can I improve it?

I'm using the SQL code below in SQLite to get a list of trades from a table containing trades and then combining it with total portfolio value on the day from a holdings table that has position and price data for a set of instruments.
The holdings table has about 150000 records and the trades table has about 1700
SELECT t.*, (SELECT p.adjclose FROM prices AS p
WHERE t.instrument = p.instrument
AND p.date = "2013-02-28 00:00:00") as close,
su.mv as mv
FROM trades AS t
left outer join
(SELECT h.date, SUM(h.price * h.position) as mv FROM holdings AS h
WHERE h.portfolio = "usequity"
AND h.date >= "2013-01-11 00:00:00"
AND h.date <= "2013-02-2"
GROUP BY h.date) as su
ON t.date = su.date
WHERE t.portname = "usequity"
AND t.date >= "2013-01-11 00:00:00"
AND t.date <= "2013-02-28 00:00:00";
Running the SQL code returns
[2014-12-01 19:21:00] 123 row(s) retrieved starting from 1 in 572/627 ms
Which seems really slow for a small dataset. Both tables are indexed on instrument and date.
I don't know how to index the table su on the fly so I'm not sure how to improve this code. Any help greatly appreciated.
EDIT
explain query plan shows
selectid,order,from,detail
1,0,0,"SEARCH TABLE holdings AS h USING AUTOMATIC COVERING INDEX (portfolio=?) (~7 rows)"
1,0,0,"USE TEMP B-TREE FOR GROUP BY"
0,0,0,"SCAN TABLE trades AS t (~11111 rows)"
0,1,1,"SEARCH SUBQUERY 1 AS su USING AUTOMATIC COVERING INDEX (date=?) (~3 rows)"
0,0,0,"EXECUTE CORRELATED SCALAR SUBQUERY 2"
2,0,0,"SEARCH TABLE prices AS p USING INDEX p1 (instrument=? AND date=?) (~9 rows)"
The lookup on prices is fast (it's using the index for both columns).
You could create a temporary table for the su subquery and add an index to that, but the AUTOMATIC INDEX shows that the database is already doing this.
The lookup on holdings is done with a temporary index; you should create an explicit index for that. (An index on both portfolio and date would be even more efficient.)
You could avoid the need for a temporary table by looking up the values from holdings dynamically, like you're already doing for the closing price (but this might not be an improvement if there are many trades on the same day):
SELECT t.*,
(SELECT p.adjclose
FROM prices AS p
WHERE p.instrument = t.instrument
AND p.date = '2013-02-28 00:00:00'
) AS close,
(SELECT SUM(h.price * h.position)
FROM holdings AS h
WHERE h.portfolio = 'usequity'
AND h.date = t.date
) AS mv
FROM trades AS t
WHERE t.portname = 'usequity'
AND t.date BETWEEN '2013-01-11 00:00:00'
AND '2013-02-28 00:00:00';

Count records with a criteria like "within days"

I have a table as below on sql.
OrderID Account OrderMethod OrderDate DispatchDate DispatchMethod
2145 qaz 14 20/3/2011 23/3/2011 2
4156 aby 12 15/6/2011 25/6/2011 1
I want to count all records that have reordered 'within 30 days' of dispatch date where Dispatch Method is '2' and OrderMethod is '12' and it has come from the same Account.
I want to ask if this all can be achieved with one query or do I need to create different tables and do it in stages as I think I wll have to do now? Please can someone help with a code/query?
Many thanks
T
Try the following, replacing [tablename] with the name of your table.
SELECT Count(OriginalOrders.OrderID) AS [Total_Orders]
FROM [tablename] AS OriginalOrders
INNER JOIN [tablename] AS Reorders
ON OriginalOrders.Account = Reorders.Account
AND OriginalOrders.OrderDate < Reorders.OrderDate
AND DATEDIFF(day, OriginalOrders.DispatchDate, Reorders.OrderDate) <= 30
AND Reorders.DispatchMethod = '2'
AND Reorders.OrderMethod = '12';
By using an inner join you'll be sure to only grab orders that meet all the criteria.
By linking the two tables (which are essentially the same table with itself using aliases) you make sure only orders under the same account are counted.
The results from the join are further filtered based on the criteria you mentioned requiring only orders that have been placed within 30 days of the dispatch date of a previous order.
Totally possible with one query, though my SQL is a little stale..
select count(*) from table
where DispatchMethod = 2
AND OrderMethod = 12
AND DATEDIFF(day, OrderDate, DispatchDate) <= 30;
(Untested, but it's something similar)
One query can do it.
SELECT COUNT(*)FROM myTable reOrder
INNER JOIN myTable originalOrder
ON reOrder.Account = originalOrder.Account
AND reOrder.OrderID <> originalOrder.OrderID
-- all re-orders that are within 30 days or the
-- original orders dispatch date
AND DATEDIFF(d, originalOrder.DispatchDate, reOrder.OrderDate) <= 30
WHERE reOrder.DispatchMethod = 2
AND reOrder.OrderMethod = 12
You need a self-join.
The query below assumes that a given account will have either 1 or 2 records in the table - 2 if they've reordered, else 1.
If 3 records exist for a given account, 2 orders + 1 reorder then this won't work - but we'd then need more information on how to distinguish between an order and a reorder.
SELECT COUNT(*) FROM myTable new, myTable prev
WHERE new.DispatchMethod = 2
AND new.OrderMethod = 12
AND DATEDIFF(day, prev.DispatchDate, new.OrderDate) <=30
AND prev.Account == new.Account
AND prev.OrderDate < new.OrderDate
Can we use GROUP BY in this case, such as the following?
SELECT COUNT(Account)
FROM myTable
WHERE DispatchMethod = 2 AND OrderMethod = 12
AND DATEDIFF(d, DispatchDate, OrderDate) <=30
GROUP BY Account
Will the above work or am I missing something here?