I'd like to know how I could improve the performance of the below query, since it is taking way too long to run, after all, it returns millions of rows... I'm a dummy when it comes to SQL...
SELECT CIAM.EXTERNAL_ID,
(SELECT NEW_CHARGES / 100
FROM BI_OWNER.CMF_BALANCE
WHERE ( ACCOUNT_NO, BILL_REF_NO ) = (SELECT ACCOUNT_NO,
MAX(BILL_REF_NO)
FROM BI_OWNER.CMF_BALANCE
WHERE
ACCOUNT_NO = CIAM.ACCOUNT_NO
GROUP BY ACCOUNT_NO))
"AMOUNT LAST BILL",
(SELECT 'ACTIVE DISCOUNT'
|| ' '
|| CCK.AVAIL_PERIODS
|| '/'
|| CC.TOTAL_PERIODS
FROM BI_OWNER.CUSTOMER_CONTRACT_KEY CCK,
BI_OWNER.CUSTOMER_CONTRACT CC
WHERE CC.PARENT_ACCOUNT_NO = CIAM.ACCOUNT_NO
AND CC.END_DT IS NULL
AND EXISTS (SELECT 1
FROM CONTRACT_TYPES
WHERE CONTRACT_TYPE = CC.CONTRACT_TYPE
AND PLAN_ID_DISCOUNT IS NOT NULL
AND DURATION_UNITS = -3)
AND ROWNUM = 1
AND CCK.TRACKING_ID = CC.TRACKING_ID
AND CCK.TRACKING_ID_SERV = CC.TRACKING_ID_SERV) "DISCOUNT",
(SELECT CC.TOTAL_PERIODS
FROM BI_OWNER.CUSTOMER_CONTRACT_KEY CCK,
BI_OWNER.CUSTOMER_CONTRACT CC
WHERE CC.PARENT_ACCOUNT_NO = CIAM.ACCOUNT_NO
AND CC.END_DT IS NULL
AND EXISTS (SELECT 1
FROM CONTRACT_TYPES
WHERE CONTRACT_TYPE = CC.CONTRACT_TYPE
AND PLAN_ID_DISCOUNT IS NOT NULL
AND DURATION_UNITS = -3)
AND ROWNUM = 1
AND CCK.TRACKING_ID = CC.TRACKING_ID
AND CCK.TRACKING_ID_SERV = CC.TRACKING_ID_SERV) "CYCLE"
,
(SELECT SUM(BALANCE_DUE)
FROM BI_OWNER.CMF_BALANCE
WHERE ACCOUNT_NO = CIAM.ACCOUNT_NO
AND PPDD_DATE < TRUNC(SYSDATE))
"DEBT"
FROM BI_OWNER.CUSTOMER_ID_ACCT_MAP CIAM
WHERE EXTERNAL_ID_TYPE = 1
AND EXISTS (SELECT 1
FROM BI_OWNER.CMF
WHERE ACCOUNT_NO = CIAM.ACCOUNT_NO
AND PREV_CUTOFF_DATE > SYSDATE - 30)
I would recommend identifying the SQL id for the query then using the SQL Monitor Report as it will tell you exactly what the execution plan is and where the SQL is spending most of it's time.
A simple way to get the SQL Monitor Report from SQL*Plus follows:
spool c:\temp\SQL_Monitor_rpt.html
SET LONG 1000000
SET LONGCHUNKSIZE 1000000
SET LINESIZE 1000
SET PAGESIZE 0
SET TRIM ON
SET TRIMSPOOL ON
SET ECHO OFF
SET FEEDBACK OFF
alter session set "_with_subquery" = optimizer;
SELECT DBMS_SQLTUNE.report_sql_monitor(
sql_id => '&SQLID' ,
type => 'HTML',
report_level => 'ALL') AS report
FROM dual;
spool off
Basically, you need to know your table sizes and how to get the large tables to have data access via an index (e.g. index on columns found in the where clause).
Here is an initial stab and may provide significant improvement. Many of your queries were correlated subqueries being executed for every record. Instead, I tried to build pre-query aggregates per account number in the select from/join section. Query first, then I'll explain logic after.
SELECT
CIAM.EXTERNAL_ID,
CMF_BALANCE.New_Charges / 100.0 "AMOUNT LAST BILL",
CCKs.Discount,
CCKs.Cycle,
AcntLast30.SumBalance "DEBT"
FROM
(SELECT
CMF.Account_No,
max( Bal.Bill_Ref_No ) MaxBillRef,
sum( case when Bal.PPDD_Date < TRUNC(SYSDATE )
then Bal.Balance_Due else 0 end ) SumBalance
from
BI_OWNER.CMF
JOIN BI_OWNER.CMF_BALANCE BAL
on CMF.Account_No = Bal.Account_No
where
CMF.PREV_CUTOFF_DATE > SYSDATE - 30
group by
CMF.Account_No ) AcntLast30
JOIN BI_OWNER.CUSTOMER_ID_ACCT_MAP CIAM
on AcntLast30.Account_No = CIAM.Account_No
AND CIAM.EXTERNAL_ID_TYPE = 1
JOIN BI_OWNER.CMF_BALANCE
on AcntLast30.Account_No = CMFBalance.Account_No
AND AcntLast30.MaxBillRef = CMFBalance.Bill_Ref_No
JOIN
(select
CC.Parent_Account_No,
CC.TOTAL_PERIODS "CYCLE",
'ACTIVE DISCOUNT' || ' ' || CCK.AVAIL_PERIODS || '/' || CC.TOTAL_PERIODS "DISCOUNT"
FROM
BI_OWNER.CUSTOMER_CONTRACT CC
JOIN BI_OWNER.CUSTOMER_CONTRACT_KEY CCK
ON CC.TRACKING_ID = CCK.TRACKING_ID
AND CC.TRACKING_ID_SERV = CCK.TRACKING_ID_SERV
AND ROWNUM = 1
JOIN ( select distinct Contract_Type
FROM CONTRACT_TYPES
WHERE PLAN_ID_DISCOUNT IS NOT NULL
AND DURATION_UNITS = -3) CT
on CC.Contract_Type = CT.Contract_Type
WHERE
CC.END_DT IS NULL ) CCKs
on AcntLast30.Account_No = CCKs.Parent_Account_No
The initial "FROM" clause, I have a subquery because you appear to be only interested in accounts within the last 30 days. So, while I'm there, I am joining to your CMF_Balance table and getting the maximum Bill_Ref_No per account AND the sum of the balance when the PPDD_Date is less than the TRUNC(sysdate) which is your "DEBT" result column. So now we have the finite list of accounts you are interested in with the account, max bill on file and the balance due summed up.
(SELECT
CMF.Account_No,
max( Bal.Bill_Ref_No ) MaxBillRef,
sum( case when Bal.PPDD_Date < TRUNC(SYSDATE )
then Bal.Balance_Due else 0 end ) SumBalance
from
BI_OWNER.CMF
JOIN BI_OWNER.CMF_BALANCE BAL
on CMF.Account_No = Bal.Account_No
where
CMF.PREV_CUTOFF_DATE > SYSDATE - 30
group by
CMF.Account_No ) AcntLast30
Next, simple join to the CIAM table to only get accounts for External_ID_Type = 1. This too could have been merged into the query above for "AcntLast30" alias result.
JOIN BI_OWNER.CUSTOMER_ID_ACCT_MAP CIAM
on AcntLast30.Account_No = CIAM.Account_No
AND CIAM.EXTERNAL_ID_TYPE = 1
Now, since the "AcntLast30" query has the account and max bill reference we then join back to the CMF_Balance on the account and bill reference # once thus giving us the CMF_BALANCE.New_Charges / 100.0 "AMOUNT LAST BILL"
JOIN BI_OWNER.CMF_BALANCE
on AcntLast30.Account_No = CMFBalance.Account_No
AND AcntLast30.MaxBillRef = CMFBalance.Bill_Ref_No
Finally the subquery alias result "CCKs". Since the Discount and Cycle use the same query/subquery/exists, I just ran it once that qualified on the discounts types and pulled the Account_No for the JOIN condition. Now we have the Discount and Cycle values per account.
If you are returning so many rows, I believe the performance gained by grabbing these pre-query aggregates once up-front and joining to by the account will be much faster than that of each time individually subquerying at every row.
There was a reference to ROWNUM without any table/alias reference so I am not sure the impact of that one within the query.
Final note. For things like the discount that may not be applicable, you may need to change it to a LEFT JOIN, of which those values would show as NULL. But without knowing the extent of data, Cartesian products of 1:many entries in given tables, I think this will work well for you. For the most part it looked like everything was resulting in only one record qualified per account where higher importance on join (such as the max bill reference).
Related
I'm trying to update my query to pull a list of stores if it is marked as "third party" and integrated_images_via_api is set to "true".
When returning these results, I would like to use the divide function to pull averages but i keep running into a division by zero error.
Looks like something went wrong with your query.
net.snowflake.client.jdbc.SnowflakeSQLException: Division by zero
With
menu_data as (
SELECT DISTINCT
dht.date_stamp,
dm.BUSINESS_ID,
ps.provider_type,
dht.MENU_ID,
dht.ACTIVE_STORES_LINKED_TO_MENU,
dht.HAS_HEADER_IMAGE,
dht.HAS_LOGO_IMAGE,
dht.PHOTOS_TOTAL,
dht.NUM_ITEM_IDS,
dht.ITEMS_WITH_DESCRIPTIONS,
dht.PHOTOS_TOTAL*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_photos,
dht.NUM_ITEM_IDS*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_items,
dht.ITEMS_WITH_DESCRIPTIONS*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_desc,
dht.HAS_HEADER_IMAGE*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_headers,
dht.HAS_logo_IMAGE*dht.ACTIVE_STORES_LINKED_TO_MENU as sum_logos,
case when dht.has_header_image AND dht.has_logo_image AND dht.photos_total/dht.NUM_ITEM_IDS >=0.1 --NS, >10% Photos
then 1
else 0 end as NS_Sat
FROM
PRODDB.PUBLIC.DIMENSION_MENU_HEALTH_TRACKING dht
Left Join PRODDB.PUBLIC.DIMENSION_MENU dm ON dm.MENU_ID = dht.MENU_ID
LEFT JOIN DOORDASH_MERCHANT.PUBLIC.MAINDB_STORE_POINT_OF_SALE_INFO ps on ps.store_id=dm.store_id
LEFT JOIN PRODDB.STATIC.POS_PROVIDER_CLASSIFICATION pc on pc.PROVIDER_TYPE=ps.PROVIDER_TYPE
LEFT JOIN PRODDB.STATIC.MENU_DETAILS pm on pm.PROVIDER_ID=pc.PROVIDER_TYPE
WHERE
1 = 1
AND dht.DATE_STAMP = (SELECT max(date_stamp) from PRODDB.PUBLIC.DIMENSION_MENU_HEALTH_TRACKING)
AND dht.ACTIVE_MENU
AND dht.NUM_ITEM_IDS >0
AND --dm.BUSINESS_ID in ('1026','57396','859','1037567','400712','554309')
pc.DIRECT_OR_3PT= 'Third Party'
AND pm.INTEGRATED_IMAGES_VIA_API= 'TRUE'
)
--Main Query
SELECT
md.DATE_STAMP,
business_id,
sum(ACTIVE_STORES_LINKED_TO_MENU) as total_store_menus,
sum(case when md.NS_SAT = 1 then ACTIVE_STORES_LINKED_TO_MENU else NULL end) as NS_store_menus,
total_store_menus - NS_store_menus as ns_opp,
round(NS_Store_menus / total_store_menus, 4) as NS_Perc,
sum(sum_photos) as total_photos,
sum(sum_items) as total_items,
sum(sum_desc) as total_descriptions,
sum(sum_headers) as total_headers,
round(total_photos / total_items,4) as item_perc,
round(total_descriptions / total_items,4) as desc_perc,
total_items - total_photos as item_opp,
round(total_headers / total_store_menus,4) as perc_headers
from menu_data md
where ns_perc >= 0.95
group by 1,2
order by 1,2 DESC
I am trying to figure out if there's a way to combine these 2 queries into a single one. I've run into the limits of what I know and can't figure out if this is possible or not.
This is the 1st query that gets last year sales for each day per location (for one month):
if object_id('tempdb..#LY_Data') is not null drop table #LY_Data
select
[LocationId] = ri.LocationId,
[LY_Date] = convert(date, ri.ReceiptDate),
[LY_Trans] = count(distinct ri.SalesReceiptId),
[LY_SoldQty] = convert(money, sum(ri.Qty)),
[LY_RetailAmount] = convert(money, sum(ri.ExtendedPrice)),
[LY_NetSalesAmount] = convert(money, sum(ri.ExtendedAmount))
into #LY_Data
from rpt.SalesReceiptItem ri
join #Location l
on ri.LocationId = l.Id
where ri.Ignored = 0
and ri.LineType = 1 /*Item*/
and ri.ReceiptDate between #_LYDateFrom and #_LYDateTo
group by
ri.LocationId,
ri.ReceiptDate
Then the 2nd query computes a ratio based on the total sales for that month for each day (to be used later):
if object_id('tempdb..#LY_Data2') is not null drop table #LY_Data2
select
[LocationId] = ly.LocationId,
[LY_Date] = ly.LY_Date,
[LY_Trans] = ly.LY_Trans,
[LY_RetailAmount] = ly.LY_RetailAmount,
[LY_NetSalesAmount] = ly.LY_NetSalesAmount,
[Ratio] = ly.LY_NetSalesAmount / t.MonthlySales
into #LY_Data2
from (
select
[LocationId] = ly.LocationId,
[MonthlySales] = sum(ly.LY_NetSalesAmount)
from #LY_Data ly
group by
ly.LocationId
) t
join #LY_Data ly
on t.LocationId = ly.LocationId
I've tried using the first query as a subquery in the 2nd query group-by from clause, but that won't let me select those columns in the outer most select statement (multi part identifier couldn't be bound).
As well as putting the first query into the join clause at the end of the 2nd query with the same issue.
There's probably something I'm missing, but I'm still pretty new to SQL so any help or just a pointer in the right direction would be greatly appreciated! :)
You can try using a Common Table Expression (CTE) and window function:
if object_id('tempdb..#LY_Data') is not null drop table #LY_Data
;with
cte AS
(
select
[LocationId] = ri.LocationId,
[LY_Date] = convert(date, ri.ReceiptDate),
[LY_Trans] = count(distinct ri.SalesReceiptId),
[LY_SoldQty] = convert(money, sum(ri.Qty)),
[LY_RetailAmount] = convert(money, sum(ri.ExtendedPrice)),
[LY_NetSalesAmount] = convert(money, sum(ri.ExtendedAmount))
from rpt.SalesReceiptItem ri
join #Location l
on ri.LocationId = l.Id
where ri.Ignored = 0
and ri.LineType = 1 /*Item*/
and ri.ReceiptDate between #_LYDateFrom and #_LYDateTo
group by
ri.LocationId,
ri.ReceiptDate
)
select
[LocationId] = cte.LocationId,
[LY_Date] = cte.LY_Date,
...
[Ratio] = cte.LY_NetSalesAmount / sum(cte.LY_NetSalesAmount) over (partition by cte.LocationId)
into #LY_Data
from cte
sum(cte.LY_NetSalesAmount) over (partition by cte.LocationId) gives you the sum for each locationId. The code assume that this sum is always non-zero. Otherwise, a divide-by-0 error will occur.
Seems like all you need to do is calculate ratio in the first query.
You can do this with a correlated subquery.
SELECT
...
convert(money, sum(ri.ExtendedAmount)/(SELECT sum(ri2.ExtendedAmount)
FROM rpt.SalesReceiptItem ri2
WHERE ri2.LocationId=ri.LocationId
)
) AS ratio --extended amount/total extended amount for this location
I have done a union of 2 query.When i individualy take the rowcounts of the 2 queries it shows me 1504 rows and 15 rows respectively.But when i take a total rowcount ,i still get 1504 rows.Am i missing something here ?
The query is:
SELECT DISTINCT T1.sys_tenant_id
FROM SO_CTRL T1, S_BU T2
WHERE T1.SYS_TENANT_ID = T2.ROW_ID AND T2.CUST_STATUS_CD = 'Active' AND
T1.OBJ_NAME = 'Opportunity' AND T1.CTRL_NAME != 'Primary Revenue Close Date'
UNION
SELECT DISTINCT T1.sys_tenant_id
FROM SO_CTRL T1, S_BU T2
WHERE T1.SYS_TENANT_ID = T2.ROW_ID AND T2.CUST_STATUS_CD = 'Active' AND
T1.OBJ_NAME = 'Opportunity' AND T1.CTRL_NAME = 'Primary Revenue Close Date' AND
(T1.default_value_expr IS NULL OR LTRIM(RTRIM(T1.default_value_expr)) = ''))
One possible explanation for what you are seeing is the 15 rows in the second queries already exist as duplicates in the 1504 rows from the first query.
The UNION operator will filter out duplicates, so if you want to end up with a row count of 1519, you can try using UNION ALL.
I'm struggling here trying to write a script that finds where an order was returned multiple times by the same associate (count greater than 1). I'm guessing my syntax with the subquery is incorrect. When I run the script, I get a message back that the "SELECT failed.. [3669] More than one value was returned by the subquery."
I'm not tied to the subquery, and have tried using just the group by and having statements, but I get an error regarding a non-aggregate value. What's the best way to proceed here and how do I fix this?
Thank you in advance - code below:
SEL s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
FROM salestrans s
JOIN salestrans_refund r
ON r.orig_saletran_id = s.saletran_id
AND r.orig_saletran_dt = s.saletran_dt
AND r.orig_loc_id = s.loc_id
AND r.saletran_dt between s.saletran_dt and s.saletran_dt + 30
JOIN saletran rt
ON rt.saletran_id = r.saletran_id
AND rt.saletran_dt = r.saletran_dt
AND rt.loc_id = r.loc_id
JOIN assoc ra --Return Associate
ON ra.assoc_prty_id = rt.sls_assoc_prty_id
WHERE
(SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID
) > 1
AND s.saletran_dt between '2015-01-01' and current_date - 1
Based on what you've got so far, I think you want to use this instead:
where r.ORIG_SLTRN_ID in
(select
ORIG_SLTRN_ID
from
saletran_refund
group by ORIG_SLTRN_ID
having count (*) > 1)
That will give you the ORIG_SLTRN_IDs that have more than one row.
you don't give enough for a full answer but this is a start
group by s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
having count(distinct(ORIG_SLTRN_ID)) > 0
this does return more the an one row
run it
SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID
I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?