How to improve slow sql query with aggregate functions - sql

I want to show top ten customers,sales,margin where customers is registred during this accounting year. The query takes about 65seconds to run and it is not accepted :-(
As you may see i am not good at sql and will be very happy for help to improve the query.
SELECT Top 10
AcTr.R3, Actor.Nm,
SUM(CASE WHEN AcTr.AcNo<='3999' THEN AcAm*-1 ELSE 0 END) AS Sales ,
SUM(AcAm*-1) AS TB
FROM AcTr, Actor
WHERE (Actor.CustNo = AcTr.R3) AND
(Actor.CustNo <> '0') AND
(Actor.CreDt >= '20180901') AND
(Actor.CreDt <= '20190430') AND
AcTr.AcYr = '2018' AND
AcTr.AcPr <= '8' AND
AcTr.AcNo>='3000' AND
AcTr.AcNo <= '4999'
GROUP BY AcTr.R3, Actor.Nm
ORDER BY Sales DESC

Welcome to the community. You have a good start, but future, it is more helpful if you can provide (as commented), the CREATE table declarations so users know the actual data types. Not always required, but helps.
As for your query layout, it is more common to show the JOIN syntax instead of WHERE showing relations between tables, but that comes in time and practice.
Indexes help and should be based on a combination of both WHERE/JOIN criteria AND Grouping fields. Also, if fields are numeric, then do not 'quote' them, just leave as numbers. For example, your AcYr, AcPr, AcNo. I would think that an account number really would be a string value vs number for accounting purposes.
I would suggest the following indexes on your tables
Table Index
Actr ( AcYr, AcPr, AcNo, R3 )
Actor ( CustNo, CreDt )
The Actr table I have the filtering criteria first and the R3 last to help optimize the GROUP BY. The Actor table by the customer number, then the CreDt (Create date??), and is it really a string, or is it a date field? If so, the date criteria would be something like '2018-09-01' and '2019-04-30'
select TOP 10
Actor.Nm,
PreSum.Sales,
PreSm.TB
from
( select
R3,
SUM(CASE WHEN AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales,
SUM( AcAm * -1) AS TB
from
Actr
where
AcTr.AcYr = 2018
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC
Per latest inquiry / comment, wanting by year comparison and getting rid of the top 10 performers per a given time period.
select
Actor.Nm,
PreSum.Sales2018,
PreSum.Sales2019,
PreSum.TB2018,
PreSum.TB2019
from
( select
AcTr.R3,
SUM(CASE WHEN AcTr.AcYr = 2018
AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2018,
SUM(CASE WHEN AcTr.AcYr = 2019 AND AcTr.AcNo <= '3999'
THEN AcAm * -1 ELSE 0 END) AS Sales2019,
SUM( CASE WHEN AcTr.AcYr = 2018
THEN AcAm * -1 else 0 end ) AS TB2018
SUM( CASE WHEN AcTr.AcYr = 2019
THEN AcAm * -1 else 0 end ) AS TB2019
from
Actr
where
AcTr.AcYr IN ( 2018, 2019 )
AND AcTr.AcPr <= 8
AND AcTr.AcNo >= '3000'
AND AcTr.AcNo <= '4999'
GROUP BY
AcTr.R3 ) PreSum
JOIN Actor
on PreSum.R3 = Actor.CustNo
AND Actor.CustNo <> 0
AND Actor.CreDt >= '20180901'
AND Actor.CreDt <= '20190430'
order by
Sales DESC

Related

How to nest multiple case when expressions and add a condition

I am trying to divide customers (contact_key) column who shopped in 2021 (A.TXN_MTH) into new and 'returning' with returning meaning that they had not shopped in the last 12 months (YYYYMM in X.Fiscal_mth_idnt column).
I am using CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW' which is correct. The next case when should be when the max month before X.TXN_MTH is 12 or more months previous. I have added the 12 months part in the Where statement. Should I be nesting 3 CASE WHEN'S instead of WHERE?
SELECT
T.CONTACT_KEY
, A.TXN_MTH
, CASE WHEN A.TXN_MTH = MIN(X.FISCAL_MTH_IDNT) THEN 'NEW'
WHEN (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) THEN 'RETURNING'
END AS CUST_TYPE
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
WHERE (MAX(CASE WHEN X.FISCAL_MTH_IDNT < A.TXN_MTH THEN X.FISCAL_MTH_IDNT ELSE NULL END)) < A.TXN_MTH - (date_format(add_months(concat_ws('-',substr(yearmonth,1,4),substr(yearmonth,5,2),'01'),-12),'yyyyMM')
GROUP BY
T.CONTACT_KEY
, TXN_MTH;
You have not described your tables, so assuming fiscal_mth_idnt is a DATE column then you can use the LAG analytic function to find the previous row's value:
SELECT contact_key,
txn_mth,
CASE
WHEN prev_fiscal_mth_idnt IS NULL
THEN 'NEW'
WHEN ADD_MONTHS(prev_fiscal_mth_idnt, 12) < fiscal_mth_idnt
THEN 'RETURNING'
ELSE 'CURRENT'
END AS cust_type
FROM (
SELECT T.CONTACT_KEY,
A.TXN_MTH,
yearmonth,
X.FISCAL_MTH_IDNT,
LAG(X.FISCAL_MTH_IDNT) OVER (
PARTITION BY T.CONTACT_KEY
ORDER BY X.FISCAL_MTH_IDNT
) AS prev_fiscal_mth_idnt
FROM B_TRANSACTION T
INNER JOIN B_TIME X
ON T.TRANSACTION_DT_KEY = X.DATE_KEY
INNER JOIN A
ON A.CONTACT_KEY = T.CONTACT_KEY AND A.BU_KEY = T.BU_KEY
)
WHERE yearmonth LIKE '2021%';

How do I include records in a Summary query to include those that don't have data?

I have a query on a transaction table that returns the Summarized total on a column for each ID based on a data range. The query works great except it doesn't include those IDs that don't have data in the transaction table. How can I include those IDs in my result filled with a zero total. Here's a simplified version of my query.
SELECT tblID.IDName
,SUM(CASE
WHEN tblTransactions.idxTransType = 30
THEN CAST(tblTransactions.TimeAmount AS FLOAT) / 60.0
ELSE 0
END) AS 'Vacation'
FROM tblTransactions
INNER JOIN tblTransTypes ON tblTransactions.idxTransType = tblTransTypes.IdxTransType
INNER JOIN tblID ON tblTransactions.idxID = tblID.IdxID
WHERE (tblTransactions.Deleted = 0)
AND (tblTransactions.NotCurrent = 0)
AND (tblTransactions.TransDate >= CONVERT(DATETIME, 'March 1, 2018', 102))
AND (tblTransactions.TransDate <= CONVERT(DATETIME, 'April 11, 2018', 102))
GROUP BY tblID.IDName
Actually it's slightly more complicated than that:
SELECT
i.IDName,
SUM(CASE WHEN t.idxTransType = 30 THEN CAST(t.TimeAmount AS FLOAT) / 60.0 ELSE 0 END) AS 'Vacation'
FROM
tblID i
LEFT JOIN tblTransactions t ON t.idxID = i.IdxID AND t.Deleted = 0 AND t.NotCurrent = 0 AND t.TransDate BETWEEN '20180301' AND '20180411'
LEFT JOIN tblTransTypes tt ON tt.IdxTransType = t.idxTransType
GROUP BY
i.IDName;
You want left joins:
SELECT i.IDName,
SUM(CASE WHEN t.idxTransType = 30 THEN CAST(t.TimeAmount AS Float) / 60.0 ELSE 0 END) AS Vacation
FROM tblID i LEFT JOIN
tblTransactions t
ON t.idxID = i.IdxID AND
t.Deleted = 0 AND
t.NotCurrent = 0 AND
t.TransDate >= '2018-03-01' AND
t.TransDate <= '2018-04-11'
tblTransTypes tt
ON t.idxTransType = tt.IdxTransType
GROUP BY i.IDName;
Notes:
Table aliases make the query much easier to write and to read.
Use ISO/ANSI standard date formats.
The filter conditions on all but the first table belong in the ON clauses.

SQL Efficiency on Date Range or Separate Tables

I'm calculating historical amount from a table in years(ex. 2015-2016, 2014-2015, etc.) I would like to seek expertise if its more efficient to do it in one batch or repeat the query multiple times filtered by the date required.
Thanks in advance!
OPTION 1:
select
id,
sum(case when year(getdate()) - year(txndate) between 5 and 6 then amt else 0 end) as amt_6_5,
...
sum(case when year(getdate()) - year(txndate) between 0 and 1 then amt else 0 end) as amt_1_0,
from
mytable
group by
id
OPTION 2:
select
id, sum(amt) as amt_6_5
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 5 and 6
...
select
id, sum(amt) as amt_1_0
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 0 and 1
1.
Unless you have resources issues I would go with the CASE version.
Although it has no impact on the results, filtering on the requested period in the WHERE clause might have a significant performance advantage.
2. Your period definition creates overlapping.
select id
,sum(case when year(getdate()) - year(txndate) = 6 then amt else 0 end) as amt_6
-- ...
,sum(case when year(getdate()) - year(txndate) = 0 then amt else 0 end) as amt_0
where txndate >= dateadd(year, datediff(year,0, getDate())-6, 0)
from mytable
group by id
This may be help you,
WITH CTE
AS
(
SELECT id,
(CASE WHEN year(getdate()) - year(txndate) BETWEEN 5 AND 6 THEN 'year_5-6'
WHEN year(getdate()) - year(txndate) BETWEEN 4 AND 5 THEN 'year_4-5'
...
END) AS my_year,
amt
FROM mytable
)
SELECT id,my_year,sum(amt)
FROM CTE
GROUP BY id,my_year
Here, inside the CTE, just assigned a proper year_tag for each records (based on your conditions), after that select a summary for the CTE grouped by that year_tag.

Count Based on Age SQL

I have the following T-SQL to calculate ages:
SELECT
Member_ID,
DATEDIFF(YY,DOB,GETDATE()) -
CASE
WHEN DATEADD(YY,DATEDIFF(YY,DOB,GETDATE()),DOB) > GETDATE() THEN 1
ELSE 0
END AS Age in Years
FROM MEMBER
WHERE YEAR(registration_date ) >= 2012
How do I count the number of member IDs for each age in years?
I would do this using a subquery or CTE. Much easier to follow:
SELECT AgeInYears, COUNT(*)
FROM (SELECT Member_ID,
(DATEDIFF(YY,DOB,GETDATE()) -
CASE WHEN DATEADD(YY,DATEDIFF(YY,DOB,GETDATE()),DOB) > GETDATE() THEN 1
ELSE 0
END) AS AgeinYears
FROM MEMBER
WHERE YEAR(registration_date ) >= 2012
) m
GROUP BY AgeInYears
ORDER BY 1;
Something like this:
SELECT
-- Member_ID, commented out
DATEDIFF(YY,DOB,GETDATE()) -
CASE
WHEN DATEADD(YY,DATEDIFF(YY,DOB,GETDATE()),DOB) > GETDATE() THEN 1
ELSE 0
END AS Age in Years
, count(member_id) membersWithThisAge
FROM MEMBER
WHERE YEAR(registration_date ) >= 2012
group by DATEDIFF(YY,DOB,GETDATE()) -
CASE
WHEN DATEADD(YY,DATEDIFF(YY,DOB,GETDATE()),DOB) > GETDATE() THEN 1
ELSE 0
END
The member_id can't be included in the select clause. If it were, then you would get a count of one age for each member.

counting events over flexible ranges

I am trying to count events (which are rows in the event_table) in the year before and the year after a particular target date for each person. For example, say I have a person 100 and target date is 10/01/2012. I would like to count events in 9/30/2011-9/30/2012 and in 10/02/2012-9/30/2013.
My query looks like:
select *
from (
select id, target_date
from subsample_table
) as i
left join (
select id, event_date, count(*) as N
, case when event_date between target_date-365 and target_date-1 then 0
when event_date between target_date+1 and target_date+365 then 1
else 2 end as after
from event_table
group by id, target_date, period
) as h
on i.id = h.id
and i.target_date = h.event_date
The output should look something like:
id target_date after N
100 10/01/2012 0 1000
100 10/01/2012 1 0
It's possible that some people do not have any events in the before or after periods (or both), and it would be nice to have zeros in that case. I don't care about the events outside the 730 days.
Any suggestions would be greatly appreciated.
I think the following may approach what you are trying to accomplish.
select id
, target_date
, event_date
, count(*) as N
, SUM(case when event_date between target_date-365 and target_date-1
then 1
else 0
end) AS Prior_
, SUM(case when event_date between target_date+1 and target_date+365
then 1
else 0
end) as After_
from subsample_table i
left join
event_table h
on i.id = h.id
and i.target_date = h.event_date
group by id, target_date, period
This is a generic answer. I don't know what date functions teradata has, so I will use sql server syntax.
select id, target_date, sum(before) before, sum(after) after, sum(righton) righton
from yourtable t
join (
select id, target_date td
, case when yourdate >= dateadd(year, -1, target_date)
and yourdate < target_date then 1 else 0 end before
, case when yourdate <= dateadd(year, 1, target_date)
and yourdate > target_date then 1 else 0 end after
, case when yourdate = target_date then 1 else 0 end righton
from yourtable
where whatever
group by id, target_date) sq on t.id = sq.id and target_date = dt
where whatever
group by id, target_date
This answer assumes that an id can have more than one target date.