SQL - Rank monthly dataset high, medium, low - sql

I have a table which includes the month, accountID and a set of application scores. I want to create a new column which either gives a 'high', 'medium' or 'low' for the top, middle and bottom 33% of the results each month.
If I use rank() I can order the application scores for a single month or the whole dataset but I'm unsure how to order it per month. Also, on my version of sql server percent_rank() does not work.
select
AccountID
, ApplicationScore
, rank() over (order by applicationscore asc) as Rank
from Table
I then know I need to put the rank() statement in a subquery and then use a case statement to apply the 'high', 'medium' or 'low'.
select
AccountID
, case when rank <= total/3 then 'low'
when rank > total/3 and rank <= (total/3)*2 then 'medium'
when rank > (total/3)*2 then 'high' end ApplicationScore
from (subquery) a

Ntile(3) worked very well
select
AccountID
, Monthstart
, ApplicationScore
, ntile(3) over (partition by monthstart order by applicationscore) Rank
from table

SQL Server may have something built in to handle your problem. But we can easily use a ratio of counts to find the three segments of your scores, for each month. The ratio we can use is the count, partitioned by month and ordered by score, divided by the count for the entire month.
WITH cte AS (
SELECT *,
1.0 * COUNT(*) OVER (PARTITION BY Month ORDER BY ApplicationScore) /
COUNT(*) OVER (PARTITION BY Month) AS cnt
FROM yourTable
)
SELECT
AccountID,
Month,
ApplicationScore,
CASE WHEN cnt < 0.34 THEN 'low'
WHEN cnt < 0.67 THEN 'medium'
ELSE 'high' END AS rank
FROM cte
ORDER BY
Month,
ApplicationScore DESC;
Demo

Related

SQL - get start & end balance for each member each year

so I'd like to effectively get for each year the starting and end balance for each member for every year there is a record. for example the below would give me the latest balance for each member each year based on the date column
SELECT
T.MemberID,
T.DateCol,
T.Amount
FROM
(SELECT T.MemberID,
T.DateCol,
Amount,
ROW_NUMBER() OVER (PARTITION BY MemberID,
YEAR(DateCol)
ORDER BY
DateCol desc) AS seqnum
FROM
Tablet T
GROUP BY DateCol, MemberID, Amount
) T
WHERE
seqnum = 1 AND
MemberID = '1000009'
and the below would give me the earliest balance for each year
SELECT
T.MemberID,
T.DateCol,
T.Amount
FROM
(SELECT T.MemberID,
T.DateCol,
Amount,
ROW_NUMBER() OVER (PARTITION BY MemberID,
YEAR(DateCol)
ORDER BY
DateCol) AS seqnum
FROM
Tablet T
GROUP BY DateCol, MemberID, Amount
) T
WHERE
seqnum = 1 AND
MemberID = '1000009'
This would give me a result set like the below, column titles (MemberID, Date, Amount)
What I'm looking for is one query which is done by YEAR, MEMBERID, STARTBALANCE, ENDBALANCE as the columns. And would look like the below
What would be the best way to go about this?
commented above

Grouping Column Without Breaking The Sequence

The main objective is to group the rows following Amount Column sequentially so that, if there is any different value between the 2 same values, they will be numbered separately.
This is the raw data here:
SELECT Area, DateA, DateB, Amount
FROM (VALUES
('ABC', '2019-08-18', '2019-08-18 00:07:47.000', 3.75),
('ABC','2019-08-19', '2019-08-19 00:08:47.000', 3.75),
('ABC','2019-08-20', '2019-08-20 00:09:47.000', 3.65),
('ABC','2019-08-21', '2019-08-21 00:09:57.000', 3.75))
AS FeeCollection(Area, DateA, DateB, Amount)
I've tried this but, I don't know the real matter to number in a special way.
DENSE_RANK() OVER(ORDER BY Area, Amount)
This is the sample result I want to achieve. I'm looking for simple logic to do it. Using cursor or while looping will not be efficient for me.
I believe this is what you want. I use LAG to get the value of the prior row in a CTE, and then use a windowed COUNT to reduce the value of ROW_NUMBER by 1 for each row with the same consecutive value for amount:
WITH CTE AS(
SELECT Area,
DateA,
DateB,
Amount,
LAG(Amount) OVER (PARTITION BY Area ORDER BY DateA) AS PrevAmount
FROM (VALUES
('ABC', '2019-08-18', '2019-08-18 00:07:47.000', 3.75),
('ABC','2019-08-19', '2019-08-19 00:08:47.000', 3.75),
('ABC','2019-08-20', '2019-08-20 00:09:47.000', 3.65),
('ABC','2019-08-21', '2019-08-21 00:09:57.000', 3.75))
AS FeeCollection(Area, DateA, DateB, Amount))
SELECT Area,
DateA,
DateB,
Amount,
ROW_NUMBER() OVER (PARTITION BY Area ORDER BY DateA) -
COUNT(CASE Amount WHEN PrevAmount THEN 1 END) OVER (PARTITION BY Area ORDER BY DateA
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Number
FROM CTE
ORDER BY DateA;
I did assume your PARTITION BY clause, which you may need to change/remove/move to the ORDER BY. As we had only one value for Area was impossible to know what the value should be when it changes.
I would do this using lag() and a cumulative sum, but looking like:
select t.*,
sum(case when prev_amount = amount then 0 else 1 end) over
(partition by area order by datea) as number
from (select t.*,
lag(amount) over (partition by area order by datea) as prev_amount
from t
) t;

SQL sum over(partition) not subtracting negative values in SUM

I have the following query which outputs a list of transactions per user - units spent and units earned - column 'Amount'.
I have managed to group this per user and do a running total - column 'Running_Total_Spend'.
However it is ADDING the negative 'Amount' values rather than subtracting them. Sp pretty sure it is the SUM part of query not working.
WITH cohort AS(
SELECT DISTINCT userID FROM events_live WHERE startDate = '2018-07-26' LIMIT 50),
my_events AS (
SELET events_live.* FROM events_live WHERE eventDate >= '2018-07-26')
SELECT cohort.userID,
my_events.eventDate,
my_events.eventTimestamp,
CASE
--spent resource outputs a negative value ---working
WHEN transactionVector = 'SPENT' THEN -abs(my_events.productAmount)
--earned resource outputs a positive value ---working
WHEN transactionVector = 'RECEIVED' THEN my_events.productAmount END AS Amount,
ROW_NUMBER() OVER (PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS row,
--sum the values in column 'Amount' for this partition
--should sum positive and negative values ---NOT WORKING--converting negatives into positive
--------------------------------------------------
SUM(CASE WHEN my_events.productAmount >= 0 THEN my_events.productAmount
WHEN my_events.productAmount <0 THEN -abs(my_events.productAmount) end) OVER(PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS Running_Total_Spend
---------------------------------------------------
FROM cohort
INNER JOIN my_events ON cohort.userID=my_events.userID
WHERE productName = 'COINS' AND transactionVector IN ('SPENT','RECEIVED')
I suspect you want that logic around transactionvector for the sum too as my_events.productamount seems to be always positive.
...
sum(CASE
WHEN transactionvector = 'SPENT' THEN
-my_events.productamount
WHEN transactionvector = 'RECEIVED' THEN
my_events.productamount
END) OVER (PARTITION BY cohort.userid
ORDER BY cohort.userid,
eventTimestamp) running_total_spend
...
Update your sum function to -
SUM(my_events.productAmount) OVER(PARTITION BY cohort.userID ORDER BY cohort.userID, eventTimestamp asc) AS Running_Total_Spend

Loop in Oracle SQL, comparing one month to another

I have to draft a SQL query which does the following:
Compare current week (e.g. week 10) amount to the average amount over previous 4 weeks (Week# 9,8,7,6).
Now I need to run the query on a monthly basis so say for weeks (10,11,12,13).
As of now I am running it four times giving the week parameter on each run.
For example my current query is something like this :
select account_id, curr.amount,hist.AVG_Amt
from
(
select
to_char(run_date,'IW') Week_ID,
sum(amount) Amount,
account_id
from Transaction t
where to_char(run_date,'IW') = '10'
group by account_id,to_char(run_date,'IW')
) curr,
(
select account_id,
sum(amount) / count(to_char(run_date,'IW')) as AVG_Amt
from Transactions
where to_char(run_date,'IW') in ('6','7','8','9')
group by account_id
) hist
where
hist.account_id = curr.account_id
and curr.amount > 2*hist.AVG_Amt;
As you can see, if I have to run the above query for week 11,12,13 I have to run it three separate times. Is there a way to consolidate or structure the query such that I only run once and I get the comparison data all together?
Just an additional info, I need to export the data to Excel (which I do after running query on the PL/SQL developer) and export to Excel.
Thanks!
-Abhi
You can use a correlated sub-query to get the sum of amounts for the last 4 weeks for a given week.
select
to_char(run_date,'IW') Week_ID,
sum(amount) curAmount,
(select sum(amount)/4.0 from transaction
where account_id = t.account_id
and to_char(run_date,'IW') between to_char(t.run_date,'IW')-4
and to_char(t.run_date,'IW')-1
) hist_amount,
account_id
from Transaction t
where to_char(run_date,'IW') in ('10','11','12','13')
group by account_id,to_char(run_date,'IW')
Edit: Based on OP's comment on the performance of the query above, this can also be accomplished using lag to get the previous row's value. Count of number of records present in the last 4 weeks can be achieved using a case expression.
with sum_amounts as
(select to_char(run_date,'IW') wk, sum(amount) amount, account_id
from Transaction
group by account_id, to_char(run_date,'IW')
)
select wk, account_id, amount,
1.0 * (lag(amount,1,0) over (order by wk) + lag(amount,2,0) over (order by wk) +
lag(amount,3,0) over (order by wk) + lag(amount,4,0) over (order by wk))
/ case when lag(amount,1,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,2,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,3,0) over (order by wk) <> 0 then 1 else 0 end +
case when lag(amount,4,0) over (order by wk) <> 0 then 1 else 0 end
as hist_avg_amount
from sum_amounts
I think that is what you are looking for:
with lagt as (select to_char(run_date,'IW') Week_ID, sum(amount) Amount, account_id
from Transaction t
group by account_id, to_char(run_date,'IW'))
select Week_ID, account_id, amount,
(lag(amount,1,0) over (order by week) + lag(amount,2,0) over (order by week) +
lag(amount,3,0) over (order by week) + lag(amount,4,0) over (order by week)) / 4 as average
from lagt;

How can I add cumulative sum column?

I use SqlExpress
Following is the query using which I get the attached result.
SELECT ReceiptId, Date, Amount, Fine, [Transaction]
FROM (
SELECT ReceiptId, Date, Amount, 'DR' AS [Transaction]
FROM ReceiptCRDR
WHERE (Amount > 0)
UNION ALL
SELECT ReceiptId, Date, Amount, 'CR' AS [Transaction]
FROM ReceiptCR
WHERE (Amount > 0)
UNION ALL
SELECT strInvoiceNo AS ReceiptId, CONVERT(datetime, dtInvoiceDt, 103) AS Date, floatTotal AS Amount, 'DR' AS [Transaction]
FROM tblSellDetails
) AS t
ORDER BY Date
Result
want a new column which would show balance amount.
For example. 1 Row should show -2500, 2nd should -3900, 3rd should -700 and so on.
basically, it requires previous row' Account column's data and carry out calculation based on transaction type.
Sample Result
Well, that looks like SQL-Server , if you are using 2012+ , then use SUM() OVER() :
SELECT t.*,
SUM(CASE WHEN t.transactionType = 'DR'
THEN t.amount*-1
ELSE t.amount END)
OVER(PARTITION BY t.date ORDER BY t.receiptId,t.TransactionType DESC) as Cumulative_Col
FROM (YourQuery Here) t
This will SUM the value when its CR and the value*-1 when its DR
Right now I grouped by date, meaning each day will recalculate this column, if you want it for all time, replace the OVER() with this:
OVER(ORDER BY t.date,t.receiptId,t.TransactionType DESC) as Cumulative_Col
Also, I didn't understand why in the same date, for the same ReceiptId DR is calculated before CR , I've add it to the order by but if thats not what you want then explain the logic better.