Ranking at Multiple Levels on SQL Server - sql

I've read a few ways of doing this, but it does not seem to work for me. I'm trying to pull data that has a Category, Itemcode, and Sales. I'm summing this up for a period of time so that my basic query looks like this:
select
Category
, Itemcode
, sum(Sales)
, rank() over (partition by Category order by sum(Sales) desc) as ItemRank
from
Sales
group by
Category, Itemcode
When I do that, my data looks like this:
What I would like to do is to add another rank that would show the rank of the Category as a whole.
Something like this:
What would the query look like with that added in? I've tried several things, but I can't seem to get it to work.

Taking a guess here, but I gather you want to rank based on the TotalSales? If so, you can join to a subquery to do the aggregation, then use that in another rank column:
declare #Sales table (Category varchar(25), ItemCode int, Sales int)
insert into #Sales
values
('Category A', 123, 100),
('Category A', 234, 125),
('Category A', 345, 97),
('Category B', 456, 354),
('Category B', 567, 85),
('Category B', 678, 112)
select
s.Category
, Itemcode
, sum(s.Sales)
, rank() over (partition by s.Category order by sum(Sales) desc) as ItemRank
, dense_rank() over (order by sum(TotalSales) desc) as CategoryRank
from #Sales s
join ( select Category, sum(Sales) as TotalSales
from #Sales
group by Category
) d
on s.Category = d.Category
group by s.Category, Itemcode

Related

How to determine if records (for a certain set) follow a correct order?

I have a table that contains 3 different status'
1. CLICKED
2. CLAIMED
3. BOUGHT
In this particular order. I am trying to determine if there were any records that did not occur in the right order based on their date.
For example, this is the data:
enter image description here
Record 121144 has correct order status, this is good.
Record 121200 is incorrect since bought happens before clicked even if clicked and claimed follow the right order.
Record 121122 is incorrect, since CLICKED status comes after CLAIMED.
Record 121111 also has correct order status (even if they are the same).
Record 121198 is also correct since the status order follows, even if there is no BOUGHT.
CREATE TABLE TBL_A
(
number_id int,
country varchar(50),
status varchar(50),
datetime date
);
INSERT INTO TBL_A
VALUES (121144, 'USA', 'CLICKED', '2021-10-09'),
(121144, 'USA', 'CLAIMED', '2021-10-10'),
(121144, 'USA', 'BOUGHT', '2021-10-11'),
(121111, 'CAD', 'CLICKED', '2021-10-12'),
(121111, 'CAD', 'CLAIMED', '2021-10-12'),
(121111, 'CAD', 'BOUGHT', '2021-10-12'),
(121122, 'PES', 'CLICKED', '2021-09-11'),
(121122, 'PES', 'CLAIMED', '2021-09-09'),
(121122, 'PES', 'BOUGHT', '2021-09-12'),
(121198, 'AU', 'CLICKED', '2021-09-11'),
(121198, 'AU', 'CLAIMED', '2021-09-12'),
(121200, 'POR', 'CLICKED', '2021-09-10'),
(121200, 'POR', 'CLAIMED', '2021-09-11'),
(121200, 'POR', 'BOUGHT', '2021-09-08');
My answer includes the potential for skipped steps which OP mentioned in comments. Rather than using a strictly matching sequence this approach looks for adjacent pairs where the preceding step was numbered higher:
with A as (
select *,
case status
when 'CLICKED' then 1
when 'CLAIMED' then 2
when 'BOUGHT' then 3 end as desired_order
from T
), B as (
select *,
row_number() over (
partition by number_id
order by datetime, desired_order) as rn -- handles date ties
from A
), C as (
select *,
-- look for pairs of rows where one is reversed
case when lag(desired_order) over (partition by number_id order by rn) >
desired_order then 'Y' end as flag
from B
)
select number_id, min(country) as country,
case min(flag) when 'Y' then 'Out of order' else 'In order' end as "status"
from C
group by number_id;
https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=f0ee1de8e8e81229ddc23acc97bce7d7
As Thorston pointed out, you could also take the approach of generating a pair of row numbers and then comparing the two looking for mismatches. Glancing at the query plans this may involve an extra sort operation so it would be worth trying both ways on your data.
...
), B as (
select *,
row_number() over (
partition by number_id
order by desired_order) as rn1,
row_number() over (
partition by number_id
order by datetime, desired_order) as rn2
from A
)
select
number_id, min(country) as country,
case when max(case when rn1 <> rn2 then 1 else 0 end) = 1
then 'Out of order' else 'In order' end as status
...
Using ARRAY_AGG ordered by datetime:
SELECT number_id,
ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime) AS statuses, -- debug
CASE WHEN ARRAY_AGG(status) WITHIN GROUP(ORDER BY datetime)
IN (ARRAY_CONSTRUCT('CLICKED', 'CLAIMED', 'BOUGHT'),
ARRAY_CONSTRUCT('CLICKED', 'CLAIMED')) THEN 'In order'
ELSE 'Out of order'
END AS status
FROM TBL_A
GROUP BY number_id;
Output:
Here is one way using some string aggregation and manipulations. This works as expected against the sample data and also accounts for edge cases that include skipping status, missing status, and single status.
with cte as
(select *,listagg(status,'>') within group (order by datetime,charindex(status,'CLICKED>CLAIMED>BOUGHT')) over (partition by number_id, country) as event_order
from t)
select distinct
number_id,
country,
case when charindex(event_order,'CLICKED>CLAIMED>BOUGHT,CLICKED>BOUGHT')>0 then 'Ordered' else 'Unordered' end as order_flag
from cte
order by number_id;

SQL Server - How can I query to return products only when their sales exceeds a certain percentage?

The basic requirement is this: We capture sales by day of week and product. If more than half* of the day's sales came from one product, we want to capture that. Else we show "none".
So image we sell shoes, pants and shirts. On Monday, we sold $100 of each. So it was a three way split, and each category accounted for 33.3% of sales. We show "none". On Tuesday though, half of our sales came from shoes, and on Wednesday, 80% from shirts. So we want to see that.
The query below returns the desired result, but I'm not a fan of a queries within queries within queries. They can be inefficient and hard to read, and I feel like there's a cleaner way. Can this be improved upon?
*The requirement for half will be a parameter (#threshold here). In some cases, we might want to show only when it's 75% or more of sales. Obviously that parameter has to be >= 50%.
declare #sales as table (day_of_week varchar(16), product varchar(8), sales_amt int)
insert into #sales values ('monday', 'shoes', 100)
insert into #sales values ('monday', 'pants', 100)
insert into #sales values ('monday', 'shirts', 100)
insert into #sales values ('tuesday', 'shoes', 500)
insert into #sales values ('tuesday', 'pants', 300)
insert into #sales values ('tuesday', 'shirts', 200)
insert into #sales values ('wednesday', 'shoes', 100)
insert into #sales values ('wednesday', 'pants', 100)
insert into #sales values ('wednesday', 'shirts', 800)
declare #threshold as decimal(3,2) = 0.5
select day_of_week, case when pct_of_day >= #threshold then product else 'none' end half_of_sales from (
select day_of_week, product, pct_of_day, row_number() over (partition by day_of_week order by pct_of_day desc) _rn
from (
select day_of_week, product, sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day
from #sales
group by day_of_week, product
) x
) z
where _rn = 1
maybe a little easier to read?
DECLARE #threshold AS decimal(3, 2) = 0.5;
WITH ssum
AS (SELECT
day_of_week,
SUM(sales_amt) sa
FROM #sales
GROUP BY day_of_week)
SELECT
s.day_of_week,
MAX(CASE WHEN s.sales_amt * 1.0 / ssum.sa >= #threshold THEN s.product ELSE 'none' END) threshold
FROM ssum
INNER JOIN #sales AS s
ON ssum.day_of_week = s.day_of_week
GROUP BY s.day_of_week
Firstly, you can place the nested queries in CTEs, which can make them easier to read. It won't make them more efficient, but then nested queries are not necessarily inefficient in themselves, not sure why you think so
Second, the query could be optimized, because the row-numbering is equally valid on the non-percentaged sum(sales_amt) value, so it can be on the same level as the windowed sum over
declare #threshold as decimal(3,2) = 0.5;
with GroupedSales as (
select
day_of_week,
product,
sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day,
row_number() over (partition by day_of_week order by sum(sales_amt) desc) _rn
from #sales
group by
day_of_week,
product
)
select
day_of_week,
case when pct_of_day >= #threshold
then product
else 'none'
end half_of_sales
from GroupedSales
where _rn = 1;

Looking for the same Trader buying and selling the same product within the 3minutes

Below I've the example table
Create Table #A
(
Time nvarchar(70),
Trader nvarchar(30),
Product nvarchar(30),
[Buy/Sell] nvarchar(30)
)
Insert into #A Values
('2019-03-01T14:22:29z', 'Jhon', 'Apple', 'Buy'),
('2019-03-01T12:35:09z', 'Jhon', 'Orange', 'Sell'),
('2019-03-01T12:35:09z', 'Mary', 'Milk', 'Buy'),
('2019-03-01T12:35:10z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:35:23z', 'Tom', 'Bread', 'Sell'),
('2019-03-01T14:15:52z', 'Jhon', 'Apple', 'Sell'),
('2019-03-01T14:15:53z', 'Tom', 'Orange', 'Sell'),
('2019-03-01T14:22:33z', 'Mary', 'Apple', 'Buy'),
('2019-03-01T14:22:37z', 'Mary', 'Orange', 'Sell'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy'),
('2019-03-01T12:37:41z', 'Susan', 'Milk', 'Buy')
Select * from #A
Basically I'm to get the same Trader buying and selling the same product within the 3minutes
Below I've tried this but not the correct one and working
;With DateTimeTbl
as
(
select SUBSTRING(a.Time,1,10) date, SUBSTRING(a.Time,12,8) Time1, a.*
-- lead(Time) over(order by time) cnt
from #A a
),
DataTbl
as
(
Select d.*, row_number() over(Partition by d.Trader,d.product order by d.time1) CntSrs
from DateTimeTbl d
--where [buy/sell] = 'Sell'
)
Select lag(Time1) over(order by time) cnt, d.* from DataTbl d where CntSrs>1
Basically I'm to get the same Trader buying and selling the same product within the 3minutes
I would suggest lead(). To get the first record:
select a.*
from (select a.*,
lead(time) over (partition by trader, product order by time) as next_time,
lead(buy_sell) over (partition by trader, product order by time) as next_buy_sell
from #a a
) a
where next_time < dateadd(minute, 3, time) and
buy_sell <> next_buy_sell;
Note: This assumes that buy_sell takes on only two values, which is consistent with your sample data.
Here is a db<>fiddle. Note that it fixes the data types to be appropriate (for the time column) and renames the last column so it does not need to be escaped.

Find date of most recent overdue

I have the following problem: from the table of pays and dues, I need to find the date of the last overdue. Here is the table and data for example:
create table t (
Id int
, [date] date
, Customer varchar(6)
, Deal varchar(6)
, Currency varchar(3)
, [Sum] int
);
insert into t values
(1, '2017-12-12', '1110', '111111', 'USD', 12000)
, (2, '2017-12-25', '1110', '111111', 'USD', 5000)
, (3, '2017-12-13', '1110', '122222', 'USD', 10000)
, (4, '2018-01-13', '1110', '111111', 'USD', -10100)
, (5, '2017-11-20', '2200', '222221', 'USD', 25000)
, (6, '2017-12-20', '2200', '222221', 'USD', 20000)
, (7, '2017-12-31', '2201', '222221', 'USD', -10000)
, (8, '2017-12-29', '1110', '122222', 'USD', -10000)
, (9, '2017-11-28', '2201', '222221', 'USD', -30000);
If the value of "Sum" is positive - it means overdue has begun; if "Sum" is negative - it means someone paid on this Deal.
In the example above on Deal '122222' overdue starts at 2017-12-13 and ends on 2017-12-29, so it shouldn't be in the result.
And for the Deal '222221' the first overdue of 25000 started at 2017-11-20 was completly paid at 2017-11-28, so the last date of current overdue (we are interested in) is 2017-12-31
I've made this selection to sum up all the payments, and stuck here :(
WITH cte AS (
SELECT *,
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
Apparently i need to find (for each Deal) minimum of Dates if there is no 0 or negative Debt_balance and the next date after the last 0 balance otherwise..
Will be gratefull for any tips and ideas on the subject.
Thanks!
UPDATE
My version of solution:
WITH cte AS (
SELECT ROW_NUMBER() OVER (ORDER BY Deal, [Date]) id,
Deal, [Date], [Sum],
SUM([Sum]) OVER(PARTITION BY Deal ORDER BY [Date]) AS Debt_balance
FROM t
)
SELECT a.Deal,
SUM(a.Sum) AS NET_Debt,
isnull(max(b.date), min(a.date)),
datediff(day, isnull(max(b.date), min(a.date)), getdate())
FROM cte as a
LEFT OUTER JOIN cte AS b
ON a.Deal = b.Deal AND a.Debt_balance <= 0 AND b.Id=a.Id+1
GROUP BY a.Deal
HAVING SUM(a.Sum) > 0
I believe you are trying to use running sum and keep track of when it changes to positive, and it can change to positive multiple times and you want the last date at which it became positive. You need LAG() in addition to running sum:
WITH cte1 AS (
-- running balance column
SELECT *
, SUM([Sum]) OVER (PARTITION BY Deal ORDER BY [Date], Id) AS RunningBalance
FROM t
), cte2 AS (
-- overdue begun column - set whenever running balance changes from l.t.e. zero to g.t. zero
SELECT *
, CASE WHEN LAG(RunningBalance, 1, 0) OVER (PARTITION BY Deal ORDER BY [Date], Id) <= 0 AND RunningBalance > 0 THEN 1 END AS OverdueBegun
FROM cte1
)
-- eliminate groups that are paid i.e. sum = 0
SELECT Deal, MAX(CASE WHEN OverdueBegun = 1 THEN [Date] END) AS RecentOverdueDate
FROM cte2
GROUP BY Deal
HAVING SUM([Sum]) <> 0
Demo on db<>fiddle
You can use window functions. These can calculate intermediate values:
Last day when the sum is negative (i.e. last "good" record).
Last sum
Then you can combine these:
select deal, min(date) as last_overdue_start_date
from (select t.*,
first_value(sum) over (partition by deal order by date desc) as last_sum,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where last_sum > 0 and date > max_date_neg
group by deal;
Actually, the value on the last date is not necessary. So this simplifies to:
select deal, min(date) as last_overdue_start_date
from (select t.*,
max(case when sum < 0 then date end) over (partition by deal order by date) as max_date_neg
from t
) t
where date > max_date_neg
group by deal;

Count previous consecutive rows in SQL Server

I have attendance data list which is showing below. Now I am trying to find data by a specific date range (01/05/2016 ā€“ 07/05/2016) with total Present Column, Total Present Column will be calculated from previous present data (P). Suppose today is 04/05/2016. If a person has 01,02,03,04 status ā€˜pā€™ then it will show date 04-05-2016 total present 4.
Could you help me to find total present from this result set.
You can check this example, which have logic to calculate previous sum value.
declare #t table (employeeid int, datecol date, status varchar(2) )
insert into #t values (10001, '01-05-2016', 'P'),
(10001, '02-05-2016', 'P'),
(10001, '03-05-2016', 'P'),
(10001, '04-05-2016', 'P'),
(10001, '05-05-2016', 'A'),
(10001, '06-05-2016', 'P'),
(10001, '07-05-2016', 'P'),
(10001, '08-05-2016', 'L'),
(10002, '07-05-2016', 'P'),
(10002, '08-05-2016', 'L')
--select * from #t
select * ,
SUM(case when status = 'P' then 1 else 0 end) OVER (PARTITION BY employeeid ORDER BY employeeid, datecol
ROWS BETWEEN UNBOUNDED PRECEDING
AND current row)
from
#t
Another twist of the same thing via cte (as you written SQLSERVER2012, this below solution only work in Sqlserver 2012 and above)
;with cte as
(
select employeeid , datecol , ROW_NUMBER() over(partition by employeeid order by employeeid, datecol) rowno
from
#t where status = 'P'
)
select t.*, cte.rowno ,
case when ( isnull(cte.rowno, 0) = 0)
then LAG(cte.rowno) OVER (ORDER BY t.employeeid, t.datecol)
else cte.rowno
end LagValue
from #t t left join cte on t.employeeid = cte.employeeid and t.datecol = cte.datecol
order by t.employeeid, t.datecol
You could use a subquery to calculate TotalPresent for each row:
SELECT
main.EmployeeID,
main.[Date],
main.[Status],
(
SELECT SUM(CASE WHEN t.[Status] = 'P' THEN 1 ELSE 0 END)
FROM [TableName] t
WHERE t.EmployeeID = main.EmployeeID AND t.[Date] <= main.[Date]
) as TotalPresent
FROM [TableName] main
ORDER BY
main.EmployeeID,
main.[Date]
Here I used subquery to count the sum of records that have the same EmployeeID and date is less or equal to the date of current row. If status of the record is 'P', then 1 is added to the sum, otherwise 0, which counts only records that have status P.
Interesting question, this should work:
select *
, (select count(retail) from p g
where g.date <= p.date and g.id = p.id and retail = 'P')
from p
order by ID, Date;
So I believe I understand correctly. You would like to count the occurences of P per ID datewise.
This makes a lot of sense. That is why the first occurrence of ID2 was L and the Total is 0. This query will count P status for each occurrence, pause at non-P for each ID.
Here is an example