So I have a scenario where I need to order data on a column without including it in dense_rank(). Here is my sample data set:
This is the table:
create table temp
(
id integer,
prod_name varchar(max),
source_system integer,
source_date date,
col1 integer,
col2 integer);
This is the dataset:
insert into temp
(id,prod_name,source_system,source_date,col1,col2)
values
(1,'ABC',123,'01/01/2021',50,60),
(2,'ABC',123,'01/15/2021',50,60),
(3,'ABC',123,'01/30/2021',40,60),
(4,'ABC',123,'01/30/2021',40,70),
(5,'XYZ',456,'01/10/2021',80,30),
(6,'XYZ',456,'01/12/2021',75,30),
(7,'XYZ',456,'01/20/2021',75,30),
(8,'XYZ',456,'01/20/2021',99,30);
Now, I want to do dense_rank() on the data in such a way that for a combination of "prod_name and source_system", the rank gets incremented only if there is a change in col1 or col2 but the data should still be in ascending order of source_date.
Here is the expected result:
id
prod_name
source_system
source_date
col1
col2
Dense_Rank
1
ABC
123
01-01-21
50
60
1
2
ABC
123
15-01-21
50
60
1
3
ABC
123
30-01-21
40
60
2
4
ABC
123
30-01-21
40
70
3
5
XYZ
456
10-01-21
80
30
1
6
XYZ
456
12-01-21
75
30
2
7
XYZ
456
20-01-21
75
30
2
8
XYZ
456
20-01-21
99
30
3
As you can see above, the dates are changing but the expectation is that rank should only change if there is any change in either col1 or col2.
If I use this query
select id,prod_name,source_system,source_date,col1,col2,
dense_rank() over(partition by prod_name,source_system order by source_date,col1,col2) as rnk
from temp;
Then the result would come as:
id
prod_name
source_system
source_date
col1
col2
rnk
1
ABC
123
01-01-21
50
60
1
2
ABC
123
15-01-21
50
60
2
3
ABC
123
30-01-21
40
60
3
4
ABC
123
30-01-21
40
70
4
5
XYZ
456
10-01-21
80
30
1
6
XYZ
456
12-01-21
75
30
2
7
XYZ
456
20-01-21
75
30
3
8
XYZ
456
20-01-21
99
30
4
And, if I exclude source_date from order by in rank function i.e.
select id,prod_name,source_system,source_date,col1,col2,
dense_rank() over(partition by prod_name,source_system order by col1,col2) as rnk
from temp;
Then my result is coming as:
id
prod_name
source_system
source_date
col1
col2
rnk
3
ABC
123
30-01-21
40
60
1
4
ABC
123
30-01-21
40
70
2
1
ABC
123
01-01-21
50
60
3
2
ABC
123
15-01-21
50
60
3
6
XYZ
456
12-01-21
75
30
1
7
XYZ
456
20-01-21
75
30
1
5
XYZ
456
10-01-21
80
30
2
8
XYZ
456
20-01-21
99
30
3
Both the results are incorrect. How can I get the expected result? Any guidance would be helpful.
WITH cte AS (
SELECT *,
LAG(col1) OVER (PARTITION BY prod_name, source_system ORDER BY source_date, id) lag1,
LAG(col2) OVER (PARTITION BY prod_name, source_system ORDER BY source_date, id) lag2
FROM temp
)
SELECT *,
SUM(CASE WHEN (col1, col2) = (lag1, lag2)
THEN 0
ELSE 1
END) OVER (PARTITION BY prod_name, source_system ORDER BY source_date, id) AS `Dense_Rank`
FROM cte
ORDER BY id;
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ac70104c7c5dfb49c75a8635c25716e6
When comparing multiple columns, I like to look at the previous values of the ordering column, rather than the individual columns. This makes it much simpler to add more and more columns.
The basic idea is to do a cumulative sum of changes for each prod/source system. In Redshift, I would phrase this as:
select t.*,
sum(case when prev_date = prev_date_2 then 0 else 1 end) over (
partition by prod_name, source_system
order by source_date
rows between unbounded preceding and current row
)
from (select t.*,
lag(source_date) over (partition by prod_name, source_system order by source_date, id) as prev_date,
lag(source_date) over (partition by prod_name, source_system, col1, col2 order by source_date, id) as prev_date_2
from temp t
) t
order by id;
I think I have the syntax right for Redshift. Here is a db<>fiddle using Postgres.
Note that ties on the date can cause a problem -- regardless of the solution. This uses the id to break the ties. Perhaps id can just be used in general, but your code is using the date, so this uses the date with the id.
Related
https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=9e6f83edf836f4496afb509eb9411d4a
Edited to include sql code:
CREATE TABLE TMP_PRODUCTS (STORE INT, UPC INT, PROMOCODE CHAR(3), FORSALE CHAR(1))
INSERT INTO TMP_PRODUCTS VALUES
(100,1,'123','Y'),
(100,2,'123','Y'),
(100,3,'123','N'),
(100,4,'124','Y'),
(100,5,'124','N'),
(100,6,'124','N'),
(100,7,'125','N'),
(100,8,'125','N'),
(100,9,'125','N');
SELECT
STORE,
UPC,
PROMOCODE,
DENSE_RANK() OVER (PARTITION BY STORE ORDER BY PROMOCODE) AS 'GroupCode'
FROM
TMP_PRODUCTS
WHERE
FORSALE = 'Y'
I need to return all rows where FORSALE='Y' across all groups of PROMOCODE, and also at least 1 row from all groups where FORSALE='N'. In this example all products from group 125 are FORSALE='N', but I need at least 1 row to return. Here is the output I am currently getting:
STORE UPC PROMOCODE GroupCode FORSALE
100 1 123 1 Y
100 2 123 1 Y
100 4 124 2 Y
But here is the ideal output I would like to get:
STORE UPC PROMOCODE GroupCode FORSALE
100 1 123 1 Y
100 2 123 1 Y
100 4 124 2 Y
100 7 125 3 N
It would also be completely acceptable to return 1 row from PROMOCODE 123 and 124 even though they already have some items that are FORSALE='Y'. So this would also be acceptable outcome:
STORE UPC PROMOCODE GroupCode FORSALE
100 1 123 1 Y
100 2 123 1 Y
100 3 123 1 N
100 4 124 2 Y
100 5 124 2 N
100 7 125 3 N
You can do that with an additional row number window function to always include 1 row from each group regardless of Y/N
select STORE, UPC, PROMOCODE, Dense_Rank() over (partition by STORE order by PROMOCODE) GROUPCODE, FORSALE
from (
select * , Row_Number() over(partition by STORE, PROMOCODE order by UPC) rn
from TMP_PRODUCTS
)x
where FORSALE = 'Y' or rn=1
If I understand correctly, the logic you want is:
SELECT STORE, UPC, PROMOCODE,
DENSE_RANK() OVER (PARTITION BY STORE ORDER BY PROMOCODE) AS GroupCode
FROM (SELECT P.*,
ROW_NUMBER() OVER (PARTITION BY STORE, PROMOCODE, FORSALE ORDER BY (SELECT NULL)) as seqnum
FROM TMP_PRODUCTS P
) P
WHERE FORSALE = 'Y' OR seqnum = 1;
I am having some trouble with the below query. I do understand I need to group by ID and Category, but I only want to group by ID while keeping the rest of the columns based on Rank being max. Is there a way to only group by certain columns?
select ID, Category, max(rank)
from schema.table1
group by ID
Input:
ID Category Rank
111 3 4
111 1 5
123 5 3
124 7 2
Current Output
ID Category Rank
111 3 4
111 9 1
123 5 3
124 7 2
Desired Output
ID Category Rank
111 1 5
123 5 3
124 7 2
You can use:
select *
from table1
where (id, rank) in (select id, max(rank) from table1 group by id)
Result:
ID CATEGORY RANK
---- --------- ----
111 1 5
123 5 3
124 7 2
Or you can use the ROW_NUMBER() window function. For example:
select *
from (
select *,
row_number() over(partition by id order by rank desc) as rn
from table1
) x
where rn = 1
See running example at db<>fiddle.
You can try using - row_number()
select * from
(
select ID, Category,rank, row_number() over(partition by id order by rank desc) as rn
from schema.table1
)A where rn=1
I have something similar to the below dataset...
ID RowNumber
101 1
101 2
101 3
101 4
101 5
101 1
101 2
What I would like to get is an additional column as below...
ID RowNumber New
101 1 1
101 2 1
101 3 1
101 4 1
101 5 1
101 1 2
101 2 2
I have toyed with dense_rank(), but no such luck.
Gordon already mentioned, you required a column to specify the order of data. If i consider ID as order by column, this following logic may help you to get your desired result-
WITH your_table(ID,RowNumber)
AS
(
SELECT 101,1 UNION ALL
SELECT 101,2 UNION ALL
SELECT 101,3 UNION ALL
SELECT 101,4 UNION ALL
SELECT 101,5 UNION ALL
SELECT 101,1 UNION ALL
SELECT 101,2
)
SELECT A.ID,A.RowNumber,
SUM(RN) OVER
(
ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) +1 New
FROM
(
SELECT *,
CASE
WHEN LAG(RowNumber) OVER(ORDER BY ID) > RowNumber THEN 1
ELSE 0
END RN
FROM your_table
)A
Above will always change the ROW NUMBER if the value in RowNumber decreased than previous one. Alternatively, the same output alsoo can be achieved if you wants to change row number whenever value 1 found. This is bit static option-
SELECT A.ID,A.RowNumber,
SUM(RN) OVER
(
ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) New
FROM(
SELECT *,
CASE
WHEN RowNumber = 1 THEN 1
ELSE 0
END RN
FROM your_table
)A
Output is-
ID RowNumber New
101 1 1
101 2 1
101 3 1
101 4 1
101 5 1
101 1 2
101 2 2
SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.
Assuming you have such a column, you can do what you want simply by counting the number of "1" up to each point:
select t.*,
sum(case when rownumber = 1 then 1 else 0 end) over (partition by id order by <ordering column>) as new
from t;
As Gordon alluded to, there is no default order in your example so it's difficult to imagine how to get a deterministic result (e.g. where the same values supplied to the same query always resulting in the exact same answer.)
This sample data includes a sequential PK column which is used to define the order of this set.
DECLARE #tbl TABLE (PK INT IDENTITY, ID INT, RowNumber INT)
INSERT #tbl(ID, RowNumber) VALUES (101,1),(101,2),(101,3),(101,4),(101,5),(101,1),(101,2);
SELECT t.* FROM #tbl AS t;
Returns:
PK ID RowNumber
----- ------ -----------
1 101 1
2 101 2
3 101 3
4 101 4
5 101 5
6 101 1
7 101 2
This query uses DENSE_RANK to get you what you want:
DECLARE #tbl TABLE (PK INT IDENTITY, ID INT, RowNumber INT)
INSERT #tbl(ID, RowNumber) VALUES (101,1),(101,2),(101,3),(101,4),(101,5),(101,1),(101,2);
SELECT t.ID, t.RowNumber, New = DENSE_RANK() OVER (ORDER BY t.PK - RowNumber)
FROM #tbl AS t;
Returns:
ID RowNumber New
----- ----------- ------
101 1 1
101 2 1
101 3 1
101 4 1
101 5 1
101 1 2
101 2 2
Note that ORDER BY New does not affect the plan.
Please try the below
Load the data into Temp table
Select id,RowNumber,Row_number()over(partition by RowNumber Order by id)New from #temp
Order by Row_number()over(partition by RowNumber Order by id),RowNumber
How to get the max value order of each customer ?
select num, max(sum(paid*quantity))
from orders join
pizza
using (order#)
group by customer#;
table
num orderN price
-------- --- -------
1 109 30
1 118 25
3 101 30
3 115 27
4 107 23
5 100 17
5 129 16
output req-
num Pnum price
-------- --- -------
1 109 30
3 101 30
4 107 23
5 100 17
You want to select the record having the highest price in each group of nums.
If your RDBMS supports window functions, that's straight forward with ROW_NUMBER() :
SELECT num, pnum, price
FROM (
SELECT t.*, ROW_NUMBER OVER(PARTITION BY num ORDER BY price DESC) rn
FROM mytable t
) x
WHERE rn = 1
Else, you can take the following approach, that uses a NOT EXISTS condition with a correlated subquery to ensure that the record being joined in the one with the highest price for the current num :
SELECT num, pnum, price
FROM mytable t
WHERE NOT EXISTS (
SELECT 1 FROM mytable t1 WHERE t1.num = t.num AND t1.price > t.price
)
I am using SQL Server 2008 R2 and have a structure as below:
create table #temp( deptid int, regionid int)
insert into #temp
select 15000, 50
union
select 15100, 51
union
select 15200, 50
union
select 15300, 52
union
select 15400, 50
union
select 15500, 51
union
select 15600, 52
select deptid, regionid, RANK() OVER(PARTITION BY regionid ORDER BY deptid) AS 'RANK',
ROW_NUMBER() OVER(PARTITION BY regionid ORDER BY deptid) AS 'ROW_NUMBER',
DENSE_RANK() OVER(PARTITION BY regionid ORDER BY deptid) AS 'DENSE_RANK'
from #temp
drop table #temp
And output currently is as below:
deptid regionid RANK ROW_NUMBER DENSE_RANK
--------------------------------------------------
15000 50 1 1 1
15200 50 2 2 2
15400 50 3 3 3
15100 51 1 1 1
15500 51 2 2 2
15300 52 1 1 1
15600 52 2 2 2
My requirement however is to row_number over regionid column but by grouping and not row by row. To explain better, below is my desired result set.
deptid regionid RN
-----------------------
15000 50 1
15200 50 1
15400 50 1
15100 51 2
15500 51 2
15300 52 3
15600 52 3
Please let me know if my question is unclear. Thanks.
Use dense_rank() over (order by regionid) to get the expected result.
select deptid, regionid,
DENSE_RANK() OVER( ORDER BY regionid) AS 'DENSE_RANK'
from #temp
Partitioning within a rank/row_number window function will assign numbers within the partitions, so you don't need to use a partition on regionid to order the regionids themselves.