How to rank partitions by date order when values which I am partitioning on can repeat?

How to rank partitions by date order when values which I am partitioning on can repeat? - sql

I have a query which looks for the number of different values of a key field over a period of time and assigns a rank to the values in the order they occur.
So, for example I might have:
ID Date Value
1 2010-01-01 125.00
1 2010-02-01 125.00
1 2010-03-01 130.00
1 2010-04-01 131.00
1 2010-05-01 131.00
1 2010-06-01 131.00
1 2010-07-01 126.00
1 2010-08-01 140.00
I am using
ROW_NUMBER() over(partition by [ID] order by [Date]) as [row]
to rank the different values of the Value column in the date order they occur. So I would get something like
Value row
125.00 1
130.00 2
131.00 3
126.00 4
etc
THe problem I am having is that sometimes a value might repeat. So in the above example if the value on 1st August was 125.00 for example. I want to treat this as a seperate occurance but using the ranking function I am using at the moment it obviously gets aggregated into a partition with the other instances of 125.00 when calculating the row number.
What's the easiest way for me to overcome this problem please? Thanks in advance!

This should work:
WITH A
AS
(SELECT ID, [Date], [Value], ROW_NUMBER() over(partition by [ID] order by [Value], [Date]) as [row]
FROM YourTable)
SELECT A.[Date], A.[Value], B.min_row as row
FROM A JOIN (SELECT ID, [Value], MIN([row]) AS min_row
FROM A) AS B
ON A.ID = B.ID AND A.[Value] = B.[Value]

Related

How to effective get the max data in sub group and get the min data in among big groups?

Firstly, I hope to get the max date in each sub-groups.
Group A = action 1 & 2
Group B = action 3 & 4
actionName
action
actionBy
actiontime
999
1
Tom
2022-07-15 09:18:00
999
1
Tom
2022-07-15 15:21:00
999
2
Peter
2022-07-15 14:06:00
999
2
Peter
2022-07-15 14:08:00
999
3
Sally
2022-07-15 14:20:00
999
3
Mary
2022-07-15 14:22:00
999
4
Mary
2022-07-15 14:25:00
In this example:
The max time of group A is "1 | Tom | 2022-07-15 15:21:00 "
The max time of group B is " 4 | Mary | 2022-07-15 14:25:00 "
The final answer is "1 | Tom | 2022-07-15 14:25:00 ", which is the minimum data among groups.
I have a method how to get the max date in each group like the following code.
with cte1
as (select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '1', '2' )
UNION
select actionName,
actiontime,
actionBy,
row_number() over (partition by actionName order by actiontime desc) as rn
from actionDetails
where action in ( '3', '4' )
)
select *
from cte1
where rn = 1
ActionName is not PK. It would get the max data in each group.
Then, I don't know how to use an effective way to get the minimum data between group A and group B. Would you give me some ideas?
I know one of the methods is self join again. However, I think that is not the best solution.

First of all, you can simplify your query by putting the action groups into the partition clause. Use a case expression to get one group for actions 1 and 2 and another for actions 3 and 4.
Then after getting the maximum dates per actionname and action group you want to get the minimum dates of these per actionname. This means you want a second CTE building up on the first one:
with max_per_group as
(
select top(1) with ties
actionname,
actiontime,
actionby
from actiondetails
where action in (1, 2, 3, 4)
order by row_number()
over (partition by actionname, case when action <= 2 then 1 else 2 end
order by actiontime desc)
)
, min_of_max as
(
select top(1) with ties
actionname,
actiontime,
actionby
from max_per_group
order by row_number() over (partition by actionname order by actiontime)
)
select actionname, actiontime, actionby
from min_of_max
order by actionname;
As you see, instead of computing a row number and then have to limit rows based on that in the next query, I limit the rows right away by putting the row numbering into the ORDER BY clause and applying TOP(1) WITH TIES to get all rows numbered 1. I like this a tad better, because the CTE already produces the rows that I want to work with rather than only marking them in a bigger data set. But that's personal preference I guess.
Discaimer:
In my query I assume that the column action is numeric. If the column is a string instead, because it can hold values that are not numbers, then work with strings:
where action in ('1', '2', '3', '4')
partition by actionname, case when action in ('1', '2') then 1 else 2 end
If on the other hand the column is a string, but there are only numbers in that column, fix your table instead.

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1

I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

SQL Query - Design struggle

I am fairly new to SQL Server (2012) but I was assigned the project where I have to use it.
The database consists of one table (counted in millions of rows) which looks mainly like this:
Number (float) Date (datetime) Status (nvarchar(255))
999 2016-01-01 14:00:00.000 Error
999 2016-01-02 14:00:00.000 Error
999 2016-01-03 14:00:00.000 Ok
999 2016-01-04 14:00:00.000 Error
888 2016-01-01 14:00:00.000 Error
888 2016-01-02 14:00:00.000 Ok
888 2016-01-03 14:00:00.000 Error
888 2016-01-04 14:00:00.000 Error
777 2016-01-01 14:00:00.000 Error
777 2016-01-02 14:00:00.000 Error
I have to create a query which will show me only the phone numbers (one number per row so probably Group by number?) that meet the conditions:
Number reappears at least 3 times
Last two times (that has to be based on date; originally records are not sorted by date) has to be an Error
For example, in the table above the phone number that meets the criteria is only 888, beacuse for 999 2nd newest status is Ok and number 777 reoccurs only 2 times.
I will appreciate any kind of help!
Thanks in advance!

You can use row_number() and conditional aggregation:
select number
from (select t.*,
row_number() over (partition by number order by date desc) as seqnum
from t
) t
group by number
having count(*) >= 3 and
max(case when seqnum = 1 then status end) = 'Error' and
max(case when seqnum = 2 then status end) = 'Error';
Note: float is a really, really bad type to use for the "number" column. In particular, two numbers can look the same but differ in low-order bits. They will produce different rows in the group by.
You should probably use varchar() for telephone numbers. That gives you the most flexibility. If you need to store the number as a number, then decimal/numeric is a much, much better choice than float.

select *, ROW_NUMBER() OVER(partition by Number, order by date desc) as times
FROM
(
select Number, Date
From table
where Number in
(
select Number
from table
group by Number
having count (*) >3
) as ABC
WHERE ABC.times in (1,2) and ABC.Status = 'Error'

with CTE as
(
select t1.*, row_number() over(partition by t1.Number order by t1.date desc) as r_ord
from MyTable t1
)
select C1.*
from CTE C1
inner join
(
select Number
from CTE
group by Number
having max(r_ord) >=3
) C2
on C1.Number = C2.Number
where C1.r_ord in (1,2)
and C1.Status = 'Error'

add column based on a column value in one row

I've this table with the following data
user Date Dist Start
1 2014-09-03 150 12500
1 2014-09-04 220 null
1 2014-09-05 100 null
2 2014-09-03 290 18000
2 2014-09-04 90 null
2 2014-09-05 170 null
Based on the value in Start Column i need to add another column and repeat the value if not null for the same user
The resultant table should be as below
user Date Dist Start StartR
1 2014-09-03 150 12500 12500
1 2014-09-04 220 null 12500
1 2014-09-05 100 null 12500
2 2014-09-03 290 18000 18000
2 2014-09-04 90 null 18000
2 2014-09-05 170 null 18000
Can someone please help me out with this query? because i don't have any idea how can i do it

For the data you have, you can use a window function:
select t.*, min(t.start) over (partition by user) as StartR
from table t
You can readily update using the same idea:
with toupdate as (
select t.*, min(t.start) over (partition by user) as new_StartR
from table t
)
update toupdate
set StartR = new_StartR;
Note: this works for the data in the question and how you have phrased the question. It would not work if there were multiple Start values for a given user, or if there were NULL values that you wanted to keep before the first non-NULL Start value.

You can use COALESCE/ISNULL and a correlated sub-query:
SELECT [user], [Date], [Dist], [Start],
StartR = ISNULL([Start], (SELECT MIN([Start])
FROM dbo.TableName t2
WHERE t.[User] = t2.[User]
AND t2.[Start] IS NOT NULL))
FROM dbo.TableName t
I have used MIN([Start]) since you haven't said what should happen if there are multiple Start values for one user that are not NULL.

SQL ORDER BY with grouping

I have the following query
SELECT Id, Request, BookingDate, BookingId FROM Table ORDER BY Request DESC, Date
If a row has a similar ForeignKeyId, I would like that to go in before the next ordered row like:
Request Date ForeignKeyId
Request3 01-Jun-11 56
Request2 03-Jun-11 89
NULL 03-Jun-11 89
Request1 05-Jun-11 11
NULL 20-Jul-11 57
I have been looking at RANK and OVER but haven't found a simple fix.
EDIT
I've edited above to show the actual fields and pasted data using the following query from Andomar's answer
select *
from (
select row_number() over (partition by BookingId order by Request DESC) rn
, Request, BookingDate, BookingID
from Table
WHERE Date = '28 aug 11'
) G
order by
rn
, Request DESC, BookingDate
1 ffffff 23/01/2011 15:57 350821
1 ddddddd 10/01/2011 16:28 348856
1 ccccccc 13/09/2010 14:44 338120
1 aaaaaaaaaa 21/05/2011 20:21 364422
1 123 17/09/2010 16:32 339202
1 NULL NULL
2 gggggg 08/12/2010 14:39 346634
2 NULL NULL
2 17/09/2010 16:32 339202
2 NULL 10/04/2011 15:08 361066
2 NULL 02/05/2011 14:12 362619
2 NULL 11/06/2011 13:55 366082
3 NULL NULL
3 16/10/2010 13:06 343023
3 22/10/2010 10:35 343479
3 30/04/2011 10:49 362435
The booking ID's 339202 should appear next to each other but don't

You could partition by ForeignKeyId, then sort each second or lower row below their "head". With the "head" defined as the first row for that ForeignKeyId. Example, sorting on Request:
; with numbered as
(
select row_number() over (partition by ForeignKeyID order by Request) rn
, *
from #t
)
select *
from numbered n1
order by
(
select Request
from numbered n2
where n2.ForeignKeyID = n1.ForeignKeyID
and n2.rn = 1
)
, n1.Request
The subquery is required because SQL Server doesn't allow row_number in an order by clause.
Full example at SE Data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to rank partitions by date order when values which I am partitioning on can repeat? - sql

This should work: WITH A AS (SELECT ID, [Date], [Value], ROW_NUMBER() over(partition by [ID] order by [Value], [Date]) as [row] FROM YourTable) SELECT A.[Date], A.[Value], B.min_row as row FROM A JOIN (SELECT ID, [Value], MIN([row]) AS min_row FROM A) AS B ON A.ID = B.ID AND A.[Value] = B.[Value]

Related

How to effective get the max data in sub group and get the min data in among big groups?

Expanding/changing my query to find more entries using (potentially) IFELSE

SQL Query - Design struggle

add column based on a column value in one row

SQL ORDER BY with grouping

Categories

Resources