t-sql re-rank when group field changes - sql

I'm stuck! I am trying to create a counter which starts at 1 again when group field changes:
This is what I am trying to get:
ProdID Date counter
123 1/1/2016 1
123 1/2/2016 2
123 1/3/2016 3
123 1/4/2016 4
456 1/1/2016 1
456 1/2/2016 2
789 1/1/2016 1
789 1/2/2016 2
789 1/3/2016 3
789 1/4/2016 4
789 1/5/2016 5
When I use rank() and over, doesn't reset when prodid changes?

If you're just trying to select the data then this should give you those results:
SELECT
ProdID,
[Date], -- A poor name for a column, since it's not only a reserved word, but also not at all descriptive
ROW_NUMBER() OVER (PARTITION BY ProdID ORDER BY [Date]) AS counter
FROM
My_Table
PARTITION BY tells SQL Server that you want the windows for the ROW_NUMBER windowed function to be partitioned by the ProdID. Imagine breaking up your data into groups by ProdID. The ORDER BY tells it to order the data within each window by the Date before applying the function.

Did you try this?
SELECT ProdId,Date, ROW_NUMBER() OVER
(PARTITION BY ProdID ORDER BY Date DESC)
AS Counter from table
order by Date ASC

Related

Dense Rank grouping by IDs

I am having trouble getting my DENSE_RANK() function in Oracle to work how I would like. First, my dataset:
ID DATE
1234 01-OCT-2020
1234 01-OCT-2021
1234 01-OCT-2022
2345 01-APR-2020
2345 01-APR-2021
2345 01-APR-2022
I am trying to use the dense rank function to return results with a sequence number based on the DATE field, and grouping by ID. How I want the data to return:
ID DATE SEQ
1234 01-OCT-2020 1
1234 01-OCT-2021 2
1234 01-OCT-2022 3
2345 01-APR-2020 1
2345 01-APR-2021 2
2345 01-APR-2022 3
The query I have so far:
SELECT ID, DATE, DENSE_RANK() Over (order by ID, DATE asc) as SEQ
However, this returns incorrectly as the sequence number will go to 6 (Like its disregarding my intentions to sequence based on the DATE field within a certain ID). If anyone has any insights into how to make this work it would be very much appreciated!
You want row_number():
select id, date, row_number() over (partition by id order by date) as seq
You could actually use dense_rank() as well, if you want duplicates to have the same idea. The key idea is partition by.

Find Column With Max on Other Column Grouping By Another Column

I have a table like this:
Id - ItemId - Price - SalesId - Date
1 12 99.99924 21899234 2025-01-01 00:00:00.000000
2 123 12.34567 348923 2021-01-01 00:00:00.000000
3 1234 1234.5 3321234 2022-01-01 00:00:00.000000
4 12345 3.3246 2154234 2023-01-01 00:00:00.000000
5 1234 451.234 3423 2020-02-01 00:00:00.000000
6 12345 0.989 71112357 2020-09-15 20:20.10.000000
7 123 3435.3 71112357 2020-09-14 20:10:12.000000
I am trying to find the Price of an Item with latest Date. For example, if we tried to find ItemId = 1234, the one with the latest date is this 2022-01-01 00:00:00.000000 that has Id = 3, it has the price of 1234.5. That's what I'm trying to find by this query, the price of this item.
I am a beginner to SQL and tried the following query, but it gives me this error:
select "ItemId",
max("Date"),
"Price"
from "Products"
group by "ItemId"
[42803] ERROR: column "Products.Price" must appear in the GROUP BY clause or be used in an aggregate function
I appreciate any help here. Thank you!
In Postgres, you can use distinct on:
select distinct on ("ItemId") p.*
from "Products" p
order by "ItemId", "Date" desc;
Note: If you are learning SQL, don't use double quotes for string and column names.
You can try using row_number()
select * from
(
select ItemId,Date,Price,row_number() over(partition by itemid order by date desc) as rn
from Products
)A where rn=1

Expanding/changing my query to find more entries using (potentially) IFELSE

My question will use this dataset as an example. I have a query setup (I have changed variables to more generic variables for the sake of posting this on the internet so the query may not make perfect sense) that picks the most recent date for a given account. So the query returns values with a reason_type of 1 with the most recent date. This query has effective_date set to is not null.
account date effective_date value reason_type
123456 4/20/2017 5/1/2017 5 1
123456 1/20/2017 2/1/2017 10 1
987654 2/5/2018 3/1/2018 15 1
987654 12/31/2017 2/1/2018 20 1
456789 4/27/2018 5/1/2018 50 1
456789 1/24/2018 2/1/2018 60 1
456123 4/25/2017 null 15 2
789123 5/1/2017 null 16 2
666888 2/1/2018 null 31 2
333222 1/1/2018 null 20 2
What I am looking to do now is to basically use that logic to only apply to reason_type
if there is an entry for it, otherwise have it default to reason_type
I think I should be using an IFELSE, but I'm admittedly not knowledgeable about how I would go about that.
Here is the code that I currently have to return the reason_type 1s most recent entry.
I hope my question is clear.
SELECT account, date, effective_date, value, reason_type
from
(
SELECT account, date, effective_date, value, reason_type
ROW_NUMBER() over (partition by account order by date desc) rn
from mytable
WHERE value is not null
AND effective_date is not null
)
WHERE rn =1
I think you might want something like this (do you really have a column named date by the way? That seems like a bad idea):
SELECT account, date, effective_date, value, reason_type
FROM (
SELECT account, date, effective_date, value, reason_type
, ROW_NUMBER() OVER ( PARTITION BY account ORDER BY date DESC ) AS rn
FROM mytable
WHERE value IS NOT NULL
) WHERE rn = 1
-- effective_date IS NULL or is on or before today's date
AND ( effective_date IS NULL OR effective_date < TRUNC(SYSDATE+1) );
Hope this helps.

How to add a running count to rows in a 'streak' of consecutive days

Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
I want to add a new column that is 'days in current streak'
so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
I've been trying to use the answers from
PostgreSQL: find number of consecutive days up until now
Return rows of the latest 'streak' of data
but I can't work out how to use the dense_rank() trick with other window functions to get the right result.
Building on this table (not using the SQL keyword "date" as column name.):
CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);
Query:
SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;
Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.
grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:
the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp
One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.
Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.
Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.
You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1'), (1,'2014-10-2'), (1,'2014-10-3'), (1,'2014-10-5'),
(1,'2014-10-7'), (2,'2014-10-1'), (2,'2014-10-2'), (2,'2014-10-3'),
(2,'2014-10-5'), (2,'2014-10-7');
The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.
with grouped_dates as (
select pid, date,
(date - (row_number() over (partition by pid order by date) || ' days')::interval)::date as grouping_date
from test
)
select * , dense_rank() over (partition by grouping_date order by date) as in_streak
from grouped_dates
order by pid, date
pid date grouping_date in_streak
--
1 2014-10-01 2014-09-30 1
1 2014-10-02 2014-09-30 2
1 2014-10-03 2014-09-30 3
1 2014-10-05 2014-10-01 1
1 2014-10-07 2014-10-02 1
2 2014-10-01 2014-09-30 1
2 2014-10-02 2014-09-30 2
2 2014-10-03 2014-09-30 3
2 2014-10-05 2014-10-01 1
2 2014-10-07 2014-10-02 1

Assign a counter in SQL Server to records with sequential dates, and only increment when dates not sequential

I am trying to assign a Trip # to records for Customers with sequential days, and increment the Trip ID if they have a break in sequential days, and come later in the month for example. The data structure looks like this:
CustomerID Date
1 2014-01-01
1 2014-01-02
1 2014-01-04
2 2014-01-01
2 2014-01-05
2 2014-01-06
2 2014-01-08
The desired output based upon the above example dataset would be:
CustomerID Date Trip
1 2014-01-01 1
1 2014-01-02 1
1 2014-01-04 2
2 2014-01-01 1
2 2014-01-05 2
2 2014-01-06 2
2 2014-01-08 3
So if the Dates for that Customer are back-to-back, it is considered the same Trip, and has the same Trip #. Is there a way to do this in SQL Server? I am using MSSQL 2012.
My initial thoughts are to use the LAG, ROW_NUMBER, or OVER/PARTITION BY function, or even a Recursive Table Variable Function. I can paste some code, but in all honesty, my code isn't working so far. If this is a simple query, but I am just not thinking about it correctly, that would be great.
Thank you in advance.
Since Date is a DATE (ie has no hours), you could for example use DENSE_RANK() by Date - ROW_NUMBER() days which will give a constant value for continuous days, something like;
WITH cte AS (
SELECT CustomerID, Date,
DATEADD(DAY,
-ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Date),
Date) dt
FROM trips
)
SELECT CustomerID, Date,
DENSE_RANK() OVER (PARTITION BY CustomerID ORDER BY dt)
FROM cte;
An SQLfiddle to test with.