ROW_NUMBER() Based on Dates - sql

I have the following data:
test_date
2018-07-01
2018-07-02
...
2019-06-30
2019-07-01
2019-07-02
...
2020-06-30
2020-07-01
I want to increment a row_number value every time right(test_date,5) = '07-01' so that my final result looks like this:
test_date row_num
2018-07-01 1
2018-07-02 1
... 1
2019-06-30 1
2019-07-01 2
2019-07-02 2
... 2
2020-06-30 2
2020-07-01 3
I tried doing something like this:
, ROW_NUMBER() OVER (
PARTITION BY CASE WHEN RIGHT(a.[test_date],5) = '07-01' THEN 1 ELSE 0 END
ORDER BY a.[test_date]
) AS [test2]
But that did not work out for me.
Any suggestions?

Use datepart to identify the correct date, and then add 1 to a sum every time it changes (assuming there will never be more than 1 row per date).
declare #Test table (test_date date);
insert into #Test (test_date)
values
('2018-07-01'),
('2018-07-02'),
('2019-06-30'),
('2019-07-01'),
('2019-07-02'),
('2020-06-30'),
('2020-07-01');
select *
, sum(case when datepart(month,test_date) = 7 and datepart(day,test_date) = 1 then 1 else 0 end) over (order by test_date asc) row_num
from #Test
order by test_date asc;
Returns:
test_date
row_num
2018-07-01
1
2018-07-02
1
2019-06-30
1
2019-07-01
2
2019-07-02
2
2020-06-30
2
2020-07-01
3

You can do it with DENSE_RANK() window function if you subtract 6 months from your dates:
SELECT test_date,
DENSE_RANK() OVER (ORDER BY YEAR(DATEADD(month, -6, test_date))) row_num
FROM tablename
See the demo.
Results:
test_date | row_num
---------- | -------
2018-07-01 | 1
2018-07-02 | 1
2019-06-30 | 1
2019-07-01 | 2
2019-07-02 | 2
2020-06-30 | 2
2020-07-01 | 3

build a running total based on month=7 and day=2
declare #Test table (mykey int,test_date date);
insert into #Test (mykey,test_date)
values
(1,'2018-07-01'),
(2,'2018-07-02'),
(3,'2019-06-30'),
(4,'2019-07-01'),
(5,'2019-07-02'),
(6,'2020-06-30'),
(7,'2020-07-01');
select mykey,test_date,
sum(case when DatePart(Month,test_date)=7 and DatePart(Day,test_date)=2 then 1 else 0 end) over (order by mykey) RunningTotal from #Test
order by mykey

Related

Add a counting condition into dense_rank window Function SQL

I have a function that counts how many times you've visited and if you have converted or not.
What I'd like is for the dense_rank to re-start the count, if there has been a conversion:
SELECT
uid,
channel,
time,
conversion,
dense_rank() OVER (PARTITION BY uid ORDER BY time asc) as visit_order
FROM table
current table output:
this customer (uid) had a conversion at visit 18 and now I would want the visit_order count from dense_rank to restart at 0 for the same customer until it hits the next conversion that is non-null.
See this (I do not like "try this" 😉):
SELECT
id,
ts,
conversion,
-- SC,
ROW_NUMBER() OVER (PARTITION BY id,SC) R
FROM (
SELECT
id,
ts,
conversion,
-- COUNT(conversion) OVER (PARTITION BY id, conversion=0 ORDER BY ts ) CC,
SUM(CASE WHEN conversion=1 THEN 1000 ELSE 1 END) OVER (PARTITION BY id ORDER BY ts ) - SUM(CASE WHEN conversion=1 THEN 1000 ELSE 1 END) OVER (PARTITION BY id ORDER BY ts )%1000 SC
FROM sample
ORDER BY ts
) x
ORDER BY ts;
DBFIDDLE
output:
id
ts
conversion
R
1
2022-01-15 10:00:00
0
1
1
2022-01-16 10:00:00
0
2
1
2022-01-17 10:00:00
0
3
1
2022-01-18 10:00:00
1
1
1
2022-01-19 10:00:00
0
2
1
2022-01-20 10:00:00
0
3
1
2022-01-21 10:00:00
0
4
1
2022-01-22 10:00:00
0
5
1
2022-01-23 10:00:00
0
6
1
2022-01-24 10:00:00
0
7
1
2022-01-25 10:00:00
1
1
1
2022-01-26 10:00:00
0
2
1
2022-01-27 10:00:00
0
3

Add first and last date of a sequence

I am working on a database which have a huge collection of rows. I want to update it so repeated records will be deleted. Now, I have a date column in table and I want to convert it into startDate and endDate. Please check:
id | date | price | minutes | prefixId | sellerId | routeTypeId
1234 2020-01-01 0.123 0 1 1 1
1235 2020-01-04 0.123 0 1 1 1
1236 2020-01-05 0.123 123 1 1 1
1237 2020-01-06 0.123 31 1 1 1
1238 2020-01-07 0.123 23 1 1 1
1239 2020-01-08 0.130 41 1 2 1
1240 2020-01-09 0.130 0 1 1 1
What I am looking for is:
id | startDate | endDate | price | minutes | prefixId | sellerId | routeTypeId
1234 2020-01-01 2020-01-01 0.123 0 1 1 1
1235 2020-01-04 2020-01-07 0.123 0 1 1 1
1239 2020-01-08 2020-01-08 0.130 41 1 2 1
1240 2020-01-09 2020-01-09 0.130 0 1 2 2
Dates will be considered in a series if price, prefixId, sellerId, routeTypeId will remain same with previous row and date column generates a series (without any gap between dates. So, 2020-01-01, 2020-01-2, 2020-01-10 are two different series for example)
This is a gaps-and-islands problem. You can use lag() and a cumulative sum:
select price, prefixId, sellerId, routeTypeId,
min(minutes),
min(date), max(date)
from (select t.*,
sum(case when prev_date = date - interval '1 day' then 0 else 1 end) over (order by date) as grp
from (select t.*,
lag(date) over (partition by price, prefixId, sellerId, routeTypeId order by date) as prev_date
from t
) t
) t
group by grp, price, prefixId, sellerId, routeTypeId
This is a "Gaps & Islands" problem. You can do it using:
select
min(id) as id,
min(date) as start_date,
max(date) as end_date,
min(price) as price,
...
from (
select *,
sum(inc) over(order by id) as grp
from (
select *,
case when price = lag(price) over(order by id)
and date = lag(date) over(
partition by price, prefixId, sellerId, routeTypeId
order by id)
+ interval '1 day'
then 0 else 1 end as inc
from t
) x
) y
group by grp

Oracle SQL - Select users between two date by month

I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!

Using EXISTS within a GROUP BY clause

Is it possible to do the following:
I have a table that looks like this:
declare #tran_TABLE TABLE(
EOMONTH DATE,
AccountNumber INT,
CLASSIFICATION_NAME VARCHAR(50),
Value Float
)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',10)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',15)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat1',5 )
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat2',10)
INSERT INTO #tran_TABLE VALUES('2018-11-30','123','cat3',12)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat1',5 )
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat2',10)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat2',15)
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat3',5 )
INSERT INTO #tran_TABLE VALUES('2019-01-31','123','cat3',2 )
INSERT INTO #tran_TABLE VALUES('2019-03-31','123','cat1',15)
EOMONTH AccountNumber CLASSIFICATION_NAME Value
2018-11-30 123 cat1 10
2018-11-30 123 cat1 15
2018-11-30 123 cat1 5
2018-11-30 123 cat2 10
2018-11-30 123 cat3 12
2019-01-31 123 cat1 5
2019-01-31 123 cat2 10
2019-01-31 123 cat2 15
2019-01-31 123 cat3 5
2019-01-31 123 cat3 2
2019-03-31 123 cat1 15
I want to produce a result where it will check whether in each month, for each AccountNumber (just one in this case) there exists a CLASSIFICATION_NAME cat1, cat2, cat3.
If all 3 exist for the month, then return 1 but if any are missing return 0.
The result should look like:
EOMONTH AccountNumber CLASSIFICATION_NAME
2018-11-30 123 1
2019-01-31 123 1
2019-03-31 123 0
But I want to do it as compactly as possible, without first creating a table that groups everything by CLASSIFICATION_NAME, EOMONTH and AccountNumber and then selects from that table.
For example, in the pseudo code below, is it possible to use maybe an EXISTS statement to do the group by?
SELECT
EOMONTH
,AccountNumber
,CASE WHEN EXISTS (CLASSIFICATION_NAME = 'cat1' AND 'cat2' AND 'cat3') THEN 1 ELSE 0 end
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY
EOMONTH
,AccountNumber
You could emulate this behavior by counting the distinct classifications that answer this condition (per group):
SELECT
EOMONTH
,AccountNumber
,CASE COUNT(DISTINCT CASE WHEN classification_name IN ('cat1', 'cat2', 'cat3') THEN classification_name END)
WHEN 3 THEN 1
ELSE 0
END
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY
EOMONTH
,AccountNumber
Try this-
SELECT EOMONTH,
AccountNumber,
CASE
WHEN COUNT(DISTINCT CLASSIFICATION_NAME) = 3 THEN 1
ELSE 0
END CLASSIFICATION_NAME
FROM #tran_TABLE
GROUP BY EOMONTH,AccountNumber
Output is-
2018-11-30 123 1
2019-01-31 123 1
2019-03-31 123 0
Query like this. You can count distinct values.
When you count unique values then column 'Three_Unique_Cat'. When you count exactly 'cat1','cat2','cat3' then column 'Three_Cat1_Cat2_Cat3'
SELECT
EOMONTH, AccountNumber
,CASE WHEN
COUNT(DISTINCT CLASSIFICATION_NAME)=3 THEN 1
ELSE 0
END AS 'Three_Unique_Cat'
,CASE WHEN
COUNT(DISTINCT CASE WHEN CLASSIFICATION_NAME IN ('cat1','cat2','cat3')
THEN CLASSIFICATION_NAME ELSE NULL END)=3 THEN 1
ELSE 0
END AS 'Three_Cat1_Cat2_Cat3'
,SUM(Value) AS totalSpend
FROM #tran_TABLE
GROUP BY EOMONTH, AccountNumber
Output:
EOMONTH AccountNumber Three_Unique_Cat Three_Cat1_Cat2_Cat3 totalSpend
2018-11-30 123 1 1 52
2019-01-31 123 1 1 37
2019-03-31 123 0 0 15
It's easy, just as below:
select
EOMONTH,
AccountNumber,
case when count(distinct CLASSIFICATION_NAME) = 3 then 1 else 0 end as CLASSIFICATION_NAME
from
tran_TABLE
group by
EOMONTH,
AccountNumber

Query and Partition By clause group by window

I've the following code
declare #test table (id int, [Status] int, [Date] date)
insert into #test (Id,[Status],[Date]) VALUES
(1,1,'2018-01-01'),
(2,1,'2018-01-01'),
(1,1,'2017-11-01'),
(1,2,'2017-10-01'),
(1,1,'2017-09-01'),
(2,2,'2017-01-01'),
(1,1,'2017-08-01'),
(1,1,'2017-07-01'),
(1,1,'2017-06-01'),
(1,2,'2017-05-01'),
(1,1,'2017-04-01'),
(1,1,'2017-03-01'),
(1,1,'2017-01-01')
SELECT
id,
[Status],
MIN([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as WindowStart,
max([Date]) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status]) as WindowEnd,
COUNT(*) OVER (PARTITION BY id,[Status] ORDER BY [Date],id,[Status] ) as total
from #test
But the result is this:
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-01-01 1
1 1 2017-01-01 2017-03-01 2
1 1 2017-01-01 2017-04-01 3
1 1 2017-01-01 2017-06-01 4
1 1 2017-01-01 2017-07-01 5
1 1 2017-01-01 2017-08-01 6
1 1 2017-01-01 2017-09-01 7
1 1 2017-01-01 2017-11-01 8
1 1 2017-01-01 2018-01-01 9
1 2 2017-05-01 2017-05-01 1
1 2 2017-05-01 2017-10-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
And I need to be grouped by window like this.
id Status WindowStart WindowEnd total
1 1 2017-01-01 2017-04-01 3
1 2 2017-05-01 2017-05-01 1
1 1 2017-06-01 2017-09-01 4
1 2 2017-10-01 2017-10-01 1
1 1 2017-11-01 2018-01-01 2
2 1 2018-01-01 2018-01-01 1
2 2 2017-01-01 2017-01-01 1
The first group for the id= 1 Status = 1 should end at the first row with Status = 2 (2017-05-01) so the total is 3 and then start again from the 2017-06-01 to 2017-09-01 with a total of 4 rows.
How can get this done?
This is a "classic" Groups and Island issue. There's probably 1000's of answers for these on the Internet.
This works for what you're after, however, try having a bit more of a research before hand. :)
WITH Groups AS(
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [Date]) -
ROW_NUMBER() OVER (PARTITION BY id, [status] ORDER BY [Date]) AS Grp
FROM #test t)
SELECT G.id,
G.[Status],
MIN([Date]) AS WindowStart,
MAX([date]) AS WindowsEnd,
COUNT(*) AS Total
FROM Groups G
GROUP BY G.id,
G.[Status],
G.Grp
ORDER BY G.id, WindowStart;
Note, that the ordering of your last 2 lines is the other way round in this solution; it seems you're ordering ASCENDING for id 1, for DESCENDING for id 2 in your expected results.
Here is one way using LAG function
;WITH cte
AS (SELECT *,
grp = Sum(CASE WHEN prev_val = Status THEN 0 ELSE 1 END)
OVER(partition BY id ORDER BY Date)
FROM (SELECT *,
prev_val = Lag(Status)OVER(partition BY id ORDER BY Date)
FROM #test) a)
SELECT id,
Status,
WindowStart = Min(date),
WindowEnd = Max(date),
Total = Count(*)
FROM cte
GROUP BY id, Status, grp
Using lag function first find the previous status of each date, then using Sum over() create a group by incrementing the number only when there is a change in status.