DENSE_RANK() OVER (Order by UniqueIdentifer) issue - sql

I'm struggling trying to get DENSE_RANK to do what I want it to do.
It is basically to create a unique invoice number based on a unique identifier, but it needs to go up in order based on the date/time of the invoice.
For example I need:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
2 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
3 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
But what I get when using DENSE_RANK OVER (Order by TxnId) is:
InvoiceNo TxnId TxnDate
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:02
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:01
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:03
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:04
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:06
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:05
1 6C952E91-B888-4244-9079-14FBECAE0BA2 02/01/2014 00:08
1 6C952E91-B888-4244-9079-14FBECAE0BA2 01/01/2014 00:07
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:10
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:21
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:20
2 83168B53-1647-4EB9-AF17-0B285EAA69B4 03/03/2014 00:23
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
3 8A5BCC36-8A70-4BE1-9FAB-A33BDD5BB78F 02/02/2014 00:09
If I do DENSE_RANK OVER(TxnId,TxnDate), it is a complete mess and doesn't do what I want either.
Any ideas guys? Am I even using the write function to do this? Any help appreciated :)

I think you want:
select dense_rank() over (order by txnid, txndate)
Everything with the same transaction id and date will have the same value.
EDIT:
If you need to extract the date, then that depends on the database. It would look something like this. For Oracle:
select dense_rank() over (order by txnid, trunc(txndate))
For Postgres:
select dense_rank() over (order by txnid, date_trunc('day', txndate))
For SQL Server:
select dense_rank() over (order by txnid, cast(txndate as date))
EDIT II:
You want the transactions ordered by the earliest date. Get the earliest date and then do the dense_rank():
select dense_rank() over (order by txnmindate, txnid)
from (select t.*, min(txndate) over (partition by txnid) as txnmindate
from table t
) t

Related

SQL Conditional update column value based on previous row value

I have a table with attendance dates in SQL Workbench/J. I need to define the attendance periods of the people, where any attendance period with gaps less or equal than 90 days are merged into a single attendance period and any gaps larger than that are considered a different attendance period. For example for a single person this is the table I have
id
year
start_date
end_date
prev_att_month
diff
1
2012
2012-08-01
2012-08-31
2012-07-01
31
1
2012
2012-07-01
2012-07-31
2012-04-01
91
1
2012
2012-04-01
2012-04-30
2012-03-01
31
1
2012
2012-03-01
2012-03-31
2012-02-01
29
1
2012
2012-02-01
2012-02-29
2012-01-01
31
1
2012
2012-01-01
2012-01-31
2011-12-01
31
1
2011
2011-12-01
2011-12-31
2011-11-01
30
1
2011
2011-11-01
2011-11-30
2011-10-01
31
1
2011
2011-10-01
2011-10-31
2011-09-01
30
1
2011
2011-09-01
2011-09-30
2011-08-01
31
1
2011
2011-08-01
2011-08-31
2011-07-01
31
1
2011
2011-07-01
2011-07-31
2011-05-01
61
1
2011
2011-05-01
2011-05-31
2011-04-01
30
1
2011
2011-04-01
2011-04-30
2011-03-01
31
1
2011
2011-03-01
2011-03-31
2011-02-01
28
1
2011
2011-02-01
2011-02-28
2010-08-01
184
1
2010
2010-08-01
2010-08-31
2010-07-01
31
1
2010
2010-07-01
2010-07-31
2010-06-01
30
1
2010
2010-06-01
2010-06-30
2010-05-01
31
1
2010
2010-05-01
2010-05-31
2010-04-01
30
1
2010
2010-04-01
2010-04-30
where I defined the previous attendance month column with a lag function and then found the difference between that column and the the start_date in the diff column. This way I can check the gaps between the attendance months
I want this output of the attendance periods with the 90 day rule explained above:
id
start_date
end_date
1
1/04/2010
31/08/2010
1
1/02/2011
30/04/2012
1
1/07/2012
31/08/2012
Does any one have an idea of how to do this?
So far I was just able to define the difference between the attendance months but since this is a large data set I have not been able to find a solution to define the attendance periods without making a row to row analysis.
with [table] as (
select id, year, start_date, end_date,
lag(start_date) over (partition by id order by id, year, start_date) as prev_att_month,
start_date-prev_att_month as diff
from source
)
select *
from [table]
where id = 1
One method would be to use a windowed COUNT to count how many times a value greater than 90 has appeared in the diff column, which provides a unique group number. Then you can just group your data into those groups and get the MIN and MAX values:
WITH Grps AS(
SELECT V.id,
V.year,
V.start_date,
V.end_date,
V.prev_att_month,
V.diff,
COUNT(CASE WHEN diff > 90 THEN 1 END) OVER (PARTITION BY ID ORDER BY V.start_date ASC) AS Grp
FROM (VALUES(1,2012,CONVERT(date,'20120801'),CONVERT(date,'20120831'),CONVERT(date,'2012-07-01'),31),
(1,2012,CONVERT(date,'20120701'),CONVERT(date,'20120731'),CONVERT(date,'2012-04-01'),91),
(1,2012,CONVERT(date,'20120401'),CONVERT(date,'20120430'),CONVERT(date,'2012-03-01'),31),
(1,2012,CONVERT(date,'20120301'),CONVERT(date,'20120331'),CONVERT(date,'2012-02-01'),29),
(1,2012,CONVERT(date,'20120201'),CONVERT(date,'20120229'),CONVERT(date,'2012-01-01'),31),
(1,2012,CONVERT(date,'20120101'),CONVERT(date,'20120131'),CONVERT(date,'2011-12-01'),31),
(1,2011,CONVERT(date,'20111201'),CONVERT(date,'20111231'),CONVERT(date,'2011-11-01'),30),
(1,2011,CONVERT(date,'20111101'),CONVERT(date,'20111130'),CONVERT(date,'2011-10-01'),31),
(1,2011,CONVERT(date,'20111001'),CONVERT(date,'20111031'),CONVERT(date,'2011-09-01'),30),
(1,2011,CONVERT(date,'20110901'),CONVERT(date,'20110930'),CONVERT(date,'2011-08-01'),31),
(1,2011,CONVERT(date,'20110801'),CONVERT(date,'20110831'),CONVERT(date,'2011-07-01'),31),
(1,2011,CONVERT(date,'20110701'),CONVERT(date,'20110731'),CONVERT(date,'2011-05-01'),61),
(1,2011,CONVERT(date,'20110501'),CONVERT(date,'20110531'),CONVERT(date,'2011-04-01'),30),
(1,2011,CONVERT(date,'20110401'),CONVERT(date,'20110430'),CONVERT(date,'2011-03-01'),31),
(1,2011,CONVERT(date,'20110301'),CONVERT(date,'20110331'),CONVERT(date,'2011-02-01'),28),
(1,2011,CONVERT(date,'20110201'),CONVERT(date,'20110228'),CONVERT(date,'2010-08-01'),184),
(1,2010,CONVERT(date,'20100801'),CONVERT(date,'20100831'),CONVERT(date,'2010-07-01'),31),
(1,2010,CONVERT(date,'20100701'),CONVERT(date,'20100731'),CONVERT(date,'2010-06-01'),30),
(1,2010,CONVERT(date,'20100601'),CONVERT(date,'20100630'),CONVERT(date,'2010-05-01'),31),
(1,2010,CONVERT(date,'20100501'),CONVERT(date,'20100531'),CONVERT(date,'2010-04-01'),30),
(1,2010,CONVERT(date,'20100401'),CONVERT(date,'20100430'),NULL,NULL))V(id,year,start_date,end_date,prev_att_month,diff))
SELECT id,
MIN(Start_date) AS Start_date,
MAX(End_Date) AS End_Date
FROM Grps
GROUP BY Id,
Grp
ORDER BY id,
Start_date;

SQL : GROUP and MAX multiple columns

I am a SQL beginner, can anyone please help me about a SQL query?
my table looks like below
PatientID Date Time Temperature
1 1/10/2020 9:15 36.2
1 1/10/2020 20:00 36.5
1 2/10/2020 8:15 36.1
1 2/10/2020 18:20 36.3
2 1/10/2020 9:15 36.7
2 1/10/2020 20:00 37.5
2 2/10/2020 8:15 37.1
2 2/10/2020 18:20 37.6
3 1/10/2020 8:15 36.2
3 2/10/2020 18:20 36.3
How can I get each patient everyday's max temperature:
PatientID Date Temperature
1 1/10/2020 36.5
1 2/10/2020 36.3
2 1/10/2020 37.5
2 2/10/2020 37.6
Thanks in advance!
For this dataset, simple aggregation seems sufficient:
select patientid, date, max(temperature) temperature
from mytable
group by patientid, date
On the other hand, if there are other columns that you want to display on the row that has the maximum daily temperature, then it is different. You need some filtering; one option uses window functions:
select *
from (
select t.*,
rank() over(partition by patientid, date order by temperature desc)
from mytable t
) t
where rn = 1

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

Following Start and End Date Columns

I have start and end date columns, and there are some where the start date equals the end date of the previous row without a gap. I'm trying to get it so that it would basically go from the Start Date row who's End Date is null and kinda "zig-zag" up going until the Start Date does not match the End Date.
I've tried CTEs, and ROW_NUMBER() OVER().
START_DTE END_DTE
2018-01-17 2018-01-19
2018-01-26 2018-02-22
2018-02-22 2018-08-24
2018-08-24 2018-09-24
2018-09-24 NULL
Expected:
START_DTE END_DTE
2018-01-26 2018-09-24
EDIT
Using a proposed solution with an added CTE to ensure dates don't have times with them.
WITH
CTE_TABLE_NAME AS
(
SELECT
ID_NUM,
CONVERT(DATE,START_DTE) START_DTE,
CONVERT(DATE,END_DTE) END_DTE
FROM
TABLE_NAME
WHERE ID_NUM = 123
)
select min(start_dte) as start_dte, max(end_dte) as end_dte, grp
from (select t.*,
sum(case when prev_end_dte = end_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from CTE_TABLE_NAME t
) t
) t
group by grp;
The following query provides these results:
start_dte end_dte grp
2014-08-24 2014-12-19 1
2014-08-31 2014-09-02 2
2014-09-02 2014-09-18 3
2014-09-18 2014-11-03 4
2014-11-18 2014-12-09 5
2014-12-09 2015-01-16 6
2015-01-30 2015-02-02 7
2015-02-02 2015-05-15 8
2015-05-15 2015-07-08 9
2015-07-08 2015-07-09 10
2015-07-09 2015-08-25 11
2015-08-31 2015-09-01 12
2015-10-06 2015-10-29 13
2015-11-10 2015-12-11 14
2015-12-11 2015-12-15 15
2015-12-15 2016-01-20 16
2016-01-29 2016-02-01 17
2016-02-01 2016-03-03 18
2016-03-30 2016-08-29 19
2016-08-30 2016-12-06 20
2017-01-27 2017-02-20 21
2017-02-20 2017-08-15 22
2017-08-15 2017-08-29 23
2017-08-29 2018-01-17 24
2018-01-17 2018-01-19 25
2018-01-26 2018-02-22 26
2018-02-22 2018-08-24 27
2018-08-24 2018-09-24 28
2018-09-24 NULL 29
I tried using having count (*) > 1 as suggested, but it provided no results
Expected example
START_DTE END_DTE
2017-01-27 2018-01-17
2018-01-26 2018-09-24
You can identify where groups of connected rows start by looking for where adjacent rows are not connected. A cumulative sum of these starts then gives you the groups.
select min(start_dte) as start_dte, max(end_dte) as end_dte
from (select t.*,
sum(case when prev_end_dte = start_dte then 0 else 1 end) over (order by start_dte) as grp
from (select t.*,
lag(end_dte) over (order by start_dte) as prev_end_dte
from t
) t
) t
group by grp;
If you want only multiply connected rows (as implied by your question), then add having count(*) > 1 to the outer query.
Here is a db<>fiddle.

SQL Collapse Data

I am trying to collapse data that is in a sequence sorted by date. While grouping on the person and the type.
The data is stored in an SQL server and looks like the following -
seq person date type
--- ------ ------------------- ----
1 1 2018-02-10 08:00:00 1
2 1 2018-02-11 08:00:00 1
3 1 2018-02-12 08:00:00 1
4 1 2018-02-14 16:00:00 1
5 1 2018-02-15 16:00:00 1
6 1 2018-02-16 16:00:00 1
7 1 2018-02-20 08:00:00 2
8 1 2018-02-21 08:00:00 2
9 1 2018-02-22 08:00:00 2
10 1 2018-02-23 08:00:00 1
11 1 2018-02-24 08:00:00 1
12 1 2018-02-25 08:00:00 2
13 2 2018-02-10 08:00:00 1
14 2 2018-02-11 08:00:00 1
15 2 2018-02-12 08:00:00 1
16 2 2018-02-14 16:00:00 3
17 2 2018-02-15 16:00:00 3
18 2 2018-02-16 16:00:00 3
This data set contains about 1.2 million records that resemble the above.
The result that I would like to get from this would be -
person start type
------ ------------------- ----
1 2018-02-10 08:00:00 1
1 2018-02-20 08:00:00 2
1 2018-02-23 08:00:00 1
1 2018-02-25 08:00:00 2
2 2018-02-10 08:00:00 1
2 2018-02-14 16:00:00 3
I have the data in the first format by running the following query -
select
ROW_NUMBER() OVER (ORDER BY date) AS seq
person,
date,
type,
from table
group by person, date, type
I am just not sure how to keep the minimum date with the other distinct values from person and type.
This is a gaps-and-islands problem so, you can use differences of row_number() & use them in grouping :
select person, min(date) as start, type
from (select *,
row_number() over (partition by person order by seq) seq1,
row_number() over (partition by person, type order by seq) seq2
from table
) t
group by person, type, (seq1 - seq2)
order by person, start;
The correct solution using the difference of row numbers is:
select person, type, min(date) as start
from (select t.*,
row_number() over (partition by person order by seq) as seqnum_p,
row_number() over (partition by person, type order by seq) as seqnum_pt
from t
) t
group by person, type, (seqnum_p - seqnum_pt)
order by person, start;
type needs to be included in the GROUP BY.