SQL: Rank / Group a Column by Date

SQL: Rank / Group a Column by Date - sql

Using SQL Server Management Studio v17.9.1
I'm trying to rank / order / group some data by Site and Area by Date, but I'm struggling to get my head around not ranking the area alphabetically and ranking it by the earliest date it appears.
Here's the data I have:
Site | Area | Space | Date
DCG X 7 02/02/2020 12:13
DCG X 5 04/02/2020 11:47
DCG X 12 10/02/2020 15:14
GNL U 0 03/03/2020 18:35
GNL A 4 04/03/2020 08:28
GNL C 4 06/03/2020 09:07
GNL B 1 16/03/2020 07:10
DPL U 0 18/03/2020 09:28
DPL A 1 18/03/2020 09:36
DPL A 1 20/03/2020 20:04
SGR F 2 21/03/2020 19:42
SGR B 2 22/03/2020 10:30
SGR C 3 24/03/2020 08:17
SGR F 1 01/04/2020 09:00
SGR E 1 02/02/2020 10:57
SGR F 1 02/02/2020 15:50
I want to add 2 columns that rank / group the site and the area in ascending order of date, like so:
Site | Area | Space | Date | Site Order | Area Order |
DCG X 7 02/02/2020 12:13 1 1
DCG X 5 04/02/2020 11:47 1 1
DCG X 12 10/02/2020 15:14 1 1
GNL U 0 03/03/2020 18:35 2 1
GNL A 4 04/03/2020 08:28 2 2
GNL C 4 06/03/2020 09:07 2 3
GNL B 1 16/03/2020 07:10 2 4
DPL U 0 18/03/2020 09:28 3 1
DPL A 1 18/03/2020 09:36 3 2
DPL A 1 20/03/2020 20:04 3 2
SGR F 2 21/03/2020 19:42 4 1
SGR B 2 22/03/2020 10:30 4 2
SGR C 3 24/03/2020 08:17 4 3
SGR F 1 01/04/2020 09:00 4 1
SGR E 1 02/02/2020 10:57 4 4
SGR F 1 02/02/2020 15:50 4 1
Apologies if I've not made it clear

You can use min() as a window function to get the minimum date for each site and site/area combo. Then use dense_rank():
select t.*,
dense_rank() over (order by min_site_date, site) as site_seqnum,
dense_rank() over (partition by site order by min_site_date) as area_seqnum
from (select t.*,
min(date) over (partition by site) as min_site_date,
min(date) over (partition by site, area) as min_site_area_date
from t
) t

You can use window function :
select t.*,
dense_rank() over (order by site, site_date) as site_sequence,
dense_rank() over (partition by site order by area, site_area_date) as area_sequence
from (select t.*,
min([date]) over (partition by [site]) as site_date,
min([date]) over (partition by [site], area) as site_area_date
from table t
) t;

Related

Sum over the rows using SQL but we need to stop and start the sum at specific condition

Here is an example of the data I have and the output I want in SQL.
id
date
flag
a
2022-04-05
0
a
2022-04-06
1
a
2022-04-07
1
a
2022-04-08
1
a
2022-04-09
0
a
2022-04-10
0
a
2022-04-11
1
a
2022-04-12
1
a
2022-04-13
1
a
2022-04-14
1
a
2022-04-15
0
a
2022-04-16
0
b
2022-04-05
0
b
2022-04-06
1
b
2022-04-07
1
b
2022-04-08
0
Desired Output
id
date
flag
count
a
2022-04-05
0
0
a
2022-04-06
1
1
a
2022-04-07
1
2
a
2022-04-08
1
3
a
2022-04-09
0
0
a
2022-04-10
0
0
a
2022-04-11
1
1
a
2022-04-12
1
2
a
2022-04-13
1
3
a
2022-04-14
1
4
a
2022-04-15
0
0
a
2022-04-16
0
0
b
2022-04-05
0
0
b
2022-04-06
1
1
b
2022-04-07
1
2
b
2022-04-08
0
0
Basically the increment should start if the value of flag is 1 and continue incrementing until a flag of 0 is reached, then continue incrementing from the next flag of 1 until the next 0, and so on.

This is a gaps and islands problem. One approach uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY id, flag ORDER BY date) rn2
FROM yourTable
)
SELECT id, date, flag,
SUM(flag) OVER (PARTITION BY id, flag, rn1 - rn2 ORDER BY date) AS count
FROM cte
ORDER BY id, date;

How can I join two tables on an ID and a DATE RANGE in SQL

I have 2 query result tables containing records for different assessments. There are RAssessments and NAssessments which make up a complete review.
The aim is to eventually determine which reviews were completed. I would like to join the two tables on the ID, and on the date, HOWEVER the date each assessment is completed on may not be identical and may be several days apart, and some ID's may have more of an RAssessment than an NAssessment.
Therefore, I would like to join T1 on to T2 on ID & on T1Date(+ or - 7 days). There is no other way to match the two tables and to align the records other than using the date range, as this is a poorly designed database. I hope for some help with this as I am stumped.
Here is some sample data:
Table #1:
ID
RAssessmentDate
1
2020-01-03
1
2020-03-03
1
2020-05-03
2
2020-01-09
2
2020-04-09
3
2022-07-21
4
2020-06-30
4
2020-12-30
4
2021-06-30
4
2021-12-30
Table #2:
ID
NAssessmentDate
1
2020-01-07
1
2020-03-02
1
2020-05-03
2
2020-01-09
2
2020-07-06
2
2020-04-10
3
2022-07-21
4
2021-01-03
4
2021-06-28
4
2022-01-02
4
2022-06-26
I would like my end result table to look like this:
ID
RAssessmentDate
NAssessmentDate
1
2020-01-03
2020-01-07
1
2020-03-03
2020-03-02
1
2020-05-03
2020-05-03
2
2020-01-09
2020-01-09
2
2020-04-09
2020-04-10
2
NULL
2020-07-06
3
2022-07-21
2022-07-21
4
2020-06-30
NULL
4
2020-12-30
2021-01-03
4
2021-06-30
2021-06-28
4
2021-12-30
2022-01-02
4
NULL
2022-01-02

Try this:
SELECT
COALESCE(a.ID, b.ID) ID,
a.RAssessmentDate,
b.NAssessmentDate
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table1
) a
FULL OUTER JOIN (
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RowId, *
FROM table2
) b ON a.ID = b.ID AND a.RowId = b.RowId
WHERE (a.RAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')
OR (b.NAssessmentDate BETWEEN '2020-01-01' AND '2022-01-02')

How to get the previous 10 lines and the last 10 lines at the same time by sql oracle

I want to write an oracle sql query
I have data table like this:
Table A
no ID Time
1 A001 9/27/2021 3:22:42 PM
2 A002 9/27/2021 3:25:58 PM
3 A003 9/27/2021 2:40:48 PM
4 A004 9/27/2021 2:40:44 PM
5 A005 9/27/2021 2:40:46 PM
6 A006 9/27/2021 2:40:51 PM
........................................
1000 A1000 9/27/2021 2:44:38 PM
1001 A1001 9/27/2021 2:44:47 PM
1002 A1002 9/27/2021 2:44:36 PM
1003 A1003 9/27/2021 2:44:40 PM
1004 A1004 9/27/2021 2:44:43 PM
1005 A1005 9/27/2021 2:43:57 PM
............................................
A99999999999................................
and 1 more table like this:
Table B
No ID Time
1 A03 9/27/2021 2:40:51 PM
2 A05 9/27/2021 2:44:36 PM
............................................
A999........................................
know that table B is definitely in table A.How can we get 10 rows above and 10 rows down from table A for each row in table B?.
currently I just use rank(), lag(), lead() and then join the 2 tables together but no desired result yet

WITH JOINED_TABLES AS (
SELECT "A".*, B."no" AS B_no
, ROW_NUMBER() OVER (ORDER BY "A"."Time") AS RN
FROM "A"
LEFT JOIN B ON "A"."ID"=B."ID"
)
, CENTRE_RNS AS (
SELECT RN FROM JOINED_TABLES WHERE B_no IS NOT NULL
)
SELECT J."no", J."ID", J."Time", C.RN FROM JOINED_TABLES J
LEFT JOIN CENTRE_RNS C ON J.RN BETWEEN C.RN-1 AND C.RN+1
ORDER BY "no"
SQL Fiddle here
You can edit the second last line to be -10 and +10

From Oracle 12, you can use a LATERAL join and analytic functions.
For example, if you wanted the window of 2 rows either side:
SELECT *
FROM TableB b
INNER JOIN LATERAL (
SELECT no,
id,
time,
rn - MAX(match) OVER () AS window
FROM (
SELECT no,
id,
time,
ROW_NUMBER() OVER (ORDER BY time) AS rn,
CASE
WHEN a.time = b.time
THEN ROW_NUMBER() OVER (ORDER BY time)
END AS match
FROM TableA a
)
) a
ON (a.window BETWEEN -2 AND 2)
Then, for the sample data:
CREATE TABLE TableA (no, ID, Time) AS
SELECT level,
'A' || TO_CHAR(level, '000'),
DATE '2021-09-27' + LEVEL * INTERVAL '3' SECOND
FROM DUAL
CONNECT BY LEVEL <= 50;
CREATE TABLE TableB (no, ID, Time) AS
SELECT ROWNUM,
ID,
time
FROM TableA
WHERE no IN (6, 32);
The query outputs:
NO
ID
TIME
NO
ID
TIME
WINDOW
1
A 006
2021-09-27 00:00:18
4
A 004
2021-09-27 00:00:12
-2
1
A 006
2021-09-27 00:00:18
5
A 005
2021-09-27 00:00:15
-1
1
A 006
2021-09-27 00:00:18
6
A 006
2021-09-27 00:00:18
0
1
A 006
2021-09-27 00:00:18
7
A 007
2021-09-27 00:00:21
1
1
A 006
2021-09-27 00:00:18
8
A 008
2021-09-27 00:00:24
2
2
A 032
2021-09-27 00:01:36
30
A 030
2021-09-27 00:01:30
-2
2
A 032
2021-09-27 00:01:36
31
A 031
2021-09-27 00:01:33
-1
2
A 032
2021-09-27 00:01:36
32
A 032
2021-09-27 00:01:36
0
2
A 032
2021-09-27 00:01:36
33
A 033
2021-09-27 00:01:39
1
2
A 032
2021-09-27 00:01:36
34
A 034
2021-09-27 00:01:42
2
db<>fiddle here

How do you get the MAX() and SUM() of values between STRING values?

I have data that looks like this:
metric_date location id value
20/02/07 13:00 ATL A 34
20/02/07 13:05 ATL B 12
20/02/07 13:10 ATL B 02
20/02/07 13:15 ATL A 15
20/02/07 13:20 ATL A 00
20/02/07 13:25 ATL A 00
20/02/07 13:30 ATL A 12
20/02/07 13:35 ATL B 12
20/02/07 13:40 ATL A 23
20/02/07 13:45 ATL B 03
20/02/07 13:50 ATL A 00
20/02/07 13:55 ATL A 00
I need to find max(value) and -SUM(value) where 'id' is "B"- of each section between the zero-value columns to get SUM()/MAX() = success_rate
I tried:
SELECT
CASE
WHEN DATE(metric_date) = lag(DATE(metric_date), 1) OVER (ORDER BY DATE(metric_date))
AND building = lag(building, 1) OVER (ORDER BY date)
THEN 1
END AS work_period
, CASE
WHEN LAG(value, 1) OVER (ORDER BY date) = 0
AND LEAD(value, 1) OVER (ORDER BY date) > 0
THEN LAG(work_period, 1) + 1
WHEN LAG(SUM(metric_value), 1) OVER (ORDER BY metric_date) > 0
THEN LAG(work_period, 1)
END section
I need the results to look like this:
location section max sum success_rate
ATL 1 34 14 0.4118
ATL 2 23 15 0.6522

This is a Gaps and Islands problem (article is for SQL Server but applies equally to postgresql).
The following should solve your problem
SELECT Location,
MAX(Value) AS Max,
SUM(CASE WHEN id = 'B' THEN Value END) AS Sum,
1.0 * SUM(CASE WHEN id = 'B' THEN Value END) / MAX(Value) AS SuccesRate
FROM ( SELECT *,
ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) -
ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS GroupingSet
FROM T
) AS t
WHERE Value <> 0
GROUP BY Location, GroupingSet;
The key is generating a field to group by to identify the islands, which can be done by allocating two row_numbers to each row:
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Location, CASE WHEN Value = 0 THEN 1 ELSE 0 END ORDER BY metric_date) AS RowNumInSubset,
ROW_NUMBER() OVER(PARTITION BY Location ORDER BY metric_date) AS RowNumInSet
FROM #T
ORDER BY metric_date
This produces the following:
metric_date location id value RowNumInSubset RowNumInSet
----------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1
2020-02-07 13:05 ATL B 12 2 2
2020-02-07 13:10 ATL B 2 3 3
2020-02-07 13:15 ATL A 15 4 4
2020-02-07 13:20 ATL A 0 1 5
2020-02-07 13:25 ATL A 0 2 6
2020-02-07 13:30 ATL A 12 5 7
2020-02-07 13:35 ATL B 12 6 8
2020-02-07 13:40 ATL A 23 7 9
2020-02-07 13:45 ATL B 3 8 10
2020-02-07 13:50 ATL A 0 3 11
2020-02-07 13:55 ATL A 0 4 12
Then, by deducting the RowNumInSet from the RowNumInSubset, you will produce a constant for your islands:
metric_date location id value RowNumInSubset RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1 0
2020-02-07 13:05 ATL B 12 2 2 0
2020-02-07 13:10 ATL B 2 3 3 0
2020-02-07 13:15 ATL A 15 4 4 0
------------------------------------------------------------------------------------
2020-02-07 13:20 ATL A 0 1 5 -4
2020-02-07 13:25 ATL A 0 2 6 -4
------------------------------------------------------------------------------------
2020-02-07 13:30 ATL A 12 5 7 -2
2020-02-07 13:35 ATL B 12 6 8 -2
2020-02-07 13:40 ATL A 23 7 9 -2
2020-02-07 13:45 ATL B 3 8 10 -2
------------------------------------------------------------------------------------
2020-02-07 13:50 ATL A 0 3 11 -8
2020-02-07 13:55 ATL A 0 4 12 -8
Then finally, you can remove the rows where value = 0, as these are just break points:
metric_date location id value RowNumInSubset RowNumInSet GroupingSet
------------------------------------------------------------------------------------
2020-02-07 13:00 ATL A 34 1 1 0
2020-02-07 13:05 ATL B 12 2 2 0
2020-02-07 13:10 ATL B 2 3 3 0
2020-02-07 13:15 ATL A 15 4 4 0
------------------------------------------------------------------------------------
2020-02-07 13:30 ATL A 12 5 7 -2
2020-02-07 13:35 ATL B 12 6 8 -2
2020-02-07 13:40 ATL A 23 7 9 -2
2020-02-07 13:45 ATL B 3 8 10 -2
Then you can perform your aggregate on each group.
Example on DB<>Fiddle

Based on a couple of assumption where the question does not specify, this query produces your desired result exactly:
SELECT min(location) AS location
, row_number() OVER (ORDER BY grp) AS section
, max(value) AS max
, sum(value) FILTER (WHERE id = 'B') AS sum
, round(sum(value) FILTER (WHERE id = 'B')
/ max(value)::numeric, 4) AS success_rate
FROM (
SELECT *, count(*) FILTER (WHERE value = 0) OVER (ORDER BY metric_date) AS grp
FROM tbl
) sub
WHERE value <> 0
GROUP BY grp;
db<>fiddle here
In particular, not grouping by location - which might make sense ...
Detailed explanation in many related answers:
Counting null values between dates
How to group timestamps into islands (based on arbitrary gap)?
Select longest continuous sequence
For maximum performance consider a procedural solution in this particular case (typically, set-based solutions are faster), as that can make do with a single sequential scan over the table. Like:
GROUP BY and aggregate sequential numeric values

Count/Group T-SQL

Hello The following is a sample data.
DateGroupID StartDate EndDate
1 2013-01-01 2013-01-07
2 2013-01-08 2013-01-14
3 2013-01-15 2013-01-21
.
.
.
15 2013-04-01 2013-04-07
EMPID GroupID JoinDate TerminationDate
1 A 2013-01-01 2013-03-24
2 B 2013-01-05 NULL
3 C 2013-01-05 NULL
4 A 2013-01-05 2013-03-20
5 B 2013-01-17 NULL
6 D 2013-02-01 NULL
7 A 2013-02-24 NULL
8 A 2013-02-28 NULL
9 B 2013-03-02 NULL
10 B 2013-03-12 NULL
11 C 2013-03-22 NULL
12 C 2013-03-22 NULL
13 D 2013-03-26 NULL
14 D 2013-03-29 NULL
15 A 2013-04-01 NULL
I am trying to get count for employees who is ACTIVE on each day and group it by GroupID based on which DateGroupID I select.
So for example,
If I select DateGroupID = 1 (in WHERe clause I would assume),
I want to get count of ACTIVE users for each day between StartDate and EndDate.
So my output should be like
GROUPID COUNT Date
A 1 2013-01-01 (1 EMP was added to this group on this day)
B 0 2013-01-01 (NO Emp for this group were active on this day)
C 0 2013-01-01 (NO Emp for this group were active on this day)
D 0 2013-01-01 (NO Emp for this group were active on this day)
A 1 2013-01-02 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-02 (NO Emp for this group were active on this day)
C 0 2013-01-02 (NO Emp for this group were active on this day)
D 0 2013-01-02 (NO Emp for this group were active on this day)
A 1 2013-01-03 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-03 (NO Emp for this group were active on this day)
C 0 2013-01-03 (NO Emp for this group were active on this day)
D 0 2013-01-03 (NO Emp for this group were active on this day)
A 1 2013-01-04 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-04 (NO Emp for this group were active on this day)
C 0 2013-01-04 (NO Emp for this group were active on this day)
D 0 2013-01-04 (NO Emp for this group were active on this day)
A 2 2013-01-05 (1 more Emp was added to this group on this day)
B 1 2013-01-05 (1 EMP was added to this group on this day)
C 1 2013-01-05 (1 EMP was added to this group on this day)
D 0 2013-01-05 (NO Emp for this group were active on this day)
.
.
.
.
A 2 2013-01-17 (2 EMP active on this day for this group)
B 2 2013-01-17 (1 more Emp was added to this group on this day))
C 1 2013-01-17 (NO Emp for this group were added but 1 is active from the past)
D 0 2013-01-17 (NO Emp for this group were active on this day)
.
.
.
A 2 2013-03-24 (2 EMP were removed and added as for this day, 2 active EMP)
B 4 2013-03-24 (So far 4 active EMP for this group)
C 3 2013-03-24 (So Far 3 active EMP for this group)
D 2 2013-03-24 (So far 2 active EMP for this group)
OR in better view
WHEN I SELECT DateGoupID = 3
GroupID 2013-01-15 2013-01-16 2013-01-17 2013-01-18 2013-01-19 2013-01-20 2013-01-21
A 2 2 2 2 2 2 2
B 1 1 2 2 2 2 2
C 1 1 1 1 1 1 1
D 0 0 0 0 0 0 0

Do you need the DateGroup table?
If not:
Select GroupID, Count(EmpId), JoinDate
from dbo.[EmployeeStartDateTableName]
group by GroupID, JoinDate
If So:
SELECT GroupID, Count(EmpId), JoinDate,
FROM dbo.[EmployeeStartDateTableName] INNER JOIN
dbo.[DateGroupTableName] ON [EmployeeStartDateTableName].JoinDate =
dbo.[DateGroupTableName].StartDate
Where groupID = [InsertGroupId]
group by GroupID, JoinDate
Then if you want the critera to be the DateGroupID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL: Rank / Group a Column by Date - sql

Related

Sum over the rows using SQL but we need to stop and start the sum at specific condition

How can I join two tables on an ID and a DATE RANGE in SQL

How to get the previous 10 lines and the last 10 lines at the same time by sql oracle

How do you get the MAX() and SUM() of values between STRING values?

Count/Group T-SQL

Categories

Resources