Count/Group T-SQL - sql

Hello The following is a sample data.
DateGroupID StartDate EndDate
1 2013-01-01 2013-01-07
2 2013-01-08 2013-01-14
3 2013-01-15 2013-01-21
.
.
.
15 2013-04-01 2013-04-07
EMPID GroupID JoinDate TerminationDate
1 A 2013-01-01 2013-03-24
2 B 2013-01-05 NULL
3 C 2013-01-05 NULL
4 A 2013-01-05 2013-03-20
5 B 2013-01-17 NULL
6 D 2013-02-01 NULL
7 A 2013-02-24 NULL
8 A 2013-02-28 NULL
9 B 2013-03-02 NULL
10 B 2013-03-12 NULL
11 C 2013-03-22 NULL
12 C 2013-03-22 NULL
13 D 2013-03-26 NULL
14 D 2013-03-29 NULL
15 A 2013-04-01 NULL
I am trying to get count for employees who is ACTIVE on each day and group it by GroupID based on which DateGroupID I select.
So for example,
If I select DateGroupID = 1 (in WHERe clause I would assume),
I want to get count of ACTIVE users for each day between StartDate and EndDate.
So my output should be like
GROUPID COUNT Date
A 1 2013-01-01 (1 EMP was added to this group on this day)
B 0 2013-01-01 (NO Emp for this group were active on this day)
C 0 2013-01-01 (NO Emp for this group were active on this day)
D 0 2013-01-01 (NO Emp for this group were active on this day)
A 1 2013-01-02 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-02 (NO Emp for this group were active on this day)
C 0 2013-01-02 (NO Emp for this group were active on this day)
D 0 2013-01-02 (NO Emp for this group were active on this day)
A 1 2013-01-03 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-03 (NO Emp for this group were active on this day)
C 0 2013-01-03 (NO Emp for this group were active on this day)
D 0 2013-01-03 (NO Emp for this group were active on this day)
A 1 2013-01-04 (NO Emp for this group were added but 1 is active from the past)
B 0 2013-01-04 (NO Emp for this group were active on this day)
C 0 2013-01-04 (NO Emp for this group were active on this day)
D 0 2013-01-04 (NO Emp for this group were active on this day)
A 2 2013-01-05 (1 more Emp was added to this group on this day)
B 1 2013-01-05 (1 EMP was added to this group on this day)
C 1 2013-01-05 (1 EMP was added to this group on this day)
D 0 2013-01-05 (NO Emp for this group were active on this day)
.
.
.
.
A 2 2013-01-17 (2 EMP active on this day for this group)
B 2 2013-01-17 (1 more Emp was added to this group on this day))
C 1 2013-01-17 (NO Emp for this group were added but 1 is active from the past)
D 0 2013-01-17 (NO Emp for this group were active on this day)
.
.
.
A 2 2013-03-24 (2 EMP were removed and added as for this day, 2 active EMP)
B 4 2013-03-24 (So far 4 active EMP for this group)
C 3 2013-03-24 (So Far 3 active EMP for this group)
D 2 2013-03-24 (So far 2 active EMP for this group)
OR in better view
WHEN I SELECT DateGoupID = 3
GroupID 2013-01-15 2013-01-16 2013-01-17 2013-01-18 2013-01-19 2013-01-20 2013-01-21
A 2 2 2 2 2 2 2
B 1 1 2 2 2 2 2
C 1 1 1 1 1 1 1
D 0 0 0 0 0 0 0

Do you need the DateGroup table?
If not:
Select GroupID, Count(EmpId), JoinDate
from dbo.[EmployeeStartDateTableName]
group by GroupID, JoinDate
If So:
SELECT GroupID, Count(EmpId), JoinDate,
FROM dbo.[EmployeeStartDateTableName] INNER JOIN
dbo.[DateGroupTableName] ON [EmployeeStartDateTableName].JoinDate =
dbo.[DateGroupTableName].StartDate
Where groupID = [InsertGroupId]
group by GroupID, JoinDate
Then if you want the critera to be the DateGroupID

Related

How to count consecutive days in a table where days are duplicated "PostgresSQL"

Hello I would like to know the highest count of consecutive days a user has trained for.
My logs table that stores the records looks like this:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
The closest I could get is with this query, which does work only if the user has trained on one ground at a day.
SELECT COUNT(*) AS days_in_row
FROM (SELECT row_number() OVER (ORDER BY day) - day AS grp
FROM logs
WHERE created_at >= '2023-01-24 00:00:00'
AND user_id = 1) x
GROUP BY grp
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
4
1
2023-01-27 10:00:00
5
1
5
1
2023-01-28 10:00:00
This query would return a count of 5 consecutive days which is correct.
However my query doesn't work once a user trains multiple times on different training grounds in one day:
logs table:
id
user_id
day
ground_id
created_at
1
1
1
1
2023-01-24 10:00:00
2
1
2
1
2023-01-25 10:00:00
3
1
3
1
2023-01-26 10:00:00
4
1
3
2
2023-01-26 10:00:00
5
1
4
1
2023-01-27 10:00:00
Than the query from above would return a count of 2 consecutive days which is not what I expect instead I would expect the number four because the user has trained the following days in row (1,2,3,4).
Thank you for reading.
Select only distinct data of interest first
SELECT min(created_at) start, COUNT(*) AS days_in_row
FROM (SELECT created_at, row_number() OVER (ORDER BY day) - day AS grp
FROM (
select distinct day, created_at
from logs
where created_at >= '2023-01-24 00:00:00'
AND user_id = 1) t
) x
GROUP BY grp

SQL: Rank / Group a Column by Date

Using SQL Server Management Studio v17.9.1
I'm trying to rank / order / group some data by Site and Area by Date, but I'm struggling to get my head around not ranking the area alphabetically and ranking it by the earliest date it appears.
Here's the data I have:
Site | Area | Space | Date
DCG X 7 02/02/2020 12:13
DCG X 5 04/02/2020 11:47
DCG X 12 10/02/2020 15:14
GNL U 0 03/03/2020 18:35
GNL A 4 04/03/2020 08:28
GNL C 4 06/03/2020 09:07
GNL B 1 16/03/2020 07:10
DPL U 0 18/03/2020 09:28
DPL A 1 18/03/2020 09:36
DPL A 1 20/03/2020 20:04
SGR F 2 21/03/2020 19:42
SGR B 2 22/03/2020 10:30
SGR C 3 24/03/2020 08:17
SGR F 1 01/04/2020 09:00
SGR E 1 02/02/2020 10:57
SGR F 1 02/02/2020 15:50
I want to add 2 columns that rank / group the site and the area in ascending order of date, like so:
Site | Area | Space | Date | Site Order | Area Order |
DCG X 7 02/02/2020 12:13 1 1
DCG X 5 04/02/2020 11:47 1 1
DCG X 12 10/02/2020 15:14 1 1
GNL U 0 03/03/2020 18:35 2 1
GNL A 4 04/03/2020 08:28 2 2
GNL C 4 06/03/2020 09:07 2 3
GNL B 1 16/03/2020 07:10 2 4
DPL U 0 18/03/2020 09:28 3 1
DPL A 1 18/03/2020 09:36 3 2
DPL A 1 20/03/2020 20:04 3 2
SGR F 2 21/03/2020 19:42 4 1
SGR B 2 22/03/2020 10:30 4 2
SGR C 3 24/03/2020 08:17 4 3
SGR F 1 01/04/2020 09:00 4 1
SGR E 1 02/02/2020 10:57 4 4
SGR F 1 02/02/2020 15:50 4 1
Apologies if I've not made it clear
You can use min() as a window function to get the minimum date for each site and site/area combo. Then use dense_rank():
select t.*,
dense_rank() over (order by min_site_date, site) as site_seqnum,
dense_rank() over (partition by site order by min_site_date) as area_seqnum
from (select t.*,
min(date) over (partition by site) as min_site_date,
min(date) over (partition by site, area) as min_site_area_date
from t
) t
You can use window function :
select t.*,
dense_rank() over (order by site, site_date) as site_sequence,
dense_rank() over (partition by site order by area, site_area_date) as area_sequence
from (select t.*,
min([date]) over (partition by [site]) as site_date,
min([date]) over (partition by [site], area) as site_area_date
from table t
) t;

Get all dates for all date ranges in table using SQL Server

I have table dbo.WorkSchedules(Id, From, To) where I store date ranges for work schedules. I want to create a view that will have all dates for all rows of WorkSchedules. Thanks to this I have 1 view with all dates for all schedules.
On web I only found solutions for 1 row like 2 parameters start and end. My issue is different where I have multiple rows with start and end range.
Example:
WorkSchedules
Id | From | To
---+------------+-----------
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
Desired result
1 | 2018-01-01
2 | 2018-01-02
3 | 2018-01-03
4 | 2018-01-04
5 | 2018-01-05
6 | 2018-01-08
7 | 2018-01-09
8 | 2018-01-10
9 | 2018-01-11
10| 2018-01-12
If you are regularly dealing with "jobs" and "schedules" then I propose that you need a permanent calendar table (a table where each row is a unique date). You can create rows for dates dynamically but why do this many times when you can do it once and just re-use?
A calendar table, even of several decades, isn't "big" and when indexed they can be very fast as well. You can also store information about holidays and/or fiscal periods etc.
There are many scripts available to produce these tables, here's an answer with 2 scripts on this site: https://stackoverflow.com/a/5635628/2067753
Assuming you use the second (more comprehensive) script, then you can exclude weekends, or other conditions such as holidays, from query results.
Once you have a permanent Calendar table this style of query may be used:
CREATE TABLE WorkSchedules(
Id INTEGER NOT NULL PRIMARY KEY
,[From] DATE NOT NULL
,[To] DATE NOT NULL
);
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (1,'2018-01-01','2018-01-05');
INSERT INTO WorkSchedules(Id,[From],[To]) VALUES (2,'2018-01-12','2018-01-12');
with range as (
select min(ws.[From]) as dt_from, max(ws.[To]) dt_to
from WorkSchedules as ws
)
select c.*
from calendar as c
inner join range on c.date between range.dt_from and range.dt_to
where c.KindOfDay = 'BANKDAY'
order by c.date
and the result looks like this (note: "News Years Day" has been excluded)
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- -------------
1 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
2 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
3 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
4 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
5 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
6 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
7 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
8 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
9 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
Without the where clause the full range is:
Date Year Quarter Month Week Day DayOfYear Weekday Fiscal_Year Fiscal_Quarter Fiscal_Month KindOfDay Description
---- --------------------- ------ --------- ------- ------ ----- ----------- --------- ------------- ---------------- -------------- ----------- ----------------
1 01.01.2018 00:00:00 2018 1 1 1 1 1 1 2018 1 1 HOLIDAY New Year's Day
2 02.01.2018 00:00:00 2018 1 1 1 2 2 2 2018 1 1 BANKDAY NULL
3 03.01.2018 00:00:00 2018 1 1 1 3 3 3 2018 1 1 BANKDAY NULL
4 04.01.2018 00:00:00 2018 1 1 1 4 4 4 2018 1 1 BANKDAY NULL
5 05.01.2018 00:00:00 2018 1 1 1 5 5 5 2018 1 1 BANKDAY NULL
6 06.01.2018 00:00:00 2018 1 1 1 6 6 6 2018 1 1 SATURDAY NULL
7 07.01.2018 00:00:00 2018 1 1 1 7 7 7 2018 1 1 SUNDAY NULL
8 08.01.2018 00:00:00 2018 1 1 2 8 8 1 2018 1 1 BANKDAY NULL
9 09.01.2018 00:00:00 2018 1 1 2 9 9 2 2018 1 1 BANKDAY NULL
10 10.01.2018 00:00:00 2018 1 1 2 10 10 3 2018 1 1 BANKDAY NULL
11 11.01.2018 00:00:00 2018 1 1 2 11 11 4 2018 1 1 BANKDAY NULL
12 12.01.2018 00:00:00 2018 1 1 2 12 12 5 2018 1 1 BANKDAY NULL
and weekends and holidays may be excluded using the column KindOfDay
See this as a demonstration (with build of calendar table) here: http://rextester.com/CTSW63441
Ok, I worked this out for you, thinking you mean that you meant 01/08/2018 as a From date in the second row.
/*WorkSchedules
Id| From | To
1 | 2018-01-01 | 2018-01-05
2 | 2018-01-08 | 2018-01-12
*/
--DROP TABLE #WorkSchedules;
CREATE TABLE #WorkSchedules (
ID int,
[DateFrom] DATE,
[DateTo] DATE
)
INSERT INTO #WorkSchedules
SELECT 1, '2018-01-01', '2018-01-05'
UNION
SELECT 2, '2018-01-08', '2018-01-12'
;WITH CTEDATELIMITS AS (
SELECT [DateFrom], [DateTo]
FROM #WorkSchedules
)
,CTEDATES AS
(
SELECT [DateFrom] as [DateResult] FROM CTEDATELIMITS
UNION ALL
SELECT DATEADD(Day, 1, [DateResult]) FROM CTEDATES
JOIN CTEDATELIMITS ON CTEDATES.[DateResult] >= CTEDATELIMITS.[DateFrom]
AND CTEDATES.dateResult < CTEDATELIMITS.[DateTo]
)
SELECT [DateResult] FROM CTEDATES
ORDER BY [DateResult]
You would use a recursive CTE:
with dates as (
select from, to, from as date
from WorkSchedules
union all
select from, to, dateadd(day, 1, date)
from dates
where date < to
)
select row_number() over (order by date), date
from dates;
Note that from and to are reserved words in SQL. They are lousy names for identifiers. I have not escaped them because I assume they are not the actual names of the columns.

Transposing SQLite rows and columns with average per hour

I have a table in SQLite called param_vals_breaches that looks like the following:
id param queue date_time param_val breach_count
1 c a 2013-01-01 00:00:00 188 7
2 c b 2013-01-01 00:00:00 156 8
3 c c 2013-01-01 00:00:00 100 2
4 d a 2013-01-01 00:00:00 657 0
5 d b 2013-01-01 00:00:00 23 6
6 d c 2013-01-01 00:00:00 230 12
7 c a 2013-01-01 01:00:00 100 0
8 c b 2013-01-01 01:00:00 143 9
9 c c 2013-01-01 01:00:00 12 2
10 d a 2013-01-01 01:00:00 0 1
11 d b 2013-01-01 01:00:00 29 5
12 d c 2013-01-01 01:00:00 22 14
13 c a 2013-01-01 02:00:00 188 7
14 c b 2013-01-01 02:00:00 156 8
15 c c 2013-01-01 02:00:00 100 2
16 d a 2013-01-01 02:00:00 657 0
17 d b 2013-01-01 02:00:00 23 6
18 d c 2013-01-01 02:00:00 230 12
I want to write a query that will show me a particular queue (e.g. "a") with the average param_val and breach_count for each param on an hour by hour basis. So transposing the data to get something that looks like this:
Results for Queue A
Hour 0 Hour 0 Hour 1 Hour 1 Hour 2 Hour 2
param avg_param_val avg_breach_count avg_param_val avg_breach_count avg_param_val avg_breach_count
c xxx xxx xxx xxx xxx xxx
d xxx xxx xxx xxx xxx xxx
is this possible? I'm not sure how to go about it. Thanks!
SQLite does not have a PIVOT function but you can use an aggregate function with a CASE expression to turn the rows into columns:
select param,
avg(case when time = '00' then param_val end) AvgHour0Val,
avg(case when time = '00' then breach_count end) AvgHour0Count,
avg(case when time = '01' then param_val end) AvgHour1Val,
avg(case when time = '01' then breach_count end) AvgHour1Count,
avg(case when time = '02' then param_val end) AvgHour2Val,
avg(case when time = '02' then breach_count end) AvgHour2Count
from
(
select param,
strftime('%H', date_time) time,
param_val,
breach_count
from param_vals_breaches
where queue = 'a'
) src
group by param;
See SQL Fiddle with Demo

find nonbreaking period with condition

There are quotas for hotels per day in a table. How to get number of days when hotel is daily available?
q_id q_hotel q_date q_value
1 1 2013-02-01 1
2 1 2013-02-02 1
3 1 2013-02-03 1
4 1 2013-02-04 0
5 1 2013-02-05 2
6 1 2013-02-06 3
7 1 2013-02-07 3
8 1 2013-02-08 2
9 1 2013-02-09 0
10 1 2013-02-10 0
11 1 2013-02-11 1
12 1 2013-02-12 1
Wanted output:
q_hotel q_date days_available
1 2013-02-01 3
1 2013-02-02 2
1 2013-02-03 1
1 2013-02-04 0
1 2013-02-05 4
1 2013-02-06 3
1 2013-02-07 2
1 2013-02-08 1
1 2013-02-09 0
1 2013-02-10 0
1 2013-02-11 2
1 2013-02-12 1
For now I can get number of days if there is zero quote after needed date exists - I find closest unavailable day and calculate dates difference.
http://sqlfiddle.com/#!12/1a64c/14
select q_hotel
,q_date
,(select extract(day from (min(B.q_date)-A.q_date)) from Table1 B where B.q_date>A.q_date
and B.q_value=0 and A.q_value<>0)
from Table1 A
But there is a problem when I don't have a zero closing date.
Here is a solution:
select
a.q_date
, a.q_hotel
, case
when
a.q_value = 0
then
0
else
(
select
extract
( day from
min ( b.q_date ) - a.q_date + interval '1 day'
)
from table1 b
where b.q_date >= a.q_date
and b.q_hotel = a.q_hotel
and not exists
(
select 1
from table1 c
where c.q_date = b.q_date + interval '1 day'
and b.q_hotel = a.q_hotel
and q_value <> 0
)
)
end as days_available
from table1 a