Complex rank in SQL using Postgres - sql

I'm in over my head with the SQL needed for a complex rank function. This is an app for a racing sport where I need to rank each Entry for a Timesheet based on the entry's :total_time.
The relevant models:
class Timesheet
has_many :entries
end
class Entry
belongs_to :timesheet
belongs_to :athlete
end
class Run
belongs_to :entry
end
An Entry's :total time isn't stored in the database. It's a calculated column of runs.sum(:finish). I use Postgres (9.3) rank() function to get Entries for a given Timesheet and rank them by this calculated column.
def ranked_entries
Entry.find_by_sql([
"SELECT *, rank() OVER (ORDER BY total_time asc)
FROM(
SELECT Entries.id, Entries.timesheet_id, Entries.athlete_id,
SUM(Runs.finish) AS total_time
FROM Entries
INNER JOIN Runs ON (Entries.id = Runs.entry_id)
GROUP BY Entries.id) AS FinalRanks
WHERE timesheet_id = ?", self.id])
end
So far so good. This returns my entry objects with a rank attribute which I can display on timesheet#show.
Now the tricky part. On a Timesheet, not every Entry will have the same number of runs. There is a cutoff (usually Top-20 but not always). This renders the rank() from Postgres inaccurate because some Entries will have a lower :total_time than the race winner because they didn't make the cutoff for the second heat.
My Question: Is it possible to do something like a rank() within a rank() to produce a table that looks like the one below? Or is there another preferred way? Thanks!
Note: I store times as integers, but I formatted them as the more familiar MM:SS in the simplified table below for clarity
| rank | entry_id | total_time |
|------|-----------|------------|
| 1 | 6 | 1:59.05 |
| 2 | 3 | 1:59.35 |
| 3 | 17 | 1:59.52 |
|......|...........|............|
| 20 | 13 | 56.56 | <- didn't make the top-20 cutoff, only has one run.

Let's create a table. (Get in the habit of including CREATE TABLE and INSERT statements in all your SQL questions.)
create table runs (
entry_id integer not null,
run_num integer not null
check (run_num between 1 and 3),
run_time interval not null
);
insert into runs values
(1, 1, '00:59.33'),
(2, 1, '00:59.93'),
(3, 1, '01:03.27'),
(1, 2, '00:59.88'),
(2, 2, '00:59.27');
This SQL statement will give you the totals in the order you want, but without ranking them.
with num_runs as (
select entry_id, count(*) as num_runs
from runs
group by entry_id
)
select r.entry_id, n.num_runs, sum(r.run_time) as total_time
from runs r
inner join num_runs n on n.entry_id = r.entry_id
group by r.entry_id, n.num_runs
order by num_runs desc, total_time asc
entry_id num_runs total_time
--
2 2 00:01:59.2
1 2 00:01:59.21
3 1 00:01:03.27
This statement adds a column for rank.
with num_runs as (
select entry_id, count(*) as num_runs
from runs
group by entry_id
)
select
rank() over (order by num_runs desc, sum(r.run_time) asc),
r.entry_id, n.num_runs, sum(r.run_time) as total_time
from runs r
inner join num_runs n on n.entry_id = r.entry_id
group by r.entry_id, n.num_runs
order by rank asc
rank entry_id num_runs total_time
--
1 2 2 00:01:59.2
2 1 2 00:01:59.21
3 3 1 00:01:03.27

Related

How to identify rows per group before a certain value gap?

I'd like to update a certain column in a table based on the difference in a another column value between neighboring rows in PostgreSQL.
Here is a test setup:
CREATE TABLE test(
main INTEGER,
sub_id INTEGER,
value_t INTEGER);
INSERT INTO test (main, sub_id, value_t)
VALUES
(1,1,8),
(1,2,7),
(1,3,3),
(1,4,85),
(1,5,40),
(2,1,3),
(2,2,1),
(2,3,1),
(2,4,8),
(2,5,41);
My goal is to determine in each group main starting from sub_id 1 which value in diff exceeds a certain threshold (e.g. <10 or >-10) by checking in ascending order by sub_id. Until the threshold is reached I would like to flag every passed row AND the one row where the condition is FALSE by filling column newval with a value e.g. 1.
Should I use a loop or are there smarter solutions?
The task description in pseudocode:
FOR i in GROUP [PARTITION BY main ORDER BY sub_id]:
DO until diff > 10 OR diff <-10
SET newval = 1 AND LEAD(newval) = 1
Basic SELECT
As fast as possible:
SELECT *, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT *, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub;
Fine points
Your thought model evolves around the window function lead(). But its counterpart lag() is a bit more efficient for the purpose, since there is no off-by-one error when including the row before the big gap. Alternatively, use lead() with inverted sort order (ORDER BY sub_id DESC).
To avoid NULL for the first row in the partition, provide value_t as default as 3rd parameter, which makes the diff 0 instead of NULL. Both lead() and lag() have that capability.
diff BETWEEN -10 AND 10 is slightly faster than #diff < 11 (clearer and more flexible, too). (# being the "absolute value" operator, equivalent to the abs() function.)
bool_or() or bool_and() in the outer window function is probably cheapest to mark all rows up to the big gap.
Your UPDATE
Until the threshold is reached I would like to flag every passed row AND the one row where the condition is FALSE by filling column newval with a value e.g. 1.
Again, as fast as possible.
UPDATE test AS t
SET newval = 1
FROM (
SELECT main, sub_id
, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT main, sub_id
, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub
) u
WHERE (t.main, t.sub_id) = (u.main, u.sub_id)
AND u.flag;
Fine points
Computing all values in a single query is typically substantially faster than a correlated subquery.
The added WHERE condition AND u.flag makes sure we only update rows that actually need an update.
If some of the rows may already have the right value in newval, add another clause to avoid those empty updates, too: AND t.newval IS DISTINCT FROM 1
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
SET newval = 1 assigns a constant (even though we could use the actually calculated value in this case), that's a bit cheaper.
db<>fiddle here
Your question was hard to comprehend, the "value_t" column was irrelevant to the question, and you forgot to define the "diff" column in your SQL.
Anyhow, here's your solution:
WITH data AS (
SELECT main, sub_id, value_t
, abs(value_t
- lead(value_t) OVER (PARTITION BY main ORDER BY sub_id)) > 10 is_evil
FROM test
)
SELECT main, sub_id, value_t
, CASE max(is_evil::int)
OVER (PARTITION BY main ORDER BY sub_id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
WHEN 1 THEN NULL ELSE 1 END newval
FROM data;
I'm using a CTE to prepare the data (computing whether a row is "evil"), and then the "max" window function is used to check if there were any "evil" rows before the current one, per partition.
EXISTS on an aggregating subquery:
UPDATE test u
SET value_t = NULL
WHERE EXISTS (
SELECT * FROM (
SELECT main,sub_id
, value_t , ABS(value_t - lag(value_t)
OVER (PARTITION BY main ORDER BY sub_id) ) AS absdiff
FROM test
) x
WHERE x.main = u.main
AND x.sub_id <= u.sub_id
AND x.absdiff >= 10
)
;
SELECT * FROM test
ORDER BY main, sub_id;
Result:
UPDATE 3
main | sub_id | value_t
------+--------+---------
1 | 1 | 8
1 | 2 | 7
1 | 3 | 3
1 | 4 |
1 | 5 |
2 | 1 | 3
2 | 2 | 1
2 | 3 | 1
2 | 4 | 8
2 | 5 |
(10 rows)

Select distinct value and bring only the latest one

I have a table that stores different statuses of each transaction. Each transaction can have multiple statuses (pending, rejected, aproved, etc).
I need to build a query that brings only the last status of each transaction.
The definition for the table that stores the statuses is:
[dbo].[Cuotas_Estado]
ID int (PK)
IdCuota int (references table dbo.Cuotas - FK)
IdEstado int (references table dbo.Estados - FK)
Here's the architecture for the 3 tables:
When running a simple SELECT statement on table dbo.Cuotas_Estado you'll get:
SELECT
*
FROM [dbo].[Cuotas_Estado] [E]
But the result I need is:
IdCuota | IdEstado
2 | 1
3 | 2
9 | 3
10 | 3
11 | 4
I'm running the following select statement:
SELECT
DISTINCT([E].[IdEstado]),
[E].[IdCuota]
FROM [dbo].[Cuotas_Estado] [E]
ORDER BY
[E].[IdCuota] ASC;
This will bring this result:
So, as you can see, it's bringing a double value to entry 9 and entry 11, I need the query to bring only the latest IdEstado column (3 in the entry 9 and 4 in the entry 11).
can you try this?
with cte as (
select IdEstado,IdCuota,
row_number() over(partition by IdCuota order by fecha desc) as RowNum
from [dbo].[Cuotas_Estado]
)
select IdEstado,IdCuota
from cte
where RowNum = 1
You can use a correlated subquery:
SELECT e.*
FROM [dbo].[Cuotas_Estado] e
WHERE e.IdEstado = (SELECT MAX(e2.IdEstado)
FROM [dbo].[Cuotas_Estado] e2
WHERE e2.IdCuota = e.IdCuota
);
With an index on Cuotas_Estado(IdCuota, IdEstado) this is probably the most efficient method.

SQL Select Query Group By With Key (Join or SubQuery)

Trying to wrap my head around this query but here it is...
Table: TVEpisode
Columns: TVEpisodeID (PK), TVSeriesID, season (number), episode (number), watched (0 or 1)
What I am looking to get is the first unwatched (value 0) episode for each TVSeries. For example, if I have watched all of season 1 for a TVSeriesID (45) and my lasted watched episode is season 2 episode 5, I want the query to return:
TVEpisodeID | TVSeriesID | Season | Episode
PK | 45 | 2 | 6
Need that result for each TVSeries
In most databases, you would do this with the ANSI standard window functions:
select tve.*
from (select tve.*
row_number() over (partition by tvseriesid order by season, episode) as seqnum
from tvepisode tve
where tve.watched = 0
) tve
where seqnum = 1;
I assume that "first" is referring to the combination of season and episode.
This should give you first watched=0 of all season/episode combination
SELECT *
FROM TVEpisode
WHERE TVEpisodeID IN
( SELECT min(TVEpisodeID)
FROM TVEpisode
WHERE watched=0
GROUP BY TVSeriesID) t

Get 10 distinct projects with the latest updates in related tasks

I have two tables in a PostgreSQL 9.5 database:
project
- id
- name
task
- id
- project_id
- name
- updated_at
There are ~ 1000 projects (updated very rarely) and ~ 10 million tasks (updated very often).
I want to list those 10 distinct projects that have the latest task updates.
A basic query would be:
SELECT * FROM task ORDER BY updated_at DESC LIMIT 10;
However, there can be many updated tasks per project. So I won't get 10 unique projects.
If I try to add DISTINCT(project_id) somewhere in the query, I'm getting an error:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
Problem is, I can't sort (primarily) by project_id, because I need to have tasks sorted by time. Sorting by updated_at DESC, project_id ASC doesn't work either, because several tasks of the same project can be among the latest.
I can't download all records because there are millions of them.
As a workaround I download 10x needed rows (without distinct) scope, and filter them in the backend. This works for most cases, but it's obviously not reliable: sometimes I don't get 10 unique projects.
Can this be solved efficiently in Postgres 9.5?
Example
id | name
----+-----------
1 | Project 1
2 | Project 2
3 | Project 3
id | project_id | name | updated_at
----+------------+--------+-----------------
1 | 1 | Task 1 | 13:12:43.361387
2 | 1 | Task 2 | 13:12:46.369279
3 | 2 | Task 3 | 13:12:54.680891
4 | 3 | Task 4 | 13:13:00.472579
5 | 3 | Task 5 | 13:13:04.384477
If I query:
SELECT project_id, updated_at FROM task ORDER BY updated_at DESC LIMIT 2
I get:
project_id | updated_at
------------+-----------------
3 | 13:13:04.384477
3 | 13:13:00.472579
But I want to get 2 distinct projects with the respective latest task.update_at like this:
project_id | updated_at
------------+-----------------
3 | 13:13:04.384477
2 | 13:12:54.680891 -- from Task 3
The simple (logically correct) solution is to aggregate tasks to get the latest update per project, and then pick the latest 10, like #Nemeros provided.
However, this incurs a sequential scan on task, which is undesirable (expensive) for big tables.
If you have relatively few projects (many task entries per project), there are faster alternatives using (bitmap) index scans.
SELECT *
FROM project p
, LATERAL (
SELECT updated_at AS last_updated_at
FROM task
WHERE project_id = p.id
ORDER BY updated_at DESC
LIMIT 1
) t
ORDER BY t.last_updated_at
LIMIT 10;
Key to performance is a matching multicolumn index:
CREATE INDEX task_project_id_updated_at ON task (project_id, updated_at DESC);
A setup with 1000 projects and 10 million tasks (like you commented) is a perfect candidate for this.
Background:
Optimize GROUP BY query to retrieve latest record per user
Select first row in each GROUP BY group?
NULL and "no row"
Above solution assumes updated_at is defined NOT NULL. Else use ORDER BY updated_at DESCNULLS LAST and ideally make the index match.
Projects without any tasks are eliminated from the result by the implicit CROSS JOIN. NULL values cannot creep in this way. This is subtly different from correlated subqueries like #Nemeros added to his answer: those return NULL values for "no row" (project has no related tasks at all). The outer descending sort order then lists NULL on top unless instructed otherwise. Most probably not what you want.
Related:
PostgreSQL sort by datetime asc, null first?
What is the difference between LATERAL and a subquery in PostgreSQL?
Try a group by expression, that's what it's aimed for :
SELECT project_id, max(update_date) as max_upd_date
FROM task t
GROUP BY project_id
order by max_upd_date DESC
LIMIT 10
Do not forget to put an index that begin with : project_id, update_date if you want to avoid full table scans.
Well the only way to use the index seems to be with correlated sub query :
select p.id,
(select upd_dte from task t where p.id = t.prj_id order by upd_dte desc limit 1) as max_dte
from project p
order by max_dte desc
limit 10
try to use
SELECT project_id,
Max (updated_at)
FROM task
GROUP BY project_id
ORDER BY Max(updated_at) DESC
LIMIT 10
I believe row_number() over() can be used for this but you will still need the final order by and limit clauses:
select
mt.*
from (
SELECT
* , row_number() over(partition by project_id order by updated_at DESC) rn
FROM tasks
) mt
-- inner join Projects p on mt.project_id = p.id
where mt.rn = 1
order by mt.updated_at DESC
limit 2
Advantage of this approach gives you access to the full row corresponding to the maximum updated_at for each project. You can optionally join the projects table as well
result:
| id | project_id | name | updated_at | rn |
|----|------------|--------|-----------------|----|
| 5 | 3 | Task 5 | 13:13:04.384477 | 1 |
| 3 | 2 | Task 3 | 13:12:54.680891 | 1 |
see: http://sqlfiddle.com/#!15/ee039/1
How about sorting the records by the most recent update and then doing distinct on?
select distinct on (t.project_id) t.*
from tasks t
order by max(t.update_date) over (partition by t.project_id), t.project_id;
EDIT:
I didn't realize Postgres did that check. Here is the version with a subquery:
select distinct on (maxud, t.project_id) t.*
from (select t.*,
max(t.update_date) over (partition by t.project_id) as maxud
from tasks t
) t
order by maxud, t.project_id;
You could probably put the analytic call in the distinct on, but I think this is clearer anyway.

How to write Oracle query to find a total length of possible overlapping from-to dates

I'm struggling to find the query for the following task
I have the following data and want to find the total network day for each unique ID
ID From To NetworkDay
1 03-Sep-12 07-Sep-12 5
1 03-Sep-12 04-Sep-12 2
1 05-Sep-12 06-Sep-12 2
1 06-Sep-12 12-Sep-12 5
1 31-Aug-12 04-Sep-12 3
2 04-Sep-12 06-Sep-12 3
2 11-Sep-12 13-Sep-12 3
2 05-Sep-12 08-Sep-12 3
Problem is the date range can be overlapping and I can't come up with SQL that will give me the following results
ID From To NetworkDay
1 31-Aug-12 12-Sep-12 9
2 04-Sep-12 08-Sep-12 4
2 11-Sep-12 13-Sep-12 3
and then
ID Total Network Day
1 9
2 7
In case the network day calculation is not possible just get to the second table would be sufficient.
Hope my question is clear
We can use Oracle Analytics, namely the "OVER ... PARTITION BY" clause, in Oracle to do this. The PARTITION BY clause is kind of like a GROUP BY but without the aggregation part. That means we can group rows together (i.e. partition them) and them perform an operation on them as separate groups. As we operate on each row we can then access the columns of the previous row above. This is the feature PARTITION BY gives us. (PARTITION BY is not related to partitioning of a table for performance.)
So then how do we output the non-overlapping dates? We first order the query based on the (ID,DFROM) fields, then we use the ID field to make our partitions (row groups). We then test the previous row's TO value and the current rows FROM value for overlap using an expression like: (in pseudo code)
max(previous.DTO, current.DFROM) as DFROM
This basic expression will return the original DFROM value if it doesnt overlap, but will return the previous TO value if there is overlap. Since our rows are ordered we only need to be concerned with the last row. In cases where a previous row completely overlaps the current row we want the row then to have a 'zero' date range. So we do the same thing for the DTO field to get:
max(previous.DTO, current.DFROM) as DFROM, max(previous.DTO, current.DTO) as DTO
Once we have generated the new results set with the adjusted DFROM and DTO values, we can aggregate them up and count the range intervals of DFROM and DTO.
Be aware that most date calculations in database are not inclusive such as your data is. So something like DATEDIFF(dto,dfrom) will not include the day dto actually refers to, so we will want to adjust dto up a day first.
I dont have access to an Oracle server anymore but I know this is possible with the Oracle Analytics. The query should go something like this:
(Please update my post if you get this to work.)
SELECT id,
max(dfrom, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dfrom,
max(dto, LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) ) as dto
from (
select id, dfrom, dto+1 as dto from my_sample -- adjust the table so that dto becomes non-inclusive
order by id, dfrom
) sample;
The secret here is the LAST_VALUE(dto) OVER (PARTITION BY id ORDER BY dfrom) expression which returns the value previous to the current row.
So this query should output new dfrom/dto values which dont overlap. It's then a simple matter of sub-querying this doing (dto-dfrom) and sum the totals.
Using MySQL
I did haves access to a mysql server so I did get it working there. MySQL doesnt have results partitioning (Analytics) like Oracle so we have to use result set variables. This means we use #var:=xxx type expressions to remember the last date value and adjust the dfrom/dto according. Same algorithm just a little longer and more complex syntax. We also have to forget the last date value any time the ID field changes!
So here is the sample table (same values you have):
create table sample(id int, dfrom date, dto date, networkDay int);
insert into sample values
(1,'2012-09-03','2012-09-07',5),
(1,'2012-09-03','2012-09-04',2),
(1,'2012-09-05','2012-09-06',2),
(1,'2012-09-06','2012-09-12',5),
(1,'2012-08-31','2012-09-04',3),
(2,'2012-09-04','2012-09-06',3),
(2,'2012-09-11','2012-09-13',3),
(2,'2012-09-05','2012-09-08',3);
On to the query, we output the un-grouped result set like above:
The variable #ld is "last date", and the variable #lid is "last id". Anytime #lid changes, we reset #ld to null. FYI In mysql the := operators is where the assignment happens, an = operator is just equals.
This is a 3 level query, but it could be reduced to 2. I went with an extra outer query to keep things more readable. The inner most query is simple and it adjusts the dto column to be non-inclusive and does the proper row ordering. The middle query does the adjustment of the dfrom/dto values to make them non-overlapped. The outer query simple drops the non-used fields, and calculate the interval range.
set #ldt=null, #lid=null;
select id, no_dfrom as dfrom, no_dto as dto, datediff(no_dto, no_dfrom) as days from (
select if(#lid=id,#ldt,#ldt:=null) as last, dfrom, dto, if(#ldt>=dfrom,#ldt,dfrom) as no_dfrom, if(#ldt>=dto,#ldt,dto) as no_dto, #ldt:=if(#ldt>=dto,#ldt,dto), #lid:=id as id,
datediff(dto, dfrom) as overlapped_days
from (select id, dfrom, dto + INTERVAL 1 DAY as dto from sample order by id, dfrom) as sample
) as nonoverlapped
order by id, dfrom;
The above query gives the results (notice dfrom/dto are non-overlapping here):
+------+------------+------------+------+
| id | dfrom | dto | days |
+------+------------+------------+------+
| 1 | 2012-08-31 | 2012-09-05 | 5 |
| 1 | 2012-09-05 | 2012-09-08 | 3 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-08 | 0 |
| 1 | 2012-09-08 | 2012-09-13 | 5 |
| 2 | 2012-09-04 | 2012-09-07 | 3 |
| 2 | 2012-09-07 | 2012-09-09 | 2 |
| 2 | 2012-09-11 | 2012-09-14 | 3 |
+------+------------+------------+------+
How about constructing an SQL which merges intervals by removing holes and considering only maximum intervals. It goes like this (not tested):
SELECT DISTINCT F.ID, F.From, L.To
FROM Temp AS F, Temp AS L
WHERE F.From < L.To AND F.ID = L.ID
AND NOT EXISTS (SELECT *
FROM Temp AS T
WHERE T.ID = F.ID
AND F.From < T.From AND T.From < L.To
AND NOT EXISTS ( SELECT *
FROM Temp AS T1
WHERE T1.ID = F.ID
AND T1.From < T.From
AND T.From <= T1.To)
)
AND NOT EXISTS (SELECT *
FROM Temp AS T2
WHERE T2.ID = F.ID
AND (
(T2.From < F.From AND F.From <= T2.To)
OR (T2.From < L.To AND L.To < T2.To)
)
)
with t_data as (
select 1 as id,
to_date('03-sep-12','dd-mon-yy') as start_date,
to_date('07-sep-12','dd-mon-yy') as end_date from dual
union all
select 1,
to_date('03-sep-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('05-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('06-sep-12','dd-mon-yy'),
to_date('12-sep-12','dd-mon-yy') from dual
union all
select 1,
to_date('31-aug-12','dd-mon-yy'),
to_date('04-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('04-sep-12','dd-mon-yy'),
to_date('06-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('11-sep-12','dd-mon-yy'),
to_date('13-sep-12','dd-mon-yy') from dual
union all
select 2,
to_date('05-sep-12','dd-mon-yy'),
to_date('08-sep-12','dd-mon-yy') from dual
),
t_holidays as (
select to_date('01-jan-12','dd-mon-yy') as holiday
from dual
),
t_data_rn as (
select rownum as rn, t_data.* from t_data
),
t_model as (
select distinct id,
start_date
from t_data_rn
model
partition by (rn, id)
dimension by (0 as i)
measures(start_date, end_date)
rules
( start_date[for i
from 1
to end_date[0]-start_date[0]
increment 1] = start_date[0] + cv(i),
end_date[any] = start_date[cv()] + 1
)
order by 1,2
),
t_network_days as (
select t_model.*,
case when
mod(to_char(start_date, 'j'), 7) + 1 in (6, 7)
or t_holidays.holiday is not null
then 0 else 1
end as working_day
from t_model
left outer join t_holidays
on t_holidays.holiday = t_model.start_date
)
select id,
sum(working_day) as network_days
from t_network_days
group by id;
t_data - your initial data
t_holidays - contains list of holidays
t_data_rn - just adds unique key (rownum) to each row of t_data
t_model - expands t_data date ranges into a flat list of dates
t_network_days - marks each date from t_model as working day or weekend based on day of week (Sat and Sun) and holidays list
final query - calculates number of network day per each group.