How to partition my data by a specific date and another identifier SQL - sql

with cte as
(
select to_date('01-JUN-2020','DD-MON-YYYY')+(level-1) DT
from dual
connect bY level<= 30
)
select *
from cte x
left outer join
(select date from time where emp in (1, 2)) a on x.dt = a.date
In this scenario I am trying to find the missing days that these persons didn't report to work... it works well for 1 person. I get back their missing days correctly. But when I add 2 persons.. I do not get back the correct missing days for them because I'm only joining on date I guess.
I would like to know how I can partition this data by the persons id and date to be able get accurate days that each were missing.
Please help, thanks.

You would typically cross join the list of dates with the list of persons, and then use not exists to pull out the missing person/date tuples:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select distinct emp from times) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
Note that this uses a literal date rather than to_date(), which is more appropriate here.
This gives the missing tuples for all persons at once. If you want just for a predefined list of persons, then:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
If you want to also see the "presence" dates, then use a left join rather than not exists, as in your original query:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp, -- enumerate the relevant columns from "t" here
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
left join times t on t.emp = e.emp and t.dt = e.date

Related

How to extrapolate dates in SQL Server to calculate the daily counts?

This is how the data looks like. It's a long table
I need to calculate the number of people employed by day
How to write SQL Server logic to get this result? I treid to create a DATES table and then join, but this caused an error because the table is too big. Do I need a recursive logic?
For future questions, don't post images of data. Instead, use a service like dbfiddle. I'll anyhow add a sketch for an answer, with a better-prepared question you could have gotten a complete answer. Anyhow here it goes:
-- extrema is the least and the greatest date in staff table
with extrema(mn, mx) as (
select least(min(hired),min(retired)) as mn
, greatest(max(hired),max(retired)) as mx
from staff
), calendar (dt) as (
-- we construct a calendar with every date between extreme values
select mn from extrema
union all
select dateadd(day, 1, d)
from calendar
where dt < (select mx from extrema)
)
-- finally we can count the number of employed people for each such date
select dt, count(1)
from calendar c
join staff s
on c.dt between s.hired and s.retired
group by dt;
If you find yourself doing this kind of calculation often, it is a good idea to create a calendar table. You can add other attributes to it such as if it is a day of in the middle of the week etc.
With a constraint as:
CHECK(hired <= retired)
the first part can be simplified to:
with extrema(mn, mx) as (
select min(hired) as mn
, max(retired) as mx
from staff
),
Assuming Current Employees have a NULL retirement date
Declare #Date1 date = '2015-01-01'
Declare #Date2 date = getdate()
Select A.Date
,HeadCount = count(B.name)
From ( Select Top (DateDiff(DAY,#Date1,#Date2)+1)
Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1,master..spt_values n2
) A
Left Join YourTable B on A.Date >= B.Hired and A.Date <= coalesce(B.Retired,getdate())
Group BY A.Date
You need a calendar table for this. You start with the calendar, and LEFT JOIN everything else, using BETWEEN logic.
You can use a real table. Or you can generate it on the fly, like this:
WITH
L0 AS ( SELECT c = 1
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1)) AS D(c) ),
L1 AS ( SELECT c = 1 FROM L0 A, L0 B, L0 C, L0 D ),
Nums AS ( SELECT rownum = ROW_NUMBER() OVER(ORDER BY (SELECT 1))
FROM L1 ),
Dates AS (
SELECT TOP (DATEDIFF(day, '20141231', GETDATE()))
Date = DATEADD(day, rownum, '20141231')
FROM Nums
)
SELECT
d.Date,
NumEmployed = COUNT(*)
FROM Dates d
JOIN YourTable t ON d.Date BETWEEN t.Hired AND t.Retired
GROUP BY
d.Date;
If your dates have a time component then you need to use >= AND < logic
Try limiting the scope of your date table. In this example I have a table of dates named TallyStickDT.
SELECT dt, COUNT(name)
FROM (
SELECT dt
FROM tallystickdt
WHERE dt >= (SELECT MIN(hired) FROM #employees)
AND dt <= GETDATE()
) A
LEFT OUTER JOIN #employees E ON A.dt >= E.Hired AND A.dt <= e.retired
GROUP BY dt
ORDER BY dt

Count Data by Loop Calendar SQL/Oracle

I need to get the data that generates count of total ID by date between date_active and date_end using date ranges for each. If the dates are crossing each other the ID will adding up. here is the data I have right now,
TABLE CONTRACT:
ID DATE_ACTIVE DATE_END
1 05-FEB-13 08-NOV-13
1 21-DEC-18 06-OCT-19
2 05-FEB-13 27-JAN-14
3 05-FEB-13 07-NOV-13
4 06-FEB-13 02-NOV-13
4 25-OCT-14 13-APR-16
TABLE CALENDAR:
DT
05-FEB-13
06-FEB-13
07-FEB-13
08-FEB-13
09-FEB-13
..-DEC-19
what I want out is basically like this:
DT COUNT(ID)
05-FEB-13 3
06-FEB-13 4
07-FEB-13 4
08-FEB-13 4
09-FEB-13 4
10-FEB-13 4
....
03-NOV-13 3
....
08-NOV-13 2
09-NOV-13 1
....
28-JAN-14 0
....
25-OCT-14 1
....
13-APR-16 1
14-APR-16 0
....
21-DEC-18 1
....
06-OCT-19 1
07-OCT-19 0
....
....
And here is my query to get that result
with contract as (
select * from contract
where id in ('1','2','3','4')
)
,
cal as
(
select TRUNC (SYSDATE - ROWNUM) dt
from dual
connect by rownum < sysdate - to_date('05-FEB-13')
)
select aa.dt,count(distinct bb.id)id from cal aa
left join contract bb on aa.dt >= bb.date_active and aa.dt<= bb.date_end
group by aa.dt
order by 1
but the problem is I have 6 mio of ID and if I use this kind of query, the result maybe will take forever, and I'm having a hard times to figured out how to get the result with different query. It will be my pleasure if somebody can help me out of this. Thank you so much.
If you group your events by date_active and date_end, you will get the numbers of events which have started and ended on each separate day.
Not a lot of days have passed between 2013 and 2019 (about 2 000), so the grouped resultsets will be relatively short.
Now that you have the two groups, you can notice that the number of events on each given date is the number of events which have started on or before this date, minus the number of events which have finished on or before this date (I'm assuming the end dates are non-inclusive).
In other words, the number of events on every given day is:
The number of events on the previous date,
plus the number of events started on this date,
minus the number of events ended on this date.
This can be easily done using a window function.
This will require a join between the calendar table and the two groups, but fortunately all of them are relatively short (thousands of records) and the join would be fast.
Here's the query: http://sqlfiddle.com/#!4/b21ce/5
WITH cal AS
(
SELECT TRUNC (to_date('01-NOV-13') - ROWNUM) dt
FROM dual
CONNECT BY
rownum < to_date('01-NOV-13')- to_date('01-FEB-13')
),
started_on AS
(
SELECT date_active AS dt, COUNT(*) AS cnt_start
FROM contract
GROUP BY
date_active
),
ended_on AS
(
SELECT date_end AS dt, COUNT(*) AS cnt_end
FROM contract
GROUP BY
date_end
)
SELECT dt,
SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM cal c
LEFT JOIN
started_on s
USING (dt)
LEFT JOIN
ended_on e
USING (dt)
(I used a fixed date instead of SYSDATE to keep the resultset short, but the idea is the same)
This query requires that the calendar starts before the earliest event, otherwise every result will be off by a fixed amount, the number of events before the beginning of the calendar.
You can replace the fixed date in the calendar condition with (SELECT MIN(date_active) FROM contract) which is instant if date_active is indexed.
Update:
If your contract dates can overlap and you want to collapse multiple overlapping contracts into a one continuous contract, you can use window functions to do so.
WITH cal AS
(
SELECT TRUNC (to_date('01-NOV-13') - ROWNUM) dt
FROM dual
CONNECT BY
rownum <= to_date('01-NOV-13')- to_date('01-FEB-13')
),
collapsed_contract AS
(
SELECT *
FROM (
SELECT c.*,
COALESCE(LAG(date_end_effective) OVER (PARTITION BY id ORDER BY date_active), date_active) AS date_start_effective
FROM (
SELECT c.*,
MAX(date_end) OVER (PARTITION BY id ORDER BY date_active) AS date_end_effective
FROM contract c
) c
) c
WHERE date_start_effective < date_end_effective
),
started_on AS
(
SELECT date_start_effective AS dt, COUNT(*) AS cnt_start
FROM collapsed_contract
GROUP BY
date_start_effective
),
ended_on AS
(
SELECT date_end_effective AS dt, COUNT(*) AS cnt_end
FROM collapsed_contract
GROUP BY
date_end_effective
)
SELECT dt,
SUM(COALESCE(cnt_start, 0) - COALESCE(cnt_end, 0)) OVER (ORDER BY dt) cnt
FROM cal c
LEFT JOIN
started_on s
USING (dt)
LEFT JOIN
ended_on e
USING (dt)
http://sqlfiddle.com/#!4/adeba/1
The query might seem bulky, but that's to make it more efficient, as all these window functions can be calculated in a single pass over the table.
Note however that this single pass relies on the table being sorted on (id, date_active) so an index on these two fields is crucial.
Firstly, row_number() over (order by id,date_active) analytic function is used in order to generate unique ID values those will be substituted in
connect by level <= ... and prior id = id syntax to get unpivoted hierarchical data :
with t0 as
(
select row_number() over (order by id,date_active) as id, date_active, date_end
from contract
), t1 as
(
select date_active + level - 1 as dt
from t0
connect by level <= date_end - date_active + 1
and prior id = id
and prior sys_guid() is not null
)
select dt, count(*)
from t1
group by dt
order by dt
Demo

How to fill missing dates between empty records?

I am trying to fill dates between empty records but without success. Tried to do multiple selects method, tried to join, but it seems like I am missing the point. I would like to generate records with missing dates, to generate chart from this block of code. Firstly I would like to have dates filled "manually", later I will reorganise this code and swap that method for an argument.
Can someone help me with that expression?
SELECT
LOG_LAST AS "data",
SUM(run_cnt) AS "Number of runs"
FROM
dual l
LEFT OUTER JOIN "LOG_STAT" stat ON
stat."LOG_LAST" = l."CLASS"
WHERE
new_class = '$arg[klasa]'
--SELECT to_date(TRUNC (SYSDATE - ROWNUM), 'DD-MM-YYYY'),
--0
--FROM dual CONNECT BY ROWNUM < 366
GROUP BY
LOG_LAST
ORDER BY
LOG_LAST
//Edit:
LOG_LAST is just a column with date (for example: 25.04.2018 15:44:21), run_cnt is a column with just a simple number, LOG_STAT is a table that contains LOG_LAST and run_cnt, new_class is a column with name of the record I would like to list records even when they are no existing. For example: I have a records with date 24-09-2018, 23-09-2018, 20-09-2018, 18-09-2018, and I would like to list records even without names and run_cnt, but to generate missing dates in some period
try to fill with isnull:
SELECT
case when trim(LOG_LAST) is null then '01-01-2018'
else isnull(LOG_LAST,'01-01-2018')end AS data,
SUM(isnull(run_cnt,0)) AS "Number of runs"
FROM
dual l
LEFT OUTER JOIN "LOG_STAT" stat ON
stat."LOG_LAST" = l."CLASS"
WHERE
new_class = '$arg[klasa]'
--SELECT to_date(TRUNC (SYSDATE - ROWNUM), 'DD-MM-YYYY'),
--0
--FROM dual CONNECT BY ROWNUM < 366
GROUP BY
LOG_LAST
ORDER BY
LOG_LAST
What you want is more or less:
select d.day, sum(ls.run_cnt)
from all_dates d
left join log_stat ls on trunc(ls.log_last) = d.day
where ls.new_class = :klasa
group by d.day
order by d.day;
The all_dates table in above query is supposed to contain all dates beginning with the minimum klasa log_last date and ending with the maximum klasa log_last date. You get these dates with a recursive query.
with ls as
(
select trunc(log_last) as day, sum(run_cnt) as total
from log_stat
where new_class = :klasa
group by trunc(log_last)
)
, all_dates(day) as
(
select min(day) from ls
union all
select day + 1 from all_dates where day < (select max(day) from ls)
)
select d.day, ls.total
from all_dates d
left join ls on ls.day = d.day
order by d.day;
It's called data densification. From oracle doc Data Densification for Reporting, An example data densification
with ls as
(
select trunc(created) as day,object_type new_class, sum(1) as total
from user_objects
group by trunc(created),object_type
)
, all_dates(day) as
(
select min(day) from ls
union all
select day + 1 from all_dates where day < (select max(day) from ls)
)
select d.day, nvl(ls.total,0),new_class
from all_dates d
left join ls partition by (ls.new_class) on ls.day = d.day
order by d.day;

Teradata - How to account for missing hours in timestamp when using extract() function

I have the following statement to extract the date, hour and number of users from a table in a Teradata DB . . .
SELECT
CAST(end_time AS DATE) AS end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(users) AS total_users
FROM table
GROUP BY end_date, end_hour
When using the extract() function, my resultset contains missing hours where there is no activity by users over a 24 hour period... I'm wondering is there any technique to account for these missing hours in my resultset?
I can't creat a lookup table to reference as I don't have the necessary permissions to create a table on this DB.
Any help would be appreciated!
sys_calendar.calendar to generate the requested dates (change the range as needed)
WITH RECURSIVE to generate the hours
with recursive cte_hours (hr)
as
(
select 0 from (select 1) t(c)
union all select hr + 1 from cte_hours where hr < 23
)
select c.calendar_date as dt
,h.hr as hr
,zeroifnull(t.total_users) as total_users
from sys_calendar.calendar as c
cross join cte_hours as h
left join (select cast(end_time as date) as end_date
,extract(hour from end_time) as end_hour
,count(users) as total_users
from mytable t
group by end_date
,end_hour
) t
on t.end_date = c.calendar_date
and t.end_hour = h.hr
where c.calendar_date between current_date - 10 and current_date
order by dt,hr
;
For #GordonLinoff
select 0
0
select 1
1
select 0
union all
select 1
[3888] A SELECT for a UNION,INTERSECT or MINUS must reference a table.
select 0 from (select 1 as c) t
union all
select 1 from (select 1 as c) t
0
1
or
select 0 from (select 1) t(c)
union all
select 1 from (select 1) t(c)
0
1
If you want all hours from all days in the database, then you can generate the rows using cross join and then use left join to bring in results:
SELECT d.end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(t.users) AS total_users
FROM (select distinct CAST(end_time AS DATE) AS end_date from table) d CROSS JOIN
(select distinct EXTRACT(HOUR FROM end_time) AS end_hour from table) h LEFT JOIN
table t
ON t.end_date = d.end_date and t.end_hour = d.end_hour
GROUP BY e.end_date, h.end_hour;
If all hours are not represented, you can use an explicit list:
SELECT d.end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(t.users) AS total_users
FROM (select distinct CAST(end_time AS DATE) AS end_date from table) d CROSS JOIN
(select * from (select 0 as end_hour) t UNION ALL
select * from (select 1 as end_hour) t UNION ALL
. . .
) h LEFT JOIN
table t
ON t.end_date = d.end_date and t.end_hour = d.end_hour
GROUP BY e.end_date, h.end_hour;

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];