Teradata - How to account for missing hours in timestamp when using extract() function - sql

I have the following statement to extract the date, hour and number of users from a table in a Teradata DB . . .
SELECT
CAST(end_time AS DATE) AS end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(users) AS total_users
FROM table
GROUP BY end_date, end_hour
When using the extract() function, my resultset contains missing hours where there is no activity by users over a 24 hour period... I'm wondering is there any technique to account for these missing hours in my resultset?
I can't creat a lookup table to reference as I don't have the necessary permissions to create a table on this DB.
Any help would be appreciated!

sys_calendar.calendar to generate the requested dates (change the range as needed)
WITH RECURSIVE to generate the hours
with recursive cte_hours (hr)
as
(
select 0 from (select 1) t(c)
union all select hr + 1 from cte_hours where hr < 23
)
select c.calendar_date as dt
,h.hr as hr
,zeroifnull(t.total_users) as total_users
from sys_calendar.calendar as c
cross join cte_hours as h
left join (select cast(end_time as date) as end_date
,extract(hour from end_time) as end_hour
,count(users) as total_users
from mytable t
group by end_date
,end_hour
) t
on t.end_date = c.calendar_date
and t.end_hour = h.hr
where c.calendar_date between current_date - 10 and current_date
order by dt,hr
;
For #GordonLinoff
select 0
0
select 1
1
select 0
union all
select 1
[3888] A SELECT for a UNION,INTERSECT or MINUS must reference a table.
select 0 from (select 1 as c) t
union all
select 1 from (select 1 as c) t
0
1
or
select 0 from (select 1) t(c)
union all
select 1 from (select 1) t(c)
0
1

If you want all hours from all days in the database, then you can generate the rows using cross join and then use left join to bring in results:
SELECT d.end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(t.users) AS total_users
FROM (select distinct CAST(end_time AS DATE) AS end_date from table) d CROSS JOIN
(select distinct EXTRACT(HOUR FROM end_time) AS end_hour from table) h LEFT JOIN
table t
ON t.end_date = d.end_date and t.end_hour = d.end_hour
GROUP BY e.end_date, h.end_hour;
If all hours are not represented, you can use an explicit list:
SELECT d.end_date,
EXTRACT(HOUR FROM end_time) AS end_hour,
COUNT(t.users) AS total_users
FROM (select distinct CAST(end_time AS DATE) AS end_date from table) d CROSS JOIN
(select * from (select 0 as end_hour) t UNION ALL
select * from (select 1 as end_hour) t UNION ALL
. . .
) h LEFT JOIN
table t
ON t.end_date = d.end_date and t.end_hour = d.end_hour
GROUP BY e.end_date, h.end_hour;

Related

How to partition my data by a specific date and another identifier SQL

with cte as
(
select to_date('01-JUN-2020','DD-MON-YYYY')+(level-1) DT
from dual
connect bY level<= 30
)
select *
from cte x
left outer join
(select date from time where emp in (1, 2)) a on x.dt = a.date
In this scenario I am trying to find the missing days that these persons didn't report to work... it works well for 1 person. I get back their missing days correctly. But when I add 2 persons.. I do not get back the correct missing days for them because I'm only joining on date I guess.
I would like to know how I can partition this data by the persons id and date to be able get accurate days that each were missing.
Please help, thanks.
You would typically cross join the list of dates with the list of persons, and then use not exists to pull out the missing person/date tuples:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select distinct emp from times) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
Note that this uses a literal date rather than to_date(), which is more appropriate here.
This gives the missing tuples for all persons at once. If you want just for a predefined list of persons, then:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
where not exists (
select 1
from times t
where t.emp = e.emp and t.dt = e.date
)
If you want to also see the "presence" dates, then use a left join rather than not exists, as in your original query:
with cte as (
select date '2020-06-01' + level - 1 dt
from dual
connect by level <= 30
)
select c.dt, e.emp, -- enumerate the relevant columns from "t" here
from cte c
cross join (select 1 emp from dual union all select 2 from dual) e
left join times t on t.emp = e.emp and t.dt = e.date

How to return 0 value if no record exists in bigquery

I need find out for which date record does not exits in BigQuery table.
Query pls find
select cast(creat_ts as date) as create,IFNULL(count(*) ,0)
FROM table
where cast(creat_ts as date)='2020-06-23' group by 1 )
Below is for BigQuery Standard SQL
#standardSQL
SELECT DISTINCT day
FROM UNNEST(GENERATE_DATE_ARRAY('2020-06-01', '2020-06-30')) day
LEFT JOIN `project.dataset.table` t
ON CAST(creat_ts AS DATE) = day
WHERE creat_ts IS NULL
You could try something like this:
with calendar as (
select * from unnest(generate_date_array('2020-01-01', '2020-07-01', interval 1 day)) date
),
temp as (
select cast(b.create_ts as date) as date from `project.dataset.table` b
),
daily_count as (
select
date,
count(date.temp) as ct
from calendar
left join temp using(date)
group by 1
)
select * from daily_count
where ct = 0
order by 1

Windows functions orderen by date when some dates doesn't exist

Suppose this example query:
select
id
, date
, sum(var) over (partition by id order by date rows 30 preceding) as roll_sum
from tab
When some dates are not present on date column the window will not consider the unexistent dates. How could i make this windowns aggregation including these unexistent dates?
Many thanks!
You can join a sequence containing all dates from a desired interval.
select
*
from (
select
d.date,
q.id,
q.roll_sum
from unnest(sequence(date '2000-01-01', date '2030-12-31')) d
left join ( your_query ) q on q.date = d.date
) v
where v.date > (select min(my_date) from tab2)
and v.date < (select max(my_date) from tab2)
In standard SQL, you would typically use a window range specification, like:
select
id,
date,
sum(var) over (
partition by id
order by date
range interval '30' day preceding
) as roll_sum
from tab
However I am unsure that Presto supports this syntax. You can resort a correlated subquery instead:
select
id,
date,
(
select sum(var)
from tab t1
where
t1.id = t.id
and t1.date >= t.date - interval '30' day
and t1.date <= t.date
) roll_sum
from tab t
I don't think Presto support window functions with interval ranges. Alas. There is an old fashioned way to doing this, by counting "ins" and "outs" of values:
with t as (
select id, date, var, 1 as is_orig
from t
union all
select id, date + interval '30 day', -var, 0
from t
)
select id.*
from (select id, date, sum(var) over (partition by id order by date) as running_30,
sum(is_org) as is_orig
from t
group by id, date
) id
where is_orig > 0

SQL count display 0s using 1 table

I have a table called requests, and Im looking to count how many requests where the column rideId is not null each day during the last week. I have the following query:
Select count(*), dayname(time) as Day
from request
where time >= (select current_timestamp - interval 7 day) and rideId is not null
group by dayname(time)
order by dayofweek(Day);
How can I make it so it shows me those days where there is no request with rideId and count should be 0
Table is: Request(userId, time, rideId)
Move the not null check into your count, and join to a calendar table to bring in the missing days.
SELECT
t1.dname,
COALESCE(t2.numRides, 0) AS numRides
FROM
(
SELECT 'Monday' AS dname, 2 AS dow UNION ALL
SELECT 'Tuesday', 3 UNION ALL
SELECT 'Wednesday', 4 UNION ALL
SELECT 'Thursday', 5 UNION ALL
SELECT 'Friday', 6 UNION ALL
SELECT 'Saturday', 7 UNION ALL
SELECT 'Sunday', 1
) t1
LEFT JOIN
(
SELECT DAYNAME(time) AS dname, COUNT(rideId) AS numRides
FROM request
WHERE time >= DATE_SUB(CURDATE(),INTERVAL 7 DAY)
GROUP BY DAYNAME(time)
) t2
ON t1.dname = t2.dname
ORDER BY t1.dow;
Select a.day, coalesce(b.cnt, 0) as cnt
from (--select all days here) a
left join
(select dayname(time) as day, count(*) as cnt
from requests
where some_condition
group by day) b
using a.day = b.day
order by day;

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];