Identify missing hours - find the gaps in time - sql

I have a table with hours, but there are gaps. I need to find which are the missing hours.
select datehour
from stored_hours
order by 1;
The gaps in this timeline are easy to find:
select lag(datehour) over(order by datehour) since, datehour until
, timestampdiff(hour, lag(datehour) over(order by datehour), datehour) - 1 missing
from stored_hours
qualify missing > 0
How can I create a list of the missing hours during these days?
(with Snowflake and SQL)

To create a list/table of the missing hours:
Generate a list of all the hours between the min/max of the existing table.
To generate that list with Snowflake you will need to use session variables (as the generator only takes constants for the length.
Then find the missing hours with a left join, looking for nulls.
Use variables to find out the start and total number of hours:
set (min_hour, total_hours) = (
select min(datehour) min_hour
, timestampdiff('hour', min(datehour), max(datehour)) total_hours
from stored_hours
);
Then do the left join with a generated table of all hours, to find the missing ones:
select generated_hour missing_hour
from ( -- generated hours
select timestampadd('hour', row_number() over(order by 0), $min_hour) generated_hour
from table(generator(rowcount => $total_hours))
) a
left outer join stored_hours b
on generated_hour=b.datehour
where datehour is null;
The result is a list of the missing hours:
(you could apply a similar technique for missing days, if the input are dates)

Related

Impala get the difference between 2 dates excluding weekends

I'm trying to get the day difference between 2 dates in Impala but I need to exclude weekends.
I know it should be something like this but I'm not sure how the weekend piece would go...
DATEDIFF(resolution_date,created_date)
Thanks!
One approach at such task is to enumerate each and every day in the range, and then filter out the week ends before counting.
Some databases have specific features to generate date series, while in others offer recursive common-table-expression. Impala does not support recursive queries, so we need to look at alternative solutions.
If you have a table wit at least as many rows as the maximum number of days in a range, you can use row_number() to offset the starting date, and then conditional aggregation to count working days.
Assuming that your table is called mytable, with column id as primary key, and that the big table is called bigtable, you would do:
select
t.id,
sum(
case when dayofweek(dateadd(t.created_date, n.rn)) between 2 and 6
then 1 else 0 end
) no_days
from mytable t
inner join (select row_number() over(order by 1) - 1 rn from bigtable) n
on t.resolution_date > dateadd(t.created_date, n.rn)
group by id

Trying to UNNEST timestamp array field, but need to GROUP BY

I have a repeated field of type TIMESTAMP in a BigQuery table. I am attempting to UNNEST this field. However, I must group or aggregate the field in order. I am not knowledgable with SQL, so I could use some help. The code snippet is part of a larger query that works when substituting subscription.future_renewal_dates with GENERATE_TIMESTAMP_ARRAY
subscription.future_renewal_dates is ARRAY<TIMESTAMP>
The TIMESTAMP array is unique (recurring subscriptions) and cannot be generated using GENERATE_TIMESTAMP_ARRAY, so I have to generate the dates before uploading to BigQuery. UDF is too much.
SELECT
subscription.amount AS subscription_amount,
subscription.status AS subscription_status,
"1" AS analytic_name,
ARRAY (
SELECT
AS STRUCT FORMAT_TIMESTAMP("%x", days) AS type_value, subscription.amount AS analytic_name
FROM
UNNEST(subscription.future_renewal_dates) as days
WHERE
(
days >= TIMESTAMP("2019-06-05T19:30:02+00:00")
AND days <= TIMESTAMP("2019-08-01T03:59:59+00:00")
)
) AS forecast
FROM
`mydataset.subscription` AS subscription
GROUP BY
subscription_amount,
subscription_status,
analytic_name
Cannot figure out how to successfully unnest subscription.future_renewal_dates without error 'UNNEST expression references subscription.future_renewal_dates which is neither grouped nor aggregated'
When you do GROUP BY - all expressions, columns in the SELECT (except those in GROUP BY list) should be used with some aggregation function - which you clearly do not have. So you need to decide what it is that you actually trying to achieve here with that grouping
Below is the option I think you had in mind - though it can be different - but at least you have an idea on how to fix it
SELECT
subscription.amount AS subscription_amount,
subscription.status AS subscription_status,
"1" AS analytic_name,
ARRAY_CONCAT_AGG( ARRAY (
SELECT
AS STRUCT FORMAT_TIMESTAMP("%x", days) AS type_value, subscription.amount AS analytic_name
FROM
UNNEST(subscription.future_renewal_dates) as days
WHERE
(
days >= TIMESTAMP("2019-06-05T19:30:02+00:00")
AND days <= TIMESTAMP("2019-08-01T03:59:59+00:00")
)
)) AS forecast
FROM
`mydataset.subscription` AS subscription
GROUP BY
subscription_amount,
subscription_status,
analytic_name

IBM DB2: Generate list of dates between two dates

I need a query which will output a list of dates between two given dates.
For example, if my start date is 23/02/2016 and end date is 02/03/2016, I am expecting the following output:
Date
----
23/02/2016
24/02/2016
25/02/2016
26/02/2016
27/02/2016
28/02/2016
29/02/2016
01/03/2016
02/03/2016
Also, I need the above using SQL only (without the use of 'WITH' statement or tables). Please help.
I am using ,ostly DB2 for iSeries, so I will give you an SQL only solution that works on it. Currently I don't have an access to the server, so the query is not tested but it should work. EDIT Query is already tested and working
SELECT
d.min + num.n DAYS
FROM
-- create inline table with min max date
(VALUES(DATE('2015-02-28'), DATE('2016-03-01'))) AS d(min, max)
INNER JOIN
-- create inline table with numbers from 0 to 999
(
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) AS num
ON
d.min + num.n DAYS<= d.max
ORDER BY
num.n;
if you don't want to execute the query only once, you should consider creating a real table with values for the loop:
CREATE TABLE dummy_loop AS (
SELECT
n1.n + n10.n + n100.n AS n
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) WITH DATA;
ALTER TABLE dummy_loop ADD PRIMARY KEY (dummy_loop.n);
It depends on the reason for which you like to use it, but you could even create table for lets say for 100 years. It will be only 100*365 = 36500 rows with just a date field, so the table will be quite small and fast for joins.
CREATE TABLE dummy_dates AS (
SELECT
DATE('1970-01-01') + (n1.n + n10.n + n100.n) DAYS AS date
FROM
(VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS n1(n)
CROSS JOIN
(VALUES(0),(10),(20),(30),(40),(50),(60),(70),(80),(90)) AS n10(n)
CROSS JOIN
(VALUES(0),(100),(200),(300),(400),(500),(600),(700),(800),(900)) AS n100(n)
) WITH DATA;
ALTER TABLE dummy_dates ADD PRIMARY KEY (dummy_dates.date);
And the select query could look like:
SELECT
*
FROM
dummy_days
WHERE
date BETWEEN(:startDate, :endDate);
EDIT 2: Thanks to #Lennart suggestion I have changed TABLE(VALUES(..,..,..)) to VALES(..,..,..) because as he said TABLE is a synonym to LATERAL that was a real surprise for me.
EDIT 3: Thanks to #godric7gt I have removed TIMESTAMPDIFF and will remove from all my scripts, because as it is said in the documentation:
These assumptions are used when converting the information in the second argument, which is a timestamp duration, to the interval type specified in the first argument. The returned estimate may vary by a number of days. For example, if the number of days (interval 16) is requested for the difference between '1997-03-01-00.00.00' and '1997-02-01-00.00.00', the result is 30. This is because the difference between the timestamps is 1 month, and the assumption of 30 days in a month applies.
It was a real surprise, because I was always trust this function for days difference.
For generating rows recusive SQL will needed.
Usually this looks like this in DB2:
with temp (date) as (
select date('23.02.2016') as date from sysibm.sysdummy1
union all
select date + 1 day from temp
where date < date('02.03.2016')
)
select * from temp
For whatever reason a CTE (using WITH) should be avoided.
A possible workaround would be setting
db2set DB2_COMPATIBILITY_VECTOR=8
which enables the use of the Oracle style recusion with CONNECT BY
SELECT date('22.02.2016') + level days as dt
FROM sysibm.sysdummy1 CONNECT BY date('22.02.2016') + level days <= date('02.03.2016')
Please note: after setting the DB2_COMPATIBILITY_VECTOR a instance restart is necessary.
This solution doesn't use WITH, but it does use WHILE and a temp table...hopefully that meets your needs still?
EDIT -- I built this in SSMS 2014
DECLARE #Start DATE
DECLARE #End DATE
SET #Start = '2016-02-23'
SET #End = '2016-03-02'
CREATE TABLE #Dates ([Date] DATE)
WHILE #Start <= #End
BEGIN
INSERT INTO #Dates
SELECT #Start
SET #Start = DATEADD(Day,1,#Start)
END
SELECT * FROM #Dates
DROP TABLE #Dates
I assume AS400 does not support recursive CTE's, and that's why you want a solution without them. I have no clue whether it supports any of the following constructions, but it might be worth a shot. First we will need a generator, any table with a sufficient number of rows will do. If you don't have a table large enough for the number of days you want you can create a cartesian product. Example:
select row_number() over ()
from a_table
cross join a_table
Another way of extending the domain is to create the powerset of a table using group by cube, see below.
Assume we one way or another can create a large enough set of rows. You can generate the dates like:
select date('23/02/2016') + n days
from (
select row_number() over () as n
from a_table
) as t
where n < 100
order by n
If for some reason you don't want to use an existing table, group by cube will produce a relation with a cardinality equal to the power set of the attributes. Here I use 4 columns which will generate 16 rows.
select date('2016-01-01') + row_number() over () days
from sysibm.dual x
group by cube(x.dummy, x.dummy, x.dummy, x.dummy)
If you want to generate say 100 rows you need 7 (since 2^7=128) attributes in the group by cube clause and a fetch first 100 rows:
select date('2016-01-01') + row_number() over () days
from sysibm.dual x
group by cube(x.dummy, x.dummy, x.dummy, x.dummy, x.dummy, x.dummy, x.dummy)
order by 1
fetch first 100 rows only

generate each minute string for a day within specified time limit

My aim is to generate per minute count of all records existing in a table like this.
SELECT
COUNT(*) as RECORD_COUNT,
to_Char(MY_DATE,'HH24:MI') MINUTE_GAP
FROM
TABLE_A
WHERE
BLAH='Blah! Blah!!'
GROUP BY
to_Char(MY_DATE,'HH24:MI')
However, This query doesn't give me the minutes where there were no results.
To get the desired result it, I'm to using the following query to fill the gaps in the original query by doing a JOIN between these two results.
SELECT
*
FROM
( SELECT
TO_CHAR(TRUNC(SYSDATE)+( (ROWNUM-1) /1440) ,'HH24:MI') as MINUTE_GAP,
0 as COUNT
FROM
SOME_LARGE_TABLE_B
WHERE
rownum<=1440
)
WHERE
minute_gap>'07:00' /*I want only the data starting from 7:00AM*/
This works for me, But
I can't rely on SOME_LARGE_TABLE_B to generate the minutes
because it might have no records at some point in future
The query doesn't look like a professional solution.
Is there any easier way to do this?
NOTE:I don't want any new tables created with static values for all the minutes just for one query.
Just generate your timestamps and left join your grouped data to it:
SELECT MINUTE, ....
FROM (
SELECT TO_CHAR(TO_DATE((LEVEL + 419) * 60, 'SSSSS'), 'HH24:MI') MINUTE /* 07:00 - 23:59 */ FROM DUAL CONNECT BY LEVEL <= 1020)
LEFT JOIN (
<your grouped subquery>
) ON MINUTE = MINUTE_GAP

SQL Average Inter-arrival Time, Time Between Dates

I have a table with sequential timestamps:
2011-03-17 10:31:19
2011-03-17 10:45:49
2011-03-17 10:47:49
...
I need to find the average time difference between each of these(there could be dozens) in seconds or whatever is easiest, I can work with it from there. So for example the above inter-arrival time for only the first two times would be 870 (14m 30s). For all three times it would be: (870 + 120)/2 = 445 (7m 25s).
A note, I am using postgreSQL 8.1.22 .
EDIT: The table I mention above is from a different query that is literally just a one-column list of timestamps
Not sure I understood your question completely, but this might be what you are looking for:
SELECT avg(difference)
FROM (
SELECT timestamp_col - lag(timestamp_col) over (order by timestamp_col) as difference
FROM your_table
) t
The inner query calculates the distance between each row and the preceding row. The result is an interval for each row in the table.
The outer query simply does an average over all differences.
i think u want to find avg(timestamptz).
my solution is avg(current - min value). but since result is interval, so add it to min value again.
SELECT avg(target_col - (select min(target_col) from your_table))
+ (select min(target_col) from your_table)
FROM your_table
If you cannot upgrade to a version of PG that supports window functions, you
may compute your table's sequential steps "the slow way."
Assuming your table is "tbl" and your timestamp column is "ts":
SELECT AVG(t1 - t0)
FROM (
-- All this silliness would be moot if we could use
-- `` lead(ts) over (order by ts) ''
SELECT tbl.ts AS t0,
next.ts AS t1
FROM tbl
CROSS JOIN
tbl next
WHERE next.ts = (
SELECT MIN(ts)
FROM tbl subquery
WHERE subquery.ts > tbl.ts
)
) derived;
But don't do that. Its performance will be terrible. Please do what
a_horse_with_no_name suggests, and use window functions.