Getting minimum value at a given time in SQL - sql

I have the following SQL table:
start_time end_time value
2016-01-01 00:00:00 2016-01-01 08:59:59 1
2016-01-01 06:00:00 2016-01-01 14:59:59 2
2016-01-01 12:00:00 2016-01-01 17:59:59 1.5
2016-01-01 03:00:00 2016-01-01 17:59:59 3
I want to convert it into:
start_time end_time min_value
2016-01-01 00:00:00 2016-01-01 08:59:59 1
2016-01-01 09:00:00 2016-01-01 11:59:59 2
2016-01-01 12:00:00 2016-01-01 17:59:59 1.5
where min_value is the minimum value at a given point in time. Is it possible to do this in SQL?

Try below. I think it does exactly what you asked
As you can see - I added one more entry in your example to make it a little spicier :o)
WITH YourTable AS (
SELECT TIMESTAMP '2016-01-01 00:00:00' AS start_time, TIMESTAMP '2016-01-01 08:59:59' AS end_time, 1 AS value UNION ALL
SELECT TIMESTAMP '2016-01-01 06:00:00' AS start_time, TIMESTAMP '2016-01-01 14:59:59' AS end_time, 2 AS value UNION ALL
SELECT TIMESTAMP '2016-01-01 12:00:00' AS start_time, TIMESTAMP '2016-01-01 17:59:59' AS end_time, 1.5 AS value UNION ALL
SELECT TIMESTAMP '2016-01-01 03:00:00' AS start_time, TIMESTAMP '2016-01-01 17:59:59' AS end_time, 3 AS value UNION ALL
SELECT TIMESTAMP '2016-01-01 12:30:00' AS start_time, TIMESTAMP '2016-01-01 12:40:59' AS end_time, 1 AS value
),
Intervals AS (
SELECT iStart AS start_time, LEAD(iStart) OVER(ORDER BY iStart) AS end_time
FROM (
SELECT DISTINCT iStart FROM (
SELECT start_time AS iStart FROM YourTable UNION ALL
SELECT end_time AS iStart FROM YourTable )
)
),
Intervals_Mins AS (
SELECT b.start_time, b.end_time, MIN(value) AS min_value
FROM YourTable AS a
JOIN Intervals AS b
ON b.start_time BETWEEN a.start_time AND a.end_time
AND b.end_time BETWEEN a.start_time AND a.end_time
GROUP BY b.start_time, b.end_time
),
Intervals_Group AS (
SELECT start_time, end_time, min_value, IFNULL(SUM(flag) OVER(PARTITION BY CAST(min_value AS STRING) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS time_group
FROM (
SELECT start_time, end_time, min_value, IF(end_time = LEAD(start_time) OVER(PARTITION BY CAST(min_value AS STRING) ORDER BY start_time), 0, 1) AS flag
FROM Intervals_Mins
)
)
SELECT MIN(start_time) AS start_time, MAX(end_time) AS end_time, min_value
FROM Intervals_Group
GROUP BY min_value, time_group
-- ORDER BY start_time

Hmmm . . . This seems hard. I think the following strategy will work:
Break the data into two parts, for start times and end times.
For each start time calculate the minimum value in effect at that time.
For each end time, calculate the minimum value in effect starting at that time.
Recombine using a gaps-and-islands approach
I'm just not 100% sure you can do this in BQ, because it involves non-equijoins. But . . .
with starts as (
select start_time as time,
(select min(t2.value)
from t t2
where t.start_time between t2.start_time and t2.end_time
) as value
from t
),
ends as (
select end_time as time,
(select min(t2.value)
from t t2
where t2.end_time > t.end_time and
t2.start_time <= t.end_time
) as value
from t
)
select value, min(time), max(time)
from (select time,
row_number() over (order by time) as seqnum,
row_number() over (partition by value order by time) as seqnum_v
from ((select s.* from starts) union all
(select e.* from ends)
) t
) t
group by value, (seqnum - seqnum_v);

I'm not sure that I understand how the expected output relates to the input, but if you just want to associate the minimum value with distinct (start_time, end_time) pairs, you can do e.g.:
#standardSQL
WITH T AS (
SELECT TIMESTAMP '2016-01-01 00:00:00' AS start_time,
TIMESTAMP '2016-01-01 08:59:59' AS end_time, 1 AS value UNION ALL
SELECT TIMESTAMP '2016-01-01 06:00:00',
TIMESTAMP '2016-01-01 14:59:59', 2 UNION ALL
SELECT TIMESTAMP '2016-01-01 12:00:00',
TIMESTAMP '2016-01-01 17:59:59', 1.5 UNION ALL
SELECT TIMESTAMP '2016-01-01 3:00:00',
TIMESTAMP '2016-01-01 17:59:59', 3
)
SELECT
start_time,
end_time,
MIN(value) AS min_value
FROM T
GROUP BY start_time, end_time;

Related

Query to find and subtract two timestamps associated with the same identifier

I'm very new to BigQuery and not terribly familiar with SQL. I have a table of data that looks like this, where MyDate is a Timestamp object:
Row
MyDate
StateTransition
MyIdentifier
1
2022-09-23 00:08:00 UTC
Start
6371
2
2022-10-10 01:17:14 UTC
Finished
6371
3
2022-09-26 04:51:40 UTC
Start
7768
4
2022-10-05 03:44:32 UTC
Finished
7768
etc.
My query looks something like
SELECT *
FROM <my-data-source>
WHERE (StateTransition="Start" OR StateTransition="Finished")
ORDER BY MyIdentifier, MyDate
What I'm trying to do is calculate the elapsed time (in days) between the Start and Finished timestamps associated with each MyIdentifier, and to have that displayed in another column. It could look like:
Row
MyDate
StateTransition
MyIdentifier
ElapsedTime
1
2022-09-23 00:08:00 UTC
Start
6371
2
2022-10-10 01:17:14 UTC
Finished
6371
0.33
3
2022-09-26 04:51:40 UTC
Start
7768
4
2022-10-05 03:44:32 UTC
Finished
7768
0.04
Alternatively, it could even be flattened a little to something like:
Row
StartTransition
FinishedTransition
MyIdentifier
ElapsedTime
1
2022-09-23 00:08:00 UTC
2022-10-10 01:17:14 UTC
6371
0.33
2
2022-09-26 04:51:40 UTC
2022-10-05 03:44:32 UTC
7768
0.04
I've tried looking through the BigQuery docs and Stack Overflow but haven't found anything that addresses this use case of selecting items from multiple rows with a common identifier and then performing an operation on them. It seems like subtracting the two timestamps would be done with the TIMESTAMP_DIFF function.
Any assistance is appreciated!
WITH sample_table AS (
SELECT TIMESTAMP '2022-09-23 00:08:00 UTC' MyDate, 'Start' StateTransition, 6371 MyIdentifier, 'aaa' AS author UNION ALL
SELECT '2022-10-10 01:17:14 UTC' MyDate, 'Finished' StateTransition, 6371 MyIdentifier, 'bbb' AS author UNION ALL
SELECT '2022-09-26 04:51:40 UTC' MyDate, 'Start' StateTransition, 7768 MyIdentifier, 'ccc' AS author UNION ALL
SELECT '2022-10-05 03:44:32 UTC' MyDate, 'Finished' StateTransition, 7768 MyIdentifier, 'ccc' AS author
)
SELECT MyIdentifier,
ARRAY_AGG(author ORDER BY MyDate LIMIT 1)[SAFE_OFFSET(0)] AS author,
MIN(MyDate) AS StartTransition,
MAX(MyDate) AS FinishedTransition,
TIMESTAMP_DIFF(MAX(MyDate), MIN(MyDate), DAY) AS ElapsedTime,
FROM sample_table
WHERE (StateTransition="Start" OR StateTransition="Finished")
GROUP BY 1;
Query results
If Start and Finished has different author name and you want the name of Finished, you can use below instead.
ARRAY_AGG(author ORDER BY MyDate DESC LIMIT 1)[SAFE_OFFSET(0)] AS author,
For flattened result, you might consider below using an aggregation.
WITH sample_table AS (
SELECT TIMESTAMP '2022-09-23 00:08:00 UTC' MyDate, 'Start' StateTransition, 6371 MyIdentifier UNION ALL
SELECT '2022-10-10 01:17:14 UTC' MyDate, 'Finished' StateTransition, 6371 MyIdentifier UNION ALL
SELECT '2022-09-26 04:51:40 UTC' MyDate, 'Start' StateTransition, 7768 MyIdentifier UNION ALL
SELECT '2022-10-05 03:44:32 UTC' MyDate, 'Finished' StateTransition, 7768 MyIdentifier
)
SELECT MyIdentifier,
MIN(MyDate) AS StartTransition,
MAX(MyDate) AS FinishedTransition,
TIMESTAMP_DIFF(MAX(MyDate), MIN(MyDate), DAY) AS ElapsedTime,
FROM sample_table
WHERE (StateTransition="Start" OR StateTransition="Finished")
GROUP BY 1;
Query results
But for the intermediate result, we need a window function.
SELECT *,
IF(
StateTransition = 'Finished',
TIMESTAMP_DIFF(MyDate, FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w, DAY),
NULL
) AS ElapsedTime
FROM sample_table
WINDOW w AS (PARTITION BY MyIdentifier ORDER BY MyDate);
and if you want flattend result from the above result (using a window function), the query will looks like below which shows same result as the first query using an aggregation.
SELECT MyIdentifier,
FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w AS StartTransition,
MyDate AS FinishedTransition,
IF(
StateTransition = 'Finished',
TIMESTAMP_DIFF(MyDate, FIRST_VALUE(IF(StateTransition = 'Start', MyDate, NULL) IGNORE NULLS) OVER w, DAY),
NULL
) AS ElapsedTime
FROM sample_table
QUALIFY StateTransition = 'Finished'
WINDOW w AS (PARTITION BY MyIdentifier ORDER BY MyDate);
Consider below approach
select *, timestamp_diff(Transition_Finished, Transition_Start, day) as ElapsedTime
from your_table
pivot (max(MyDate) Transition for StateTransition in ('Start', 'Finished'))
if applied to sample data in your question - output is
Use below to test
WITH your_table AS (
SELECT TIMESTAMP '2022-09-23 00:08:00 UTC' MyDate, 'Start' StateTransition, 6371 MyIdentifier UNION ALL
SELECT '2022-10-10 01:17:14 UTC' MyDate, 'Finished' StateTransition, 6371 MyIdentifier UNION ALL
SELECT '2022-09-26 04:51:40 UTC' MyDate, 'Start' StateTransition, 7768 MyIdentifier UNION ALL
SELECT '2022-10-05 03:44:32 UTC' MyDate, 'Finished' StateTransition, 7768 MyIdentifier
)
select *, timestamp_diff(Transition_Finished, Transition_Start, day) as ElapsedTime
from your_table
pivot (max(MyDate) Transition for StateTransition in ('Start', 'Finished'))
I think that for each MyIdentifier you should have only one start and one finish, so you can simply split and join:
;WITH
ts AS ( SELECT * FROM <my-data-source> WHERE StateTransition = 'Start'),
tf AS ( SELECT * FROM <my-data-source> WHERE StateTransition = 'Finished')
SELECT
ts.MyIdentifier,
ts.MyDate StartTransition,
tf.MyDate FinishedTransition,
TIMESTAMP_DIFF(ts.MyDate, tf.MyDate, DAY) ElapsedTime
FROM ts
LEFT JOIN tf on ts.MyIdentifier = tf.MyIdentifier
UPDATE
If you have more start and finish for each identifier you need to choice which one to keep, I will assume you want to keep the 1st start and the last finish:
;WITH
ts AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MyIdentifier ORDER BY MyDate) n
FROM <my-data-source> WHERE StateTransition = 'Start'),
tf AS (
SELECT *,
ROW_NUMBER() over (partition by MyIdentifier ORDER BY MyDate DESC) n
FROM <my-data-source> WHERE StateTransition = 'Finished')
SELECT
ts.MyIdentifier,
ts.MyDate StartTransition,
tf.MyDate FinishedTransition,
TIMESTAMP_DIFF(ts.MyDate, tf.MyDate, DAY) ElapsedTime
FROM ts
LEFT JOIN tf on ts.MyIdentifier = tf.MyIdentifier and tf.n=1
WHERE ts.n=1

SQL join on timestamp differences from the same table

I'm not sure how to write this SQL query in BigQuery. I have a table of events with names and timestamps. Let's say I have only two events in the table: A and B. What I want to do is query the table to get all instances of event A, and get the next closest occurrence of B and create a new column with the time difference. B will always happen after A.
For example if I had a table that looks like:
A1 | 1:00 pm
B5 | 2:00 pm
A3 | 3:00 pm
B9 | 5:00 pm
My resultant table would be:
A1 | 1 hour
A3 | 2 hours
The query I came up with is the following:
SELECT
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
This works fine for getting the table I wanted above, but I would also like to include an additional column from the subquery. Something that looks like:
A1 | 1 hour | B5Column
A3 | 2 hours | B9Column
I attempted at using the query below:
SELECT
(SELECT
sub.SubQueryColumn
FROM table sub
WHERE sub.time > main.time
ORDER BY sub.time asc
LIMIT 1) SubColumn,
CAST(TIMESTAMP_DIFF((SELECT MIN(sub.time)
FROM table sub
WHERE sub.time > main.time), main.time, SECOND) AS INT64) duration
FROM table main
but it did not work. The error I get is
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Could I get some help with this?
Here is one method:
select m.*,
timestamp_diff(time, next_b_time, second) as duration
from (select m.*,
min(case when event like 'B%' then time end) over (order by time desc) as next_b_time
from main m
) m
where event like 'A%';
Below is for BigQuery Standard SQL
#standardSQL
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
-- ORDER BY time
You can test / play with it using dummy data from your question as below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time,
LEAD(time) OVER(PARTITION BY grp ORDER BY time) b_time,
LEAD(event) OVER(PARTITION BY grp ORDER BY time) b_event
FROM (
SELECT *,
COUNTIF(STARTS_WITH(event, 'A')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
with result as
Row event duration b_event
1 A1 3600 B5
2 A3 7200 B9
Please note: above solution rely on statement in your question - B will always happen after A so if you have sequence as below
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
result will be
Row event duration b_event
1 A1 null null
2 A2 1800 B5
3 A3 7200 B9
If you need to address this - try below
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 'A1' event, TIMESTAMP '2018-01-01 1:00:00' time UNION ALL
SELECT 'A2', TIMESTAMP '2018-01-01 1:30:00' UNION ALL
SELECT 'B5', TIMESTAMP '2018-01-01 2:00:00' UNION ALL
SELECT 'A3', TIMESTAMP '2018-01-01 3:00:00' UNION ALL
SELECT 'B9', TIMESTAMP '2018-01-01 5:00:00'
)
SELECT event, TIMESTAMP_DIFF(b_time, time, SECOND) duration, b_event
FROM (
SELECT event, time, type, grp,
FIRST_VALUE(event) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_event,
FIRST_VALUE(time) OVER(ORDER BY grp RANGE BETWEEN 1 FOLLOWING AND 1 FOLLOWING) b_time
FROM (
SELECT event, time, SUBSTR(event, 1, 1) type,
COUNTIF(STARTS_WITH(event, 'B')) OVER(ORDER BY time) grp
FROM `project.dataset.your_table` t
)
)
WHERE STARTS_WITH(event, 'A')
ORDER BY time
this version will return
Row event duration b_event
1 A1 3600 B5
2 A2 1800 B5
3 A3 7200 B9

How to combine multiple SELECTs into a single SELECT by a common column in (BigQuery) SQL?

Given I have multiple tables in BigQuery, hence I have multiple SQL-statements that gives me "the number of X per day". For example:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as installs
FROM database.table1
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | installs |
-------------------------
| 2017-01-01 | 11 |
| 2017-01-02 | 22 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as uninstalls
FROM database.table2
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | uninstalls |
---------------------------
| 2017-01-02 | 22 |
| 2017-01-03 | 33 |
etc
Another statement:
SELECT FORMAT_TIMESTAMP("%F",timestamp) AS day, COUNT(*) as cases
FROM database.table3
GROUP BY day
ORDER BY day ASC
Which would give the result:
| day | cases |
----------------------
| 2017-01-01 | 11 |
| 2017-01-03 | 33 |
etc
etc
Now I need to combine all these into a single SELECT statement that gives the following results:
| day | installs | uninstalls | cases |
----------------------------------------------
| 2017-01-01 | 11 | 0 | 11 |
| 2017-01-02 | 22 | 22 | 0 |
| 2017-01-03 | 0 | 33 | 33 |
etc
Is this even possible?
Or what's the closest SQL-statement I can write that would give me a similar result?
Any feedback is appreciated!
Here is a self-contained example that might help to get you started. It uses two dummy tables, InstallEvents and UninstallEvents, which contain timestamps for the respective actions. It creates a common table expression called StartAndEnd that computes the minimum and maximum dates for these events in order to decide which dates to aggregate over, then unions the contents of the InstallEvents and UninstallEvents, counting the events for each day.
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
),
StartAndEnd AS (
SELECT MIN(DATE(timestamp)) AS min_date, MAX(DATE(timestamp)) AS max_date
FROM (
SELECT * FROM InstallEvents UNION ALL
SELECT * FROM UninstallEvents
)
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY(
(SELECT min_date FROM StartAndEnd),
(SELECT max_date FROM StartAndEnd)
)) AS day
GROUP BY day
ORDER BY day;
If you know what the start and end dates are in advance, you can hard-code them in the query instead and then omit the StartAndEnd CTE:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT
day,
COUNTIF(is_install AND DATE(timestamp) = day) AS installs,
COUNTIF(NOT is_install AND DATE(timestamp) = day) AS uninstalls
FROM (
SELECT *, true AS is_install
FROM InstallEvents UNION ALL
SELECT *, false
FROM UninstallEvents
)
CROSS JOIN UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-01-04')) AS day
GROUP BY day
ORDER BY day;
To see the events in the sample data, use a query that unions the contents:
WITH InstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-01 00:00:00', INTERVAL x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 100)) AS x
),
UninstallEvents AS (
SELECT TIMESTAMP_ADD('2017-01-02 00:00:00', INTERVAL 2 * x HOUR) AS timestamp
FROM UNNEST(GENERATE_ARRAY(0, 50)) AS x
)
SELECT timestamp, true AS is_install
FROM InstallEvents UNION ALL
SELECT timestamp, false
FROM UninstallEvents;
Below is for BigQuery Standard SQL
#standardSQL
WITH calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
-- ORDER BY day
Please note: you are using timestamp as a column name which is not the best practice as it is keyword, so in my example i leave your naming but consider to change this!
You can test / play this solution with below dummy data
#standardSQL
WITH `database.table1` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs
UNION ALL SELECT TIMESTAMP '2017-01-01', 22
),
`database.table2` AS (
SELECT TIMESTAMP '2016-12-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
`database.table3` AS (
SELECT TIMESTAMP '2017-01-01' AS timestamp, 1 AS installs UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL SELECT TIMESTAMP '2017-01-01', 22 UNION ALL
SELECT TIMESTAMP '2017-01-10', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22 UNION ALL SELECT TIMESTAMP '2017-01-02', 22
),
calendar AS (
SELECT day
FROM (
SELECT MIN(min_day) AS min_day, MAX(max_day) AS max_day
FROM (
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table1` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table2` UNION ALL
SELECT MIN(DATE(timestamp)) AS min_day, MAX(DATE(timestamp)) AS max_day FROM `database.table3`
)
), UNNEST(GENERATE_DATE_ARRAY(min_day, max_day, INTERVAL 1 DAY)) AS day
)
SELECT
c.day AS day,
IFNULL(SUM(installs), 0) AS installs,
IFNULL(SUM(uninstalls), 0) AS uninstalls,
IFNULL(SUM(cases),0) AS cases
FROM calendar AS c
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) installs FROM `database.table1` GROUP BY day) t1 ON t1.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) uninstalls FROM `database.table2` GROUP BY day) t2 ON t2.day = c.day
LEFT JOIN (SELECT DATE(timestamp) day, COUNT(1) cases FROM `database.table3` GROUP BY day) t3 ON t3.day = c.day
GROUP BY day
HAVING installs + uninstalls + cases > 0
ORDER BY day
I am not very familiar with bigquery, so this is probably not going to be a copy-paste answer.
You'll first have to build a calander table to make sure you have all dates. Here's an example for sql server. There are probably examples for bigquery available as well. The following assumes a Calander table with Date attribute in timestamp.
Once you have your calander table you can join all your tables to that:
SELECT FORMAT_TIMESTAMP("%F",C.Date) AS day
, COUNT(T1.DATE(T1.TIMESTAMP)) AS installs --Here you could also use your FORMAT_TIMESTAMP
, COUNT(T1.DATE(T2.TIMESTAMP)) AS uninstalls
FROM Calander C
LEFT JOIN database.table1 T1
ON DATE(T1.TIMESTAMP) = DATE(C.Date) --Convert to date to remove times, you could also use your FORMAT_TIMESTAMP
LEFT JOIN database.table2 T2
ON DATE(T2.TIMESTAMP) = DATE(C.Date)
GROUP BY day
ORDER BY day ASC

SQL to calculate difference between 2 latest recent values by event_types

The events table looks like
event_type value timestamp
2 2 06-06-2016 14:00:00
2 7 06-06-2016 13:00:00
2 2 06-06-2016 12:00:00
3 3 06-06-2016 14:00:00
3 9 06-06-2016 13:00:00
4 9 06-06-2016 13:00:00
My goal is to filter event types that occur more than twice and subtract most two recent values and shows BY event_type.
The end result would be
event_type value
2 -5
3 -6
I was able to get filter events occurred more than twice and order by event_type based on timestamp desc.
The difficult part for me is to subtract most two recent values and shows BY event_type.
DB / SQL experts , please help
You can use a query like this:
SELECT event_type, diff
FROM (
SELECT event_type, value, "timestamp", rn,
value - LEAD(value) OVER (PARTITION BY event_type
ORDER BY "timestamp" DESC) AS diff
FROM (
SELECT event_type, value, "timestamp",
COUNT(*) OVER (PARTITION BY event_type) AS cnt,
ROW_NUMBER() OVER (PARTITION BY event_type ORDER BY "timestamp" DESC) AS rn
FROM mytable) AS t
WHERE cnt >=2 AND rn <= 2 ) AS s
WHERE rn = 1
The innermost subquery uses:
Window function COUNT with PARTITION BY clause, so as to calculate the population of each event_type slice.
Window function ROW_NUMBER so as to get the two latest records within each event_type slice.
The mid-level query uses LEAD window function, so as to calculate the difference between the first and the second records. The outermost query simply returns this difference.
Demo here
This example only for Oracle.
Test data:
with t(event_type,
value,
timestamp) as
(select 2, 2, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 7, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 2, 2, to_timestamp('06-06-2016 12:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 3, to_timestamp('06-06-2016 14:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 3, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual
union all
select 4, 9, to_timestamp('06-06-2016 13:00:00', 'mm-dd-yyyy hh24:mi:ss')
from dual)
Query:
select event_type,
max(value) keep(dense_rank first order by rn) - max(value) keep(dense_rank last order by rn) as value
from (select event_type,
row_number() over(partition by event_type order by timestamp desc) rn,
value
from t) t
where rn in (1, 2)
group by event_type
having count (*) >= 2

Selecting window of entries

Consider the following (simplyfied) table:
ID NUMBER
PROD_NO VARCHAR2(10)
START_TIME DATE
What I want to do is selecting a 'window' of rows of size n around a given START_TIME.
Example:
ID PROD_NO START_TIME
...
42 1234567 2012-02-28 13:42:10
43 1234568 2012-02-28 13:47:53
44 1234569 2012-02-28 13:52:22
45 1234570 2012-02-28 13:59:01
46 1234571 2012-02-28 14:02:12
47 1234572 2012-02-28 14:05:19
...
Provided START_TIME = '2012-02-28 14:00:00' and window size n = 4 the resulting set of rows should be ID 44...47.
The entries cannot be assumed to be sorted by START_TIME. In case there are not enough entries available to match the specified window size, it may be cropped.
Since my SQL skills are pretty limited any help would be greatly appreciated.
Thanks in advance.
You can use analytic functions to help with this:
select WT.ID
from (select WT.ID
,max(
START_TIME)
over (order by START_TIME
rows between 2 preceding and 2 following)
as MAXST
,min(
START_TIME)
over (order by START_TIME
rows between 2 preceding and 2 following)
as MINST
from WT) WT
where MINST < to_date('2012-02-28 14:00:00', 'yyyy-mm-dd hh24:mi:ss')
and MAXST > to_date('2012-02-28 14:00:00', 'yyyy-mm-dd hh24:mi:ss')
This should work now:
SELECT *
FROM (SELECT id,
prod_no,
start_time,
ROWNUM rn,
datediff
FROM (SELECT id,
prod_no,
start_time,
start_time
- TO_DATE('01-JAN-2011 12:00:00',
'DD-MON-YYYY HH:MI:SS AM')
datediff
FROM table
WHERE start_time
- TO_DATE('01-JAN-2011 12:00:00',
'DD-MON-YYYY HH:MI:SS AM') > 0
ORDER BY datediff))
WHERE rn <= 2
UNION ALL
SELECT *
FROM (SELECT id,
prod_no,
start_time,
ROWNUM rn,
datediff
FROM (SELECT id,
prod_no,
start_time,
start_time
- TO_DATE('01-JAN-2011 12:00:00',
'DD-MON-YYYY HH:MI:SS AM')
datediff
FROM table
WHERE start_time
- TO_DATE('01-JAN-2011 12:00:00',
'DD-MON-YYYY HH:MI:SS AM') <= 0
ORDER BY datediff DESC))
WHERE rn <= 2