In a table containing rows of date ranges, from each row, generate one row per day containing hours of utilization - sql

Given a table with rows like:
+----+-------------------------+------------------------+
| ID | StartDate | EndDate |
+----+-------------------------+------------------------+
| 1 | 2016-02-05 20:00:00.000 | 2016-02-07 5:00:00.000 |
+----+-------------------------+------------------------+
I want to produce a table like this:
+----+------------+----------+
| ID | Date | Duration |
+----+------------+----------+
| 1 | 2016-02-05 | 4 |
| 1 | 2016-02-06 | 24 |
| 1 | 2016-02-07 | 5 |
+----+------------+----------+
This is an interview-style question. I am wondering how I can go about tackling this. Is it possible to do this with just standard SQL query syntax? Or is a procedural language like pl/pgSQL required to do a query like this?

The basic idea is this:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate, interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
That adds an extra hour, so this is more accurate:
SELECT date_trunc('day', dayhour) as dd,count(*)
FROM (VALUES (1, '2016-02-05 20:00:00.000'::timestamp, '2016-02-07 5:00:00.000'::timestamp)
) v(ID, StartDate, EndDate), lateral
generate_series(StartDate, EndDate - interval '1 hour', interval '1 hour') g(dayhour)
GROUP BY dd
ORDER BY dd;
Technically, the lateral is not needed (and in that case, I would replace the comma with cross join). However, this is an example of a lateral join, so being explicit is good.
I should also note that the above is the simplest method. However, the group by does slow down the query. There are other methods that don't require generating a series for every hour.

Related

How can i split 1 row into multiply rows in SQL

I want to split something like this:
Value | Startdate | Enddate
XXXX | 2.July | 16 August
Into this:
Value | Startdate | Enddate
XXXX | 2.July | 31 July
XXXX | 1.August | 16 August
The value is not important for now.
If I understand correctly, you want to split your range into different months. A convenient method uses generate_series():
select value, greatest(startdate, gs.mon), least(enddate, gs.mon + interval '1 month - 1 day')
from t cross join lateral
generate_series(date_trunc('month', startdate), date_trunc('month', enddate), interval '1 month'
) gs(mon)
Here is a db<>fiddle.

Create query returning field containing dates of last 365 days

I'm using AWS QuickSight to build a dashboard with analytics and metrics related to the usage of a system. I'm trying to visualize user's registration over time. I've created a parameter and control on my dashboard that allows the dashboard user to select 'Last N days' (7, 30, 60, 90, 180, 365 days), and I have an associated line chart that will plot the related data.
However the issue is that there are some days where no user's registered, and that leaves gaps of seemingly unreported data (in the line chart). What I would like to do is JOIN my current query on day with a query that returns a single field each row containing the last 365 days.
Select count (DISTINCT id), date_trunc('day', created_at) as day
FROM users
GROUP BY day
ORDER BY day desc
To get date instead of numbers you can use below query:
Query:
with recursive date_range(day,daycount) as
(
SELECT '1 Jan 2020'::date as DAY, 1 as daycount
UNION ALL
SELECT day+1, daycount+1 from date_range WHERE daycount<365
)select day from date_range
Output:
| day |
| :--------- |
| 2020-01-01 |
| 2020-01-02 |
| 2020-01-03 |
.
.
.
.
| 2020-12-28 |
| 2020-12-29 |
| 2020-12-30 |
db<fiddle here
You can use recursive common table expression to generate that. Then you just can join that cte with your table. Please check out below code.
with recursive date_range(day) as
(
SELECT 1 as day
UNION ALL
SELECT day+1
from date_range
WHERE day < 365
)select DATE_TRUNC('day', NOW() - concat(day,' days')::interval ) as date from date_range
Output:
|date |
------------------------
|2021-06-10 00:00:00+01|
|2021-06-09 00:00:00+01|
|2021-06-08 00:00:00+01|
|2021-06-07 00:00:00+01|
|2021-06-06 00:00:00+01|
|2021-06-05 00:00:00+01|
|2021-06-04 00:00:00+01|
|2021-06-03 00:00:00+01|
|2021-06-02 00:00:00+01|
|2021-06-01 00:00:00+01|
...
db<fiddle here

Get a rolling count of timestamps in SQL

I have a table (in an Oracle DB) that looks something like what is shown below with about 4000 records. This is just an example of how the table is designed. The timestamps range for several years.
| Time | Action |
| 9/25/2019 4:24:32 PM | Yes |
| 9/25/2019 4:28:56 PM | No |
| 9/28/2019 7:48:16 PM | Yes |
| .... | .... |
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval. I would like this done by looking at each timestamp and getting a count of timestamps that appear within 15 minutes of that timestamp.
My goal would to have something like
| Interval | Count |
| 9/25/2019 4:24:00 PM - 9/25/2019 4:39:00 | 2 |
| 9/25/2019 4:25:00 PM - 9/25/2019 4:40:00 | 2 |
| ..... | ..... |
| 9/25/2019 4:39:00 PM - 9/25/2019 4:54:00 | 0 |
I am not sure how I would be able to do this, if at all. Any ideas or advice would be much appreciated.
If you want any 15 minute interval in the data, then you can use:
select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t;
If you want the maximum, then use rank() on this:
select t.*
from (select t.*, rank() over (order by cnt_15 desc) as seqnum
from (select t.*,
count(*) over (order by timestamp
range between interval '15' minute preceding and current row
) as cnt_15
from t
) t
) t
where seqnum = 1;
This doesn't produce exactly the results you specify in the query. But it does answer the question:
I want to be able to get a count of timestamps that occur on a rolling 15 minute interval. My main goal is to identify the maximum number of timestamps that appear for any 15 minute interval.
You could enumerate the minutes with a recursive query, then bring the table with a left join:
with recursive cte (start_dt, max_dt) as (
select trunc(min(time), 'mi'), max(time) from mytable
union all
select start_dt + interval '1' minute, max_dt from cte where start_dt < max_dt
)
select
c.start_dt,
c.start_dt + interval '15' minute end_dt,
count(t.time) cnt
from cte c
left join mytable t
on t.time >= c.start_dt
and t.time < c.start_dt + interval '15' minute
group by c.start_dt

How to write a SQL statement to sum data using group by the same day of every two neighboring months

I have a data table like this:
datetime data
-----------------------
...
2017/8/24 6.0
2017/8/25 5.0
...
2017/9/24 6.0
2017/9/25 6.2
...
2017/10/24 8.1
2017/10/25 8.2
I want to write a SQL statement to sum the data using group by the 24th of every two neighboring months in certain range of time such as : from 2017/7/20 to 2017/10/25 as above.
How to write this SQL statement? I'm using SQL Server 2008 R2.
The expected results table is like this:
datetime_range data_sum
------------------------------------
...
2017/8/24~2017/9/24 100.9
2017/9/24~2017/10/24 120.2
...
One conceptual way to proceed here is to redefine a "month" as ending on the 24th of each normal month. Using the SQL Server month function, we will assign any date occurring after the 24th as belonging to the next month. Then we can aggregate by the year along with this shifted month to obtain the sum of data.
WITH cte AS (
SELECT
data,
YEAR(datetime) AS year,
CASE WHEN DAY(datetime) > 24
THEN MONTH(datetime) + 1 ELSE MONTH(datetime) END AS month
FROM yourTable
)
SELECT
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), month) +
'/25~' +
CONVERT(varchar(4), year) + '/' + CONVERT(varchar(2), (month + 1)) +
'/24' AS datetime_range,
SUM(data) AS data_sum
FROM cte
GROUP BY
year, month;
Note that your suggested ranges seem to include the 24th on both ends, which does not make sense from an accounting point of view. I assume that the month includes and ends on the 24th (i.e. the 25th is the first day of the next accounting period.
Demo
I would suggest dynamically building some date range rows so that you can then join you data to those for aggregation, like this example:
+----+---------------------+---------------------+----------------+
| | period_start_dt | period_end_dt | your_data_here |
+----+---------------------+---------------------+----------------+
| 1 | 24.04.2017 00:00:00 | 24.05.2017 00:00:00 | 1 |
| 2 | 24.05.2017 00:00:00 | 24.06.2017 00:00:00 | 1 |
| 3 | 24.06.2017 00:00:00 | 24.07.2017 00:00:00 | 1 |
| 4 | 24.07.2017 00:00:00 | 24.08.2017 00:00:00 | 1 |
| 5 | 24.08.2017 00:00:00 | 24.09.2017 00:00:00 | 1 |
| 6 | 24.09.2017 00:00:00 | 24.10.2017 00:00:00 | 1 |
| 7 | 24.10.2017 00:00:00 | 24.11.2017 00:00:00 | 1 |
| 8 | 24.11.2017 00:00:00 | 24.12.2017 00:00:00 | 1 |
| 9 | 24.12.2017 00:00:00 | 24.01.2018 00:00:00 | 1 |
| 10 | 24.01.2018 00:00:00 | 24.02.2018 00:00:00 | 1 |
| 11 | 24.02.2018 00:00:00 | 24.03.2018 00:00:00 | 1 |
| 12 | 24.03.2018 00:00:00 | 24.04.2018 00:00:00 | 1 |
+----+---------------------+---------------------+----------------+
DEMO
declare #start_dt date;
set #start_dt = '20170424';
select
period_start_dt, period_end_dt, sum(1) as your_data_here
from (
select
dateadd(month,m.n,start_dt) period_start_dt
, dateadd(month,m.n+1,start_dt) period_end_dt
from (
select #start_dt start_dt ) seed
cross join (
select 0 n union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
) m
) r
-- LEFT JOIN YOUR DATA
-- ON yourdata.date >= r.period_start_dt and data.date < r.period_end_dt
group by
period_start_dt, period_end_dt
Please don't be tempted to use "between" when it comes to joining to your data. Follow the note above and use yourdata.date >= r.period_start_dt and data.date < r.period_end_dt otherwise you could double count information as between is inclusive of both lower and upper boundaries.
I think the simplest way is to subtract 25 days and aggregate by the month:
select year(dateadd(day, -25, datetime)) as yr,
month(dateadd(day, -25, datetime)) as mon,
sum(data)
from t
group by dateadd(day, -25, datetime);
You can format yr and mon to get the dates for the specific ranges, but this does the aggregation (and the yr/mon columns might be sufficient).
Step 0: Build a calendar table. Every database needs a calendar table eventually to simplify this sort of calculation.
In this table you may have columns such as:
Date (primary key)
Day
Month
Year
Quarter
Half-year (e.g. 1 or 2)
Day of year (1 to 366)
Day of week (numeric or text)
Is weekend (seems redundant now, but is a huge time saver later on)
Fiscal quarter/year (if your company's fiscal year doesn't start on Jan. 1)
Is Holiday
etc.
If your company starts its month on the 24th, then you can add a "Fiscal Month" column that represents that.
Step 1: Join on the calendar table
Step 2: Group by the columns in the calendar table.
Calendar tables sound weird at first, but once you realize that they are in fact tiny even if they span a couple hundred years they quickly become a major asset.
Don't try to cheap out on disk space by using computed columns. You want real columns because they are much faster and can be indexed if necessary. (Though honestly, usually just the PK index is enough for even wide calendar tables.)

Populating a table with all dates in a given range in Google BigQuery

Is there any convenient way to populate a table with all dates in a given range in Google BigQuery? What I need are all dates from 2015-06-01 till CURRENT_DATE(), so something like this:
+------------+
| date |
+------------+
| 2015-06-01 |
| 2015-06-02 |
| 2015-06-03 |
| ... |
| 2016-07-11 |
+------------+
Optimally, the next step would be to also get all weeks between the two dates, i.e.:
+---------+
| week |
+---------+
| 2015-23 |
| 2015-24 |
| 2015-25 |
| ... |
| 2016-28 |
+---------+
I've been fiddling around with the following answers I found, but I can't get them to work, mostly because core functions aren't supported and I can't find proper ways to replace them.
Easiest way to populate a temp table with dates between and including 2 date parameters
Generate Dates between date ranges
Your help is very much appreciated!
Best,
Max
Mikhail's answer works for BigQuery's legacy sql syntax perfectly. This solution is a slightly easier one if you're using the standard SQL syntax.
BigQuery standard SQL syntax actually has a built in function, GENERATE_DATE_ARRAY for creating an array from a date range. It takes a start date, end date and INTERVAL. For example:
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)
) AS day
If you wanted the week and year you could use
SELECT EXTRACT(YEAR FROM day), EXTRACT(WEEK FROM day)
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 WEEK)
) AS day
all dates from 2015-06-01 till CURRENT_DATE()
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
all weeks between the two dates
SELECT YEAR(DAY) AS y, WEEK(DAY) AS w
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
)
GROUP BY y, w