Postgres - aggregate minutes to hour - sql

I need some assistance with PostgresSQL. I am trying to group some records (5-, 10-, 15-, 20-, etc) into 60-minute intervals.
What i need is to GROUP BY and AVG the minute values within a given hour to the respective hour.
SELECT id, value,
extract(year from GDDP.timestamp) as YEAR,
extract(month from GDDP.timestamp) as MONTH,
extract(day from GDDP.timestamp) as DAY,
extract(hour from GDDP.timestamp) as "HOUR",
extract(minute from GDDP.timestamp) as MINUTE,
FROM GDDP
WHERE value > 0 AND
GDDP.timestamp BETWEEN '2016-07-01 00:00:00' and '2016-12-31 23:55:00'
ORDER BY YEAR, MONTH, DAY, HOUR
Currently, this is the result of the query above:
id | value | YEAR | MONTH | DAY | HOUR | MINUTE
-------------------------------------------------
1 | 100 | 2016 | 07 | 01 | 1 | 05
2 | 200 | 2016 | 07 | 01 | 1 | 10
3 | 100 | 2016 | 07 | 01 | 1 | 15
4 | 300 | 2016 | 07 | 01 | 1 | 20
5 | 200 | 2016 | 07 | 01 | 1 | 25
6 | 500 | 2016 | 07 | 01 | 1 | 30
But, I would like the result to look like this:
id | value | YEAR | MONTH | DAY | HOUR
---------------------------------------
1 | 233.3 | 2016 | 07 | 01 | 1
Thanks in advance for any assistance!

Use the aggregation function avg() in groups by year, month, day, hour
SELECT
min(id) as id,
avg(value) as value,
extract(year from gddp.timestamp) as year,
extract(month from gddp.timestamp) as month,
extract(day from gddp.timestamp) as day,
extract(hour from gddp.timestamp) as hour
FROM gddp
WHERE value > 0
AND gddp.timestamp BETWEEN '2016-07-01 01:00:00' AND '2016-12-31 01:23:00'
GROUP BY year, month, day, hour
ORDER BY year, month, day, hour;
id | value | year | month | day | hour
----+----------------------+------+-------+-----+------
1 | 233.3333333333333333 | 2016 | 7 | 1 | 1
(1 row)

Related

How to calculate average for every month from start from year in Presto's SQL (Athena)?

Below is an example of the table data I have
| date | value |
| 2020-01-01 | 20 |
| 2020-01-14 | 10 |
| 2020-02-02 | 30 |
| 2020-02-11 | 25 |
| 2020-02-25 | 25 |
| 2020-03-13 | 34 |
| 2020-03-21 | 10 |
| 2020-04-06 | 55 |
| 2020-04-07 | 11 |
I would like to generate a result set as below
| date | value | average |
| 2020-01-01 | 20 | Jan average |
| 2020-01-14 | 10 | Jan average |
| 2020-02-02 | 30 | Jan & Feb average |
| 2020-02-11 | 25 | Jan & Feb average |
| 2020-02-25 | 25 | Jan & Feb average |
| 2020-03-13 | 34 | Jan & Feb & Mar average |
| 2020-03-21 | 10 | Jan & Feb & Mar average |
| 2020-04-06 | 55 | Jan & Feb & Mar & Apr average |
| 2020-04-07 | 11 | Jan & Feb & Mar & Apr average |
I tried to use window function OVER() and PARTITION() but I managed to get average on month by month rather than starting from the year.
Any suggestions, please.
Thanks
I think you want:
select
t.*,
avg(value) over(
partition by year(date)
order by month(date)
) running_avg
from mytable t
This puts each year in a separate partition, and then orders partition rows by month.
This following query should give your expected output-
Demo Here
SELECT A.*,
(
SELECT AVG(Value * 1.00)
FROM your_table B
WHERE YEAR(B.Date) = YEAR(A.DAte)
AND MONTH(B.Date) <= MONTH(A.DAte)
)
FROM your_table A
This query will make your output per year. But if you wants no partition by YEAR, just remove the YEAR filter from the sub query.
This following query will return AVG with no consideration of YEAR, just AVG of all before months-
Demo Here
SELECT A.*,
(
SELECT AVG(Value * 1.00)
FROM your_table B
WHERE B.date <=
(
SELECT MAX(Date)
FROM your_table C
WHERE YEAR(c.Date) = YEAR(A.Date)
AND MONTH(C.Date) = MONTH(A.Date)
)
)
FROM your_table A
Not sure I understand your question, but if all you want is a running average for each row bound by year:
SELECT date, value, (
SELECT AVG(value)
FROM data ds
WHERE ds.date <= d.date AND YEAR(ds.date) = YEAR(d.date)
) average
FROM data d
ORDER BY d.date ASC;
Example with MySQL (the syntax for this specific query is the same)
If you want to include later rows of the same month in the average, use WHERE MONTH(ds.date) <= MONTH(d.date).
SELECT a.date,
a.value,
(Select avg(b.value) from myTable B where b.date < a.date and YEAR(a.date) = YEAR(b.date))
From myTable a

Querying all past and future round birthdays

I got the birthdates of users in a table and want to display a list of round birthdays for the next n years (starting from an arbitrary date x) which looks like this:
+----------------------------------------------------------------------------------------+
| Name | id | birthdate | current_age | birthday | year | month | day | age_at_date |
+----------------------------------------------------------------------------------------+
| User 1 | 1 | 1958-01-23 | 59 | 2013-01-23 | 2013 | 1 | 23 | 55 |
| User 2 | 2 | 1988-01-29 | 29 | 2013-01-29 | 2013 | 1 | 29 | 25 |
| User 3 | 3 | 1963-02-12 | 54 | 2013-02-12 | 2013 | 2 | 12 | 50 |
| User 1 | 1 | 1958-01-23 | 59 | 2018-01-23 | 2018 | 1 | 23 | 60 |
| User 2 | 2 | 1988-01-29 | 29 | 2018-01-29 | 2018 | 1 | 29 | 30 |
| User 3 | 3 | 1963-02-12 | 54 | 2018-02-12 | 2018 | 2 | 12 | 55 |
| User 1 | 1 | 1958-01-23 | 59 | 2023-01-23 | 2023 | 1 | 23 | 65 |
| User 2 | 2 | 1988-01-29 | 29 | 2023-01-29 | 2023 | 1 | 29 | 35 |
| User 3 | 3 | 1963-02-12 | 54 | 2023-02-12 | 2023 | 2 | 12 | 60 |
+----------------------------------------------------------------------------------------+
As you can see, I want to be "wrap around" and not only show the next upcoming round birthday, which is easy, but also historical and far future data.
The core idea of my current approach is the following: I generate via generate_series all dates from 1900 till 2100 and join them by matching day and month of the birthdate with the user. Based on that, I calculate the age at that date to select finally only that birthdays, which are round (divideable by 5) and yield to a nonnegative age.
WITH
test_users(id, name, birthdate) AS (
VALUES
(1, 'User 1', '23-01-1958' :: DATE),
(2, 'User 2', '29-01-1988'),
(3, 'User 3', '12-02-1963')
),
dates AS (
SELECT
s AS date,
date_part('year', s) AS year,
date_part('month', s) AS month,
date_part('day', s) AS day
FROM generate_series('01-01-1900' :: TIMESTAMP, '01-01-2100' :: TIMESTAMP, '1 days' :: INTERVAL) AS s
),
birthday_data AS (
SELECT
id AS member_id,
test_users.birthdate AS birthdate,
(date_part('year', age((test_users.birthdate)))) :: INT AS current_age,
date :: DATE AS birthday,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
ROUND(extract(EPOCH FROM (dates.date - birthdate)) / (60 * 60 * 24 * 365)) :: INT AS age_at_date
FROM test_users, dates
WHERE
dates.day = date_part('day', birthdate) AND
dates.month = date_part('month', birthdate) AND
dates.year >= date_part('year', birthdate)
)
SELECT
test_users.name,
bd.*
FROM test_users
LEFT JOIN birthday_data bd ON bd.member_id = test_users.id
WHERE
bd.age_at_date % 5 = 0 AND
bd.birthday BETWEEN NOW() - INTERVAL '5' YEAR AND NOW() + INTERVAL '10' YEAR
ORDER BY bd.birthday;
My current approach seems to be very inefficient and rather complicated: It takes >100ms. Does anybody have an idea for a more compact and performant query? I am using Postgresql 9.5.3. Thank you!
Maybe try to join the generate series:
create table bday(id serial, name text, dob date);
insert into bday (name, dob) values ('a', '08-21-1972'::date);
insert into bday (name, dob) values ('b', '03-20-1974'::date);
select * from bday ,
lateral( select generate_series( (1950-y)/5 , (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
This will for each entry generate years between 1950 and 2010.
You can add a where clause to exclude people born after 2010 (they cant have a birthday in range)
Or exclude people born before 1850 (they are unlikely...)
--
Edit (after your edit):
So your generate_series creates 360+ rows per annum. In 100 years that is over 30.000. And they get joined to each user. (3 users => 100.000 rows)
My query generates only rows for years needed. In 100 years that is 20 rows.
That means 20 rows per user.
By dividing by 5, it ensures that the start date is a round birthday.
(1950-y)/5) calculates how many round birthdays there were before 1950.
A person born in 1941 needs to skip 1941 and 1946, but has a round birthday in 1951. So that is the difference (9 years) divided by 5, and then actually plus 1 to account for the 0st.
If the person is born after 1950 the number is negative, and greatest(-1,...)+1 gives 0, starting at the actual birthday year.
But actually it should be
select * from bday ,
lateral( select generate_series( greatest(-1,(1950-y)/5)+1, (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
(you may be doing greatest(0,...)+1 if you want to start at age 5)

postgresql create table of days per month

I'm trying to write a script which returns a list of months with the number of days in the month. It references this table
CREATE TABLE generic.time_series_only (measurementdatetime TIMESTAMP WITHOUT TIME ZONE NOT NULL)
which is just a chronological time series (and very useful when joining tables of data with gaps in different places, but you want an unbroken timeseries as your output, maybe there's a smarter way to do that but I haven't found it yet).
SELECT date_part('year'::text, time_series_only.measurementdatetime) AS
measyear,
date_part('month'::text, time_series_only.measurementdatetime) AS
measmonth,
date_trunc('month'::text, time_series_only.measurementdatetime) +
'1 mon'::interval - date_trunc('month'::text,
time_series_only.measurementdatetime) AS days_in_month
FROM generic.time_series_only
GROUP BY date_part('year'::text, time_series_only.measurementdatetime),
date_part('month'::text, time_series_only.measurementdatetime)
ORDER BY date_part('year'::text, time_series_only.measurementdatetime),
date_part('month'::text, time_series_only.measurementdatetime);
But I get this error:
ERROR: column "time_series_only.measurementdatetime" must appear in the GROUP BY clause or be used in an aggregate function
I can't put this column in the GROUP BY clause because then I'd get a result for every single entry in the time_series_only table, and I can't figure a way to get the same result using an aggregate function? Any suggestions very welcome :-)
you not using generate_series?.. like here:
vao=# with pre as (select generate_series('2016-01-01','2017-03-31','1 day'::interval) g) select distinct
extract('year' from g), extract('month' from g), count(1) over (partition by date_trunc('month',g)) from pre order by 1,2;
date_part | date_part | count
-----------+-----------+-------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31
2017 | 1 | 31
2017 | 2 | 28
2017 | 3 | 31
(15 rows)
Use distinct on a pair (year, month). You can replace the time_series_only table with the function generate_series() , e.g.:
select distinct on (date_part('year', d), date_part('month', d))
date_part('year', d) as year,
date_part('month', d) as month,
date_part('day', d) as days_in_month
from
generate_series('2016-01-01'::date, '2016-12-31'::date, '1d'::interval) d
order by 1, 2, 3 desc;
year | month | days_in_month
------+-------+---------------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31
(12 rows)
This one has better performance since it generates only the last day for each month and consequently does not need aggregation:
select
date_part('year', d) as year,
date_part('month', d) as month,
date_part('day', d) as days_in_month
from
generate_series('2016-01-01'::date, '2016-12-01', '1 month') gs(gsd)
cross join lateral
(select gsd + interval '1 month - 1 day') d(d)
order by 1, 2;
year | month | days_in_month
------+-------+---------------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31
Another variation, using CTEs for a bit more readability, IMHO (this example generating months and datas for next threee full months following the calendar month of current_date)
WITH series AS (
SELECT generate_series (
date_trunc ('month', date_trunc('day', now()) + interval '1 month'),
date_trunc('day', now() + interval '4 months'), '1d'::interval
) AS day ) SELECT DISTINCT ON (date_part('year', series.day), date_part('month', series.day))
date_part('year', series.day) as year,
date_part('month', series.day) as month,
date_part('day', series.day) as days_in_month
FROM series
ORDER BY 1, 2, 3 desc LIMIT 3;
year | month | days_in_month
------+-------+---------------
2021 | 1 | 31
2021 | 2 | 28
2021 | 3 | 31

Query with inner select and Group by

I am struggling with this query, here is the table set up:
date | time | count
----------------------------
12/12/2015 | 0:00 | 8
12/12/2015 | 1:00 | 19
12/12/2015 | 2:00 | 36
12/13/2015 | 0:00 | 12
12/13/2015 | 1:00 | 22
12/13/2015 | 2:00 | 30
12/14/2015 | 0:00 | 14
12/14/2015 | 1:00 | 26
12/14/2015 | 2:00 | 38
What I would like my query to return is something like this:
date | time | count | DAY | AVG/HR | AVG/DAY
---------------------------------------------------------
12/12/2015 | 0:00 | 8 | MONDAY | 11.33 | 63
12/12/2015 | 1:00 | 19 | MONDAY | 22.33 | 63
12/12/2015 | 2:00 | 36 | MONDAY | 34.67 | 63
12/13/2015 | 0:00 | 12 | TUESDAY | 11.33 | 64
12/13/2015 | 1:00 | 22 | TUESDAY | 22.33 | 64
12/13/2015 | 2:00 | 30 | TUESDAY | 34.67 | 64
12/14/2015 | 0:00 | 14 | WEDNESDAY | 11.33 | 78
12/14/2015 | 1:00 | 26 | WEDNESDAY | 22.33 | 78
12/14/2015 | 2:00 | 38 | WEDNESDAY | 34.67 | 78
So basically that is returning all rows (there will be months worth of data in the table, with each day having 24 records/hours). And adding a day of the week field, and an Average of the count per hour along with a average of the count per day of the week. The last 2 are what I am struggling with. Here is what I have so far:
SELECT DATE, TIME, COUNT,
TO_CHAR(DATE, 'DAY'),
(SELECT AVG(t2.COUNT)
FROM tableXX t2
WHERE t2.time = t1.time
GROUP BY t2.time) AS AvgPerHr
(SELECT AVG(t2.COUNT)
FROM tableXX t2
WHERE TO_CHAR(t2.DATE, 'DAY') = TO_CHAR(t1.DATE, 'DAY')
GROUP BY TO_CHAR(t2.DATE, 'DAY')) AS AvgPerDay
FROM tableXX t1
ORDER BY DATE, TO_DATE(TIME, 'hh24:mi') ASC;
Any suggestions would be appreciated, the query above returns data, but it definitely isn't accurate.
This can be solved by using analytical functions.
SELECT DATE, TIME, COUNT,
TO_CHAR(DATE, 'DAY'),
AVG(t1.COUNT)
OVER (PARTITION BY TIME) AS AvgPerHr,
AVG(t1.COUNT)
OVER (PARTITION BY TO_CHAR(DATE, 'DAY')) AS AvgPerDay
FROM tableXX t1
ORDER BY DATE, TO_DATE(TIME, 'hh24:mi') ASC;
Try:
SELECT "DATE", "TIME", "COUNT", TO_CHAR(DATE, 'DAY') "DAY,
avg( "COUNT" ) Over (partition by "TIME" ) "AVG/HR",
SUM( "COUNT" ) Over (partition by "DATE" ) "AVG/DAY"
FROM tablexx
ORDER BY 1;
I use SUM( "COUNT" ) instead of AVG( "COUNT" ), since 63 in the first row of your example appears to be sum per day, not an average.

SQL sum over partition for preceding period

I have the following table, which represent Customers for each day:
+----------+-----------+
| Date | Customers |
+----------+-----------+
| 1/1/2014 | 4 |
| 1/2/2014 | 7 |
| 1/3/2014 | 5 |
| 1/4/2014 | 5 |
| 1/5/2014 | 10 |
| 2/1/2014 | 7 |
| 2/2/2014 | 4 |
| 2/3/2014 | 1 |
| 2/4/2014 | 5 |
+----------+-----------+
I would like to add 2 additional columns:
Summary of the customers for the current month
Summary of the customers for the preceding month
here's the desired outcome:
+----------+-----------+----------------------+------------------------+
| Date | Customers | Sum_of_Current_month | Sum_of_Preceding_month |
+----------+-----------+----------------------+------------------------+
| 1/1/2014 | 4 | 31 | 0 |
| 1/2/2014 | 7 | 31 | 0 |
| 1/3/2014 | 5 | 31 | 0 |
| 1/4/2014 | 5 | 31 | 0 |
| 1/5/2014 | 10 | 31 | 0 |
| 2/1/2014 | 7 | 17 | 31 |
| 2/2/2014 | 4 | 17 | 31 |
| 2/3/2014 | 1 | 17 | 31 |
| 2/4/2014 | 5 | 17 | 31 |
+----------+-----------+----------------------+------------------------+
I have managed to calculate the 3rd column by a simple sum over partition function:
Select
Date,
Customers,
Sum(Customers) over (Partition by (Month(Date)||year(Date) Order by 1) as Sum_of_Current_month
From table
However, I can't find a way to calculate the Sum_of_preceding_month column.
Appreciate your support.
Asaf
The previous month is a bit tricky. What's your Teradata release, TD14.10 supports LAST_VALUE:
SELECT
dt,
customers,
Sum_of_Current_month,
-- return the previous sum
COALESCE(LAST_VALUE(x ignore NULLS)
OVER (ORDER BY dt
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
,0) AS Sum_of_Preceding_month
FROM
(
SELECT
dt,
Customers,
SUM(Customers) OVER (PARTITION BY TRUNC(dt,'mon')) AS Sum_of_Current_month,
CASE -- keep the number only for the last day in month
WHEN ROW_NUMBER()
OVER (PARTITION BY TRUNC(dt,'mon')
ORDER BY dt)
= COUNT(*)
OVER (PARTITION BY TRUNC(dt,'mon'))
THEN Sum_of_Current_month
END AS x
FROM tab
) AS dt
I think this might be easier by using lag() and an aggregation sub-query. The ANSI Standard syntax is:
Select t.*, tt.sumCustomers, tt.prev_sumCustomers
From table t join
(select extract(year from date) as yyyy, extract(month from date) as mm,
sum(Customers) as sumCustomers,
lag(sum(Customers)) over (order by extract(year from date), extract(month from date)
) as prev_sumCustomers
from table t
group by extract(year from date), extract(month from date)
) tt
on extract(year from date) = tt.yyyy and extract(month from date) = t.mm;
In Teradata, this would be written as:
Select t.*, tt.sumCustomers, tt.prev_sumCustomers
From table t join
(select extract(year from date) as yyyy, extract(month from date) as mm,
sum(Customers) as sumCustomers,
min(sum(Customers)) over (order by extract(year from date), extract(month from date)
rows between 1 preceding and 1 preceding
) as prev_sumCustomers
from table t
group by extract(year from date), extract(month from date)
) tt
on extract(year from date) = tt.yyyy and extract(month from date) = t.mm;
Try this:
SELECT
[Date],
[Customers],
(SELECT SUM(customers) FROM table WHERE MONTH(dte) = MONTH(tbl.dte)),
ISNULL((SELECT SUM(customers) FROM table WHERE MONTH(dte) = MONTH(DATEADD(MONTH, -1, tbl.dte))), 0)
FROM table tbl