postgresql create table of days per month - sql

I'm trying to write a script which returns a list of months with the number of days in the month. It references this table
CREATE TABLE generic.time_series_only (measurementdatetime TIMESTAMP WITHOUT TIME ZONE NOT NULL)
which is just a chronological time series (and very useful when joining tables of data with gaps in different places, but you want an unbroken timeseries as your output, maybe there's a smarter way to do that but I haven't found it yet).
SELECT date_part('year'::text, time_series_only.measurementdatetime) AS
measyear,
date_part('month'::text, time_series_only.measurementdatetime) AS
measmonth,
date_trunc('month'::text, time_series_only.measurementdatetime) +
'1 mon'::interval - date_trunc('month'::text,
time_series_only.measurementdatetime) AS days_in_month
FROM generic.time_series_only
GROUP BY date_part('year'::text, time_series_only.measurementdatetime),
date_part('month'::text, time_series_only.measurementdatetime)
ORDER BY date_part('year'::text, time_series_only.measurementdatetime),
date_part('month'::text, time_series_only.measurementdatetime);
But I get this error:
ERROR: column "time_series_only.measurementdatetime" must appear in the GROUP BY clause or be used in an aggregate function
I can't put this column in the GROUP BY clause because then I'd get a result for every single entry in the time_series_only table, and I can't figure a way to get the same result using an aggregate function? Any suggestions very welcome :-)

you not using generate_series?.. like here:
vao=# with pre as (select generate_series('2016-01-01','2017-03-31','1 day'::interval) g) select distinct
extract('year' from g), extract('month' from g), count(1) over (partition by date_trunc('month',g)) from pre order by 1,2;
date_part | date_part | count
-----------+-----------+-------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31
2017 | 1 | 31
2017 | 2 | 28
2017 | 3 | 31
(15 rows)

Use distinct on a pair (year, month). You can replace the time_series_only table with the function generate_series() , e.g.:
select distinct on (date_part('year', d), date_part('month', d))
date_part('year', d) as year,
date_part('month', d) as month,
date_part('day', d) as days_in_month
from
generate_series('2016-01-01'::date, '2016-12-31'::date, '1d'::interval) d
order by 1, 2, 3 desc;
year | month | days_in_month
------+-------+---------------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31
(12 rows)

This one has better performance since it generates only the last day for each month and consequently does not need aggregation:
select
date_part('year', d) as year,
date_part('month', d) as month,
date_part('day', d) as days_in_month
from
generate_series('2016-01-01'::date, '2016-12-01', '1 month') gs(gsd)
cross join lateral
(select gsd + interval '1 month - 1 day') d(d)
order by 1, 2;
year | month | days_in_month
------+-------+---------------
2016 | 1 | 31
2016 | 2 | 29
2016 | 3 | 31
2016 | 4 | 30
2016 | 5 | 31
2016 | 6 | 30
2016 | 7 | 31
2016 | 8 | 31
2016 | 9 | 30
2016 | 10 | 31
2016 | 11 | 30
2016 | 12 | 31

Another variation, using CTEs for a bit more readability, IMHO (this example generating months and datas for next threee full months following the calendar month of current_date)
WITH series AS (
SELECT generate_series (
date_trunc ('month', date_trunc('day', now()) + interval '1 month'),
date_trunc('day', now() + interval '4 months'), '1d'::interval
) AS day ) SELECT DISTINCT ON (date_part('year', series.day), date_part('month', series.day))
date_part('year', series.day) as year,
date_part('month', series.day) as month,
date_part('day', series.day) as days_in_month
FROM series
ORDER BY 1, 2, 3 desc LIMIT 3;
year | month | days_in_month
------+-------+---------------
2021 | 1 | 31
2021 | 2 | 28
2021 | 3 | 31

Related

Last value grouped by month for reporting monthly progress

Hi I have a table that looks like the following
grouping_coulmn
value
date_modified
1
5
2020-10-15
1
10
2020-10-20
2
3
2020-10-20
1
11
2020-11-30
1
11
2020-12-10
1
5
2020-12-15
How could I make a query that returns the following results
grouping_column
last_value_of_month
month
1
10
OCT 2020
1
11
NOV 2020
1
5
DIC 2020
1
5
JAN 2021
2
3
OCT 2020
2
3
NOV 2020
2
3
DIC 2020
2
3
JAN 2021
In other words it should return the last value of the group each month, from the first entry until the current month. I could work it out if you don't fill the missing months, but I don't know how to work that out.
NOTE: this question was asked on January 2021, just for context.
First, generate all the months based on the oldest date in the table:
with months as (
select ddate + interval '1 month' as end_date,
to_char(ddate, 'MON YYYY') as month
from generate_series(
date_trunc(
'month',
(select min(date_modified) from table1)
),
now(),
interval '1 month'
) as gs(ddate)
)
Join that back to your data table, and use distinct on to limit the result to one record per (grouping_column, month):
select distinct on (t.grouping_column, m.end_date)
t.grouping_column, t.value as last_value_of_month, m.month
from months m
join table1 t
on t.date_modified < m.end_date
order by t.grouping_column, m.end_date, t.date_modified desc;
Result:
grouping_column | last_value_of_month | month
--------------: | ------------------: | :-------
1 | 10 | OCT 2020
1 | 11 | NOV 2020
1 | 5 | DEC 2020
1 | 5 | JAN 2021
2 | 3 | OCT 2020
2 | 3 | NOV 2020
2 | 3 | DEC 2020
2 | 3 | JAN 2021
db<>fiddle here

Rolling Average SQL

Hi I have a dataset where I have Year Month and output variables with the values as following:
Year | Month | Output
2015 | 1 | 12
2015 | 2 | 24
2015 | 3 | 2
2015 | 4 | 3
2015 | 5 | 7
2015 | 6 | 3
2015 | 7 | 7
2015 | 8 | 6
2015 | 9 | 7
2015 | 10 | 8
2015 | 11 | 3
2015 | 12 | 6
2016 | 1 | 3
2016 | 2 | 6
2016 | 3 | 8
2016 | 4 | 9
2016 | 5 | 4
......... and so on...
I want to add a new column in the dataset as Rolling_Average
Rolling_Average = Sum of previous 12 month Output/ Output of this month
for example :
Rolling_Average (for 2015-7) = output (2015-01) + output (2015-02) +output (2015-03) + output (2015-04) +output (2015-05) + output (2015-06) / output (2015-07)
I tried couple of queries online to get the output but it didn't work for me. Can someone please help me
Output Required is as follows:
Year | Month | Output | Rolling Average
2015 | 1 | 12 | 12
2015 | 2 | 24 | 0.5
2015 | 3 | 2 | 18
2015 | 4 | 3 | 38/3
2015 | 5 | 7 | 45/7
2015 | 6 | 3 | 48/3
2015 | 7 | 7 | 55/7
2015 | 8 | 6 | 61/6
2015 | 9 | 7 | 68/7
2015 | 10 | 8 | 74/8
2015 | 11 | 3 | 77/3
2015 | 12 | 6 | 83/6
2016 | 1 | 3 | 86/3
2016 | 2 | 6 | 92/6
2016 | 3 | 8 | 100/8
2016 | 4 | 9 | 109/9
2016 | 5 | 4 | 113/4
The Query I tried is :
SELECT DISTINCT
//CALCULATIONS
Year,
Month,
Output,
(sum(CAST(Output) AS DOUBLE)))
over(order by year,month rows between 12 preceding and 1 preceding )
as Rolling_Average
from my_table
group by Year,Month
order by Year,Month
It gives me error :
Syntax error: OVER keyword must follow a function call
Also I have tried other things
Can someone please help me in an easy way . I am using SQL Plx it is similar to SQL
Thank You!
You might have misplaced some parentheses
(sum( CAST(Output) AS DOUBLE ))) over (order by year, month rows between 12 preceding and 1 preceding ) as Rolling_Average
Versus:
SUM( CAST(Output AS DOUBLE) ) OVER (order by year, month rows between 12 preceding and 1 preceding) as Rolling_Average
You can also ROUND that result.
And those records already seem to be unique by Year and Month.
So there's not really a need to group on those.
SELECT
t.Year, t.Month, t.Output,
ROUND(SUM(CAST(t.Output AS INT)) OVER (ORDER BY t.Year, t.Month ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING)*1.0 / CAST(t.Output AS INT), 1) as Rolling_Average
FROM my_table t
ORDER BY t.Year, t.Month;
And if the window functions aren't supported, then this will work:
SELECT
t1.Year, t1.Month, t1.Output,
ROUND(SUM(CAST(t2.Output AS INT))*1.0 / CAST(t1.Output AS INT), 1) as Rolling_Average
FROM my_table t1
LEFT JOIN my_table t2 ON ((t2.Year = t1.Year AND t2.Month < t1.Month) OR
(t2.Year = t1.Year - 1 AND t2.Month >= t1.Month))
GROUP BY t1.Year, t1.Month, t1.Output
ORDER BY t1.Year, t1.Month;
db<>fiddle here
Try this(if you use sql-server)
Select *
from tableName T
outer apply (
select sum(output) Rolling_Average
from tableName T_in on T_in.year = T.year and T_in.Month <= T.Month
)x

Querying all past and future round birthdays

I got the birthdates of users in a table and want to display a list of round birthdays for the next n years (starting from an arbitrary date x) which looks like this:
+----------------------------------------------------------------------------------------+
| Name | id | birthdate | current_age | birthday | year | month | day | age_at_date |
+----------------------------------------------------------------------------------------+
| User 1 | 1 | 1958-01-23 | 59 | 2013-01-23 | 2013 | 1 | 23 | 55 |
| User 2 | 2 | 1988-01-29 | 29 | 2013-01-29 | 2013 | 1 | 29 | 25 |
| User 3 | 3 | 1963-02-12 | 54 | 2013-02-12 | 2013 | 2 | 12 | 50 |
| User 1 | 1 | 1958-01-23 | 59 | 2018-01-23 | 2018 | 1 | 23 | 60 |
| User 2 | 2 | 1988-01-29 | 29 | 2018-01-29 | 2018 | 1 | 29 | 30 |
| User 3 | 3 | 1963-02-12 | 54 | 2018-02-12 | 2018 | 2 | 12 | 55 |
| User 1 | 1 | 1958-01-23 | 59 | 2023-01-23 | 2023 | 1 | 23 | 65 |
| User 2 | 2 | 1988-01-29 | 29 | 2023-01-29 | 2023 | 1 | 29 | 35 |
| User 3 | 3 | 1963-02-12 | 54 | 2023-02-12 | 2023 | 2 | 12 | 60 |
+----------------------------------------------------------------------------------------+
As you can see, I want to be "wrap around" and not only show the next upcoming round birthday, which is easy, but also historical and far future data.
The core idea of my current approach is the following: I generate via generate_series all dates from 1900 till 2100 and join them by matching day and month of the birthdate with the user. Based on that, I calculate the age at that date to select finally only that birthdays, which are round (divideable by 5) and yield to a nonnegative age.
WITH
test_users(id, name, birthdate) AS (
VALUES
(1, 'User 1', '23-01-1958' :: DATE),
(2, 'User 2', '29-01-1988'),
(3, 'User 3', '12-02-1963')
),
dates AS (
SELECT
s AS date,
date_part('year', s) AS year,
date_part('month', s) AS month,
date_part('day', s) AS day
FROM generate_series('01-01-1900' :: TIMESTAMP, '01-01-2100' :: TIMESTAMP, '1 days' :: INTERVAL) AS s
),
birthday_data AS (
SELECT
id AS member_id,
test_users.birthdate AS birthdate,
(date_part('year', age((test_users.birthdate)))) :: INT AS current_age,
date :: DATE AS birthday,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
ROUND(extract(EPOCH FROM (dates.date - birthdate)) / (60 * 60 * 24 * 365)) :: INT AS age_at_date
FROM test_users, dates
WHERE
dates.day = date_part('day', birthdate) AND
dates.month = date_part('month', birthdate) AND
dates.year >= date_part('year', birthdate)
)
SELECT
test_users.name,
bd.*
FROM test_users
LEFT JOIN birthday_data bd ON bd.member_id = test_users.id
WHERE
bd.age_at_date % 5 = 0 AND
bd.birthday BETWEEN NOW() - INTERVAL '5' YEAR AND NOW() + INTERVAL '10' YEAR
ORDER BY bd.birthday;
My current approach seems to be very inefficient and rather complicated: It takes >100ms. Does anybody have an idea for a more compact and performant query? I am using Postgresql 9.5.3. Thank you!
Maybe try to join the generate series:
create table bday(id serial, name text, dob date);
insert into bday (name, dob) values ('a', '08-21-1972'::date);
insert into bday (name, dob) values ('b', '03-20-1974'::date);
select * from bday ,
lateral( select generate_series( (1950-y)/5 , (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
This will for each entry generate years between 1950 and 2010.
You can add a where clause to exclude people born after 2010 (they cant have a birthday in range)
Or exclude people born before 1850 (they are unlikely...)
--
Edit (after your edit):
So your generate_series creates 360+ rows per annum. In 100 years that is over 30.000. And they get joined to each user. (3 users => 100.000 rows)
My query generates only rows for years needed. In 100 years that is 20 rows.
That means 20 rows per user.
By dividing by 5, it ensures that the start date is a round birthday.
(1950-y)/5) calculates how many round birthdays there were before 1950.
A person born in 1941 needs to skip 1941 and 1946, but has a round birthday in 1951. So that is the difference (9 years) divided by 5, and then actually plus 1 to account for the 0st.
If the person is born after 1950 the number is negative, and greatest(-1,...)+1 gives 0, starting at the actual birthday year.
But actually it should be
select * from bday ,
lateral( select generate_series( greatest(-1,(1950-y)/5)+1, (2010-y)/5)*5 + y as year
from (select date_part('year',dob)::integer as y) as t2
) as t1;
(you may be doing greatest(0,...)+1 if you want to start at age 5)

Postgres - aggregate minutes to hour

I need some assistance with PostgresSQL. I am trying to group some records (5-, 10-, 15-, 20-, etc) into 60-minute intervals.
What i need is to GROUP BY and AVG the minute values within a given hour to the respective hour.
SELECT id, value,
extract(year from GDDP.timestamp) as YEAR,
extract(month from GDDP.timestamp) as MONTH,
extract(day from GDDP.timestamp) as DAY,
extract(hour from GDDP.timestamp) as "HOUR",
extract(minute from GDDP.timestamp) as MINUTE,
FROM GDDP
WHERE value > 0 AND
GDDP.timestamp BETWEEN '2016-07-01 00:00:00' and '2016-12-31 23:55:00'
ORDER BY YEAR, MONTH, DAY, HOUR
Currently, this is the result of the query above:
id | value | YEAR | MONTH | DAY | HOUR | MINUTE
-------------------------------------------------
1 | 100 | 2016 | 07 | 01 | 1 | 05
2 | 200 | 2016 | 07 | 01 | 1 | 10
3 | 100 | 2016 | 07 | 01 | 1 | 15
4 | 300 | 2016 | 07 | 01 | 1 | 20
5 | 200 | 2016 | 07 | 01 | 1 | 25
6 | 500 | 2016 | 07 | 01 | 1 | 30
But, I would like the result to look like this:
id | value | YEAR | MONTH | DAY | HOUR
---------------------------------------
1 | 233.3 | 2016 | 07 | 01 | 1
Thanks in advance for any assistance!
Use the aggregation function avg() in groups by year, month, day, hour
SELECT
min(id) as id,
avg(value) as value,
extract(year from gddp.timestamp) as year,
extract(month from gddp.timestamp) as month,
extract(day from gddp.timestamp) as day,
extract(hour from gddp.timestamp) as hour
FROM gddp
WHERE value > 0
AND gddp.timestamp BETWEEN '2016-07-01 01:00:00' AND '2016-12-31 01:23:00'
GROUP BY year, month, day, hour
ORDER BY year, month, day, hour;
id | value | year | month | day | hour
----+----------------------+------+-------+-----+------
1 | 233.3333333333333333 | 2016 | 7 | 1 | 1
(1 row)

weekly aggregate with CTE not behaving as expected

I have this USERS table with users that can be of two different types (A and B). I need to show a report with the aggregate per type for each week. The query I have so far works well except some weeks are not grouping properly. In the example below, the week starting Jan 28th should have one line, not two.
Week Starts |Week| Type A | Type B
------------+----+--------+------
2013-02-04 | 14 | 2 | 26
2013-01-28 | 13 | 5 | 191
2013-01-28 | 13 | 0 | 24
2013-01-21 | 12 | 1 | 134
2013-01-21 | 12 | 0 | 20
2013-01-14 | 11 | 1 | 143
2013-01-14 | 11 | 0 | 2
2013-01-07 | 10 | 0 | 233
2013-01-07 | 10 | 0 | 23
2012-12-31 | 9 | 0 | 12
2012-12-31 | 9 | 4 | 164
2012-12-31 | 9 | 0 | 20
SQL
;with cte as
(
select DATEADD(m,-3,GETDATE()) firstday, DATEADD(m,-3,GETDATE()) + 6 - DATEDIFF(day, 0, DATEADD(m,-3,GETDATE())) %7 lastday, 1 week
union all
select lastday + 1, case when GETDATE() < lastday + 7 then GETDATE() else lastday + 7 end, week + 1
from cte
where lastday < GETDATE()
)
SELECT
cast(firstday as date) 'Week Starts',
cte.week as 'Week',
Sum(CASE WHEN USR_TYPE = 'A' THEN 1 ELSE 0 END) As 'Type A',
Sum(CASE WHEN USR_TYPE = 'B' THEN 1 ELSE 0 END) As 'Type B'
FROM cte left join USERS
ON cte.firstday <= USERS.CREATED
AND cte.lastday > USERS.CREATED
GROUP BY cte.week, cte.firstday, cte.lastday, DATEPART(YEAR,USERS.CREATED), DATEPART(wk,USERS.CREATED)
ORDER BY week desc
What am I doing wrong?
Without seeing any data from your users table I am going to take a guess.
The list of dates you are generating in the CTE includes the time.
You might need to cast() your firstday and lastday values as either a date or generate the list with no time.
See a SQL Fiddle Demo
Sample from your CTE and the new dates cast:
| CASTFIRSTDAY | CASTLASTDAY | WEEK | FIRSTDAY | LASTDAY |
---------------------------------------------------------------------------------------------------------
| 2012-11-05 | 2012-11-11 | 1 | November, 05 2012 20:08:10+0000 | November, 11 2012 20:08:10+0000 |
| 2012-11-12 | 2012-11-18 | 2 | November, 12 2012 20:08:10+0000 | November, 18 2012 20:08:10+0000 |
| 2012-11-19 | 2012-11-25 | 3 | November, 19 2012 20:08:10+0000 | November, 25 2012 20:08:10+0000 |
| 2012-11-26 | 2012-12-02 | 4 | November, 26 2012 20:08:10+0000 | December, 02 2012 20:08:10+0000 |
| 2012-12-03 | 2012-12-09 | 5 | December, 03 2012 20:08:10+0000 | December, 09 2012 20:08:10+0000 |
| 2012-12-10 | 2012-12-16 | 6 | December, 10 2012 20:08:10+0000 | December, 16 2012 20:08:10+0000 |
You might want to edit your CTE to return the date only values:
;with cte as
(
select
cast(DATEADD(m,-3,GETDATE()) as date) firstday,
cast(DATEADD(m,-3,GETDATE()) + 6 - DATEDIFF(day, 0, DATEADD(m,-3,GETDATE())) %7 as DATE) lastday,
1 week
union all
select
cast(DATEADD(DAY, 1, lastday) as date),
case
when cast(GETDATE() as date) < cast(DATEADD(DAY, 7, lastday) as date)
then cast(GETDATE() as date)
else cast(DATEADD(DAY, 7, lastday) as date)
end,
week + 1
from cte
where cast(lastday as date) < cast(GETDATE() as date)
)
select *
from cte
See SQL Fiddle with Demo