Postgresql percentage of records created in the specific time interval - sql

I have a table that contains the field created_at. I want to calculate the percentage of records from the total number that was created in the specified time interval. Let's say that I have the following structure:
| name | created_at |
----------------------------------------
| first | "2019-04-29 09:30:07.441717" |
| second | "2019-04-30 09:30:07.441717" |
| third | "2019-04-28 09:30:07.441717" |
| fourth | "2019-04-27 09:30:07.441717" |
So I want to calculate what is the percentage of records created in the time interval between 2019-04-28 00:00:00 and 2019-04-30 00:00:00. In this time interval, I have two records first and third, so the result should be 50%. I came across the OVER() clause, but either I don't get how to use it, or it's not what I need.

You can use CASE
select 100 * count(case
when created_at between '2019-04-28 00:00:00' and '2019-04-30 00:00:00'
then 1
end) / count(*)
from your_table

I would just use avg():
select avg( (created_at between '2019-04-28' and '2019-04-30')::int )
from your_table
You can multiply by 100, if you want a value between 0 and 1.
I strongly discourage you from using between with date/time values. The time components may not behave the way you want. You used "between" in your question, but I left it in. However, I would suggest:
select avg( (created_at >= '2019-04-28' and
created_at < '2019-04-30'
)::int
)
from your_table;
It is not clear if you want < '2019-04-30', <= '2019-04-30' or '2019-05-01'.

Related

Counts for time range per day

I have a table something like this
create table widgets
(
id primary key,
created_at timestamp,
-- other fields
)
Now I want a query that shows the count of widgets with created_at between multiple time ranges for each day. For example, the count of widgets with created_at between 00:00:00 and 11:59:59 and the count between 12:00:00 and 23:59:59. The output would look something like this:
date | morning widgets (before noon) | evening widgets (after noon) |
---------------|-------------------------------|------------------------------|
2022-05-01 | ## | ## |
2022-05-02 | ## | ## |
2022-05-03 | ## | ## |
2022-05-04 | ## | ## |
... etc.
So far, I figured out I can get counts per day:
select created_at::date as created_at_date, count(*) as total
from widgets
where created_at::date >= '2022-05-01' -- where clause for illustration purposes only and not critical to the central question here
group by created_at::date
I'm learning about windowing functions, specifically partition by. I think this will help me get what I want, but not sure. How do I do this?
I'd prefer a "standard SQL" solution. If necessary, I'm on postgres and can use anything specific to its flavor of SQL.
If I understand correctly, we can try to use the condition window function to make it.
morning widgets (before noon) : between 00:00:00 and 11:59:59
evening widgets (after noon) : between 12:00:00 and 23:59:59
put the condition in aggregate function by CASE WHEN expression.
SELECT created_at::date,
COUNT(CASE WHEN created_at >= created_at::date AND created_at <= created_at::date + INTERVAL '12 HOUR' THEN 1 END) ,
COUNT(CASE WHEN created_at >= created_at::date + INTERVAL '12 HOUR' AND created_at <= created_at::date+ INTERVAL '1 DAY' THEN 1 END)
FROM widgets w
GROUP BY created_at::date
ORDER BY created_at::date
sqliddle

Get max value of binned time-interval

I have a 'requests' table with a 'time_request' column which has a timestamp for each request. I want to know the maximum amount of requests that i had in a single minute.
So im guessing i need to somehow 'group by' a 1m time interval, and then do some sort of MAX(COUNT(request_id))? Although nested aggregations are not allowed.
Will appreciate any help.
Table example:
request_id | time_request
------------------+---------------------
ab1 | 2021-03-29 16:20:05
ab2 | 2021-03-29 16:20:20
bc3 | 2021-03-31 20:34:07
fw3 | 2021-03-31 20:38:53
fe4 | 2021-03-31 20:39:53
Expected result: 2 (There were a maximum of 2 requests in a single minute)
Thanks!
You may use window function count and specify logical interval of one minute as the window boundary. It will calculate the count for each row and will account all the rows that are within one minute before.
Code for Postgres is below:
with a as (
select
id
, cast(ts as timestamp) as ts
from(values
('ab1', '2021-03-29 16:20:05'),
('ab2', '2021-03-29 16:20:20'),
('bc3', '2021-03-31 20:34:07'),
('fw3', '2021-03-31 20:38:53'),
('fe4', '2021-03-31 20:39:53')
) as t(id, ts)
)
, count_per_interval as (
select
a.*
, count(id) over (
order by ts asc
range between
interval '1' minute preceding
and current row
) as cnt_per_min
from a
)
select max(cnt_per_min)
from count_per_interval
| max |
| --: |
| 2 |
db<>fiddle here

Postgres query for difference between latest and first record of the day

Postgres data alike this:
| id | read_at | value_1 |
| ------|------------------------|---------|
| 16239 | 2021-11-28 16:13:00+00 | 1509 |
| 16238 | 2021-11-28 16:12:00+00 | 1506 |
| 16237 | 2021-11-28 16:11:00+00 | 1505 |
| 16236 | 2021-11-28 16:10:00+00 | 1501 |
| 16235 | 2021-11-28 16:09:00+00 | 1501 |
| ..... | .......................| .... |
| 15266 | 2021-11-28 00:00:00+00 | 1288 |
A value is added every minute and increases over time.
I would like to get the current total for the day and have this in a Grafana stat panel. Above it would be: 221 (1509-1288). Latest record minus first record of today.
SELECT id,read_at,value_1
FROM xyz
ORDER BY id DESC
LIMIT 1;
With this the latest record is given (A).
SELECT id,read_at,value_1
FROM xyz
WHERE read_at = CURRENT_DATE
ORDER BY id DESC
LIMIT 1;
With this the first record of the day is given (B).
Grafana cannot do math on this (A-B). Single query would be best.
Sadly my database knowledge is low and attempts at building queries have not succeeded, and have taken all afternoon now.
Theoretical ideas to solve this:
Subtract the min from the max value where time frame is today.
Using a lag, lag it for the count of records that are recorded today. Subtract lag value from latest value.
Window function.
What is the best way (performance wise) forward and how would such query be written?
Calculate the cumulative total last_value - first_value for each record for the current day using window functions (this is the t subquery) and then pick the latest one.
select current_total, read_at::date as read_at_date
from
(
select last_value(value_1) over w - first_value(value_1) over w as current_total,
read_at
from the_table
where read_at >= current_date and read_at < current_date + 1
window w as (partition by read_at::date order by read_at)
) as t
order by read_at desc limit 1;
However if it is certain that value_1 only "increases over time" then simple grouping will do and that is by far the best way performance wise:
select max(value_1) - min(value_1) as current_total,
read_at::date as read_at_date
from the_table
where read_at >= current_date and read_at < current_date + 1
group by read_at::date;
Please, check if it works.
Since you intend to publish it in Grafana, the query does not impose a period filter.
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/3080
create table g (id int, read_at timestamp, value_1 int);
insert into g
values
(16239, '2021-11-28 16:13:00+00', 1509),
(16238, '2021-11-28 16:12:00+00', 1506),
(16237, '2021-11-28 16:11:00+00', 1505),
(16236, '2021-11-28 16:10:00+00', 1501),
(16235, '2021-11-28 16:09:00+00', 1501),
(15266, '2021-11-28 00:00:00+00', 1288);
select date(read_at), max(value_1) - min(value_1)
from g
group by date(read_at);
Since you data contains multiple values for 2 distinct times (16:09 and 16:10), this indicates the possibility that min and max values do not always increase in the time interval. Leaving open the possibility of a decrease. So do you want max - min reading or the difference in reading at min/max time. The following get value difference to get difference between the first and latest reading of the day as indicated in the title.
with parm(dt) as
( values (date '2021-11-28') )
, first_read (f_read,f_value) as
( select read_at, value_1
from test_tbl
where read_at at time zone 'UTC'=
( select min(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
, last_read (l_read, l_value) as
( select read_at,value_1
from test_tbl
where read_at at time zone 'UTC'=
( select max(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
select l_read, f_read, l_value, f_value, l_value - f_value as "Day Difference"
from last_read
join first_read on true;

Easy substraction of year's values

I do have the following database table containing the timestamps in unix format as well as the total yield (summing up) of my solar panels every 5 mins:
| Timestamp | TotalYield |
|------------|------------|
| 1321423500 | 1 |
| 1321423800 | 5 |
| ... | |
| 1573888800 | 44094536 |
Now I would like to calculate how much energy was produced each year. I thought of reading the first and last timestamp using UNION of each year:
SELECT strftime('%d.%m.%Y',datetime(TimeStamp,'unixepoch')), TotalYield FROM PascalsDayData WHERE TimeStamp IN (
SELECT MAX(TimeStamp) FROM PascalsDayData GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
UNION
SELECT MIN(TimeStamp) FROM DayData GROUP BY strftime('%Y',datetime(TimeStamp,'unixepoch'))
)
This works fine but I need to do some post processing to substract end year's value with the first year's one. There must be a more elegant way to do this in SQL, right?
Thanks,
Anton
You can aggregate by year and subtract the min and max value:
SELECT MAX(TotalYield) - MIN(TotalYield)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
This assumes that TotalYield does not decrease -- which your question implies.
If you actually want the next year's value, you can use LEAD():
SELECT (LEAD(MIN(TotalYield), 1, MAX(TotalYield) OVER (ORDER BY MIN(TimeStamp) -
MIN(TotalYield)
)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))

How to do a sub-select per result entry in postgresql?

Assume I have a table with only two columns: id, maturity. maturity is some date in the future and is representative of until when a specific entry will be available. Thus it's different for different entries but is not necessarily unique. And with time number of entries which have not reached this maturity date changes.
I need to count a number of entries from such a table that were available on a specific date (thus entries that have not reached their maturity). So I basically need to join this two queries:
SELECT generate_series as date FROM generate_series('2015-10-01'::date, now()::date, '1 day');
SELECT COUNT(id) FROM mytable WHERE mytable.maturity > now()::date;
where instead of now()::date I need to put entry from the generated series. I'm sure this has to be simple enough, but I can't quite get around it. I need the resulting solution to remain a query, thus it seems that I can't use for loops.
Sample table entries:
id | maturity
---+-------------------
1 | 2015-10-03
2 | 2015-10-05
3 | 2015-10-11
4 | 2015-10-11
Expected output:
date | count
------------+-------------------
2015-10-01 | 4
2015-10-02 | 4
2015-10-03 | 3
2015-10-04 | 3
2015-10-05 | 2
2015-10-06 | 2
NOTE: This count doesn't constantly decrease, since new entries are added and this count increases.
You have to use fields of outer query in WHERE clause of a sub-query. This can be done if the subquery is in the SELECT clause of the outer query:
SELECT generate_series,
(SELECT COUNT(id)
FROM mytable
WHERE mytable.maturity > generate_series)
FROM generate_series('2015-10-01'::date, now()::date, '1 day');
More info: http://www.techonthenet.com/sql_server/subqueries.php
I think you want to group your data by the maturity Date.
Check this:
select maturity,count(*) as count
from your_table group by maturity;