Counts for time range per day - sql

I have a table something like this
create table widgets
(
id primary key,
created_at timestamp,
-- other fields
)
Now I want a query that shows the count of widgets with created_at between multiple time ranges for each day. For example, the count of widgets with created_at between 00:00:00 and 11:59:59 and the count between 12:00:00 and 23:59:59. The output would look something like this:
date | morning widgets (before noon) | evening widgets (after noon) |
---------------|-------------------------------|------------------------------|
2022-05-01 | ## | ## |
2022-05-02 | ## | ## |
2022-05-03 | ## | ## |
2022-05-04 | ## | ## |
... etc.
So far, I figured out I can get counts per day:
select created_at::date as created_at_date, count(*) as total
from widgets
where created_at::date >= '2022-05-01' -- where clause for illustration purposes only and not critical to the central question here
group by created_at::date
I'm learning about windowing functions, specifically partition by. I think this will help me get what I want, but not sure. How do I do this?
I'd prefer a "standard SQL" solution. If necessary, I'm on postgres and can use anything specific to its flavor of SQL.

If I understand correctly, we can try to use the condition window function to make it.
morning widgets (before noon) : between 00:00:00 and 11:59:59
evening widgets (after noon) : between 12:00:00 and 23:59:59
put the condition in aggregate function by CASE WHEN expression.
SELECT created_at::date,
COUNT(CASE WHEN created_at >= created_at::date AND created_at <= created_at::date + INTERVAL '12 HOUR' THEN 1 END) ,
COUNT(CASE WHEN created_at >= created_at::date + INTERVAL '12 HOUR' AND created_at <= created_at::date+ INTERVAL '1 DAY' THEN 1 END)
FROM widgets w
GROUP BY created_at::date
ORDER BY created_at::date
sqliddle

Related

Postgres query for difference between latest and first record of the day

Postgres data alike this:
| id | read_at | value_1 |
| ------|------------------------|---------|
| 16239 | 2021-11-28 16:13:00+00 | 1509 |
| 16238 | 2021-11-28 16:12:00+00 | 1506 |
| 16237 | 2021-11-28 16:11:00+00 | 1505 |
| 16236 | 2021-11-28 16:10:00+00 | 1501 |
| 16235 | 2021-11-28 16:09:00+00 | 1501 |
| ..... | .......................| .... |
| 15266 | 2021-11-28 00:00:00+00 | 1288 |
A value is added every minute and increases over time.
I would like to get the current total for the day and have this in a Grafana stat panel. Above it would be: 221 (1509-1288). Latest record minus first record of today.
SELECT id,read_at,value_1
FROM xyz
ORDER BY id DESC
LIMIT 1;
With this the latest record is given (A).
SELECT id,read_at,value_1
FROM xyz
WHERE read_at = CURRENT_DATE
ORDER BY id DESC
LIMIT 1;
With this the first record of the day is given (B).
Grafana cannot do math on this (A-B). Single query would be best.
Sadly my database knowledge is low and attempts at building queries have not succeeded, and have taken all afternoon now.
Theoretical ideas to solve this:
Subtract the min from the max value where time frame is today.
Using a lag, lag it for the count of records that are recorded today. Subtract lag value from latest value.
Window function.
What is the best way (performance wise) forward and how would such query be written?
Calculate the cumulative total last_value - first_value for each record for the current day using window functions (this is the t subquery) and then pick the latest one.
select current_total, read_at::date as read_at_date
from
(
select last_value(value_1) over w - first_value(value_1) over w as current_total,
read_at
from the_table
where read_at >= current_date and read_at < current_date + 1
window w as (partition by read_at::date order by read_at)
) as t
order by read_at desc limit 1;
However if it is certain that value_1 only "increases over time" then simple grouping will do and that is by far the best way performance wise:
select max(value_1) - min(value_1) as current_total,
read_at::date as read_at_date
from the_table
where read_at >= current_date and read_at < current_date + 1
group by read_at::date;
Please, check if it works.
Since you intend to publish it in Grafana, the query does not impose a period filter.
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/3080
create table g (id int, read_at timestamp, value_1 int);
insert into g
values
(16239, '2021-11-28 16:13:00+00', 1509),
(16238, '2021-11-28 16:12:00+00', 1506),
(16237, '2021-11-28 16:11:00+00', 1505),
(16236, '2021-11-28 16:10:00+00', 1501),
(16235, '2021-11-28 16:09:00+00', 1501),
(15266, '2021-11-28 00:00:00+00', 1288);
select date(read_at), max(value_1) - min(value_1)
from g
group by date(read_at);
Since you data contains multiple values for 2 distinct times (16:09 and 16:10), this indicates the possibility that min and max values do not always increase in the time interval. Leaving open the possibility of a decrease. So do you want max - min reading or the difference in reading at min/max time. The following get value difference to get difference between the first and latest reading of the day as indicated in the title.
with parm(dt) as
( values (date '2021-11-28') )
, first_read (f_read,f_value) as
( select read_at, value_1
from test_tbl
where read_at at time zone 'UTC'=
( select min(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
, last_read (l_read, l_value) as
( select read_at,value_1
from test_tbl
where read_at at time zone 'UTC'=
( select max(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
select l_read, f_read, l_value, f_value, l_value - f_value as "Day Difference"
from last_read
join first_read on true;

Aggregate data based on unix time stamp crate database

I'm very new to SQL and time series database. I'm using crate database ( it think which is used PostgreSQL).i want to aggregate the data by hour,day ,week and month. Unix time stamp is used to store the data. following is my sample database.
|sensorid | reading | timestamp|
====================================
|1 | 1604192522 | 10 |
|1 | 1604192702 | 9.65 |
|2 | 1605783723 | 8.1 |
|2 | 1601514122 | 9.6 |
|2 | 1602292210 | 10 |
|2 | 1602291611 | 12 |
|2 | 1602291615 | 10 |
i tried the sql query using FROM_UNIXTIME not supported .
please help me?
im looking the answer for hourly data as follows.
sensorid ,reading , timestamp
1 19.65(10+9.65) 1604192400(starting hour unixt time)
2 8.1 1605783600(starting hour unix time)
2 9.6 1601514000(starting hour unix time)
2 32 (10+12+10) 1602291600(starting hour unix time)
im looking the answer for monthly data is like
sensorid , reading , timestamp
1 24.61(10+9.65+8.1) 1604192400(starting month unix time)
2 41.6(9.6+10+12+10) 1601510400(starting month unix time)
A straight-forward approach is:
SELECT
(date '1970-01-01' + unixtime * interval '1 second')::date as date,
extract(hour from date '1970-01-01' + unixtime * interval '1 second') AS hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2
If you are content with having the date and time in the same column (which would seem more helpful to me), you can use date_trunc():
select
date_trunc('hour', date '1970-01-01' + unixtime * interval '1 second') as date_hour,
count(c.user) AS count
FROM core c
GROUP BY 1,2
You can convert a unix timestamp to a date/time value using to_timestamp(). You can aggregate along multiple dimensions at the same time using grouping sets. So, you might want:
select date_trunc('year', v.ts) as year,
date_trunc('month', v.ts) as month,
date_trunc('week', v.ts) as week,
date_trunc('day', v.ts) as day,
date_trunc('hour', v.ts) as hour,
count(*), avg(reading), sum(reading)
from t cross join lateral
(values (to_timestamp(timestamp))) v(ts)
group by grouping sets ( (year), (month), (week), (day), (hour) );

Postgresql percentage of records created in the specific time interval

I have a table that contains the field created_at. I want to calculate the percentage of records from the total number that was created in the specified time interval. Let's say that I have the following structure:
| name | created_at |
----------------------------------------
| first | "2019-04-29 09:30:07.441717" |
| second | "2019-04-30 09:30:07.441717" |
| third | "2019-04-28 09:30:07.441717" |
| fourth | "2019-04-27 09:30:07.441717" |
So I want to calculate what is the percentage of records created in the time interval between 2019-04-28 00:00:00 and 2019-04-30 00:00:00. In this time interval, I have two records first and third, so the result should be 50%. I came across the OVER() clause, but either I don't get how to use it, or it's not what I need.
You can use CASE
select 100 * count(case
when created_at between '2019-04-28 00:00:00' and '2019-04-30 00:00:00'
then 1
end) / count(*)
from your_table
I would just use avg():
select avg( (created_at between '2019-04-28' and '2019-04-30')::int )
from your_table
You can multiply by 100, if you want a value between 0 and 1.
I strongly discourage you from using between with date/time values. The time components may not behave the way you want. You used "between" in your question, but I left it in. However, I would suggest:
select avg( (created_at >= '2019-04-28' and
created_at < '2019-04-30'
)::int
)
from your_table;
It is not clear if you want < '2019-04-30', <= '2019-04-30' or '2019-05-01'.

Database Query to generate a Time-based Chart

I have a logins table in the following (simplified) structure:
id | login_time
---------
1 | 2019-02-04 18:14:30.026361+00
2 | 2019-02-04 22:10:19.720065+00
3 | 2019-02-06 15:51:53.799014+00
Now I want to generate chart like this:
https://prnt.sc/mifz6y
Basically I want to show the logins within the past 48 hours.
My current query:
SELECT count(*), date_trunc('hour', login_time) as time_trunced FROM user_logins
WHERE login_time > now() - interval '48' hour
GROUP BY time_trunced
ORDER BY time_trunced DESC
This works as long as there are entries for every hour. However, if in some hour there were no logins, there will be no entry selected, like this:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
15:00 | 3
16:00 | 5
I would need a continous query, so that I can simply put the count values into an array:
time_trunced | count
---------------------
12:00 | 1
13:00 | 2
14:00 | 0 <-- This is missing
15:00 | 3
16:00 | 5
Based on that I can simply transform the query result into an array like [1, 2, 0, 3, 5] and pass that to my frontend.
Is this possible with postgresql? Or do I need to implement my own logic?
I think I would do:
select gs.h, count(ul.login_time)
from generate_series(
date_trunc('hour', now() - interval '48 hour'),
date_trunc('hour', now()),
interval '1 hour'
) gs(h) left join
user_logins ul
on ul.login_time >= gs.h and
ul.login_time < gs.h + interval '1 hour'
group by gs.h
order by gs.h;
This can almost certainly be tidied up a bit but should give you some ides. Props to clamp for the generate_series() tip :
SELECT t.time_trunced,coalesce(l.login_count,0) as logins
FROM
(
-- Generate an inline view with all hours between the min & max values in user_logins table
SELECT date_trunc('hour',a.min_time)+ interval '1h' * b.hr_offset as time_trunced
FROM (select min(login_time) as min_time from user_logins) a
JOIN (select generate_series(0,(select ceil((EXTRACT(EPOCH FROM max(login_time))-EXTRACT(EPOCH FROM min(login_time)))/3600) from user_logins)::int) as hr_offset) b on true
) t
LEFT JOIN
(
-- OP's original query tweaked a bit
SELECT count(*) as login_count, date_trunc('hour', login_time) as time_trunced
FROM user_logins
GROUP BY time_trunced
) l on t.time_trunced=l.time_trunced
order BY 1 desc;

PostgreSQL: trying to find miss and mister of the last month with highest rating

At my Drupal website users can rate each other and those timestamped ratings are stored in the pref_rep table:
# select id, nice, last_rated from pref_rep where nice=true
order by last_rated desc limit 7;
id | nice | last_rated
------------------------+------+----------------------------
OK152565298368 | t | 2011-07-07 14:26:38.325716
OK452217781481 | t | 2011-07-07 14:26:10.831353
OK524802920494 | t | 2011-07-07 14:25:28.961652
OK348972427664 | t | 2011-07-07 14:25:17.214928
DE11873 | t | 2011-07-07 14:25:05.303104
OK335285460379 | t | 2011-07-07 14:24:39.062652
OK353639875983 | t | 2011-07-07 14:23:33.811986
Also I keep the gender of each user in the pref_users table:
# select id, female from pref_users limit 7;
id | female
----------------+--------
OK351636836012 | f
OK366097485338 | f
OK251293359874 | t
OK7848446207 | f
OK335478250992 | t
OK355400714550 | f
OK146955222542 | t
I'm trying to create 2 Drupal blocks displaying "Miss last month" and "Mister last month", but my question is not about Drupal, so please don't move it to drupal.stackexchange.com ;-)
My question is about SQL: how could I find the user with the highest count of nice - and that for the last month? I would have 2 queries - one for female and one for non-female.
Using PostgreSQL 8.4.8 / CentOS 5.6 and SQL is sometimes so hard :-)
Thank you!
Alex
UPDATE:
I've got a nice suggestion to cast timestamps to strings in order to find records for the last month (not for the last 30 days)
UPDATE2:
I've ended up doing string comparison:
select r.id,
count(r.id),
u.first_name,
u.avatar,
u.city
from pref_rep r, pref_users u where
r.nice=true and
to_char(current_timestamp - interval '1 month', 'IYYY-MM') =
to_char(r.last_rated, 'IYYY-MM') and
u.female=true and
r.id=u.id
group by r.id , u.first_name, u.avatar, u.city
order by count(r.id) desc
limit 1
Say you run it once on the first day of the month, and cache the results, since counting votes on every page is kinda useless.
First some date arithmetic :
SELECT now(),
date_trunc( 'month', now() ) - '1 MONTH'::INTERVAL,
date_trunc( 'month', now() );
now | ?column? | date_trunc
-------------------------------+------------------------+------------------------
2011-07-07 16:24:38.765559+02 | 2011-06-01 00:00:00+02 | 2011-07-01 00:00:00+02
OK, we got the bounds for the "last month" datetime range.
Now we need some window function to get the first rows per gender :
SELECT * FROM (
SELECT *, rank( ) over (partition by gender order by score desc )
FROM (
SELECT user_id, count(*) AS score FROM pref_rep
WHERE nice=true
AND last_rated >= date_trunc( 'month', now() ) - '1 MONTH'::INTERVAL
AND last_rated < date_trunc( 'month', now() )
GROUP BY user_id) s1
JOIN users USING (user_id)) s2
WHERE rank=1;
Note this can give you several rows in case of ex-aequo.
EDIT :
I've got a nice suggestion to cast timestamps to strings in order to
find records for the last month (not for the last 30 days)
date_trunc() works much better.
If you make 2 queries, you'll have to make the count() twice. Since users can potentially vote many times for other users, that table will probably be the larger one, so scanning it once is a good thing.
You can't "leave joining back onto the users table to the outer part of the query too" because you need genders...
Query above takes about 30 ms with 1k users and 100k votes so you'd definitely want to cache it.