MySQL: daily average value - sql

I have a table with a 'timestamp' column and a 'value' column where the values are roughly 3 seconds apart.
I'm trying to return a table that has daily average values.
So, something like this is what i'm looking for.
| timestamp | average |
| 2010-06-02 | 456.6 |
| 2010-06-03 | 589.4 |
| 2010-06-04 | 268.5 |
etc...
Any help on this would be greatly appreciated.

SELECT DATE(timestamp), AVG(value)
FROM table
GROUP BY DATE(timestamp)
Since you want the day instead of each timestamp

select DATE(timestamp), AVG(value)
from TABLE
group by DATE(timestamp)

This assumes that your timestamp column only contains information about the day, but not the time. That way, the dates can be grouped together:
select timestamp, AVG(value) as average
from TABLE_NAME
group by timestamp

Related

Running sum of unique users in redshift

I have a table with as follows with user visits by day -
| date | user_id |
|:-------- |:-------- |
| 01/31/23 | a |
| 01/31/23 | a |
| 01/31/23 | b |
| 01/30/23 | c |
| 01/30/23 | a |
| 01/29/23 | c |
| 01/28/23 | d |
| 01/28/23 | e |
| 01/01/23 | a |
| 12/31/22 | c |
I am looking to get a running total of unique user_id for the last 30 days . Here is the expected output -
| date | distinct_users|
|:-------- |:-------- |
| 01/31/23 | 5 |
| 01/30/23 | 4 |
.
.
.
Here is the query I tried -
SELECT date
, SUM(COUNT(DISTINCT user_id)) over (order by date rows between 30 preceding and current row) AS unique_users
FROM mytable
GROUP BY date
ORDER BY date DESC
The problem I am running into is that this query not counting the unique user_id - for instance the result I am getting for 01/31/23 is 9 instead of 5 as it is counting user_id 'a' every time it occurs.
Thank you, appreciate your help!
Not the most performant approach, but you could use a correlated subquery to find the distinct count of users over a window of the past 30 days:
SELECT
date,
(SELECT COUNT(DISTINCT t2.user_id)
FROM mytable t2
WHERE t2.date BETWEEN t1.date - INTERVAL '30 day' AND t1.date) AS distinct_users
FROM mytable t1
ORDER BY date;
There are a few things going on here. First window functions run after group by and aggregation. So COUNT(DISTINCT user_id) gives the count of user_ids for each date then the window function runs. Also, window function set up like this work over the past 30 rows, not 30 days so you will need to fill in missing dates to use them.
As to how to do this - I can only think of the "expand to the data so each date and id has a row" method. This will require a CTE to generate the last 2 years of dates plus 30 days so that the look-back window works for the first dates. Then window over the past 30 days for each user_id and date to see which rows have an example of this user_id within the past 30 days, setting the value to NULL if no uses of the user_id are present within the window. Then Count the user_ids counts (non NULL) grouping by just date to get the number of unique user_ids for that date.
This means expanding the data significantly but I see no other way to get truly unique user_ids over the past 30 days. I can help code this up if you need but will look something like:
WITH RECURSIVE CTE to generate the needed dates,
CTE to cross join these dates with a distinct set of all the user_ids in user for the past 2 years,
CTE to join the date/user_id data set with the table of real data for past 2 years and 30 days and window back counting non-NULL user_ids, partition by date and user_id, order by date, and setting any zero counts to NULL with a DECODE() or CASE statement,
SELECT, grouping by just date count the user_ids by date;

Postgres query for difference between latest and first record of the day

Postgres data alike this:
| id | read_at | value_1 |
| ------|------------------------|---------|
| 16239 | 2021-11-28 16:13:00+00 | 1509 |
| 16238 | 2021-11-28 16:12:00+00 | 1506 |
| 16237 | 2021-11-28 16:11:00+00 | 1505 |
| 16236 | 2021-11-28 16:10:00+00 | 1501 |
| 16235 | 2021-11-28 16:09:00+00 | 1501 |
| ..... | .......................| .... |
| 15266 | 2021-11-28 00:00:00+00 | 1288 |
A value is added every minute and increases over time.
I would like to get the current total for the day and have this in a Grafana stat panel. Above it would be: 221 (1509-1288). Latest record minus first record of today.
SELECT id,read_at,value_1
FROM xyz
ORDER BY id DESC
LIMIT 1;
With this the latest record is given (A).
SELECT id,read_at,value_1
FROM xyz
WHERE read_at = CURRENT_DATE
ORDER BY id DESC
LIMIT 1;
With this the first record of the day is given (B).
Grafana cannot do math on this (A-B). Single query would be best.
Sadly my database knowledge is low and attempts at building queries have not succeeded, and have taken all afternoon now.
Theoretical ideas to solve this:
Subtract the min from the max value where time frame is today.
Using a lag, lag it for the count of records that are recorded today. Subtract lag value from latest value.
Window function.
What is the best way (performance wise) forward and how would such query be written?
Calculate the cumulative total last_value - first_value for each record for the current day using window functions (this is the t subquery) and then pick the latest one.
select current_total, read_at::date as read_at_date
from
(
select last_value(value_1) over w - first_value(value_1) over w as current_total,
read_at
from the_table
where read_at >= current_date and read_at < current_date + 1
window w as (partition by read_at::date order by read_at)
) as t
order by read_at desc limit 1;
However if it is certain that value_1 only "increases over time" then simple grouping will do and that is by far the best way performance wise:
select max(value_1) - min(value_1) as current_total,
read_at::date as read_at_date
from the_table
where read_at >= current_date and read_at < current_date + 1
group by read_at::date;
Please, check if it works.
Since you intend to publish it in Grafana, the query does not impose a period filter.
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/3080
create table g (id int, read_at timestamp, value_1 int);
insert into g
values
(16239, '2021-11-28 16:13:00+00', 1509),
(16238, '2021-11-28 16:12:00+00', 1506),
(16237, '2021-11-28 16:11:00+00', 1505),
(16236, '2021-11-28 16:10:00+00', 1501),
(16235, '2021-11-28 16:09:00+00', 1501),
(15266, '2021-11-28 00:00:00+00', 1288);
select date(read_at), max(value_1) - min(value_1)
from g
group by date(read_at);
Since you data contains multiple values for 2 distinct times (16:09 and 16:10), this indicates the possibility that min and max values do not always increase in the time interval. Leaving open the possibility of a decrease. So do you want max - min reading or the difference in reading at min/max time. The following get value difference to get difference between the first and latest reading of the day as indicated in the title.
with parm(dt) as
( values (date '2021-11-28') )
, first_read (f_read,f_value) as
( select read_at, value_1
from test_tbl
where read_at at time zone 'UTC'=
( select min(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
, last_read (l_read, l_value) as
( select read_at,value_1
from test_tbl
where read_at at time zone 'UTC'=
( select max(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
select l_read, f_read, l_value, f_value, l_value - f_value as "Day Difference"
from last_read
join first_read on true;

Easy substraction of year's values

I do have the following database table containing the timestamps in unix format as well as the total yield (summing up) of my solar panels every 5 mins:
| Timestamp | TotalYield |
|------------|------------|
| 1321423500 | 1 |
| 1321423800 | 5 |
| ... | |
| 1573888800 | 44094536 |
Now I would like to calculate how much energy was produced each year. I thought of reading the first and last timestamp using UNION of each year:
SELECT strftime('%d.%m.%Y',datetime(TimeStamp,'unixepoch')), TotalYield FROM PascalsDayData WHERE TimeStamp IN (
SELECT MAX(TimeStamp) FROM PascalsDayData GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
UNION
SELECT MIN(TimeStamp) FROM DayData GROUP BY strftime('%Y',datetime(TimeStamp,'unixepoch'))
)
This works fine but I need to do some post processing to substract end year's value with the first year's one. There must be a more elegant way to do this in SQL, right?
Thanks,
Anton
You can aggregate by year and subtract the min and max value:
SELECT MAX(TotalYield) - MIN(TotalYield)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
This assumes that TotalYield does not decrease -- which your question implies.
If you actually want the next year's value, you can use LEAD():
SELECT (LEAD(MIN(TotalYield), 1, MAX(TotalYield) OVER (ORDER BY MIN(TimeStamp) -
MIN(TotalYield)
)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))

How to make query that selects based on 1 day interval?

How can I get all IDs that have more than 10 entries on one day?
Here is the sample data:
ID | Time
__________________________
4 | 2019-02-14 17:22:43
__________________________
2 | 2019-04-27 07:51:09
__________________________
83 | 2018-01-07 08:38:37
__________________________
I am having a hard time using count and going through and finding all of the ones on the same day. The Hour:Min:Sec is what is causing problems for me.
For MySql it would be:
select distinct id from tablename
group by id, date(time)
having count(*) > 10
The date() function rejects the time part of the column, so the grouping is done only by the date part.
For SqlServer you would use:
convert(date, time)

Postgres how to determine there are X records spanning 2 days of datetimes in a table

I have a table containing electricity meter readings which looks something like this:
| meter_id | reading_interval_datetime |
| 110 | 2018-01-15T00:00:00+00:00 |
| 110 | 2018-01-15T00:30:00+00:00 |
The table is filled with at most 48 records per day (one reading every 30 mins).
What's an efficient way to check if a particular meter has at least two days of readings in there?
You can determine if a meter_id has at least two days by doing:
select meter_id
from t
group by meter_id
having min(reading_interval_datetime::date) <> max(reading_interval_datetime::date);
This will check that there are two dates in the data.
I would do this:
sql> create index your_table_idx on your_table(meter_id, date(reading_interval_datetime));
sql> select meter_id, date(reading_interval_datetime), count(1)
from your_table
where meter_id = THE_METER_ID_YOUD_LIKE_TO_CHECK
group by meter_id, date(reading_interval_datetime)
having count(1) > 1