Easy substraction of year's values - sql

I do have the following database table containing the timestamps in unix format as well as the total yield (summing up) of my solar panels every 5 mins:
| Timestamp | TotalYield |
|------------|------------|
| 1321423500 | 1 |
| 1321423800 | 5 |
| ... | |
| 1573888800 | 44094536 |
Now I would like to calculate how much energy was produced each year. I thought of reading the first and last timestamp using UNION of each year:
SELECT strftime('%d.%m.%Y',datetime(TimeStamp,'unixepoch')), TotalYield FROM PascalsDayData WHERE TimeStamp IN (
SELECT MAX(TimeStamp) FROM PascalsDayData GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
UNION
SELECT MIN(TimeStamp) FROM DayData GROUP BY strftime('%Y',datetime(TimeStamp,'unixepoch'))
)
This works fine but I need to do some post processing to substract end year's value with the first year's one. There must be a more elegant way to do this in SQL, right?
Thanks,
Anton

You can aggregate by year and subtract the min and max value:
SELECT MAX(TotalYield) - MIN(TotalYield)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))
This assumes that TotalYield does not decrease -- which your question implies.
If you actually want the next year's value, you can use LEAD():
SELECT (LEAD(MIN(TotalYield), 1, MAX(TotalYield) OVER (ORDER BY MIN(TimeStamp) -
MIN(TotalYield)
)
FROM PascalsDayData
GROUP BY strftime('%Y', datetime(TimeStamp, 'unixepoch'))

Related

BigQuery for running count of distinct values with a dynamic date-range

We are trying to make a query where we get the sum of unique customers on a specific year-month + the sum of unique customers on the 364 days before the specific date.
For example:
Our customer-table looks like this:
| order_date | customer_unique_id |
| -------- | -------------- |
| 2020-01-01 | tom#email.com |
| 2020-01-01 | daisy#email.com |
| 2019-05-02 | tom#email.com |
In this example we have two customers who ordered on 2020-01-01 and one of them already ordered within the 364-days timeframe.
The desired table should look like this:
| year_month | unique_customers |
| -------- | -------------- |
| 2020-01 | 2 |
We tried multiple solutions, such as partitioning and windows, but nothing seem to work correctly. The tricky part is the uniqueness. We want the look 364 days back but want to do a count distinct on the customers based on that whole period and not based on date/year/month because then we would get duplicates. For example, if you partition by date, year or month tom#email.com would be counted twice instead of once.
The goal of this query is to get insight into the order-frequency (orders divided by customers) over a time period from 12 months.
We work with Google BigQuery.
Hope someone can help us out! :)
Here is a way to achieve your desired results. Note that this query does year-month join in a separate query, and joins it with the rolling 364-day-interval query.
with year_month_distincts as (
select
concat(
cast(extract(year from order_date) as string),
'-',
cast(extract(month from order_date) as string)
) as year_month,
count(distinct customer_id) as ym_distincts
from customer_table
group by 1
)
select x.order_date, x.ytd_distincts, y.ym_distincts from (
select
a. order_date,
(select
count(distinct customer_id)
from customer_table b
where b.order_date between date_sub(a.order_date, interval 364 day) and a.order_date
) as ytd_distincts
from orders a
group by 1
) x
join year_month_distincts y on concat(
cast(extract(year from x.order_date) as string),
'-',
cast(extract(month from x.order_date) as string)
) = y.year_month
Two options using arrays that may help.
Look back 364 days as requested
In case you wish to look back 11 months (given reporting is monthly)
month_array AS (
SELECT
DATE_TRUNC(order_date,month) AS order_month,
STRING_AGG(DISTINCT customer_unique_id) AS cust_mth
FROM customer_table
GROUP BY 1
),
year_array AS (
SELECT
order_month,
STRING_AGG(cust_mth) OVER(ORDER by UNIX_DATE(order_month) RANGE BETWEEN 364 PRECEDING AND CURRENT ROW) cust_12m
-- (option 2) STRING_AGG(cust_mth) OVER (ORDER by cast(format_date('%Y%m', order_month) as int64) RANGE BETWEEN 99 PRECEDING AND CURRENT ROW) AS cust_12m
FROM month_array
)
SELECT format_date('%Y-%m',order_month) year_month,
(SELECT COUNT(DISTINCT cust_unique_id) FROM UNNEST(SPLIT(cust_12m)) AS cust_unique_id) as unique_12m
FROM year_array

Postgres query for difference between latest and first record of the day

Postgres data alike this:
| id | read_at | value_1 |
| ------|------------------------|---------|
| 16239 | 2021-11-28 16:13:00+00 | 1509 |
| 16238 | 2021-11-28 16:12:00+00 | 1506 |
| 16237 | 2021-11-28 16:11:00+00 | 1505 |
| 16236 | 2021-11-28 16:10:00+00 | 1501 |
| 16235 | 2021-11-28 16:09:00+00 | 1501 |
| ..... | .......................| .... |
| 15266 | 2021-11-28 00:00:00+00 | 1288 |
A value is added every minute and increases over time.
I would like to get the current total for the day and have this in a Grafana stat panel. Above it would be: 221 (1509-1288). Latest record minus first record of today.
SELECT id,read_at,value_1
FROM xyz
ORDER BY id DESC
LIMIT 1;
With this the latest record is given (A).
SELECT id,read_at,value_1
FROM xyz
WHERE read_at = CURRENT_DATE
ORDER BY id DESC
LIMIT 1;
With this the first record of the day is given (B).
Grafana cannot do math on this (A-B). Single query would be best.
Sadly my database knowledge is low and attempts at building queries have not succeeded, and have taken all afternoon now.
Theoretical ideas to solve this:
Subtract the min from the max value where time frame is today.
Using a lag, lag it for the count of records that are recorded today. Subtract lag value from latest value.
Window function.
What is the best way (performance wise) forward and how would such query be written?
Calculate the cumulative total last_value - first_value for each record for the current day using window functions (this is the t subquery) and then pick the latest one.
select current_total, read_at::date as read_at_date
from
(
select last_value(value_1) over w - first_value(value_1) over w as current_total,
read_at
from the_table
where read_at >= current_date and read_at < current_date + 1
window w as (partition by read_at::date order by read_at)
) as t
order by read_at desc limit 1;
However if it is certain that value_1 only "increases over time" then simple grouping will do and that is by far the best way performance wise:
select max(value_1) - min(value_1) as current_total,
read_at::date as read_at_date
from the_table
where read_at >= current_date and read_at < current_date + 1
group by read_at::date;
Please, check if it works.
Since you intend to publish it in Grafana, the query does not impose a period filter.
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/3080
create table g (id int, read_at timestamp, value_1 int);
insert into g
values
(16239, '2021-11-28 16:13:00+00', 1509),
(16238, '2021-11-28 16:12:00+00', 1506),
(16237, '2021-11-28 16:11:00+00', 1505),
(16236, '2021-11-28 16:10:00+00', 1501),
(16235, '2021-11-28 16:09:00+00', 1501),
(15266, '2021-11-28 00:00:00+00', 1288);
select date(read_at), max(value_1) - min(value_1)
from g
group by date(read_at);
Since you data contains multiple values for 2 distinct times (16:09 and 16:10), this indicates the possibility that min and max values do not always increase in the time interval. Leaving open the possibility of a decrease. So do you want max - min reading or the difference in reading at min/max time. The following get value difference to get difference between the first and latest reading of the day as indicated in the title.
with parm(dt) as
( values (date '2021-11-28') )
, first_read (f_read,f_value) as
( select read_at, value_1
from test_tbl
where read_at at time zone 'UTC'=
( select min(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
, last_read (l_read, l_value) as
( select read_at,value_1
from test_tbl
where read_at at time zone 'UTC'=
( select max(read_at at time zone 'UTC')
from test_tbl
join parm
on ((read_at at time zone 'UTC')::date = dt)
)
)
select l_read, f_read, l_value, f_value, l_value - f_value as "Day Difference"
from last_read
join first_read on true;

Postgresql percentage of records created in the specific time interval

I have a table that contains the field created_at. I want to calculate the percentage of records from the total number that was created in the specified time interval. Let's say that I have the following structure:
| name | created_at |
----------------------------------------
| first | "2019-04-29 09:30:07.441717" |
| second | "2019-04-30 09:30:07.441717" |
| third | "2019-04-28 09:30:07.441717" |
| fourth | "2019-04-27 09:30:07.441717" |
So I want to calculate what is the percentage of records created in the time interval between 2019-04-28 00:00:00 and 2019-04-30 00:00:00. In this time interval, I have two records first and third, so the result should be 50%. I came across the OVER() clause, but either I don't get how to use it, or it's not what I need.
You can use CASE
select 100 * count(case
when created_at between '2019-04-28 00:00:00' and '2019-04-30 00:00:00'
then 1
end) / count(*)
from your_table
I would just use avg():
select avg( (created_at between '2019-04-28' and '2019-04-30')::int )
from your_table
You can multiply by 100, if you want a value between 0 and 1.
I strongly discourage you from using between with date/time values. The time components may not behave the way you want. You used "between" in your question, but I left it in. However, I would suggest:
select avg( (created_at >= '2019-04-28' and
created_at < '2019-04-30'
)::int
)
from your_table;
It is not clear if you want < '2019-04-30', <= '2019-04-30' or '2019-05-01'.

Populating a table with all dates in a given range in Google BigQuery

Is there any convenient way to populate a table with all dates in a given range in Google BigQuery? What I need are all dates from 2015-06-01 till CURRENT_DATE(), so something like this:
+------------+
| date |
+------------+
| 2015-06-01 |
| 2015-06-02 |
| 2015-06-03 |
| ... |
| 2016-07-11 |
+------------+
Optimally, the next step would be to also get all weeks between the two dates, i.e.:
+---------+
| week |
+---------+
| 2015-23 |
| 2015-24 |
| 2015-25 |
| ... |
| 2016-28 |
+---------+
I've been fiddling around with the following answers I found, but I can't get them to work, mostly because core functions aren't supported and I can't find proper ways to replace them.
Easiest way to populate a temp table with dates between and including 2 date parameters
Generate Dates between date ranges
Your help is very much appreciated!
Best,
Max
Mikhail's answer works for BigQuery's legacy sql syntax perfectly. This solution is a slightly easier one if you're using the standard SQL syntax.
BigQuery standard SQL syntax actually has a built in function, GENERATE_DATE_ARRAY for creating an array from a date range. It takes a start date, end date and INTERVAL. For example:
SELECT day
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 DAY)
) AS day
If you wanted the week and year you could use
SELECT EXTRACT(YEAR FROM day), EXTRACT(WEEK FROM day)
FROM UNNEST(
GENERATE_DATE_ARRAY(DATE('2015-06-01'), CURRENT_DATE(), INTERVAL 1 WEEK)
) AS day
all dates from 2015-06-01 till CURRENT_DATE()
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
all weeks between the two dates
SELECT YEAR(DAY) AS y, WEEK(DAY) AS w
FROM (
SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS DAY
FROM (
SELECT ROW_NUMBER() OVER() AS pos, *
FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
FROM (SELECT NULL)),h
)))
)
GROUP BY y, w

MySQL: daily average value

I have a table with a 'timestamp' column and a 'value' column where the values are roughly 3 seconds apart.
I'm trying to return a table that has daily average values.
So, something like this is what i'm looking for.
| timestamp | average |
| 2010-06-02 | 456.6 |
| 2010-06-03 | 589.4 |
| 2010-06-04 | 268.5 |
etc...
Any help on this would be greatly appreciated.
SELECT DATE(timestamp), AVG(value)
FROM table
GROUP BY DATE(timestamp)
Since you want the day instead of each timestamp
select DATE(timestamp), AVG(value)
from TABLE
group by DATE(timestamp)
This assumes that your timestamp column only contains information about the day, but not the time. That way, the dates can be grouped together:
select timestamp, AVG(value) as average
from TABLE_NAME
group by timestamp