PSQL Recursive Adding Query - sql

I have a table called "deaths" with two columns. One is a date, and the second is the amount of people who died on that specific date.
I need a query that gives me the total amount of people who died between that date and 90 days prior. For example, if row value of the date is 30/09/2021, I would need to add the deaths since 02/07/2021.
¿Can I get any guidance as to how can I do this?
"deaths" Table example below.
Date | Deaths |
------------+--------+
2021-08-19 | 21 |
2021-08-18 | 96 |
2021-08-17 | 100 |
2021-08-16 | 64 |
2021-08-15 | 107 |
2021-08-14 | 93 |
So, if this was all my data, the first row (2021-08-19) of my result should be (21 + 96 + 100 + 64 + 107 + 93).
Hope I was clear enough.

You don't need a recursive query for this, you can use window functions instead. As others have mentioned, "date" is not a good name for a column and it would have been better to give sample data with dates more than 90 days apart, but I believe this query should work for you:
SELECT "date",
deaths,
sum(deaths) OVER (ORDER BY "date" RANGE BETWEEN interval '90 days' preceding and current row)
FROM deaths;
The clause RANGE BETWEEN interval '90 days' preceding and current row, called the frame, limits the rows that will be part of the sum.

Use
Select distinct "date" , (Select sum(deaths) from table where
"date" <=d."date" and "date" >=d."date" -90)
as tot_deaths from tabl3 d;

Related

How do I calculate a rolling average over a specific range timeframe in BigQuery?

I have a BigQuery table like the one below, where data wasn't necessarily recorded at a consistent rate:
| timestamp | value |
|-------------------------|-------|
| 2022-10-01 00:03:00 UTC | 2.43 |
| 2022-10-01 00:17:00 UTC | 4.56 |
| 2022-10-01 00:36:00 UTC | 3.64 |
| 2022-10-01 00:58:00 UTC | 2.15 |
| 2022-10-01 01:04:00 UTC | 2.90 |
| 2022-10-01 01:13:00 UTC | 5.88 |
... ...
I want to calculate a rolling average (as a new column) on value over a certain timeframe, e.g. the previous 12 hours. I know it's relatively simple to do over a fixed number of rows, and I've tried using LAG and TIMESTAMP_SUB functions to select the right values to average over, but I'm quite new to SQL so I'm not even sure if this is the right approach.
Does anyone know how to go about this? Thanks!
Please use a window function.
You need to calculate a date and hour column as an integer. For this we take the unix date and multiply it by 24 hours. Then we add the hours of the day. We ignore daylight saving time.
WITH
tbl AS (SELECT 10* rand() as val, timestamp_add(snapshot_date,interval cast(rand()*5000 as int64) minute) as timestamps FROM UNNEST(GENERATE_Timestamp_ARRAY("2021-01-01 00:00:00","2023-01-01 0:00:00",INTERVAL 1 hour)) AS snapshot_date)
SELECT
*,
unix_date(date(timestamps))*24+extract(hour from timestamps) as dummy_time,
avg(val) over WIN1_range as rolling_avg,
sum(1) over WIN1_range as values_in_avg
FROM
tbl
window WIN1_range as (order by unix_date(date(timestamps))*24+extract(hour from timestamps) range between 12 PRECEDING and current row)
BigQuery has simplified specifications for the range frame of window functions:
Tip: If you want to use a range with a date, use ORDER BY with the UNIX_DATE() function. If you want to use a range with a timestamp, use the UNIX_SECONDS(), UNIX_MILLIS(), or UNIX_MICROS() function.
Here, we can simply use unix_seconds() when ordering the records in the partition, and accordingly specify an interval of 12 hours as seconds:
select ts, val,
avg(value) over(
order by unix_seconds(ts)
range between 12 * 60 * 60 preceding and current row
) as avg_last_12_hours
from mytable
Now say we wanted the average over the last 2 days, we would use unix_date() instead:
select ts, val,
avg(value) over(
order by unix_date(ts)
range between 2 preceding and current row
) as avg_last_12_hours
from mytable

3 month rolling average with missing months

I've been reading the related questions here, and so far the solutions require that there are no missing months. Would love to get some help on what I can do if there are missing months?
For example, I'd like to calculate the 3 month rolling average of orders per item. If there is a missing month for an item, the calculation assumes that the number of orders for that item for that month is 0. If there are fewer than three months left, the rolling average isn't so important (it can be null or otherwise).
MONTH | ITEM | ORDERS | ROLLING_AVG
2021-04 | A | 5 | 3.33
2021-04 | B | 4 | 3
2021-03 | A | 3 | 1.66
2021-03 | B | 5 | null
2021-02 | A | 2 | null
2021-01 | B | 2 | null
Big thanks in advance!
Also, is there a way to "add" the missing month rows without using a cross join with a list of items? For example if I have 10 million items, the cross join takes quite a while to execute.
You can use a range window frame -- and some conditional logic:
select t.*,
(case when min(month) over (partition by item) <= month - interval '2 month'
then sum(orders) over (partition by item
order by month
range between interval '2 month' preceding and current row
) / 3.0
end) as rolling_average
from t;
Here is a db<>fiddle. The results are slightly different from what is in your question, because there is not enough info for A in 2021-03 but there is enough for B in 2021-03.

How to make query that selects based on 1 day interval?

How can I get all IDs that have more than 10 entries on one day?
Here is the sample data:
ID | Time
__________________________
4 | 2019-02-14 17:22:43
__________________________
2 | 2019-04-27 07:51:09
__________________________
83 | 2018-01-07 08:38:37
__________________________
I am having a hard time using count and going through and finding all of the ones on the same day. The Hour:Min:Sec is what is causing problems for me.
For MySql it would be:
select distinct id from tablename
group by id, date(time)
having count(*) > 10
The date() function rejects the time part of the column, so the grouping is done only by the date part.
For SqlServer you would use:
convert(date, time)

How to do a sub-select per result entry in postgresql?

Assume I have a table with only two columns: id, maturity. maturity is some date in the future and is representative of until when a specific entry will be available. Thus it's different for different entries but is not necessarily unique. And with time number of entries which have not reached this maturity date changes.
I need to count a number of entries from such a table that were available on a specific date (thus entries that have not reached their maturity). So I basically need to join this two queries:
SELECT generate_series as date FROM generate_series('2015-10-01'::date, now()::date, '1 day');
SELECT COUNT(id) FROM mytable WHERE mytable.maturity > now()::date;
where instead of now()::date I need to put entry from the generated series. I'm sure this has to be simple enough, but I can't quite get around it. I need the resulting solution to remain a query, thus it seems that I can't use for loops.
Sample table entries:
id | maturity
---+-------------------
1 | 2015-10-03
2 | 2015-10-05
3 | 2015-10-11
4 | 2015-10-11
Expected output:
date | count
------------+-------------------
2015-10-01 | 4
2015-10-02 | 4
2015-10-03 | 3
2015-10-04 | 3
2015-10-05 | 2
2015-10-06 | 2
NOTE: This count doesn't constantly decrease, since new entries are added and this count increases.
You have to use fields of outer query in WHERE clause of a sub-query. This can be done if the subquery is in the SELECT clause of the outer query:
SELECT generate_series,
(SELECT COUNT(id)
FROM mytable
WHERE mytable.maturity > generate_series)
FROM generate_series('2015-10-01'::date, now()::date, '1 day');
More info: http://www.techonthenet.com/sql_server/subqueries.php
I think you want to group your data by the maturity Date.
Check this:
select maturity,count(*) as count
from your_table group by maturity;

Can you define a custom "week" in PostgreSQL?

To extract the week of a given year we can use:
SELECT EXTRACT(WEEK FROM timestamp '2014-02-16 20:38:40');
However, I am trying to group weeks together in a bit of an odd format. My start of a week would begin on Mondays at 4am and would conclude the following Monday at 3:59:59am.
Ideally, I would like to create a query that provides a start and end date, then groups the total sales for that period by the weeks laid out above.
Example:
SELECT
(some custom week date),
SUM(sales)
FROM salesTable
WHERE
startDate BETWEEN 'DATE 1' AND 'DATE 2'
I am not looking to change the EXTRACT() function, rather create a query that would pull from the following sample table and output the sample results.
If 'DATE 1' in query was '2014-07-01' AND 'DATE 2' was '2014-08-18':
Sample Table:
itemID | timeSold | price
------------------------------------
1 | 2014-08-13 09:13:00 | 12.45
2 | 2014-08-15 12:33:00 | 20.00
3 | 2014-08-05 18:33:00 | 10.00
4 | 2014-07-31 04:00:00 | 30.00
Desired result:
weekBegin | priceTotal
----------------------------------
2014-07-28 04:00:00 | 30.00
2014-08-04 04:00:00 | 10.00
2014-08-11 04:00:00 | 32.45
Produces your desired output:
SELECT date_trunc('week', time_sold - interval '4h')
+ interval '4h' AS week_begin
, sum(price) AS price_total
FROM tbl
WHERE time_sold >= '2014-07-01 0:0'::timestamp
AND time_sold < '2014-08-19 0:0'::timestamp -- start of next day
GROUP BY 1
ORDER BY 1;
db<>fiddle here (extended with a row that actually shows the difference)
Old sqlfiddle
Explanation
date_trunc() is the superior tool here. You are not interested in week numbers, but in actual timestamps.
The "trick" is to subtract 4 hours from selected timestamps before extracting the week - thereby shifting the time frame towards the earlier bound of the ISO week. To produce the desired display, add the same 4 hours back to the truncated timestamps.
But apply the WHERE condition on unmodified timestamps. Also, never use BETWEEN with timestamps, which have fractional digits. Use the WHERE conditions like presented above. See:
Unexpected results from SQL query with BETWEEN timestamps
Operating with data type timestamp, i.e. with (shifted) "weeks" according to the current time zone. You might want to work with timestamptz instead. See:
Ignoring time zones altogether in Rails and PostgreSQL