Max value by ID, date and last x days - sql

Supposed I have a table :
---------------
id | date | value
------------------
1 | Jan 1 | 10
1 | Jan 2 | 12
1 | Jan 3 | 11
2 | Jan 4 | 11
I need to get the max and median value of each id, each date, each for the past 90 days. Im using query :
select id, date, value
max(value) over (partition by id, date) as max_date,
median(value) over (partition by id, date) as med_date
from table
where date > date - interval '90 days'
I tried to export the data and check manually but the result is not correct. Any thing I missed? thanks
expected output is to get maximum value of since the last 90 days. for example the date is April 5th, then it will find the maximum value from Jan 5th (the last 90 days) until April 5th. and then the date moves to April 6th, then it will do again for jan 6th until April 6h and so on for each ID

So im assuming u can get several values for same ID and Date and right ? otherwise partitioning for both id and date makes no sense
SELECT id, date, max(value), avg(value) from table where date > date - interval '90 days'
group by id, value
'group by' does the partitioning

Why are you using window functions? This seems to do what you describe:
select id,
max(value) as max_date,
percentile_disc(0.5) within group (order by value) as median_value
from table
where date > date - interval '90 days';
If you want this per date, use window functions:
select t.*
from (select t.*,
max(value) over (order by date range between '89 day' preceding and current row) as running_max_value,
percentile_disc(0.5) within group (order by value) range between '89 day' preceding and current row) as running_median_value
from t
) t
where date > date - interval '90 days';
The filter is in the outer query so the preceding period can go back further in time.

Related

Get days of the week from a date range in Postgres

So I have the following table :
id end_date name number_of_days start_date
1 "2022-01-01" holiday1 1 "2022-01-01"
2 "2022-03-20" holiday2 1 "2022-03-20"
3 "2022-04-09" holiday3 1 "2022-04-09"
4 "2022-05-01" holiday4 1 "2022-05-01"
5 "2022-05-04" holiday5 3 "2022-05-02"
6 "2022-07-12" holiday6 9 "2022-07-20"
I want to check if a week falls in a holiday range.
So far I can select the holidays that overlap with my choosen week( week_start_date, week_end_date) , but i cant get the exact days in which the overlap happens.
this is the query i'm using, i want to add a mechanism to detect the DAYS OF THE WEEK IN WHICH THE OVERLAP HAPPENS
SELECT * FROM holidays
where daterange(CAST(start_date AS date), CAST(end_date as date), '[]') && daterange('2022-07-18', '2022-07-26','[]')
THE CURRENT QUERY RETURNS THE OVERLLAPPING HOLIDA, (id = 6), however i'm trying to get the exact DAYS OF THE WEEK in which the overlap happens ( in this case, it should be monday,tuesday , wednesday)
You can use the * operator with tsranges, generate a series of dates with the lower and upper dates and finally with to_char print the days of the week, e.g.
SELECT
id, name, start_date, end_date, array_agg(dow) AS days
FROM (
SELECT *,
trim(
to_char(
generate_series(lower(overlap), upper(overlap),'1 day'),
'Day')) AS dow
FROM holidays
CROSS JOIN LATERAL (SELECT tsrange(start_date,end_date) *
tsrange('2022-07-18', '2022-07-26')) t (overlap)
WHERE tsrange(start_date,end_date) && tsrange('2022-07-18', '2022-07-26')) j
GROUP BY id,name,start_date,end_date,number_of_days;
id | name | start_date | end_date | days
----+----------+------------+------------+----------------------------
6 | holiday6 | 2022-07-12 | 2022-07-20 | {Monday,Tuesday,Wednesday}
(1 row)
Demo: db<>fiddle

Select Sum of Grouped Values over Date Range (Window Function)

I have a table of names, dates and numeric values. I want to know the total first date entry and the total sum of numeric values for the first 90 days after the first date.
Eg
name
date
value
Joe
2020-10-30
3
Bob
2020-12-23
5
Joe
2021-01-03
7
Joe
2021-05-30
2
I want a query that returns
name
min_date
sum_first_90_days
Joe
2020-10-30
10
Bob
2020-12-23
5
So far I have
SELECT name, min(date) min_date,
sum(value) over (partition by name
order by date
rows between min(date) and dateadd(day,90,min(date))
) as first_90_days_sum
FROM table
but it's not executing. What's a good approach here? How can I set up a window function to use a dynamic date range for each partition?
You can use window functions and aggregation:
select name, sum(value)
from (select t.*,
min(date) over (partition by name) as min_date
from t
) t
where date <= min_date + interval '90 day'
group by name;

PostgreSQL group by and order by

I have a table with a date column. I wanted to get the count of months and display them in the order of months. Months should be displayed as 'Jan', 'Feb' etc. If I use to_char function, the order by happens on text. I can use extract(month from dt), but that will also display month in number format. This is part of a report and month should be displayed in 'Mon' format only.
SELECT to_char(dt,'Mon'), COUNT(*) FROM tb GROUP BY to_char(dt,'Mon') ORDER BY to_char(dt,'Mon');
to_char | count
---------+-------
Dec | 1
Jan | 1
Jul | 2
select month, total
from (
select
extract(month from dt) as month_number,
to_char(dt,'mon') as month,
count(*) as total
from tb
group by 1, 2
) s
order by month_number

Calculating difference between daily sum and a average for the same day of the week in defined time range. SQL 10g Oracle

Hi I'm working with data depending mostly on the day of the week. Data is formatted in a table
Date - position - count/number.
There are multiple different positions.
I was able to sort my data for a each day of the week using.
select MOD(to_char(time, 'J'),7),
sum(COUNT))
from TABLE
where time > sysdate -x
group by to_char(time, 'J')
order by to_char(time, 'J');
This outputs daily sums according to day of the week.
Now I'm able to get an average for a single day of a week in a year.
This code outputs an average for only Sunday
SELECT AVG(asset_sums)
FROM (
select MOD(to_char(time, 'J'),7),
sum(COUNT)) as asset_sums
from table
where time > sysdate -365
and MOD(TO_CHAR(time, 'J'), 7) + 1 IN (7)
group by to_char(time, 'J')
order by to_char(time, 'J')
);
My goal is to be able to get a table with daily sum compared with yearly average for that particular day of the week.
For example yearly average number for Mondays is 57 , Tuesdays 60.
This week my Monday is 59 and Tuesday is 57. Output of the table is
Monday +2, Tuesday -3.
What is the easiest way / most efficient ?
Thanks for your help.
Edit : Format of my data
Date : yyyy-mm-dd | Place : xxxx | Number( of customers) 0 to 10000
2013-09-16 | AAAA | 1534
2013-09-16 | AAAB | 534
2013-09-17 | AAAA | 1434
2013-09-17 | AAAC | 834
2013-09-18 | AAAA | 134
2013-09-18 | AAAD | 183
Needed output
2013-09-16 | Day of the week | Sum | Average monday this year | Difference Sum-AVG
2013-09-16 | 1 (= Monday) | 2068 | 2015| 53
For clarity I will use subquery factoring. First, select the current weeks data. Next, subquery the sum for the day over the current week. Then, subquery the sum for each day over the past year. Then, average the daily sum of each day for each day of the week. Finally, join the two and display the difference.
with
this_week as (
select
time
from table
where time > x - 7
group by time
),
this_week_dly_sum as (
select
to_char(time, 'd') day,
sum(count) sum
from this_week
group by to_char(time, 'd')
),
this_year_dly_sum as (
select
time,
sum(count) sum
from table
where time > x - 365
group by time
),
this_year_dly_avg as (
select
to_char(day, 'd'),
avg(sum) avg
from this_year_dly_sum
group by to_char(day, 'd')
)
select
this_week.time,
to_char(this_week.time, 'day') day of week,
this_week_dly_sum.sum,
this_year_dly_avg.avg,
this_week_dly_sum.sum - this_year_dly_avg.avg difference
from this_week
inner join this_week_dly_sum
on to_char(this_week.time, 'd') = this_week_dly_sum.day
inner join this_year_dly_avg
on to_char(this_week.time, 'd').day = this_year_dly_avg.
group by time
;
You can use analytic function for this.
select date1, to_char(date1, 'd'),
sum(val) over(partition by to_char(date1, 'd')),
avg(val) over(partition by to_char(date1, 'd')),
sum(val) over(partition by to_char(date1, 'd'))-
avg(val) over(partition by to_char(date1, 'd'))
from table1
time > add_month(sysdate,-12);
This will give you daily counts for the last year:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
You can modify it to additionally return averages per day of the week for the specified range:
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
At this point you have all the initial data you need but probably for more days than necessary. You can use the above query as a derived table to limit the rows to just those where date > SYSDATE - x:
WITH last_year_by_day AS
(
SELECT TRUNC(time, 'DD') AS date,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER
(PARTITION BY TO_CHAR(TRUNC(time, 'DD'), 'D')) AS asset_sum_avg
FROM yourtable
WHERE time > SYSDATE - 365
GROUP BY TRUNC(time, 'DD')
)
SELECT date,
TO_CHAR(TRUNC(time, 'DD'), 'D') AS day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
As some expressions are being repeated multiple times, it can be a good idea to re-factor the query to avoid the repetition. Here's one way:
WITH last_year AS
(
SELECT TRUNC(time, 'DD') AS date,
TO_CHAR(time, 'D') AS day_of_week,
count
FROM yourtable
WHERE time > SYSDATE - 365
),
last_year_by_day AS
(
SELECT date,
day_of_week,
SUM(count) AS asset_sum,
AVG(SUM(count)) OVER (PARTITION BY day_of_week) AS asset_sum_avg
FROM last_year
GROUP BY date, day_of_week
)
SELECT date,
day_of_week,
asset_sum,
asset_sum_avg,
asset_sum - asset_sum_avg AS asset_sum_diff
FROM last_year_by_day
WHERE date > SYSDATE - x
;
One last note is about TO_CHAR('D'), which is used to obtain the day_of_week values. Since you are using a different method for the same results, you may not be aware that the results of TO_CHAR('D') are affected by the NLS_TERRITORY setting. You may want to use an ALTER SESSION statement to set NLS_TERRITORY to the value that would cause TO_CHAR('D') to return 1 for Monday, 2 for Tuesday etc. Here is the list of territories supported.

Need to find Average of top 3 records grouped by ID in SQL

I have a postgres table with customer ID's, dates, and integers. I need to find the average of the top 3 records for each customer ID that have dates within the last year. I can do it with a single ID using the SQL below (id is the customer ID, weekending is the date, and maxattached is the integer).
One caveat: the maximum values are per month, meaning we're only looking at the highest value in a given month to create our dataset, thus why we're extracting month from the date.
SELECT
id,
round(avg(max),0)
FROM
(
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max
FROM
myTable
WHERE
weekending >= now() - interval '1 year' AND
id=110070 group by id,month,year
ORDER BY
max desc limit 3
) AS t
GROUP BY id;
How can I expand this query to include all ID's and a single averaged number for each one?
Here is some sample data:
ID | MaxAttached | Weekending
110070 | 5 | 2011-11-10
110070 | 6 | 2011-11-17
110071 | 4 | 2011-11-10
110071 | 7 | 2011-11-17
110070 | 3 | 2011-12-01
110071 | 8 | 2011-12-01
110070 | 5 | 2012-01-01
110071 | 9 | 2012-01-01
So, for this sample table, I would expect to receive the following results:
ID | MaxAttached
110070 | 5
110071 | 8
This averages the highest value in a given month for each ID (6,3,5 for 110070 and 7,8,9 for 110071)
Note: postgres version 8.1.15
First - get the max(maxattached) for every customer and month:
SELECT id,
max(maxattached) as max_att
FROM myTable
WHERE weekending >= now() - interval '1 year'
GROUP BY id, date_trunc('month',weekending);
Next - for every customer rank all his values:
SELECT id,
max_att,
row_number() OVER (PARTITION BY id ORDER BY max_att DESC) as max_att_rank
FROM <previous select here>;
Next - get the top 3 for every customer:
SELECT id,
max_att
FROM <previous select here>
WHERE max_att_rank <= 3;
Next - get the avg of the values for every customer:
SELECT id,
avg(max_att) as avg_att
FROM <previous select here>
GROUP BY id;
Next - just put all the queries together and rewrite/simplify them for your case.
UPDATE: Here is an SQLFiddle with your test data and the queries: SQLFiddle.
UPDATE2: Here is the query, that will work on 8.1 :
SELECT customer_id,
(SELECT round(avg(max_att),0)
FROM (SELECT max(maxattached) as max_att
FROM table1
WHERE weekending >= now() - interval '2 year'
AND id = ct.customer_id
GROUP BY date_trunc('month',weekending)
ORDER BY max_att DESC
LIMIT 3) sub
) as avg_att
FROM customer_table ct;
The idea - to take your initial query and run it for every customer (customer_table - table with all unique id for customers).
Here is SQLFiddle with this query: SQLFiddle.
Only tested on version 8.3 (8.1 is too old to be on SQLFiddle).
8.3 version
8.3 is the oldest version I've got access to, so I can't guarantee it'll work in 8.1
I'm using a temporary table to work out the best three records.
CREATE TABLE temp_highest_per_month as
select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month,
0 as priority
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year;
UPDATE temp_highest_per_month t
SET priority =
(select count(*) from temp_highest_per_month t2
where t2.id = t.id and
(t.max_in_month < t2.max_in_month or
(t.max_in_month= t2.max_in_month and
t.year * 12 + t.month > t2.year * 12 + t.month)));
select id,round(avg(max_in_month),0)
from temp_highest_per_month
where priority <= 3
group by id;
The year & month are included in the working out the priority so that if two months have the same maximum, they'll still be included in the numbering correctly.
9.1 version
Similar to Igor's answer, but I used the With clause to split the steps.
with highest_per_month as
( select
id,
extract(month from weekending) as month,
extract(year from weekending) as year,
max(maxattached) as max_in_month
FROM
myTable
WHERE
weekending >= now() - interval '1 year'
group by id,month,year),
prioritised as
( select id, month, year, max_in_month,
row_number() over (partition by id, month, year
order by max_in_month desc)
as priority
from highest_per_month
)
select id, round(avg(max_in_month),0)
from prioritised
where priority <= 3
group by id;