Group by statement to do average of time - sql

My existing database has data coming for an id, value and time. There is one record coming every 3 seconds. I want my select statement to use these data and group them based on id and hrly basis to show the average of the values in that hr. How can I use group by to achieve this ?
This is my sample data:
id value date time
a 5 5/18/2015 10:27:22
a 9 5/18/2015 10:27:25
b 7 5/18/2015 10:27:22
b 8 5/18/2015 10:27:22
I have data coming in every 3 seconds. I want it to be aggregated based on every hr of the day to reflect avg values of that id in that hr.
I want the output to look like
id -a , gives avg of 7 , at 10 on 5/18/2015

This is a relatively simple group by which will have two types of columns generally. Your grouped columns and your aggregates. In this case your grouped columns will have ID,date, and hr(calculated from [time]). You only have one aggregated column in this case: the average of value. Check out my code:
SELECT ID,
[date],
DATEPART(HOUR,[time]) AS hr,
AVG(value) AS avg_val
FROM yourTable
GROUP BY ID,[date],DATEPART(HOUR,[time])

This query will pull each ID along with the average value, grouped by each hour of the day. If you want to run this for more than 1 day, you would have to group by the date + the hour so 6/19/2015 10:00, then 6/19/2015 11:00 and so forth.
SELECT
id,
avg(value) AS avg_val,
datepart(hh, time_interval) AS time_interval
FROM my_table
WHERE time_interval = '6/19/2015'
GROUP BY id, datepart(hh, time_interval)
To include multiple days, you could group change the group by section to be:
GROUP BY id, convert(varchar(10), time_interval, 120), datepart(hh,time_interval)

Related

SQL - Get historic count of rows collected within a certain period by date

For many years I've been collecting data and I'm interested in knowing the historic counts of IDs that appeared in the last 30 days. The source looks like this
id
dates
1
2002-01-01
2
2002-01-01
3
2002-01-01
...
...
3
2023-01-10
If I wanted to know the historic count of ids that appeared in the last 30 days I would do something like this
with total_counter as (
select id, count(id) counts
from source
group by id
),
unique_obs as (
select id
from source
where dates >= DATEADD(Day ,-30, current_date)
group by id
)
select count(distinct(id))
from unique_obs
left join total_counter
on total_counter.id = unique_obs.id;
The problem is that this results would return a single result for today's count as provided by current_date.
I would like to see a table with such counts as if for example I had ran this analysis yesterday, and the day before and so on. So the expected result would be something like
counts
date
1235
2023-01-10
1234
2023-01-09
1265
2023-01-08
...
...
7383
2022-12-11
so for example, let's say that if the current_date was 2023-01-10, my query would've returned 1235.
If you need a distinct count of Ids from the 30 days up to and including each date the below should work
WITH CTE_DATES
AS
(
--Create a list of anchor dates
SELECT DISTINCT
dates
FROM source
)
SELECT COUNT(DISTINCT s.id) AS "counts"
,D.dates AS "date"
FROM CTE_DATES D
LEFT JOIN source S ON S.dates BETWEEN DATEADD(DAY,-29,D.dates) AND D.dates --30 DAYS INCLUSIVE
GROUP BY D.dates
ORDER BY D.dates DESC
;
If the distinct count didnt matter you could likely simplify with a rolling sum, only hitting the source table once:
SELECT S.dates AS "date"
,COUNT(1) AS "count_daily"
,SUM("count_daily") OVER(ORDER BY S.dates DESC ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING) AS "count_rolling" --assumes there is at least one row for every day.
FROM source S
GROUP BY S.dates
ORDER BY S.dates DESC;
;
This wont work though if you have gaps in your list of dates as it'll just include the latest 30 days available. In which case the first example without distinct in the count will do the trick.
SELECT count(*) AS Counts
dates AS Date
FROM source
WHERE dates >= DATEADD(DAY, -30, CURRENT_DATE)
GROUP BY dates
ORDER BY dates DESC

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

Running total with Over

I'm trying to create a running total of the number of files per opened by day so I can use the data for a graph showing cumulative results.
The data is basically the file opening date, a calculated field showing 'This month' or 'Last Month' depending on the date and the running total field that I'm trying to figure out.
Date Month Count
==== ===== =====
2019-08-01 Last Month 6
2019-08-02 Last Month 2
2019-08-03 Last Month 5
I want to have a running total...so 6, 8, 13 etc
But all I'm getting is a row count (1,2,3 etc) for my count field.
select
FileDate,
Month,
sum(Count) OVER(PARTITION BY month order by Filedate) as 'Count'
from (
select
1 as 'Count',
Case
When month(cast(concat(right(d.var_val,4),substring(d.var_val,4,2),left(d.var_val,2)) as DATE) ) = Month(getdate()) then 'This Month'
else 'Last Month'
end as 'Month'
FROM data d
left join otherdata m on d.VAR_FileID = m.MAT_FileID
left join otherdata u on m.MAT_Fee_Earner = u.User_ID
left join otherdata br on m.MAT_BranchID = br.BR_ID
WHERE d.var_no IN ( '1628' )
and Len(var_val) = 10
)files
where Month(FileDate) in (MONTH(FileDate()),MONTH(getDate())-1)
and Year(Filedate) = Year(Getdate())
and Dept = 'Peterborough Property'
group by Month, FileDate, count
GO
I'm assuming I've not quite grasped the proper usage of 'OVER' - any pointers would be great!
The Partition clause indicates when to reset the count, so by partitioning by month you are only counting records for each discreet month to get a running total, over the whole dataset, you don't want the partition clause at all, just the order by clause.
Hope your clear with OVER clause now (with "Sentinel" answer), in which case you should replace desired column as follows, so that count continuously increase for all the rows from sub-query based on order by clause: for more details on OVER Clause..
sum(Count) OVER (Oder by Filedate) as [Count]
-- or
sum(Count) OVER (Oder by Filedate desc) as [Count]

Postgres SQL: Sum of ids greater than a day, computed day by day over a series

Looking to compute a moving sum day by day over a date range. i.e. Looking to sum all values greater than or equal to the date but do it row by row. I know that a window function is needed, but need some help with the actual function.
** I need to compute the sum greater than each date in a row. Notice on 2017-08-02 I do not count the value from the day before
Example data:
2017-08-1, 1
2017-08-2, 5
2017-08-3, 4
2017-08-4, 3
2017-08-5, 2
Desired Result:
2017-08-1, 15
2017-08-2, 14
2017-08-3, 9
2017-08-4, 5
2017-08-5, 2
Here is what I have to produce this data.
SELECT DATE_TRUNC('day', created_at),
COUNT(*)
FROM table
GROUP BY 1
ORDER BY 1 DESC
Just use cumulative sums:
SELECT DATE_TRUNC('day', created_at),
COUNT(*),
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at) DESC) as sum_greater_than
FROM table
GROUP BY 1
ORDER BY 1 DESC;

Smoothing out a result set by date

Using SQL I need to return a smooth set of results (i.e. one per day) from a dataset that contains 0-N records per day.
The result per day should be the most recent previous value even if that is not from the same day. For example:
Starting data:
Date: Time: Value
19/3/2014 10:01 5
19/3/2014 11:08 3
19/3/2014 17:19 6
20/3/2014 09:11 4
22/3/2014 14:01 5
Required output:
Date: Value
19/3/2014 6
20/3/2014 4
21/3/2014 4
22/3/2014 5
First you need to complete the date range and fill in the missing dates (21/3/2014 in you example). This can be done by either joining a calendar table if you have one, or by using a recursive common table expression to generate the complete sequence on the fly.
When you have the complete sequence of dates finding the max value for the date, or from the latest previous non-null row becomes easy. In this query I use a correlated subquery to do it.
with cte as (
select min(date) date, max(date) max_date from your_table
union all
select dateadd(day, 1, date) date, max_date
from cte
where date < max_date
)
select
c.date,
(
select top 1 max(value) from your_table
where date <= c.date group by date order by date desc
) value
from cte c
order by c.date;
May be this works but try and let me know
select date, value from test where (time,date) in (select max(time),date from test group by date);