Get average of last 7 days - sql

I'm attacking a problem, where I have a value for a a range of dates. I would like to consolidate the rows in my table by averaging them and reassigning the date column to be relative to the last 7 days. My SQL experience is lacking and could use some help. Thanks for giving this a look!!
E.g.
7 rows with dates and values.
UniqueId Date Value
........ .... .....
a 2014-03-20 2
a 2014-03-21 2
a 2014-03-22 3
a 2014-03-23 5
a 2014-03-24 1
a 2014-03-25 0
a 2014-03-26 1
Resulting row
UniqueId Date AvgValue
........ .... ........
a 2014-03-26 2
First off I am not even sure this is possible. I'm am trying to attack a problem with this data at hand. I thought maybe using a framing window with a partition to roll the dates into one date with the averaged result, but am not exactly sure how to say that in SQL.

Am taking following as sample
CREATE TABLE some_data1 (unique_id text, date date, value integer);
INSERT INTO some_data1 (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'b', '2014-03-01', 1),
( 'b', '2014-03-02', 1),
( 'b', '2014-03-03', 1),
( 'b', '2014-03-04', 1),
( 'b', '2014-03-05', 1),
( 'b', '2014-03-06', 1),
( 'b', '2014-03-07', 1)
OPTION A : - Using PostgreSQL Specific Function WITH
with cte as (
select unique_id
,max(date) date
from some_data1
group by unique_id
)
select max(sd.unique_id),max(sd.date),avg(sd.value)
from some_data1 sd inner join cte using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFIDDLE DEMO
OPTION B : - To work in PostgreSQL and MySQL
select max(sd.unique_id)
,max(sd.date)
,avg(sd.value)
from (
select unique_id
,max(date) date
from some_data1
group by unique_id
) cte inner join some_data1 sd using(unique_id)
where sd.date <=cte.date
group by cte.unique_id
limit 7
> SQLFDDLE DEMO

Maybe something along the lines of SELECT AVG(Value) AS 'AvgValue' FROM tableName WHERE Date BETWEEN dateStart AND dateEnd That will get you the average between those dates and you have dateEnd already so you could use that result to create the row you're looking for.

For PostgreSQL a window function might be what you want:
DROP TABLE IF EXISTS some_data;
CREATE TABLE some_data (unique_id text, date date, value integer);
INSERT INTO some_data (unique_id, date, value) VALUES
( 'a', '2014-03-20', 2),
( 'a', '2014-03-21', 2),
( 'a', '2014-03-22', 3),
( 'a', '2014-03-23', 5),
( 'a', '2014-03-24', 1),
( 'a', '2014-03-25', 0),
( 'a', '2014-03-26', 1),
( 'a', '2014-03-27', 3);
WITH avgs AS (
SELECT unique_id, date,
avg(value) OVER w AS week_avg,
count(value) OVER w AS num_days
FROM some_data
WINDOW w AS (
PARTITION BY unique_id
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW))
SELECT unique_id, date, week_avg
FROM avgs
WHERE num_days=7
Result:
unique_id | date | week_avg
-----------+------------+--------------------
a | 2014-03-26 | 2.0000000000000000
a | 2014-03-27 | 2.1428571428571429
Questions include:
What happens if a day from the preceding six days is missing? Do we want to add it and count it as zero?
What happens if you add a day? Is the result of the code above what you want (a rolling 7-day average)?

For SQL Server, you can follow the below approach. Try this
1. For weekly value's average
SET DATEFIRST 4
;WITH CTE AS
(
SELECT *,
DATEPART(WEEK,[DATE])WK,
--Find last day in that week
ROW_NUMBER() OVER(PARTITION BY UNIQUEID,DATEPART(WEEK,[DATE]) ORDER BY [DATE] DESC) RNO,
-- Find average value of that week
AVG(VALUE) OVER(PARTITION BY UNIQUEID,DATEPART(WEEK,[DATE])) AVGVALUE
FROM DATETAB
)
SELECT UNIQUEID,[DATE],AVGVALUE
FROM CTE
WHERE RNO=1
Click here to view result
2. For last 7 days value's average
DECLARE #DATE DATE = '2014-03-26'
;WITH CTE AS
(
SELECT UNIQUEID,[DATE],VALUE,#DATE CURRENTDATE
FROM DATETAB
WHERE [DATE] BETWEEN DATEADD(DAY,-7,#DATE) AND #DATE
)
SELECT UNIQUEID,CURRENTDATE [DATE],AVG(VALUE) AVGVALUE
FROM CTE
GROUP BY UNIQUEID,CURRENTDATE
Click here to view result

Related

Find missing months in SQL

So this post remains unanswered and not useful
Finding missing month from my table
This link Get Missing Month from table requires a lookup table... which is not my first choice.
I have a table with Financial Periods, and a reference number. Each reference numbers has a series of financial periods which may start anywhere, and end anywhere. The test is simply that between the start and end, there is no gap - i.e. there must be every financial period period the smallest and largest dates, when grouped by reference number.
A financial period is a month.
So... in this example below, Reference Number A is missing May 2016.
REF MONTH
A 2016-04-01
A 2016-06-01
A 2016-07-01
B 2016-03-01
B 2016-04-01
B 2016-05-01
C 2022-05-01
-- Find the boundaries of each ref
select REF
, MIN(Month) as smallest
, MAX(Month) as largest
from myTable
group by REF
-- But how to find missing items?
SQL Server 2019.
Clearly a Calendar Table would make this a small task (among many others)
Here is an alternative using the window function lead() over()
Example
Declare #YourTable Table ([REF] varchar(50),[MONTH] date) Insert Into #YourTable Values
('A','2016-04-01')
,('A','2016-06-01')
,('A','2016-07-01')
,('B','2016-03-01')
,('B','2016-04-01')
,('B','2016-05-01')
,('C','2022-05-01')
;with cte as (
Select *
,Missing = datediff(MONTH,[Month],lead([Month],1) over (partition by Ref order by [Month]))-1
From #YourTable
)
Select * from cte where Missing>0
Results
REF MONTH Missing
A 2016-04-01 1
I added one more row of input to demonstrate the solution better.
with forecast (
REF,
[MONTH]
) as (
select REF
, [MONTH]
from (
values
('A', {d '2016-04-01'})
, ('A', {d '2016-06-01'})
, ('A', {d '2016-07-01'})
, ('B', {d '2016-03-01'})
, ('B', {d '2016-04-01'})
, ('B', {d '2016-05-01'})
, ('B', {d '2016-09-01'})
, ('C', {d '2022-05-01'})
) x (REF, [MONTH])
),
-- define the date ranges
daterange as (
select REF
, min([MONTH]) as dtmin
, max([MONTH]) as dtmax
from forecast
group by REF
),
-- get all of the [end of month] dates in the range
dt (
REF,
[MONTH]
) as (
select REF
, dtmin
from daterange dr
union all
select dt.REF
, dateadd(month, 1, [MONTH])
from dt dt
inner join daterange dr on dr.REF = dt.REF
where dateadd(month, 1, [MONTH]) <= dr.dtmax
)
-- find the missing months
select REF
, [MONTH]
from dt
except
select REF
, [MONTH]
from forecast
order by 1, 2
-- or list all of the months for each REF
--select REF
--, [MONTH]
--from dt

Compare array of datetime objects and pick all rows where difference between each and the next is less than 7 days

My table looks like this:
(can't post images yet)
I want to select all names from my table where the time difference between each of the datetime objects and the next is always more than 7 days.
So from the above I would get only Paul, since Adam's first two times are already only a day apart.
The best I can come up with is to get the time difference between the smallest and largest datetime in the array and then divide by array_length(datetime). So basically the average time all datetime objects, but that's not helping me.
I'm using Standard SQL on BigQuery
SELECT name
FROM dataset.table
WHERE NOT EXISTS(
SELECT 1 FROM UNNEST(datetime) AS dt WITH OFFSET off
WHERE DATETIME_DIFF(
datetime[SAFE_OFFSET(off - 1)], dt, DAY
) <= 7
)
This compares each entry in the array with the one after it, looking for any where the number of days is 7 or less.
You can use unnest():
select t.*
from t
where not exists (select 1
from (select dt, lag(dt) over (order by dt) as prev_dt
from unnest(datetime) dt
) x
where dt < datetime_add(prev_dt, interval 7 day
);
It is still not clear what exactly the schema of your data: based on layout - it looks like datetime is an array, but based on data type you show in the image - it could be just regular field, so below cover both cases (for BigQuery Standard SQL)
Case 1 - repeated field
#standardSQL
SELECT name
FROM `project.dataset.table`
WHERE 7 < (
SELECT DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY) distance
FROM UNNEST(datetime) datetime
ORDER BY IFNULL(distance, 777)
LIMIT 1
)
you can test, play with it using dummy data as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Adam' name,
[DATETIME '2018-07-26T17:55:03',
'2018-07-27T17:55:03',
'2018-06-29T17:55:03',
'2018-07-16T17:55:03',
'2018-08-19T17:55:03',
'2018-07-14T17:55:03'] datetime UNION ALL
SELECT 'Paul', [DATETIME '2018-08-26T17:55:03',
'2018-08-18T17:55:03',
'2018-06-20T17:55:03',
'2018-08-09T17:55:03',
'2018-07-16T17:55:03']
)
SELECT name
FROM `project.dataset.table`
WHERE 7 < (
SELECT DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY) distance
FROM UNNEST(datetime) datetime
ORDER BY IFNULL(distance, 777)
LIMIT 1
)
Case 2 - regular (not repeated field)
#standardSQL
SELECT name FROM (
SELECT name,
DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY
) distance
FROM `project.dataset.table`
)
GROUP BY name
HAVING MIN(distance) > 7
Dummy data example below:
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'Adam' name, DATETIME '2018-07-26T17:55:03' datetime UNION ALL
SELECT 'Adam', '2018-07-27T17:55:03' UNION ALL
SELECT 'Adam', '2018-06-29T17:55:03' UNION ALL
SELECT 'Adam', '2018-07-16T17:55:03' UNION ALL
SELECT 'Adam', '2018-08-19T17:55:03' UNION ALL
SELECT 'Adam', '2018-07-14T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-26T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-18T17:55:03' UNION ALL
SELECT 'Paul', '2018-06-20T17:55:03' UNION ALL
SELECT 'Paul', '2018-08-09T17:55:03' UNION ALL
SELECT 'Paul', '2018-07-16T17:55:03'
)
SELECT name FROM (
SELECT name,
DATETIME_DIFF(
datetime,
LAG(datetime) OVER(PARTITION BY name ORDER BY datetime),
DAY
) distance
FROM `project.dataset.table`
)
GROUP BY name
HAVING MIN(distance) > 7
both return same result
Row name
1 Paul

SQL counting days with gap / overlapping

I am working on a "counting days" problem almost identical to this one. I have a list of date(s), and need to count how many days used excluding duplicate, and handling the gaps. Same input and output.
From: Markus Jarderot
Input
ID d1 d2
1 2011-08-01 2011-08-08
1 2011-08-02 2011-08-06
1 2011-08-03 2011-08-10
1 2011-08-12 2011-08-14
2 2011-08-01 2011-08-03
2 2011-08-02 2011-08-06
2 2011-08-05 2011-08-09
Output
ID hold_days
1 11
2 8
SQL to find time elapsed from multiple overlapping intervals
But for the life of me I couldn't understand Markus Jarderot's solution.
SELECT DISTINCT
t1.ID,
t1.d1 AS date,
-DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2 -- Join for any events occurring while this
ON t2.ID = t1.ID -- is starting. If this is a start point,
AND t2.d1 <> t1.d1 -- it won't match anything, which is what
AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0
Why is DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) picking from the min(d1) from the entire list? Is that regardless of ID.
And what does t1.d1 BETWEEN t2.d1 AND t2.d2 do? Is that to ensure only overlapped interval are calculated?
Same thing with group by, I think because if in the event the same identical period will be discarded? I tried to trace the solution by hand but getting more confused.
This is mostly a duplicate of my answer here (including explanation) but with the inclusion of grouping on an id column. It should use a single table scan and does not require a recursive sub-query factoring clause (CTE) or self joins.
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;
Query 1:
SELECT id,
SUM( days ) AS total_days
FROM (
SELECT id,
dt - LAG( dt ) OVER ( PARTITION BY id
ORDER BY dt ) + 1 AS days,
start_end
FROM (
SELECT id,
dt,
CASE SUM( value ) OVER ( PARTITION BY id
ORDER BY dt ASC, value DESC, ROWNUM ) * value
WHEN 1 THEN 'start'
WHEN 0 THEN 'end'
END AS start_end
FROM your_table
UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
)
WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id
Results:
| ID | TOTAL_DAYS |
|----|------------|
| 1 | 25 |
| 2 | 13 |
| 3 | 9 |
The brute force method is to create all days (in a recursive query) and then count:
with dates(id, day, d2) as
(
select id, d1 as day, d2 from mytable
union all
select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;
Unfortunately there is a bug in some Oracle versions and recursive queries with dates don't work there. So try this code and see whether it works in your system. (I have Oracle 11.2 and the bug still exists there; so I guess you need Oracle 12c.)
I guess Markus' idea is to find all starting points that are not within other ranges and all ending points that aren't. Then just take the first starting point till the first ending point, then the next starting point till the next ending point, etc. As Markus isn't using a window function to number starting and ending points, he must find a more complicated way to achieve this. Here is the query with ROW_NUMBER. Maybe this gives you a start what to look for in Markus' query.
select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
select id, d1 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d1 > m2.d1 and m1.d1 <= m2.d2
)
) startpoint
join
(
select id, d2 as day, row_number() over (partition by id order by d1) as rn
from mytable m1
where not exists
(
select *
from mytable m2
where m1.id = m2.id
and m1.d2 >= m2.d1 and m1.d2 < m2.d2
)
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;
If all your intervals start at different dates, consider them in ascending order by d1 counting how many days are from d1 to the next interval.
You can discard an interval of it is contained in another one.
The last interval won't have a follower.
This query should give you how many days each interval give
select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1
Then group by id and sum days

How to select the current date row from multiple date rows using system date

I have a table with many rows, they contain different dates, any one of them will be for the current period. There is no end date as a field otherwise i would have compared system date between from and to date. I have tried using max function but still it displays many rows.
The data is grouped by a type identifier, so for each type there will be a current date row.
What can be the best query to get the current row (single) which is active considering the current date?
Below is the original query:
Select Group1,Group2,FromDate,FPFrom, FpTo FROM [DB].[dbo].[HGD] AS GD, [DB].[dbo].[HDT] AS TD WHERE GD.GRoup1 = TD.MainGroup
Thanks
SELECT TOP 1 * FROM yourTable WHERE procStart <= getdate() ORDER BY procStart DESC
or something like
SELECT * FROM (SELECT TOP 1 * FROM yourTable row_number OVER(GROUP BY TypeId, Order By procStart DESC) RN WHERE procStart <= getdate()) DQ WHERE DQ.RN = 1
Please try to be more precise. I think you are looking something like shown below:
CREATE TABLE #temp(
SomeDate datetime,
SomeType int
)
INSERT #temp VALUES
('2016-07-20', 1),
('2016-07-23', 1),
('2016-07-27', 1),
('2016-07-30', 1),
('2016-01-25', 3),
('2016-01-31', 3),
('2016-02-21', 3),
('2016-07-23', 3),
('2016-09-30', 3)
WITH Numbered AS
(
SELECT SomeDate, SomeType, ROW_NUMBER() OVER (PARTITION BY SomeType ORDER BY SomeDate) RowNumber
FROM #temp
),
Ranges AS
(
SELECT T1.SomeDate StartPeriod, COALESCE(T2.SomeDate, DATEADD(year,1,GETDATE())) EndPeriod, T1.SomeType
FROM Numbered T1
LEFT JOIN Numbered T2 ON T1.RowNumber+1=T2.RowNumber AND T1.SomeType=T2.SomeType
)
SELECT * FROM Ranges
WHERE GETDATE() BETWEEN StartPeriod AND EndPeriod
ORDER BY SomeType
This yields:
StartPeriod EndPeriod SomeType
2016-07-23 00:00:00.000 2016-07-27 00:00:00.000 1
2016-07-23 00:00:00.000 2016-09-30 00:00:00.000 3
#Paweł Dyl gave me an idea and I added a condition to my query and got the desired results.
ToDate field was not available , so I created a field by adding 180 days to it.
AND GetDate() BETWEEN cast(FromDate as Date) AND DATEADD(DAY, 180,cast(FromDate as DATE))
Thanks again.

SQL server query to find values grouped by one column but different in at least one of other columns

Please pardon the title of my question -
I have a table
TRXN (ID,ACCT_NUM,TRAN_MEMO,AMOUNT,DATE,LRN)
I want to write a query to pull records which have same LRN but atleast one of the other column has different value. Is it possible?
In my answer I consider you have unique value for ID and exclude it.
Table created:
CREATE TABLE #TRXN (ID INT IDENTITY(1, 1)
,ACCT_NUM INT
,TRAN_MEMO INT
,AMOUNT INT
,[DATE] DATE
,LRN INT
)
Sample data inserted
INSERT INTO #TRXN VALUES (1, 2, 2, '1 jan 2000', 2)
,(2, 2, 2, '2 jan 2000', 2)
,(1, 2, 2, '1 jan 2000', 2)
,(1, 2, 2, '1 jan 2000', 3)
Have same LRN but at least one of the other column has different value
;WITH C AS(
SELECT ROW_NUMBER() OVER (PARTITION BY ACCT_NUM, TRAN_MEMO, AMOUNT, [DATE], LRN ORDER BY ACCT_NUM, TRAN_MEMO, AMOUNT, [DATE], LRN) AS Rn
,ID, ACCT_NUM, TRAN_MEMO, AMOUNT, [DATE], LRN
FROM #TRXN WHERE LRN IN(
SELECT LRN FROM #TRXN GROUP BY LRN HAVING COUNT(ID) > 1)
)
SELECT ID, ACCT_NUM, TRAN_MEMO, AMOUNT, [DATE], LRN
FROM C WHERE Rn = 1
Output:
ID ACCT_NUM TRAN_MEMO AMOUNT DATE LRN
---------------------------------------------
1 1 2 2 2000-01-01 2
2 2 2 2 2000-01-02 2
why simply, use group by:
SELECT COUNT(1) AS numberOfGroupedRows,ID,ACCT_NUM,TRAN_MEMO,AMOUNT,DATE,LRN
FROM TRNX GROUP BY ID,ACCT_NUM,TRAN_MEMO,AMOUNT,DATE,LRN
since group by it will group all similar rows in one row