Moving Average / Rolling Average - sql

I have 2 columns in MS SQL one is Serial no. and other is values. I need the thrird column which gives me the sum of the value in that row and the next 2.
Ex
SNo values
1 2
2 3
3 1
4 2
5 6
7 9
8 3
9 2
So I need third column which has sum of 2+3+1, 3+1+2 and So on, so the 8th and 9th row will not have any values:
1 2 6
2 3 6
3 1 4
4 2 5
5 1 6
7 2 7
8 3
9 2
Can the Solution be generic so that I can Varry the current window size of adding 3 numbers to a bigger number say 60.

Here is the SQL Fiddle that demonstrates the following query:
WITH TempS as
(
SELECT s.SNo, s.value,
ROW_NUMBER() OVER (ORDER BY s.SNo) AS RowNumber
FROM MyTable AS s
)
SELECT m.SNo, m.value,
(
SELECT SUM(s.value)
FROM TempS AS s
WHERE RowNumber >= m.RowNumber
AND RowNumber <= m.RowNumber + 2
) AS Sum3InRow
FROM TempS AS m
In your question you were asking to sum 3 consecutive values. You modified your question saying the number of consecutive records you need to sum could change. In the above query you simple need to change the m.RowNumber + 2 to what ever you need.
So if you need 60, then use
m.RowNumber + 59
As you can see it is very flexible since you only have to change one number.

In case the sno field is not sequential, you can use row_number() with aggregation:
with ss as (
select sno, values, row_number() over (order by sno) as seqnum
from s
)
select s1.sno, s1.values,
(case when count(s2.values) = 3 then sum(s2.values) end) as avg3
from ss s1 left outer join
ss s2
on s2.seqnum between s1.seqnum - 2 and s1.seqnum
group by s1.sno, s1.values;

select one.sno, one.values, one.values+two.values+three.values as thesum
from yourtable as one
left join yourtable as two
on one.sno=two.sno-1
left join yourtable as three
on one.sno=three.sno-2
Or, as requested in your comment, you could do this:
select sno, sum(values)
over (
order by sno
rows between current row and 3 following
)
from yourtable

If you need a fully generic solution, where you can sum, for example, current row + next row + 5th following row:
Step 1: Create an table listing the offsets needed. 0 = current row, 1 = next row, -1 = prev row, etc
SELECT * FROM (VALUES
(0),(1),(2)
) o(offset)
Step 2: Use that offset table in this template (via CTE or an actual table):
WITH o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset))
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM #t t1
INNER JOIN #t t2 CROSS JOIN o
ON t2.sno = t1.sno + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno
Also, if SNo is not sequential, you can fetch ROW_NUMBER() and join on that instead.
WITH
o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset)),
t AS (SELECT *,ROW_NUMBER() OVER(ORDER BY sno) i FROM #t)
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM t t1
INNER JOIN t t2 CROSS JOIN o
ON t2.i = t1.i + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno

Related

How to get the all the predecessors of a number in a SQL query

How can I get all the predecessors of a number in a SQL select statement?
I have this query:
SELECT
COUNT(CASE WHEN tb2.status = 'C' THEN 1 END) AS num_sales
FROM
table1 AS tb1
INNER JOIN
table2 AS tb2 ON tb1.id = tb2.id_sales
I get this result:
num_sales
7
5
4
3
1
0
I want
num_sales
predecessors
7
1,2,3,4,5,6,7
5
1,2,3,4,5
4
1,2,3,4
3
1,2,3
1
1
0
HELP!
With Standard SQL, you could use listagg():
select mynumber,
(select listagg(t2.mynumber, ',') within group (order by t2.mynumber)
from mytable t2
where t2.mynumber <= t.mynumber
) as predecessors
from mytable t;
Similar functionality exists is most databases, but the exact details for string aggregation often very by database.
EDIT:
In Postgres, you would use generate_series():
select mynumber,
(select string_agg(gs.n, ',' order by gs.n desc)
from generate_series(1, t.mynumber, 1) gs(n)
) as predecessors
from mytable t;

Repeating rows based on count in a different column - SQL

I have a table that holds IDs and count. I want to repeat the rows the number of times mentioned in the count.
My table:
Desired output:
My code:
create table #temp1(CID int, CVID int, count int)
insert #temp1
values
(9906, 4687, 4),
(9906, 4693, 5)
create table #temp2 (CID int,CVID int, count int,ro int)
;with t3 as (
select c.CID,c.CVID, c.count, row_number() over (partition by c.CID order by c.CID) ro
from #temp1 c
)
insert #temp2
select CID,CVID,count,ro from t3 where ro <= count
My code is missing something that its not producing desired result. Any help?!
You need a numbers table up to the maximum value of count column which can then be used to generate multiple rows. This number generation can be done using a recursive cte.
--Recursive CTE
with nums(n) as (select max(count) from #temp1
union all
select n-1
from nums
where n > 1
)
--Query to generate multiple rows
select t.*,nums.n as ro
from #temp1 t
join nums on nums.n <= t.count
Just another option is an ad-hoc tally table
Example
Select A.*
,Ro = B.N
From YourTable A
Join ( Select Top 1000 N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1 ) B on B.N<=A.[Count]
Returns
CID CVID COUNT Ro
9906 4687 4 1
9906 4687 4 2
9906 4687 4 3
9906 4687 4 4
9906 4693 5 1
9906 4693 5 2
9906 4693 5 3
9906 4693 5 4
9906 4693 5 5
I would use a recursive CTE, but directly:
with cte as (
select CID, CVID, count, 1 as ro
from #temp1
union all
select CID, CVID, count, ro + 1
from cte
where cte.ro < cte.count
)
select cte.*
from cte;
If your counts exceed 100, then you'll need to use option (maxrecursion 0).
Thanks all for all the suggestion. I used the below query to solve my problem:
;with cte(cid, cvid,count, i) as
(
select cid
, cvid
, count
, 1
from #temp1
union all
select cid
, cvid
, count
, i + 1
from cte
where cte.i < cte.count
)
select *
from cte
order by
cid,count

Fetching the next 3 or adjacent rows based upon a condition in postgreSQL

I have a database of more than 10,000 rows. eg:
id text
1 abc
2 ghj
3 cde
4 hif
5 klm
6 bbc
7 jkl
8 mno
9 dbo
10 ijk
I need to fetch the next three rows where the text matches a condition.
For eg: if I am doing a text like '%bc% query it should return me rows with ids 1,2,3,4,6,7,8,9 as row #1 and #6 is a match
Use below query to get the desired result. I am assuming you want to calculate next based on ID only and ID is always increment by 1, as you have mentioned in question.
If ID doesn't always increment by 1 , then first add a ROW Number and then replace id in t2 subquery and join condition with row number.
select t1.id, t1.id_text
from test t1
join
(
select id from test where id_text like '%bc%'
UNION
select id+1 from test where id_text like '%bc%'
UNION
select id+2 from test where id_text like '%bc%'
UNION
select id+3 from test where id_text like '%bc%'
) t2
on t1.id = t2.id;
SQL Fiddle Link
with -- Test data
t(i, x) as (values
(1,'abc'),(2,'ghj'),(3,'cde'),(4,'hif'),(5,'klm'),(6,'bbc'),(7,'jkl'),(8,'mno'),(9,'dbo'),(10,'ijk'))
select r.*
from
t as t0 cross join lateral (
select *
from t
where t.i >= t0.i
order by t.i
limit 4) as r
where t0.x like '%bc%'
order by r.i;
Lateral joins allows to use previous table in the next subquery.
You could use something like this:
SELECT next.*
FROM test, test next
WHERE test.text LIKE '%bc%'
AND (test.id + 1 = next.id OR test.id + 2 = next.id OR test.id + 3 = next.id)
I am not going to assume that the ids have no gaps. One method uses lag():
select t.*
from (select t.*,
lag(text) over (order by id) as prev_text,
lag(text, 2) over (order by id) as prev_text2,
lag(text, 3) over (order by id) as prev_text3
from t
) t
where text like '%bc%' or
prev_text like '%bc%' or
prev_text2 like '%bc%' or
prev_text3 like '%bc%';
You can also do this with one comparison, using other window functions:
select id, text
from (select t.*,
sum( (text like '%bc%')::int ) over (order by id rows between 3 preceding and current row) as cnt
from t
) t
where cnt > 0;
With an index on id, this might be the fastest approach to solving the problem.

In SQL how to increment a varibale in case statement

So I have a table A as follows
Message code trig timestamp
a x 1 T1
a x 1 T2
a x 0 T3
b y 1 T4
b y 1 T5
a x 1 T6
I want the following result
Message code trig timestamp groupbycolumn
a x 1 T1 1
a x 1 T2 1
a x 0 T3 2
b y 1 T4 3
b y 1 T5 3
a x 1 T6 4
I need to group the rows according to message, code and trigg but ordered by the timestamp. So if a new message, code and trigg value comes then it should have a new number in the groupby column. Note that a,x 1 in the first line has a groupby value 1 and the one in the last has 4.
declare #chngeVal int;
set #chngeVal=0;
select n.Message,n.code,n.trig,
case when n.Message<>n.nextMessage or n.code<>n.nextCode or n.trig<>n.nextTrigg
then #chngeVal+1
else #chngeVal
end as groupbycolumn,
n.timeStamp
from ( select Message,code,trig,timestamp,
lead(Message) over (order by timestamp asc) as nextMessage,
lead(code) over (order by timestamp asc) as nextCode,
lead(trig) over (order by timestamp asc) as nextTrig
from A ) n
If I could get the case to do a #chngeVal= #chngeVal+1 it would work, but I cannot do that in case. Would anybody know how to change the value of a variable in a query.
Any idea would be much appreciated.
I broke the solution into a three part query using two CTEs:
CreateIds produces ids I use to identify the rows in the next two parts.
Firstrows gets only the rows that start each group, and determines the unique id for each group as well as the row id that starts the next group (NexdtGroupRowId).
Finally, I produce the result by joining Firstrows to a range of rows from CreateIds that have a rowId between the rowId of the first row and the rowId of NextGroupRowId - 1.
My feeling is that this is inefficient as heck, and there's a way to do this with a recursive CTE. But since you started using window functions I just went in that direction.
WITH createIds AS (
SELECT *
, ROW_NUMBER() OVER(ORDER BY [timestamp]) AS RowId
, DENSE_RANK() OVER(ORDER BY Message, code, trig DESC) AS GroupId
FROM src
)
, firstrows AS (
SELECT a.RowId
, ROW_NUMBER() OVER (ORDER BY a.RowId) AS OrderedGroupId
, LEAD(a.RowId, 1, NULL) OVER (ORDER BY a.RowId) NextGroupRowId
FROM createIds a
LEFT JOIN createIds b ON b.RowId = a.RowId - 1
WHERE a.GroupId != b.GroupId OR b.GroupId IS NULL
)
SELECT a.[Message], a.code, a.trig, a.[timestamp], r1.OrderedGroupId
FROM firstrows r1
INNER JOIN createIds a ON a.RowId >= r1.RowId AND (r1.NextGroupRowId IS NULL OR a.RowId < r1.NextGroupRowId)
ORDER BY a.[timestamp]
You can use the difference of row_numbers() or lag() and cmulative sums:
select t.*,
sum(case when message = prev_message and code = prev_code and trig = prev_trig
then 0 else 1
end) over (order by timestamp) as groupbycolumn
from (select t.*,
lag(message) over (order by timestamp) as prev_message,
lag(code) over (order by timestamp) as prev_code,
lag(trig) over (order by timestamp) as prev_trig
from a
) a

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead(), I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):
;WITH Digits AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
select u.Digit
+ t.Digit*10
+ h.Digit*100
+ th.Digit*1000
+ tth.Digit*10000
--Add 10000, 100000 multipliers if required here.
as myId
from Digits u
cross join Digits t
cross join Digits h
cross join Digits th
cross join Digits tth
--Add the cross join for higher numbers
)
SELECT myId
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)
Problem: we need to find the gap range in id field
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
Solution
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId
Output
GapStart GapEnd
----------- -----------
4 7
12 14