Select max key from joined keys - sql

I have a table that contains keys that have been changed to a different key. These are laid out like this:
origkey newkey
1 2
2 3
4 5
6 7
7 8
8 9
9 10
What I'm trying to accomplish is a query that takes the origkey and finds the max newkey for each one. In the example above, the results would look like:
origkey maxkey
1 3
4 5
6 10
If I knew the maximum amount of times that the key could have been changed, I would just add that amount of self joins and get it from there. Unfortunately, I don't know how many times it could have changed in the past. Is there a way to keep self joining until it finds a null? The following query will return the changed keys into new columns, but I think I'm going down the wrong road here since this will get the 1 -> 3 change, but not the 6 -> 10 change.
select a.origkey
,a.newkey
,b.newkey newkey1
,c.newkey newkey2
from changedkeys a
Left Outer Join changedkeys b on a.newkey=b.origkey
Left Outer Join changedkeys c on b.newkey=c.origkey

There is a way. It is called a recursive CTE:
with cte as (
select origkey, newkey, 1 as lev
from table1
union all
select cte.origkey, t1.newkey, lev + 1
from cte join
table1 t1
on cte.newkey = t1.origkey
)
select origkey, newkey as newestkey
from (select cte.*, row_number() over (partition by origkey order by lev desc) as seqnum
from cte
) t
where seqnum = 1;
Note that this assumes that there are no cycles in the key definitions, as in the example in your question. If this is a possibility, the recursive CTE can be modified to handle this.
EDIT:
If you have potential cycles in the data, then try this:
with cte as (
select origkey, newkey, 1 as lev, ',' + cast(newkey as varchar(8000)) + ',' as keys
from table1
union all
select cte.origkey, t1.newkey, cte.lev + 1, keys + cast(t1.newkey as varchar(8000)) + ','
from cte join
table1 t1
on cte.newkey = t1.origkey
where ',' + t1.keys + ',' not like '%,' + cast(t1.newkey as varchar(8000)) + '%,'
)
select origkey, newkey as newestkey
from (select cte.*, row_number() over (partition by origkey order by lev desc) as seqnum
from cte
) t
where seqnum = 1;

Related

Frequency of all combinations of values for certain column

I have a dataset in SQL Server 2012 with a column for id and value, like this:
[id] [value]
--------------
A 15
A 11
A 11
B 13
B 15
B 12
C 12
C 13
D 13
D 12
My goal is to get a frequency count of all combinations of [value], with two caveats:
Order doesn't matter, so [11,12,15] is not counted separately from [12,11,15]
Repeated values are counted separately, so [11,11,12,15] is counted separately from [11,12,15]
I'm interested in all combinations, of any length (not just pairs)
So the outcome would look like:
[combo] [frequency]
---------------------
11,11,15 1
12,13,15 1
12,13 2
I've seen answers here involving recursion that answer similar questions but where order counts, and answers here involving self-joins that yield pair-wise combinations. These come close but I'm not quite sure how to adapt for my specific needs.
You can use string_agg():
select vals, count(*) as frequency
from (select string_agg(value, ',') within group (order by value) as vals, id
from t
group by id
) i
group by vals;
SQL Server 2012 doesn't support string_agg() but you can use the XML hack:
select vals, count(*) as frequency
from (select id,
stuff( (select concat(',', value)
from t t2
where t2.id = i.id
for xml path ('')
), 1, 1, ''
) as vals
from (select distinct id from t) i
) i
group by vals;
Your number string is just all the values with the same id in increasing order. So I'm treating the lowest id as a canonical name for the full sequence and all its matches. This spares all the string manipulations though you can expand as necessary.
Just tag each duplicate value with a counter and then look for groups that pair up completely.
with data as (
select id, value,
row_number() over (partition by id, value) as rn
), matches as (
select l.id, r.id as match
from data l full outer join data r on
l.value = r.value and l.rn = r.rn and l.id <= r.id
group by l.id
having count(l.id) = count(*) and count(r.id) = count(*)
)
select id, count(match) as frequency
from matches
group by id;
The logic in the middle query is also easily adaptable for finding subset of values in common.
You can achieve this using CTEs and row_number functions.
DECLARE #table table(id CHAR(1), val int)
insert into #table VALUES
('A',15),
('A',11),
('A',11),
('B',13),
('B',15),
('B',12),
('C',12),
('C',13),
('D',13),
('D',12);
;WITH CTE_rnk as
(
SELECT id,val, row_number() over (partition by id order by val) as rnk
from #table
),
CTE_concat as
(
SELECT id, cast(val as varchar(100)) as val, rnk
from CTE_rnk
where rnk =1
union all
SELECT r.id, cast(concat(c.val,',',r.val) as varchar(100)) as val,r.rnk
from CTE_rnk as r
inner join CTE_concat as c
on r.rnk = c.rnk+1
and r.id = c.id
),
CTE_maxcombo as
(
SELECT id,val, row_number() over(partition by id order by rnk desc) as rnk
from CTE_concat
)
select val as combo, count(*) as frequency
from CTE_maxcombo where rnk = 1
group by val
+----------+-----------+
| combo | frequency |
+----------+-----------+
| 11,11,15 | 1 |
| 12,13 | 2 |
| 12,13,15 | 1 |
+----------+-----------+

concatenate recursive cross join

I need to concatenate the name in a recursive cross join way. I don't know how to do this, I have tried a CTE using WITH RECURSIVE but no success.
I have a table like this:
group_id | name
---------------
13 | A
13 | B
19 | C
19 | D
31 | E
31 | F
31 | G
Desired output:
combinations
------------
ACE
ACF
ACG
ADE
ADF
ADG
BCE
BCF
BCG
BDE
BDF
BDG
Of course, the results should multiply if I add a 4th (or more) group.
Native Postgresql Syntax:
SqlFiddleDemo
WITH RECURSIVE cte1 AS
(
SELECT *, DENSE_RANK() OVER (ORDER BY group_id) AS rn
FROM mytable
),cte2 AS
(
SELECT
CAST(name AS VARCHAR(4000)) AS name,
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
CAST(CONCAT(c2.name,c1.name) AS VARCHAR(4000)) AS name
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT name as combinations
FROM cte2
WHERE LENGTH(name) = (SELECT MAX(rn) FROM cte1)
ORDER BY name;
Before:
I hope if you don't mind that I use SQL Server Syntax:
Sample:
CREATE TABLE #mytable(
ID INTEGER NOT NULL
,TYPE VARCHAR(MAX) NOT NULL
);
INSERT INTO #mytable(ID,TYPE) VALUES (13,'A');
INSERT INTO #mytable(ID,TYPE) VALUES (13,'B');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'C');
INSERT INTO #mytable(ID,TYPE) VALUES (19,'D');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'E');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'F');
INSERT INTO #mytable(ID,TYPE) VALUES (31,'G');
Main query:
WITH cte1 AS
(
SELECT *, rn = DENSE_RANK() OVER (ORDER BY ID)
FROM #mytable
),cte2 AS
(
SELECT
TYPE = CAST(TYPE AS VARCHAR(MAX)),
rn
FROM cte1
WHERE rn = 1
UNION ALL
SELECT
[Type] = CAST(CONCAT(c2.TYPE,c1.TYPE) AS VARCHAR(MAX))
,c1.rn
FROM cte1 c1
JOIN cte2 c2
ON c1.rn = c2.rn + 1
)
SELECT *
FROM cte2
WHERE LEN(Type) = (SELECT MAX(rn) FROM cte1)
ORDER BY Type;
LiveDemo
I've assumed that the order of "cross join" is dependent on ascending ID.
cte1 generate DENSE_RANK() because your IDs contain gaps
cte2 recursive part with CONCAT
main query just filter out required length and sort string
The recursive query is a bit simpler in Postgres:
WITH RECURSIVE t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name
FROM tbl
)
, cte AS (
SELECT grp, name
FROM t
WHERE grp = 1
UNION ALL
SELECT t.grp, c.name || t.name
FROM cte c
JOIN t ON t.grp = c.grp + 1
)
SELECT name AS combi
FROM cte
WHERE grp = (SELECT max(grp) FROM t)
ORDER BY 1;
The basic logic is the same as in the SQL Server version provided by #lad2025, I added a couple of minor improvements.
Or you can use a simple version if your maximum number of groups is not too big (can't be very big, really, since the result set grows exponentially). For a maximum of 5 groups:
WITH t AS ( -- to produce gapless group numbers
SELECT dense_rank() OVER (ORDER BY group_id) AS grp, name AS n
FROM tbl
)
SELECT concat(t1.n, t2.n, t3.n, t4.n, t5.n) AS combi
FROM (SELECT n FROM t WHERE grp = 1) t1
LEFT JOIN (SELECT n FROM t WHERE grp = 2) t2 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 3) t3 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 4) t4 ON true
LEFT JOIN (SELECT n FROM t WHERE grp = 5) t5 ON true
ORDER BY 1;
Probably faster for few groups. LEFT JOIN .. ON true makes this work even if higher levels are missing. concat() ignores NULL values. Test with EXPLAIN ANALYZE to be sure.
SQL Fiddle showing both.

SQL query to select union of properties

There is a table:
ID INDEX PROPERTY VALUE
-----------------------------
1 1 p1 v1
2 1 p3 v3
3 2 p2 v2
4 2 p3 v3a
5 3 p1 v1a
6 3 p2 v2a
7 3 p3 v3b
I need to select union of all PROPERTY where INDEX=1 or INDEX=2 (INDEX=3 is out of intereset). At the same time VALUE of PROPERTY should be selected from INDEX=2 if it exists otherwise - from INDEX=1 i.e. I expect 3 properties in result set: p1=v1, p2=v2, p3=v3a
How to compose SQL query (SQL Server and Oracle) for such task without using full outer join?
Adding to the suggested answer to select union of all PROPERTY you can write as:
;with cte as
(
select row_number() over ( partition by PROPERTY order by [INDEX] desc)
as rownum,
PROPERTY,VALUE
from Test
where [INDEX] in (1,2)
)
SELECT top 1 STUFF(
(SELECT ',' + cte1.PROPERTY + '=' + cte1.VALUE
FROM cte AS cte1
WHERE cte1.rownum = cte.rownum
FOR XML PATH('')), 1, 1, '') AS PROPERTY
FROM cte
where rownum = 1
DEMO
I see. You want the result set to have one row for each Property, only from indexes 1 and 2, with preference given to 2 when there are duplicates.
You can do this using window functions:
select t.*
from (select t.*,
row_number() over (partition by property order by index desc) as seqnum
from table t
where index in (1, 2)
) t
where seqnum = 1;
You can also do this using union all:
select *
from table t
where index = 2
union all
select *
from table t
where index = 1 and
not exists (select 1 from table t2 where t2.property = t.property and t2.index = 2);
By the way, index is a lousy name for a column because it is a reserved word.

Moving Average / Rolling Average

I have 2 columns in MS SQL one is Serial no. and other is values. I need the thrird column which gives me the sum of the value in that row and the next 2.
Ex
SNo values
1 2
2 3
3 1
4 2
5 6
7 9
8 3
9 2
So I need third column which has sum of 2+3+1, 3+1+2 and So on, so the 8th and 9th row will not have any values:
1 2 6
2 3 6
3 1 4
4 2 5
5 1 6
7 2 7
8 3
9 2
Can the Solution be generic so that I can Varry the current window size of adding 3 numbers to a bigger number say 60.
Here is the SQL Fiddle that demonstrates the following query:
WITH TempS as
(
SELECT s.SNo, s.value,
ROW_NUMBER() OVER (ORDER BY s.SNo) AS RowNumber
FROM MyTable AS s
)
SELECT m.SNo, m.value,
(
SELECT SUM(s.value)
FROM TempS AS s
WHERE RowNumber >= m.RowNumber
AND RowNumber <= m.RowNumber + 2
) AS Sum3InRow
FROM TempS AS m
In your question you were asking to sum 3 consecutive values. You modified your question saying the number of consecutive records you need to sum could change. In the above query you simple need to change the m.RowNumber + 2 to what ever you need.
So if you need 60, then use
m.RowNumber + 59
As you can see it is very flexible since you only have to change one number.
In case the sno field is not sequential, you can use row_number() with aggregation:
with ss as (
select sno, values, row_number() over (order by sno) as seqnum
from s
)
select s1.sno, s1.values,
(case when count(s2.values) = 3 then sum(s2.values) end) as avg3
from ss s1 left outer join
ss s2
on s2.seqnum between s1.seqnum - 2 and s1.seqnum
group by s1.sno, s1.values;
select one.sno, one.values, one.values+two.values+three.values as thesum
from yourtable as one
left join yourtable as two
on one.sno=two.sno-1
left join yourtable as three
on one.sno=three.sno-2
Or, as requested in your comment, you could do this:
select sno, sum(values)
over (
order by sno
rows between current row and 3 following
)
from yourtable
If you need a fully generic solution, where you can sum, for example, current row + next row + 5th following row:
Step 1: Create an table listing the offsets needed. 0 = current row, 1 = next row, -1 = prev row, etc
SELECT * FROM (VALUES
(0),(1),(2)
) o(offset)
Step 2: Use that offset table in this template (via CTE or an actual table):
WITH o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset))
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM #t t1
INNER JOIN #t t2 CROSS JOIN o
ON t2.sno = t1.sno + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno
Also, if SNo is not sequential, you can fetch ROW_NUMBER() and join on that instead.
WITH
o AS (SELECT * FROM (VALUES (0),(1),(2) ) o(offset)),
t AS (SELECT *,ROW_NUMBER() OVER(ORDER BY sno) i FROM #t)
SELECT
t1.sno,
t1.value,
SUM(t2.Value)
FROM t t1
INNER JOIN t t2 CROSS JOIN o
ON t2.i = t1.i + o.offset
GROUP BY t1.sno,t1.value
ORDER BY t1.sno

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead(), I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):
;WITH Digits AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
select u.Digit
+ t.Digit*10
+ h.Digit*100
+ th.Digit*1000
+ tth.Digit*10000
--Add 10000, 100000 multipliers if required here.
as myId
from Digits u
cross join Digits t
cross join Digits h
cross join Digits th
cross join Digits tth
--Add the cross join for higher numbers
)
SELECT myId
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)
Problem: we need to find the gap range in id field
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
Solution
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId
Output
GapStart GapEnd
----------- -----------
4 7
12 14