Running SUM in T-SQL - sql

Sorry for bad topic but I wasn't sure what to call it..
I have a table looking like this:
+-----++-----+
| Id ||Count|
+-----++-----+
| 1 || 1 |
+-----++-----+
| 2 || 5 |
+-----++-----+
| 3 || 8 |
+-----++-----+
| 4 || 3 |
+-----++-----+
| 5 || 6 |
+-----++-----+
| 6 || 8 |
+-----++-----+
| 7 || 3 |
+-----++-----+
| 8 || 1 |
+-----++-----+
I'm trying to make a select from this table where every time the SUM of row1 + row2 + row3 (etc) reaches 10, then it's a "HIT", and the count starts over again.
Requested output:
+-----++-----++-----+
| Id ||Count|| HIT |
+-----++-----++-----+
| 1 || 1 || N | Count = 1
+-----++-----++-----+
| 2 || 5 || N | Count = 6
+-----++-----++-----+
| 3 || 8 || Y | Count = 14 (over 10)
+-----++-----++-----+
| 4 || 3 || N | Count = 3
+-----++-----++-----+
| 5 || 6 || N | Count = 9
+-----++-----++-----+
| 6 || 8 || Y | Count = 17 (over 10..)
+-----++-----++-----+
| 7 || 3 || N | Count = 3
+-----++-----++-----+
| 8 || 1 || N | Count = 4
+-----++-----++-----+
How do I do this, and with best performance? I have no idea..

You can't do this using window/analytic functions, because the breakpoints are not known in advance. Sometimes, it is possible to calculate the breakpoints. However, in this case, the breakpoints depend on a non-linear function of the previous values (I can't think of a better word than "non-linear" right now). That is, sometimes adding "1" to an earlier value has zero effect on the calculation for the current row. Sometimes it has a big effect. The implication is that the calculation has to start at the beginning and iterate through the data.
A minor modification to the problem would be solvable using such functions. If the problem were, instead, to carry over the excess amount for each group (instead of restarting the sum), the problem would be solvable using cumulative sums (and some other trickery).
Recursive queries (which others have provided) or a sequential operation is the best way to approach this problem. Unfortunately, it doesn't have a set-based method for solving it.

You could use Recursive Queries
Please note the following query assuming the id value are all in sequence, otherwise, please use ROW_NUMBER() to create a new id
WITH cte AS (
SELECT id, [Count], [Count] AS total_count
FROM Table1
WHERE id = 1
UNION ALL
SELECT t2.id,t2.[Count], CASE WHEN t1.total_count >= 10 THEN t2.[Count] ELSE t1.total_count + t2.[Count] END
FROM Table1 t2
INNER JOIN cte t1
ON t2.id = t1.id + 1
)
SELECT *
FROM cte
ORDER BY id
SQL Fiddle

I'm really hoping someone can show us if it's possible to do this using straight-forward window functions. That's the real challenge.
In the meantime, here is how I would do it using recursion. This handles gaps in the sequence, and handles the edge case of the first row already being >= 10.
I also added the maxrecursion hint to remove the default recursion limit. But I honestly don't know how well it will run with larger amounts of data.
with NumberedRows as (
select Id, Cnt,
row_number() over(order by id) as rn
from CountTable
), RecursiveCTE as (
select Id, Cnt, rn,
case when Cnt >= 10 then 0 else Cnt end as CumulativeSum,
case when Cnt >= 10 then 'Y' else 'N' end as hit
from NumberedRows
where rn = 1
union all
select n.Id, n.Cnt, n.rn,
case when (n.Cnt + r.CumulativeSum) >= 10 then 0 else n.Cnt + r.CumulativeSum end as CumulativeSum,
case when (n.Cnt + r.CumulativeSum) >= 10 then 'Y' else 'N' end as hit
from RecursiveCTE r
join NumberedRows n
on n.rn = r.rn + 1
)
select Id, Cnt, hit
from RecursiveCTE
order by Id
option (maxrecursion 0)
SQLFiddle Demo

How about this using Running Totals:
DECLARE #Data TABLE(
Id INT
,SubTotal INT
)
INSERT INTO #Data
VALUES(1, 5)
INSERT INTO #Data
VALUES(2, 3)
INSERT INTO #Data
VALUES(3, 4)
INSERT INTO #Data
VALUES(4, 4)
INSERT INTO #Data
VALUES(5, 7)
DECLARE #RunningTotal INT = 0
DECLARE #HitCount INT = 0
SELECT
#RunningTotal = CASE WHEN #RunningTotal < 10 THEN #RunningTotal + SubTotal ELSE SubTotal END
,#HitCount = #HitCount + CASE WHEN #RunningTotal >= 10 THEN 1 ELSE 0 END
FROM #Data ORDER BY Id
SELECT #HitCount -- Outputs 2
Having re-read the question I see this does not meet the required output - I'll leave the answer as it may be of some use to someone looking for an example of a running total solution to this type of problem that doesn't need each row tagged with a Y or an N.

Related

Recursive loop in BigQuery for capped cumulative sum?

I'd like to be able to implement a "capped" cumulative sum in BigQuery using SQL.
Here's what I mean: I have a table whose rows have the amount by which a value is increased/decreased each day, but the value cannot go below 0 or above 100. I want to compute the cumulative sum of the changes to keep track of this value.
As an example, consider the following table:
day | change
--------------
1 | 70
2 | 50
3 | 20
4 | -30
5 | 10
6 | -90
7 | 20
I want to make a column that has the capped cumulative sum so that it looks like this:
day | change | capped cumsum
----------------------------
1 | 70 | 70
2 | 50 | 100
3 | 20 | 100
4 | -30 | 70
5 | 10 | 80
6 | -90 | 0
7 | 20 | 20
Simply doing SUM (change) OVER (ORDER BY day) and capping the values at 100 and 0 won't work. I need some sort of recursive loop and I don't know how to implement this in BigQuery.
Eventually I'd also like to do this over partitions, so that if I have something like
day | class | change
--------------
1 | A | 70
1 | B | 12
2 | A | 50
2 | B | 83
3 | A | -30
3 | B | 17
4 | A | 10
5 | A | -90
6 | A | 20
I can do the capped cumulative sum partitioned over each class.
I need some sort of recursive loop and I don't know how to implement this in BigQuery
Super naïve / cursor based approach
declare cumulative_change int64 default 0;
create temp table temp_table as (
select * , 0 as capped_cumsum from your_table where false
);
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
insert into temp_table (select rec.*, cumulative_change);
end for;
select * from temp_table order by day;
if applied to sample data in your question - output is
Slightly modified option with use of array instead of temp table
declare cumulative_change int64 default 0;
declare result array<struct<day int64, change int64, capped_cumsum int64>>;
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change + rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
set result = array(select as struct * from unnest(result) union all select as struct rec.*, cumulative_change);
end for;
select * from unnest(result) order by day;
P.S. I like none of above options so far :o)
Meantime, that approach might work for relatively small tables, set of data
Using RECURSIVE CTE can be another option:
DECLARE sample ARRAY<STRUCT<day INT64, change INT64>> DEFAULT [
(1, 70), (2, 50), (3, 20), (4, -30), (5, 10), (6, -90), (7, 20)
];
WITH RECURSIVE ccsum AS (
SELECT 0 AS n, vals[OFFSET(0)] AS change,
CASE
WHEN vals[OFFSET(0)] > 100 THEN 100
WHEN vals[OFFSET(0)] < 0 THEN 0
ELSE vals[OFFSET(0)]
END AS cap_csum
FROM sample
UNION ALL
SELECT n + 1 AS n, vals[OFFSET(n + 1)] AS change,
CASE
WHEN cap_csum + vals[OFFSET(n + 1)] > 100 THEN 100
WHEN cap_csum + vals[OFFSET(n + 1)] < 0 THEN 0
ELSE cap_csum + vals[OFFSET(n + 1)]
END AS cap_csum
FROM ccsum, sample
WHERE n < ARRAY_LENGTH(vals) - 1
),
sample AS (
SELECT ARRAY_AGG(change ORDER BY day) vals FROM UNNEST(sample)
)
SELECT * EXCEPT(n) FROM ccsum ORDER BY n;
output:
Eventually I'd also like to do this over partitions ...
Consider below solution
create temp function cap_value(value int64, lower_boundary int64, upper_boundary int64) as (
least(greatest(value, lower_boundary), upper_boundary)
);
with recursive temp_table as (
select *, row_number() over(partition by class order by day) as n from your_table
), iterations as (
select 1 as n, day, class, change, cap_value(change, 0, 100) as capped_cumsum
from temp_table
where n = 1
union all
select t.n, t.day, t.class, t.change, cap_value(i.capped_cumsum + t.change, 0, 100) as capped_cumsum
from temp_table t
join iterations i
on t.n = i.n + 1
and t.class = i.class
)
select * except(n) from iterations
order by class, day
if applied to sample data in your question - output is

If the difference between two sequences is bigger than 30, deduct bigger sequence

I'm having a hard time trying to make a query that gets a lot of numbers, a sequence of numbers, and if the difference between two of them is bigger than 30, then the sequence resets from this number. So, I have the following table, which has another column other than the number one, which should be maintained intact:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 36 | Failed |
| 7 | 39 | OK |
| 8 | 47 | OK |
| 9 | 80 | Failed |
| 10 | 110 | Failed |
| 11 | 111 | OK |
| 12 | 150 | Failed |
| 13 | 165 | OK |
+----+--------+--------+
It should turn it into this one:
+----+--------+--------+
| Id | Number | Status |
+----+--------+--------+
| 1 | 1 | OK |
| 2 | 1 | Failed |
| 3 | 2 | Failed |
| 4 | 3 | OK |
| 5 | 4 | OK |
| 6 | 1 | Failed |
| 7 | 4 | OK |
| 8 | 12 | OK |
| 9 | 1 | Failed |
| 10 | 1 | Failed |
| 11 | 2 | OK |
| 12 | 1 | Failed |
| 13 | 16 | OK |
+----+--------+--------+
Thanks for your attention, I will be available to clear any doubt regarding my problem! :)
EDIT: Sample of this table here: http://sqlfiddle.com/#!6/ded5af
With this test case:
declare #data table (id int identity, Number int, Status varchar(20));
insert #data(number, status) values
( 1,'OK')
,( 1,'Failed')
,( 2,'Failed')
,( 3,'OK')
,( 4,'OK')
,( 4,'OK') -- to be deleted, ensures IDs are not sequential
,(36,'Failed') -- to be deleted, ensures IDs are not sequential
,(36,'Failed')
,(39,'OK')
,(47,'OK')
,(80,'Failed')
,(110,'Failed')
,(111,'OK')
,(150,'Failed')
,(165,'OK')
;
delete #data where id between 6 and 7;
This SQL:
with renumbered as (
select rn = row_number() over (order by id), data.*
from #data data
),
paired as (
select
this.*,
startNewGroup = case when this.number - prev.number >= 30
or prev.id is null then 1 else 0 end
from renumbered this
left join renumbered prev on prev.rn = this.rn -1
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - (
select top 1 GroupNo
from groups where groups.id <= paired.id
order by GroupNo desc)
,status
from paired
;
yields as desired:
Id Number status
----------- ----------- --------------------
1 1 OK
2 1 Failed
3 2 Failed
4 3 OK
5 4 OK
8 1 Failed
9 4 OK
10 12 OK
11 1 Failed
12 1 Failed
13 2 OK
14 1 Failed
15 16 OK
Update: using the new LAG() function allows somewhat simpler SQL without a self-join early on:
with renumbered as (
select
data.*
,gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select
*,
startNewGroup = case when gap >= 30 or gap is null then 1 else 0 end
from renumbered
),
groups as (
select Id,Number, GroupNo = Number from paired where startNewGroup = 1
)
select
Id
,Number = 1 + Number - ( select top 1 GroupNo
from groups
where groups.id <= paired.id
order by GroupNo desc
)
,status
from paired
;
I don't deserve answer but I think this is even shorter
with gapped as
( select id, number, gap = number - lag(number, 1) over (order by id)
from #data data
),
select Id, status
ReNumber = Number + 1 - isnull( (select top 1 gapped.Number
from gapped
where gapped.id <= data.id
and gap >= 30
order by gapped.id desc), 1)
from #data data;
This is simply Pieter Geerkens's answer slightly simplified. I removed some intermediate results and columns:
with renumbered as (
select data.*, gap = number - lag(number, 1) over (order by number)
from #data data
),
paired as (
select *
from renumbered
where gap >= 30 or gap is null
)
select Id, Number = 1 + Number - (select top 1 Number
from paired
where paired.id <= renumbered.id
order by Number desc)
, status
from renumbered;
It should have been a comment, but it's too long for that and wouldn't be understandable.
You might need to make another cte before this and use row_number instead of ID to join the recursive cte, if your ID's are not in sequential order
WITH cte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber]
FROM
Table1
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END
FROM Table1 t1
JOIN cte ON cte.Id + 1 = t1.Id
)
SELECT Id, [NewNumber], [Status]
FROM cte
SQL Fiddle
Here is another SQL Fiddle with an example of what you would do if the ID is not sequential..
SQL Fiddle 2
In case sql fiddle stops working
--Order table to make sure there is a sequence to follow
WITH OrderedSequence AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY Id) RnId,
Id,
[Number],
[Status]
FROM
Sequence
),
RecursiveCte AS
( SELECT
Id, [Number], [Status],
0 AS Diff,
[Number] AS [NewNumber],
RnId
FROM
OrderedSequence
WHERE Id = 1
UNION ALL
SELECT
t1.Id, t1.[Number], t1.[Status],
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN t1.Number - 1 ELSE Diff END,
CASE WHEN t1.[Number] - cte.[Number] >= 30 THEN 1 ELSE t1.[Number] - Diff END,
t1.RnId
FROM OrderedSequence t1
JOIN RecursiveCte cte ON cte.RnId + 1 = t1.RnId
)
SELECT Id, [NewNumber], [Status]
FROM RecursiveCte
I tried to optimize the queries here, since it took 1h20m to process my data. I had it down to 30s after some further research.
WITH AuxTable AS
( SELECT
id,
number,
status,
relevantId = CASE WHEN
number = 1 OR
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN id
ELSE NULL
END,
deduct = CASE WHEN
((number - LAG(number, 1) OVER (ORDER BY id)) > 29)
THEN number - 1
ELSE 0
END
FROM #data data
)
,AuxTable2 AS
(
SELECT
id,
number,
status,
AT.deduct,
MAX(AT.relevantId) OVER (ORDER BY AT.id ROWS UNBOUNDED PRECEDING ) AS lastRelevantId
FROM AuxTable AT
)
SELECT
id,
number,
status,
number - MAX(deduct) OVER(PARTITION BY lastRelevantId ORDER BY id ROWS UNBOUNDED PRECEDING ) AS ReNumber,
FROM AuxTable2
I think this runs faster, but it's not shorter.

SQL get the count of ids when timestamp difference is greater than 30

I have this following table data structure.
I need to find the number of SESSIONS.
SESSION is : For a userid, if multiple rows are there, then check the timestamp. If the timestamp diffrence is less than 30, consider it one session.
+---------+----------+
|userid | timestamp|
+---------+----------+
| 1 | 10 |
| 1 | 11 |
| 1 | 55 |
| 2 | 65 |
+---------+----------+
In this example above, for userid 1 the timestamp 10 and 11 is considered as a single session. But (55-11 = 44) which is greater than 30. So, it is another session.
So there are 2 sessions for userid 1 and
And there are 1 sessions for userid 2 and
And overall there are 2+1= 3 sessions. I only need to fetch this count. How to accomplish this?
This query work fine:
SELECT COUNT(FINAL_TAB.userid) + SUM(FINAL_TAB.FIN) FINAL_RESULT FROM
(SELECT TAB2.userid,SUM(CNT) FIN FROM
(SELECT TAB1.userid,CASE WHEN HA > 30 THEN 1 ELSE 0 END CNT FROM
(SELECT Q1.userid,CASE WHEN Q1.userid = Q2.userid THEN Q2.timestamp - Q1.timestamp
ELSE 0 END HA FROM
(SELECT #v1 := #v1 + 1 RN,TABLE1.* FROM TABLE1 JOIN(SELECT #v1 := 0)V1)Q1
LEFT OUTER JOIN
(SELECT #v2 := #v2 + 1 RN,TABLE1.* FROM TABLE1 JOIN(SELECT #v2 := 0)V2)Q2
ON Q1.RN = Q2.RN - 1)TAB1)TAB2 GROUP BY TAB2.userid)FINAL_TAB;
ORACLE
MS SQL Server
MYSQL
PostgreSQL
The most important thing in this query is that i add row number as RN to the first table and i create Q1 and Q2 then i join Q1 and Q2 on Q1.RN = Q2.RN - 1, there for we got current timestamp as Q1.timestamp and next timestamp as Q2.timestamp,and this query works on each kinds of RDBMS just with changing ROW Number functions.

How can I calculate the remaining amount per row?

I have a table that I want to find for each row id the amount remaining from the total. However, the order of amounts is in an ascending order.
id amount
1 3
2 2
3 1
4 5
The results should look like this:
id remainder
1 10
2 8
3 5
4 0
Any thoughts on how to accomplish this? I'm guessing that the over clause is the way to go, but I can't quite piece it together.Thanks.
Since you didn't specify your RDBMS, I will just assume it's Postgresql ;-)
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl;
Output:
| ID | AMOUNT | REMAINDER |
---------------------------
| 3 | 1 | 10 |
| 2 | 2 | 8 |
| 1 | 3 | 5 |
| 4 | 5 | 0 |
How it works: http://www.sqlfiddle.com/#!1/c446a/5
It works in SQL Server 2012 too: http://www.sqlfiddle.com/#!6/c446a/1
Thinking of solution for SQL Server 2008...
Btw, is your ID just a mere row number? If it is, just do this:
select
row_number() over(order by amount) as rn
, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl
order by rn;
Output:
| RN | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
But if you really need the ID intact and move the smallest amount on top, do this:
with a as
(
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder,
row_number() over(order by id) as id_sort,
row_number() over(order by amount) as amount_sort
from tbl
)
select a.id, sort.remainder
from a
join a sort on sort.amount_sort = a.id_sort
order by a.id_sort;
Output:
| ID | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
See query progression here: http://www.sqlfiddle.com/#!6/c446a/11
I just want to offer a simpler way to do this in descending order:
select id, sum(amount) over (order by id desc) as Remainder
from t
This will work in Oracle, SQL Server 2012, and Postgres.
The general solution requres a self join:
select t.id, coalesce(sum(tafter.amount), 0) as Remainder
from t left outer join
t tafter
on t.id < tafter.id
group by t.id
SQL Server 2008 answer, I can't provide an SQL Fiddle, it seems it strips the begin keyword, resulting to syntax errors. I tested this on my machine though:
create function RunningTotalGuarded()
returns #ReturnTable table(
Id int,
Amount int not null,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
as
begin
insert into #ReturnTable(id, amount, RunningTotal)
select id, amount, 0 from tbl order by amount;
declare #RunningTotal numeric(16,4) = 0;
declare #rn_check int = 0;
update #ReturnTable
set
#rn_check = #rn_check + 1
,#RunningTotal =
case when rn = #rn_check then
#RunningTotal + Amount
else
1 / 0
end
,RunningTotal = #RunningTotal;
return;
end;
To achieve your desired output:
with a as
(
select *, sum(amount) over() - RunningTotal as remainder
, row_number() over(order by id) as id_order
from RunningTotalGuarded()
)
select a.id, amount_order.remainder
from a
inner join a amount_order on amount_order.rn = a.id_order;
Rationale for guarded running total: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Choose the lesser evil ;-)

SQL - Find missing int values in mostly ordered sequential series

I manage a message based system in which a sequence of unique integer ids will be entirely represented at the end of the day, though they will not necessarily arrive in order.
I am looking for help in finding missing ids in this series using SQL. If my column values are something like the below, how can I find which ids I am missing in this sequence, in this case 6?
The sequence will begin and end at an arbitrary point each day, so min and max would differ upon each run. Coming from a Perl background I through some regex in there.
ids
1
2
3
5
4
7
9
8
10
Help would be much appreciated.
Edit: We run oracle
Edit2: Thanks all. I'll be running through your solutions next week in the office.
Edit3: I settled for the time being on something like the below, with ORIG_ID being the original id column and MY_TABLE being the source table. In looking closer at my data, there are a variety of cases beyond just number data in a string. In some cases there is a prefix or suffix of non-numeric characters. In others, there are dashes or spaces intermixed into the numeric id. Beyond this, ids periodically appear multiple times, so I included distinct.
I would appreciate any further input, specifically in regard to the best route of stripping out non-numeric characters.
SELECT
CASE
WHEN NUMERIC_ID + 1 = NEXT_ID - 1
THEN TO_CHAR( NUMERIC_ID + 1 )
ELSE TO_CHAR( NUMERIC_ID + 1 ) || '-' || TO_CHAR( NEXT_ID - 1 )
END
MISSING_SEQUENCES
FROM
(
SELECT
NUMERIC_ID,
LEAD (NUMERIC_ID, 1, NULL)
OVER
(
ORDER BY
NUMERIC_ID
ASC
)
AS NEXT_ID
FROM
(
SELECT
DISTINCT TO_NUMBER( REGEXP_REPLACE(ORIG_ID,'[^[:digit:]]','') )
AS NUMERIC_ID
FROM MY_TABLE
)
) WHERE NEXT_ID != NUMERIC_ID + 1
I've been there.
FOR ORACLE:
I found this extremely useful query on the net a while ago and noted down, however I don't remember the site now, you may search for "GAP ANALYSIS" on Google.
SELECT CASE
WHEN ids + 1 = lead_no - 1 THEN TO_CHAR (ids +1)
ELSE TO_CHAR (ids + 1) || '-' || TO_CHAR (lead_no - 1)
END
Missing_track_no
FROM (SELECT ids,
LEAD (ids, 1, NULL)
OVER (ORDER BY ids ASC)
lead_no
FROM YOURTABLE
)
WHERE lead_no != ids + 1
Here, the result is:
MISSING _TRACK_NO
-----------------
6
If there were multiple gaps,say 2,6,7,9 then it would be:
MISSING _TRACK_NO
-----------------
2
6-7
9
This is sometimes called an exclusion join. That is, try to do a join and return only rows where there is no match.
SELECT t1.value-1
FROM ThisTable AS t1
LEFT OUTER JOIN ThisTable AS t2
ON t1.id = t2.value+1
WHERE t2.value IS NULL
Note this will always report at least one row, which will be the MIN value.
Also, if there are gaps of two or more numbers, it will only report one missing value.
You didn't state your DBMS, so I'm assuming PostgreSQL:
select aid as missing_id
from generate_series( (select min(id) from message), (select max(id) from message)) as aid
left join message m on m.id = aid
where m.id is null;
This will report any missing value in a sequence between the minimum and maximum id in your table (including gaps that are bigger than one)
psql (9.1.1)
Type "help" for help.
postgres=> select * from message;
id
----
1
2
3
4
5
7
8
9
11
14
(10 rows)
postgres=> select aid as missing_id
postgres-> from generate_series( (select min(id) from message), (select max(id) from message)) as aid
postgres-> left join message m on m.id = aid
postgres-> where m.id is null;
missing_id
------------
6
10
12
13
(4 rows)
postgres=>
I applied it in mysql, it worked ..
mysql> select * from sequence;
+--------+
| number |
+--------+
| 1 |
| 2 |
| 4 |
| 6 |
| 7 |
| 8 |
+--------+
6 rows in set (0.00 sec)
mysql> SELECT t1.number - 1 FROM sequence AS t1 LEFT OUTER JOIN sequence AS t2 O
N t1.number = t2.number +1 WHERE t2.number IS NULL;
+---------------+
| t1.number - 1 |
+---------------+
| 0 |
| 3 |
| 5 |
+---------------+
3 rows in set (0.00 sec)
SET search_path='tmp';
DROP table tmp.table_name CASCADE;
CREATE table tmp.table_name ( num INTEGER NOT NULL PRIMARY KEY);
-- make some data
INSERT INTO tmp.table_name(num) SELECT generate_series(1,20);
-- create some gaps
DELETE FROM tmp.table_name WHERE random() < 0.3 ;
SELECT * FROM table_name;
-- EXPLAIN ANALYZE
WITH zbot AS (
SELECT 1+tn.num AS num
FROM table_name tn
WHERE NOT EXISTS (
SELECT * FROM table_name nx
WHERE nx.num = tn.num+1
)
)
, ztop AS (
SELECT -1+tn.num AS num
FROM table_name tn
WHERE NOT EXISTS (
SELECT * FROM table_name nx
WHERE nx.num = tn.num-1
)
)
SELECT zbot.num AS bot
,ztop.num AS top
FROM zbot, ztop
WHERE zbot.num <= ztop.num
AND NOT EXISTS ( SELECT *
FROM table_name nx
WHERE nx.num >= zbot.num
AND nx.num <= ztop.num
)
ORDER BY bot,top
;
Result:
CREATE TABLE
INSERT 0 20
DELETE 9
num
-----
1
2
6
7
10
11
13
14
15
18
19
(11 rows)
bot | top
-----+-----
3 | 5
8 | 9
12 | 12
16 | 17
(4 rows)
Note: a recursive CTE is also possible (and probably shorter).
UPDATE: here comes the recursive CTE ...:
WITH RECURSIVE tree AS (
SELECT 1+num AS num
FROM table_name t0
UNION
SELECT 1+num FROM tree tt
WHERE EXISTS ( SELECT *
FROM table_name xt
WHERE xt.num > tt.num
)
)
SELECT * FROM tree
WHERE NOT EXISTS (
SELECT *
FROM table_name nx
WHERE nx.num = tree.num
)
ORDER BY num
;
Results: (same data)
num
-----
3
4
5
8
9
12
16
17
20
(9 rows)
select student_key, next_student_key
from (
select student_key, lead(student_key) over (order by student_key) next_fed_cls_prgrm_key
from student_table
)
where student_key <> next_student_key-1;