sql finding gaps in a couple number/year - sql

I have a table like:
id
number
year
I want to find "holes", or gaps, not considering the id but only the couple year/number.
There is a gap when, for the same year, there are two non-consecutive numbers, the result being the year and all the numbers between (excluding extremes) those two non-consecutive numbers. Also note that the lower end is always 1 so that if 1 is missing, it is a gap.
For example, having:
id n year
1 1 2012
2 2 2012
3 5 2012
4 2 2010
I want as a result:
3/2012
4/2012
1/2010

The trick to finding missing entries in sequences is to generate a cartesian product of all available combinations in the sequence, then use NOT EXISTS to elimate those that exist. This is hard to do in a non DBMS specific way because all have different ways in which to optmially create a sequence on the fly. For Oracle I use:
SELECT RowNum AS r
FROM Dual
CONNECT BY Level <= MaxRequiredValue;
So, to generate a list of all available year/n pairs I would use:
SELECT d.Year, n.r
FROM ( SELECT year, MAX(n) AS MaxN
FROM T
GROUP BY Year
) d
INNER JOIN
( SELECT RowNum AS r
FROM Dual
CONNECT BY Level <= (SELECT MAX(n) FROM T)
) n
ON r < MaxN;
Where I am getting the Maximum n for each year and joining this to a list of integers from 1 to the highest n of all where this integer lists highest value is less than that years maximium value.
Finally use NOT EXISTS to elimate the values that already exist:
SELECT d.Year, n.r
FROM ( SELECT year, MAX(n) AS MaxN
FROM T
GROUP BY Year
) d
INNER JOIN
( SELECT RowNum AS r
FROM Dual
CONNECT BY Level < (SELECT MAX(n) FROM T)
) n
ON r = MaxN
WHERE NOT EXISTS
( SELECT 1
FROM T
WHERE d.Year = t.Year
AND n.r = t.n
);
Working example on SQL Fiddle
EDIT
Since I couldn't find a non DMBS specific solution I thought I'd better do the decent thing and create some examples for other DBMS.
SQL Server Example
Postgresql Example
My SQL Example

Another option is to use a temporary table like so:
create table #tempTable ([year] int, n int)
insert
into #tempTable
select t.year, 1
from tableName t
group by t.year
while exists(
select *
from tableName t1
where t1.n > (select MAX(t2.n) from #tempTable t2 where t2.year = t1.year)
)
begin
insert
into #tempTable
select t1.year,
(select MAX(t2.n)+1 from #tempTable t2 where t2.year = t1.year)
from tableName t1
where t1.n > (select MAX(t2.n) from #tempTable t2 where t2.year = t1.year)
end
delete t2
from #tempTable t2
inner join tableName t1
on t1.year = t2.year
and t1.n = t2.n
select [year], n
from #tempTable
drop table #tempTable

Related

SQL mimicking analytic LEAD/LAG function with some restrictions

There is a table named test, with one column named amount (number datatype).
There is no PK for this table, and amounts can be repeated.
The table's DDL is below: (created for testing purposes in Oracle 18c xe)
create table test (amount number(20));
insert into test values (20);
insert into test values (10);
insert into test values (30);
insert into test values (20);
insert into test values (10);
insert into test values (40);
insert into test values (15);
insert into test values (40);
The goal is to mimick the LEAD analytical function results ordered by amount, but no analytic (incl. ranking and window functions) can be used. PSM (incl MYSQL stored features, PL/SQL, T-SQL etc.) or some kind of identity tables can neither be used.
The desired output is shown in lead_rows_analytic_amount column:
select
amount,
lead(amount) over (order by amount) as lead_rows_analytic_amount
from test t1;
actual result:
amount lead_rows_analytic_amount
10 10
10 15
15 20
20 20
20 30
30 40
40 40
40
What are some elegant ways to achieve the result taking into account the restrictions set?
The DB is irrelevant here, if the restrictions apply.
I am attaching a stupidly clumsy and direct solution I came up with, but the goal is to get something more elegant (ignoring the performance).
with initial_rn as (
select
amount,t1.rowid,
( select count (*)
from test t2
where
t1.amount >= t2.amount
) as rn
from test t1
)
,prep_table as (
select t1.*,nvl2(repeating_rn,1,0) as repeating_rn_tag,
nvl(( SELECT max(rn)
FROM initial_rn t2
where t2.rn < t1.rn
),0) AS lag_rn
from initial_rn t1
left join (select rn as repeating_rn
from initial_rn
group by rn
having count(*) > 1) t2 on t1.rn = t2.repeating_rn
)
,final_rn as (
select t1.amount,case when repeating_rn_tag = 0 then rn else lag_rn +
( select count (*)
from prep_table t2
where
t1.rowid >= t2.rowid and t1.repeating_rn_tag = 1 and t2.repeating_rn_tag = 1 and t1.rn = t2.rn
)
end as final_rn
from prep_table t1
)
select t1.*,
lead(amount) over (order by amount) as lead_rows_analytic_amount,
(select min(amount)
from test t2
where t2.amount > t1.amount
) as lead_range_amount,
(SELECT MIN(amount)
FROM final_rn t2
where t2.final_rn > t1.final_rn
) AS lead_amount
from final_rn t1
order by amount
;
In Oracle, you can use:
SELECT CASE WHEN LEVEL = 1 THEN amount ELSE PRIOR amount END AS amount,
CASE WHEN LEVEL = 1 THEN NULL ELSE amount END AS lead_amount
FROM (
SELECT amount,
ROWNUM AS rn
FROM (
SELECT amount
FROM test
ORDER BY amount
)
)
WHERE LEVEL = 2
OR LEVEL = 1 AND CONNECT_BY_ISLEAF = 1
CONNECT BY PRIOR rn + 1 = rn
More generally, you can use:
WITH ordered_amounts (amount) AS (
SELECT amount
FROM test
ORDER BY amount
),
indexed_amounts (amount, idx) AS (
SELECT amount,
ROWNUM -- Or any function that gives sequentially increasing values
FROM ordered_amounts
)
SELECT i.amount,
nxt.amount AS lead_amount
FROM indexed_amounts i
LEFT OUTER JOIN indexed_amounts nxt
ON (i.idx + 1 = nxt.idx)
Which, for the sample data, both output:
AMOUNT
LEAD_AMOUNT
10
10
10
15
15
20
20
20
20
30
30
40
40
40
40
null
db<>fiddle here
Ok so just throwing this out there as something you could do, using JSON functionality (support exists in most RDBMS)
This is SQL server syntax:
with v as (
select *
from OpenJson(
(select Concat('[',String_Agg(amount,',')
within group (order by amount),']')from test)
)
)
select value, (
select value
from v v2
where v2.[key]=v.[key]+1
) as lead_rows_analytic_amount
from v
Example fiddle
To contribute to this wonderful collection of solutions how to avoid window functions, I feel it's worth mention Oracle model clause:
with test as (
select column_value as amount
from table(sys.ku$_vcnt(20,10,30,20,10,40,15,40)) -- or your table, I'm just lazy to create fiddle
)
select amount, lead_amount
from (
select *
from (select amount, 0 as lead_amount from test order by amount)
model
dimension by (rownum as rn)
measures (amount, lead_amount)
rules (amount[any] = amount[cv(rn)], lead_amount[any] = amount[cv(rn) + 1])
)
order by amount
(Not sure if it is helpful for you, compared with window functions.)
If you had a primary key (any table should have):
select a.*, (select min(r.amount)
from #test r
where ((r.id <> a.id and r.amount > a.amount)
OR
(r.id > a.id and r.amount=a.amount)
)
) as NextVal
from #test a
order by a.amount, a.id

How to get the all the predecessors of a number in a SQL query

How can I get all the predecessors of a number in a SQL select statement?
I have this query:
SELECT
COUNT(CASE WHEN tb2.status = 'C' THEN 1 END) AS num_sales
FROM
table1 AS tb1
INNER JOIN
table2 AS tb2 ON tb1.id = tb2.id_sales
I get this result:
num_sales
7
5
4
3
1
0
I want
num_sales
predecessors
7
1,2,3,4,5,6,7
5
1,2,3,4,5
4
1,2,3,4
3
1,2,3
1
1
0
HELP!
With Standard SQL, you could use listagg():
select mynumber,
(select listagg(t2.mynumber, ',') within group (order by t2.mynumber)
from mytable t2
where t2.mynumber <= t.mynumber
) as predecessors
from mytable t;
Similar functionality exists is most databases, but the exact details for string aggregation often very by database.
EDIT:
In Postgres, you would use generate_series():
select mynumber,
(select string_agg(gs.n, ',' order by gs.n desc)
from generate_series(1, t.mynumber, 1) gs(n)
) as predecessors
from mytable t;

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead(), I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):
;WITH Digits AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
select u.Digit
+ t.Digit*10
+ h.Digit*100
+ th.Digit*1000
+ tth.Digit*10000
--Add 10000, 100000 multipliers if required here.
as myId
from Digits u
cross join Digits t
cross join Digits h
cross join Digits th
cross join Digits tth
--Add the cross join for higher numbers
)
SELECT myId
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)
Problem: we need to find the gap range in id field
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
Solution
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId
Output
GapStart GapEnd
----------- -----------
4 7
12 14

SQL stored procedure to add up values and stop once the maximum has been reached

I would like to write a SQL query (SQL Server) that will return rows (in a given order) but only up to a given total. My client has paid me a given amount, and I want to return only those rows that are <= to that amount.
For example, if the client paid me $370, and the data in the table is
id amount
1 100
2 122
3 134
4 23
5 200
then I would like to return only rows 1, 2 and 3
This needs to be efficient, since there will be thousands of rows, so a for loop would not be ideal, I guess. Or is SQL Server efficient enough to optimise a stored proc with for loops?
Thanks in advance. Jim.
A couple of options are.
1) Triangular Join
SELECT *
FROM YourTable Y1
WHERE (SELECT SUM(amount)
FROM YourTable Y2
WHERE Y1.id >= Y2.id ) <= 370
2) Recursive CTE
WITH RecursiveCTE
AS (
SELECT TOP 1 id, amount, CAST(amount AS BIGINT) AS Total
FROM YourTable
ORDER BY id
UNION ALL
SELECT R.id, R.amount, R.Total
FROM (
SELECT T.*,
T.amount + Total AS Total,
rn = ROW_NUMBER() OVER (ORDER BY T.id)
FROM YourTable T
JOIN RecursiveCTE R
ON R.id < T.id
) R
WHERE R.rn = 1 AND Total <= 370
)
SELECT id, amount, Total
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
The 2nd one will likely perform better.
In SQL Server 2012 you will be able to so something like
;WITH CTE AS
(
SELECT id,
amount,
SUM(amount) OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM YourTable
)
SELECT *
FROM CTE
WHERE RunningTotal <=370
Though there will probably be a more efficient way (to stop the scan as soon as the total is reached)
Straight-forward approach :
SELECT a.id, a.amount
FROM table1 a
INNER JOIN table1 b ON (b.id <=a.id)
GROUP BY a.id, a.amount
HAVING SUM(b.amount) <= 370
Unfortunately, it has N^2 performance issue.
something like this:
select id from
(
select t1.id, t1.amount, sum( t2.amount ) s
from tst t1, tst t2
where t2.id <= t1.id
group by t1.id, t1.amount
)
where s < 370

How to filter out records grouped by date with a large date difference

I have some records, grouped by name and date.
I would like to find any records in a table that have a date difference between them larger than a week, from the most recent record.
Would this be possible to do with a cte?
I am thinking something along these lines (it is difficult to explain)
; with mycte as (
select *
from #GroupedRecords)
select *
from mycte a
join (select *
from #GroupedRecords) b on a.Name = b.Name
where datediff(day, a.DateCreated, b.DateCreated) > 7
For example:
Id Name Date
1 Foo 02/03/2010
2 Bar 23/02/2010
3 Ram 21/01/2010
4 Foo 29/02/2010
5 Foo 22/02/2010
6 Foo 05/12/2009
The results should be:
Id Name Date
1 Foo 02/03/2010
5 Foo 22/02/2010
6 Foo 05/12/2009
You can try:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
WHERE ( (SELECT MAX(DATE) AS md
FROM groupedrecords gr2
WHERE gr1.name = gr2.name) - gr1.DATE ) > 7;
Or probably better yet:
SELECT id,
name,
DATE
FROM groupedrecords AS gr1
INNER JOIN (SELECT name,
MAX(DATE) AS md
FROM groupedrecords AS gr2
GROUP BY name) AS q1
ON gr1.name = q1.name
WHERE ( q1.md - gr1.DATE ) > 7;
UPDATE: As suggested in the comments, here is a version that uses union to get the id with the max date per group AND the ids of those that are 7 days or older than the max date. I used a CTE for fun, it was not necessary. Note that if there is more than 1 ID that shares the max date in a group, this query will need to be modified-
WITH CTE
AS (SELECT name,
Max(date) AS MD
FROM Records
GROUP BY name)
SELECT R.ID,
R.name,
R.date
FROM CTE
INNER JOIN Records AS R
ON CTE.Name = R.Name
AND CTE.MD = R.date
UNION ALL
SELECT r1.id,
r1.name,
r1.DATE
FROM Records AS R1
INNER JOIN CTE
ON CTE.name = R1.name
WHERE ( CTE.md - R1.DATE ) > 7
ORDER BY name ASC,
date DESC
I wonder if this gets close to a solution:
; with tableWithRow as (
select *, row_number() over (order by name, date) as rowNum
from t
)
select t1.*, t2.id t2id, t2.name t2name, t2.date t2date, t2.rowNum t2rowNum
from tableWithRow t1
join tableWithRow t2
on t1.rowNum = t2.rowNum + 1 and t1.name = t2.name