SQL mimicking analytic LEAD/LAG function with some restrictions - sql

There is a table named test, with one column named amount (number datatype).
There is no PK for this table, and amounts can be repeated.
The table's DDL is below: (created for testing purposes in Oracle 18c xe)
create table test (amount number(20));
insert into test values (20);
insert into test values (10);
insert into test values (30);
insert into test values (20);
insert into test values (10);
insert into test values (40);
insert into test values (15);
insert into test values (40);
The goal is to mimick the LEAD analytical function results ordered by amount, but no analytic (incl. ranking and window functions) can be used. PSM (incl MYSQL stored features, PL/SQL, T-SQL etc.) or some kind of identity tables can neither be used.
The desired output is shown in lead_rows_analytic_amount column:
select
amount,
lead(amount) over (order by amount) as lead_rows_analytic_amount
from test t1;
actual result:
amount lead_rows_analytic_amount
10 10
10 15
15 20
20 20
20 30
30 40
40 40
40
What are some elegant ways to achieve the result taking into account the restrictions set?
The DB is irrelevant here, if the restrictions apply.
I am attaching a stupidly clumsy and direct solution I came up with, but the goal is to get something more elegant (ignoring the performance).
with initial_rn as (
select
amount,t1.rowid,
( select count (*)
from test t2
where
t1.amount >= t2.amount
) as rn
from test t1
)
,prep_table as (
select t1.*,nvl2(repeating_rn,1,0) as repeating_rn_tag,
nvl(( SELECT max(rn)
FROM initial_rn t2
where t2.rn < t1.rn
),0) AS lag_rn
from initial_rn t1
left join (select rn as repeating_rn
from initial_rn
group by rn
having count(*) > 1) t2 on t1.rn = t2.repeating_rn
)
,final_rn as (
select t1.amount,case when repeating_rn_tag = 0 then rn else lag_rn +
( select count (*)
from prep_table t2
where
t1.rowid >= t2.rowid and t1.repeating_rn_tag = 1 and t2.repeating_rn_tag = 1 and t1.rn = t2.rn
)
end as final_rn
from prep_table t1
)
select t1.*,
lead(amount) over (order by amount) as lead_rows_analytic_amount,
(select min(amount)
from test t2
where t2.amount > t1.amount
) as lead_range_amount,
(SELECT MIN(amount)
FROM final_rn t2
where t2.final_rn > t1.final_rn
) AS lead_amount
from final_rn t1
order by amount
;

In Oracle, you can use:
SELECT CASE WHEN LEVEL = 1 THEN amount ELSE PRIOR amount END AS amount,
CASE WHEN LEVEL = 1 THEN NULL ELSE amount END AS lead_amount
FROM (
SELECT amount,
ROWNUM AS rn
FROM (
SELECT amount
FROM test
ORDER BY amount
)
)
WHERE LEVEL = 2
OR LEVEL = 1 AND CONNECT_BY_ISLEAF = 1
CONNECT BY PRIOR rn + 1 = rn
More generally, you can use:
WITH ordered_amounts (amount) AS (
SELECT amount
FROM test
ORDER BY amount
),
indexed_amounts (amount, idx) AS (
SELECT amount,
ROWNUM -- Or any function that gives sequentially increasing values
FROM ordered_amounts
)
SELECT i.amount,
nxt.amount AS lead_amount
FROM indexed_amounts i
LEFT OUTER JOIN indexed_amounts nxt
ON (i.idx + 1 = nxt.idx)
Which, for the sample data, both output:
AMOUNT
LEAD_AMOUNT
10
10
10
15
15
20
20
20
20
30
30
40
40
40
40
null
db<>fiddle here

Ok so just throwing this out there as something you could do, using JSON functionality (support exists in most RDBMS)
This is SQL server syntax:
with v as (
select *
from OpenJson(
(select Concat('[',String_Agg(amount,',')
within group (order by amount),']')from test)
)
)
select value, (
select value
from v v2
where v2.[key]=v.[key]+1
) as lead_rows_analytic_amount
from v
Example fiddle

To contribute to this wonderful collection of solutions how to avoid window functions, I feel it's worth mention Oracle model clause:
with test as (
select column_value as amount
from table(sys.ku$_vcnt(20,10,30,20,10,40,15,40)) -- or your table, I'm just lazy to create fiddle
)
select amount, lead_amount
from (
select *
from (select amount, 0 as lead_amount from test order by amount)
model
dimension by (rownum as rn)
measures (amount, lead_amount)
rules (amount[any] = amount[cv(rn)], lead_amount[any] = amount[cv(rn) + 1])
)
order by amount
(Not sure if it is helpful for you, compared with window functions.)

If you had a primary key (any table should have):
select a.*, (select min(r.amount)
from #test r
where ((r.id <> a.id and r.amount > a.amount)
OR
(r.id > a.id and r.amount=a.amount)
)
) as NextVal
from #test a
order by a.amount, a.id

Related

SQL group by if values are close

Class| Value
-------------
A | 1
A | 2
A | 3
A | 10
B | 1
I am not sure whether it is practical to achieve this using SQL.
If the difference of values are less than 5 (or x), then group the rows (of course with the same Class)
Expected result
Class| ValueMin | ValueMax
---------------------------
A | 1 | 3
A | 10 | 10
B | 1 | 1
For fixed intervals, we can easily use "GROUP BY". But now the grouping is based on nearby row's value. So if the values are consecutive or very close, they will be "chained together".
Thank you very much
Assuming MSSQL
You are trying to group things by gaps between values. The easiest way to do this is to use the lag() function to find the gaps:
select class, min(value) as minvalue, max(value) as maxvalue
from (select class, value,
sum(IsNewGroup) over (partition by class order by value) as GroupId
from (select class, value,
(case when lag(value) over (partition by class order by value) > value - 5
then 0 else 1
end) as IsNewGroup
from t
) t
) t
group by class, groupid;
Note that this assumes SQL Server 2012 for the use of lag() and cumulative sum.
Update:
*This answer is incorrect*
Assuming the table you gave is called sd_test, the following query will give you the output you are expecting
In short, we need a way to find what was the value on the previous row. This is determined using a join on row ids. Then create a group to see if the difference is less than 5. and then it is just regular 'Group By'.
If your version of SQL Server supports windowing functions with partitioning the code would be much more readable.
SELECT
A.CLASS
,MIN(A.VALUE) AS MIN_VALUE
,MAX(A.VALUE) AS MAX_VALUE
FROM
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS A
LEFT JOIN
(SELECT
ROW_NUMBER()OVER(PARTITION BY CLASS ORDER BY VALUE) AS ROW_ID
,CLASS
,VALUE
FROM SD_TEST) AS B
ON A.CLASS = B.CLASS AND A.ROW_ID=B.ROW_ID+1
GROUP BY A.CLASS,CASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END
ORDER BY A.CLASS,cASE WHEN ABS(COALESCE(B.VALUE,0)-A.VALUE)<5 THEN 1 ELSE 0 END DESC
ps: I think the above is ANSI compliant. So should run in most SQL variants. Someone can correct me if it is not.
These give the correct result, using the fact that you must have the same number of group starts as ends and that they will both be in ascending order.
if object_id('tempdb..#temp') is not null drop table #temp
create table #temp (class char(1),Value int);
insert into #temp values ('A',1);
insert into #temp values ('A',2);
insert into #temp values ('A',3);
insert into #temp values ('A',10);
insert into #temp values ('A',13);
insert into #temp values ('A',14);
insert into #temp values ('b',7);
insert into #temp values ('b',8);
insert into #temp values ('b',9);
insert into #temp values ('b',12);
insert into #temp values ('b',22);
insert into #temp values ('b',26);
insert into #temp values ('b',67);
Method 1 Using CTE and row offsets
with cte as
(select distinct class,value,ROW_NUMBER() over ( partition by class order by value ) as R from #temp),
cte2 as
(
select
c1.class
,c1.value
,c2.R as PreviousRec
,c3.r as NextRec
from
cte c1
left join cte c2 on (c1.class = c2.class and c1.R= c2.R+1 and c1.Value < c2.value + 5)
left join cte c3 on (c1.class = c3.class and c1.R= c3.R-1 and c1.Value > c3.value - 5)
)
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where PreviousRec is null) as Starts join
(
select
class
,value
,row_number() over ( partition by class order by value ) as GroupNumber
from cte2
where NextRec is null) as Ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
** Method 2 Inline views using not exists **
select
Starts.Class
,Starts.Value as StartValue
,Ends.Value as EndValue
from
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value < t.Value and Value > t.Value -5 )
) Starts join
(
select class,Value ,row_number() over ( partition by class order by value ) as GroupNumber
from
(select distinct class,value from #temp) as T
where not exists (select 1 from #temp where class=t.class and Value > t.Value and Value < t.Value +5 )
) ends on starts.class=ends.class and starts.GroupNumber = ends.GroupNumber
In both methods I use a select distinct to begin because if you have a dulpicate entry at a group start or end things go awry without it.
Here is one way of getting the information you are after:
SELECT Under5.Class,
(
SELECT MIN(m2.Value)
FROM MyTable AS m2
WHERE m2.Value < 5
AND m2.Class = Under5.Class
) AS ValueMin,
(
SELECT MAX(m3.Value)
FROM MyTable AS m3
WHERE m3.Value < 5
AND m3.Class = Under5.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m1.Class
FROM MyTable AS m1
WHERE m1.Value < 5
) AS Under5
UNION
SELECT Over4.Class,
(
SELECT MIN(m4.Value)
FROM MyTable AS m4
WHERE m4.Value >= 5
AND m4.Class = Over4.Class
) AS ValueMin,
(
SELECT Max(m5.Value)
FROM MyTable AS m5
WHERE m5.Value >= 5
AND m5.Class = Over4.Class
) AS ValueMax
FROM
(
SELECT DISTINCT m6.Class
FROM MyTable AS m6
WHERE m6.Value >= 5
) AS Over4

Find all integer gaps in SQL

I have a database which is used to store information about different matches for a game that I pull in from an external source. Due to a few issues, there are occasional gaps (which could be anywhere from 1 missing ID to a few hundred) in the database. I want to have the program pull in the data for the missing games, but I need to get that list first.
Here is the format of the table:
id (pk-identity) | GameID (int) | etc. | etc.
I had thought of writing a program to run through a loop and query for each GameID starting at 1, but it seems like there should be a more efficient way to get the missing numbers.
Is there an easy and efficient way, using SQL Server, to find all the missing numbers from the range?
The idea is to look at where the gaps start. Let me assume you are using SQL Server 2012, and so have the lag() and lead() functions. The following gets the next id:
select t.*, lead(id) over (order by id) as nextid
from t;
If there is a gap, then nextid <> id+1. You can now characterize the gaps using where:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
from t
) t
where nextid <> id+1;
EDIT:
Without the lead(), I would do the same thing with a correlated subquery:
select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
(select top 1 id
from t t2
where t2.id > t.id
order by t2.id
) as nextid
from t
) t
where nextid <> id+1;
Assuming the id is a primary key on the table (or even that it just has an index), both methods should have reasonable performance.
Numbers table!
CREATE TABLE dbo.numbers (
number int NOT NULL
)
ALTER TABLE dbo.numbers
ADD
CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
WITH FILLFACTOR = 100
GO
INSERT INTO dbo.numbers (number)
SELECT (a.number * 256) + b.number As number
FROM (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As a
CROSS
JOIN (
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number <= 255
) As b
GO
Then you can perform an OUTER JOIN or EXISTS` between your two tables and find the gaps...
SELECT *
FROM dbo.numbers
WHERE NOT EXISTS (
SELECT *
FROM your_table
WHERE id = numbers.number
)
-- OR
SELECT *
FROM dbo.numbers
LEFT
JOIN your_table
ON your_table.id = numbers.number
WHERE your_table.id IS NULL
I like the "gaps and islands" approach. It goes a little something like this:
WITH Islands AS (
SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
FROM dbo.yourTable
)
SELECT MIN(GameID), MAX(Game_id)
FROM Islands
GROUP BY IslandID
That query will get you the list of contiguous ranges. From there, you can self-join that result set (on successive IslandIDs) to get the gaps. There is a bit of work in getting the IslandIDs themselves to be contiguous though. So, extending the above query:
WITH
cte1 AS (
SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
FROM dbo.yourTable
)
, cte2 AS (
SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
FROM cte1
GROUP BY [rn]
)
,Islands AS (
SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
from cte2
)
SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
FROM Islands AS a
LEFT JOIN Islands AS b
ON a.IslandID + 1 = b.IslandID
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
WHERE id + 1 <> nextId
GapStart GapEnd
----------- -----------
3 8
11 15
Try this (This covers upto 10000 Ids starting from 1, if you need more you can add more to Numbers table below):
;WITH Digits AS (
select Digit
from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
,Numbers AS (
select u.Digit
+ t.Digit*10
+ h.Digit*100
+ th.Digit*1000
+ tth.Digit*10000
--Add 10000, 100000 multipliers if required here.
as myId
from Digits u
cross join Digits t
cross join Digits h
cross join Digits th
cross join Digits tth
--Add the cross join for higher numbers
)
SELECT myId
FROM Numbers
WHERE myId NOT IN (SELECT GameId FROM YourTable)
Problem: we need to find the gap range in id field
SELECT * FROM #tab1
id col1
----------- --------------------
1 a
2 a
3 a
8 a
9 a
10 a
11 a
15 a
16 a
17 a
18 a
Solution
WITH cte (id,nextId) as
(SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId FROM #tab1 t)
SELECT id + 1, nextId - 1 FROM cte
WHERE id + 1 <> nextId
Output
GapStart GapEnd
----------- -----------
4 7
12 14

sql finding gaps in a couple number/year

I have a table like:
id
number
year
I want to find "holes", or gaps, not considering the id but only the couple year/number.
There is a gap when, for the same year, there are two non-consecutive numbers, the result being the year and all the numbers between (excluding extremes) those two non-consecutive numbers. Also note that the lower end is always 1 so that if 1 is missing, it is a gap.
For example, having:
id n year
1 1 2012
2 2 2012
3 5 2012
4 2 2010
I want as a result:
3/2012
4/2012
1/2010
The trick to finding missing entries in sequences is to generate a cartesian product of all available combinations in the sequence, then use NOT EXISTS to elimate those that exist. This is hard to do in a non DBMS specific way because all have different ways in which to optmially create a sequence on the fly. For Oracle I use:
SELECT RowNum AS r
FROM Dual
CONNECT BY Level <= MaxRequiredValue;
So, to generate a list of all available year/n pairs I would use:
SELECT d.Year, n.r
FROM ( SELECT year, MAX(n) AS MaxN
FROM T
GROUP BY Year
) d
INNER JOIN
( SELECT RowNum AS r
FROM Dual
CONNECT BY Level <= (SELECT MAX(n) FROM T)
) n
ON r < MaxN;
Where I am getting the Maximum n for each year and joining this to a list of integers from 1 to the highest n of all where this integer lists highest value is less than that years maximium value.
Finally use NOT EXISTS to elimate the values that already exist:
SELECT d.Year, n.r
FROM ( SELECT year, MAX(n) AS MaxN
FROM T
GROUP BY Year
) d
INNER JOIN
( SELECT RowNum AS r
FROM Dual
CONNECT BY Level < (SELECT MAX(n) FROM T)
) n
ON r = MaxN
WHERE NOT EXISTS
( SELECT 1
FROM T
WHERE d.Year = t.Year
AND n.r = t.n
);
Working example on SQL Fiddle
EDIT
Since I couldn't find a non DMBS specific solution I thought I'd better do the decent thing and create some examples for other DBMS.
SQL Server Example
Postgresql Example
My SQL Example
Another option is to use a temporary table like so:
create table #tempTable ([year] int, n int)
insert
into #tempTable
select t.year, 1
from tableName t
group by t.year
while exists(
select *
from tableName t1
where t1.n > (select MAX(t2.n) from #tempTable t2 where t2.year = t1.year)
)
begin
insert
into #tempTable
select t1.year,
(select MAX(t2.n)+1 from #tempTable t2 where t2.year = t1.year)
from tableName t1
where t1.n > (select MAX(t2.n) from #tempTable t2 where t2.year = t1.year)
end
delete t2
from #tempTable t2
inner join tableName t1
on t1.year = t2.year
and t1.n = t2.n
select [year], n
from #tempTable
drop table #tempTable

SQL stored procedure to add up values and stop once the maximum has been reached

I would like to write a SQL query (SQL Server) that will return rows (in a given order) but only up to a given total. My client has paid me a given amount, and I want to return only those rows that are <= to that amount.
For example, if the client paid me $370, and the data in the table is
id amount
1 100
2 122
3 134
4 23
5 200
then I would like to return only rows 1, 2 and 3
This needs to be efficient, since there will be thousands of rows, so a for loop would not be ideal, I guess. Or is SQL Server efficient enough to optimise a stored proc with for loops?
Thanks in advance. Jim.
A couple of options are.
1) Triangular Join
SELECT *
FROM YourTable Y1
WHERE (SELECT SUM(amount)
FROM YourTable Y2
WHERE Y1.id >= Y2.id ) <= 370
2) Recursive CTE
WITH RecursiveCTE
AS (
SELECT TOP 1 id, amount, CAST(amount AS BIGINT) AS Total
FROM YourTable
ORDER BY id
UNION ALL
SELECT R.id, R.amount, R.Total
FROM (
SELECT T.*,
T.amount + Total AS Total,
rn = ROW_NUMBER() OVER (ORDER BY T.id)
FROM YourTable T
JOIN RecursiveCTE R
ON R.id < T.id
) R
WHERE R.rn = 1 AND Total <= 370
)
SELECT id, amount, Total
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
The 2nd one will likely perform better.
In SQL Server 2012 you will be able to so something like
;WITH CTE AS
(
SELECT id,
amount,
SUM(amount) OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
FROM YourTable
)
SELECT *
FROM CTE
WHERE RunningTotal <=370
Though there will probably be a more efficient way (to stop the scan as soon as the total is reached)
Straight-forward approach :
SELECT a.id, a.amount
FROM table1 a
INNER JOIN table1 b ON (b.id <=a.id)
GROUP BY a.id, a.amount
HAVING SUM(b.amount) <= 370
Unfortunately, it has N^2 performance issue.
something like this:
select id from
(
select t1.id, t1.amount, sum( t2.amount ) s
from tst t1, tst t2
where t2.id <= t1.id
group by t1.id, t1.amount
)
where s < 370

SQL Query: How can I get data of row with number 1000 direclty?

If I have a SQL Table called Persons that contain about 30000 rows and I want to make a SQL query that retrieve the data of row number 1000 ... I got it by non professional way by making the following query
Select Top 1 * from
(
Select top 1000 *
From Persons
Order By ID
)A
Order By A.ID desc
But I feel that's a more optimized query that can do that ... can any lead me to perfect query ?
Note : table contain PK column called "ID" but it's not sequential
row_number is the best approach but as you only want a single row be sure to look at the plan. It might turn out better to identify the desired row then join back onto the original table to retrieve additional columns.
WITH T1
AS (SELECT *,
ROW_NUMBER() OVER (ORDER BY number) AS RN
FROM master..spt_values)
SELECT name,
number,
type,
low,
high,
status
FROM T1
WHERE RN = 1000;
Gives
Table 'spt_values'. Scan count 1, logical reads 2005
CPU time = 0 ms, elapsed time = 19 ms.
WITH T2
AS (SELECT number,
type,
name,
ROW_NUMBER() OVER (ORDER BY number) AS RN
FROM master..spt_values)
SELECT TOP 1 C.name,
C.number,
C.type,
C.low,
C.high,
C.status
FROM T2
CROSS APPLY (SELECT *
FROM master..spt_values v
WHERE v.number = T2.number
AND v.type = T2.type
AND ( v.name = T2.name
OR ( v.name IS NULL
AND T2.name IS NULL ) )) C
WHERE RN = 1000;
Gives
Table 'spt_values'. Scan count 1, logical reads 7
CPU time = 0 ms, elapsed time = 1 ms.
In SQL Server 2005+ you can use the following:
WITH MyCte AS
(
SELECT
[CategoryId]
,[CategoryName]
,[CategoryDescription]
,ROW_NUMBER() OVER (ORDER BY CategoryId ASC) AS RowNum
FROM
[Carmack].[dbo].[job_Categories]
)
SELECT *
FROM MyCte
WHERE RowNum = 3
you can try this
select * from Person P
where 999 = ( select count(id) from Person P1 , Person P2
where P1.id = P.id and P2.id < P1.id)