Find gaps of a sequence in SQL without creating additional tables - sql

I have a table invoices with a field invoice_number. This is what happens when i execute select invoice_number from invoice:
invoice_number
--------------
1
2
3
5
6
10
11
I want a SQL that gives me the following result:
gap_start | gap_end
4 | 4
7 | 9
How can i write a SQL to perform such query?
I am using PostgreSQL.

With modern SQL, this can easily be done using window functions:
select invoice_number + 1 as gap_start,
next_nr - 1 as gap_end
from (
select invoice_number,
lead(invoice_number) over (order by invoice_number) as next_nr
from invoices
) nr
where invoice_number + 1 <> next_nr;
SQLFiddle: http://sqlfiddle.com/#!15/1e807/1

We can use a simpler technique to get all missing values first, by joining on a generated sequence column like so:
select series
from generate_series(1, 11, 1) series
left join invoices on series = invoices.invoice_number
where invoice_number is null;
This gets us the series of missing numbers, which can be useful on it's own in some cases.
To get the gap start/end range, we can instead join the source table with itself.
select invoices.invoice_number + 1 as start,
min(fr.invoice_number) - 1 as stop
from invoices
left join invoices r on invoices.invoice_number = r.invoice_number - 1
left join invoices fr on invoices.invoice_number < fr.invoice_number
where r.invoice_number is null
and fr.invoice_number is not null
group by invoices.invoice_number,
r.invoice_number;
dbfiddle: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=32c5f3c021b0f1a876305a2bd3afafc9
This is probably less optimised than the above solutions, but could be useful in SQL servers that don't support lead() function perhaps.
Full credit goes to this excellent page in SILOTA docs:
http://www.silota.com/docs/recipes/sql-gap-analysis-missing-values-sequence.html
I highly recommend reading it, as it explains the solution step by step.

I found another query:
select invoice_number + lag gap_start,
invoice_number + lead - 1 gap_end
from (select invoice_number,
invoice_number - lag(invoice_number) over w lag,
lead(invoice_number) over w - invoice_number lead
from invoices window w as (order by invoice_number)) x
where lag = 1 and lead > 1;

Related

SQL query with grouping and MAX

I have a table that looks like the following but also has more columns that are not needed for this instance.
ID DATE Random
-- -------- ---------
1 4/12/2015 2
2 4/15/2015 2
3 3/12/2015 2
4 9/16/2015 3
5 1/12/2015 3
6 2/12/2015 3
ID is the primary key
Random is a foreign key but i am not actually using table it points to.
I am trying to design a query that groups the results by Random and Date and select the MAX Date within the grouping then gives me the associated ID.
IF i do the following query
select top 100 ID, Random, MAX(Date) from DateBase group by Random, Date, ID
I get duplicate Randoms since ID is the primary key and will always be unique.
The results i need would look something like this
ID DATE Random
-- -------- ---------
2 4/15/2015 2
4 9/16/2015 3
Also another question is there could be times where there are many of the same date. What will MAX do in that case?
You can use NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE s.random = t.random
AND s.date > t.date)
This will select only those who doesn't have a bigger date for corresponding random value.
Can also be done using IN() :
SELECT * FROM YourTable t
WHERE (t.random,t.date) in (SELECT s.random,max(s.date)
FROM YourTable s
GROUP BY s.random)
Or with a join:
SELECT t.* FROM YourTable t
INNER JOIN (SELECT s.random,max(s.date) as max_date
FROM YourTable s
GROUP BY s.random) tt
ON(t.date = tt.max_date and s.random = t.random)
In SQL Server you could do something like the following,
select a.* from DateBase a inner join
(select Random,
MAX(dt) as dt from DateBase group by Random) as x
on a.dt =x.dt and a.random = x.random
This method will work in all versions of SQL as there are no vendor specifics (you'll need to format the dates using your vendor specific syntax)
You can do this in two stages:
The first step is to work out the max date for each random:
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
Now you can join back onto your table to get the max ID for each combination:
SELECT MAX(e.ID) AS ID
,e.DateField AS DateField
,e.Random
FROM Example AS e
INNER JOIN (
SELECT MAX(DateField) AS MaxDateField, Random
FROM Example
GROUP BY Random
) data
ON data.MaxDateField = e.DateField
AND data.Random = e.Random
GROUP BY DateField, Random
SQL Fiddle example here: SQL Fiddle
To answer your second question:
If there are multiples of the same date, the MAX(e.ID) will simply choose the highest number. If you want the lowest, you can use MIN(e.ID) instead.

MS Access equivalent for using dense_rank in select

In MS Access, I have a table with 2 million account records/rows with various columns of data. I wish to apply a sequence number to every account record. (i.e.- 1 for the first account record ABC111, 2 for the second account record DEF222..., etc.)
Then, I would like to assign a batch number sequence for every 5 distinct account number. (i.e - record 1 with account number ABC111 being associated with batch number 101, record 2 with account number DEF222 being associated with batch number of 101)
This is how I would do it with a sql server query:
select distinct(p.accountnumber),FLOOR(((50 + dense_rank() over(order by
p.accountnumber)) - 1)/5) + 100 As BATCH from
db2inst1.account_table p
Raw Data:
AccountNumber
ABC111
DEF222
GHI333
JKL444
MNO555
PQR666
STU777
Resulting Data:
RecordNumber AccountNumber BatchNumber
1 ABC111 101
2 DEF222 101
3 GHI333 101
4 JKL444 101
5 MNO555 101
6 PQR666 102
7 STU777 102
I tried to make a query that uses SELECT as well as DENSE_RANK but I couldn't figure out how to make it work.
Thanks for reading my question
Something like this would probably work.
I'd first create a temporary table to hold the distinct account numbers, then I'd do an update query to assign the ranking.
CREATE TABLE tmpAccountRank
(AccountNumber TEXT(10)
CONSTRAINT PrimaryKey PRIMARY KEY,
AccountRank INTEGER NULL);
Then I'd use this table to generate the account ranking.
DELETE FROM tmpAccountRank;
INSERT INTO tmpAccountRank(AccountNumber)
SELECT DISTINCT AccountNumber FROM db2inst1.account_table;
UPDATE tmpAccountRank
SET AccountRank =
DCOUNT('AccountNumber', 'tmpAccountRank',
'AccountNumber < ''' + AccountNumber + '''') \ 5 + 101
I use DCOUNT and integer division (\ 5) to generate the ranking. This probably will have terrible performance but I think it's the way you would do it in MS Access.
If you want to skip the temp table, you can do it all in a nested subquery, but I don't think it's a great practice to do too much in a single query, especially in MS Access.
SELECT AccountNumber,
(SELECT COUNT(*) FROM
(SELECT DISTINCT AccountNumber
FROM db2inst1.account_table
WHERE AccountNumber < t.AccountNumber) q)) \ 5 + 101
FROM db2inst1.account_table t
Actually, this won't work in MS Access; apparently you can't reference tables outside of multiple levels of nesting in a subquery.
You can do dense_rank() with a correlated subquery. The logic is:
select a.*,
(select count(distinct a2.accountnumber)
from db2inst1.account_table as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;
Then, you can use this for getting the batch number. Unfortunately, I don't follow the logic in your question (dense_rank() produces a number but your batch number is not numeric). However, this should answer your question.
EDIT:
Oh, that's right. In MS Access you need nested subqueries:
select a.*,
(select count(*)
from (select distinct a2.accountnumber
from db2inst1.account_table as a2
) as a2
where a2.accountnumber <= a.accountnumber
) as dense_rank
from db2inst1.account_table as a;

Joining next Sequential Row

I am planing an SQL Statement right now and would need someone to look over my thougts.
This is my Table:
id stat period
--- ------- --------
1 10 1/1/2008
2 25 2/1/2008
3 5 3/1/2008
4 15 4/1/2008
5 30 5/1/2008
6 9 6/1/2008
7 22 7/1/2008
8 29 8/1/2008
Create Table
CREATE TABLE tbstats
(
id INT IDENTITY(1, 1) PRIMARY KEY,
stat INT NOT NULL,
period DATETIME NOT NULL
)
go
INSERT INTO tbstats
(stat,period)
SELECT 10,CONVERT(DATETIME, '20080101')
UNION ALL
SELECT 25,CONVERT(DATETIME, '20080102')
UNION ALL
SELECT 5,CONVERT(DATETIME, '20080103')
UNION ALL
SELECT 15,CONVERT(DATETIME, '20080104')
UNION ALL
SELECT 30,CONVERT(DATETIME, '20080105')
UNION ALL
SELECT 9,CONVERT(DATETIME, '20080106')
UNION ALL
SELECT 22,CONVERT(DATETIME, '20080107')
UNION ALL
SELECT 29,CONVERT(DATETIME, '20080108')
go
I want to calculate the difference between each statistic and the next, and then calculate the mean value of the 'gaps.'
Thougts:
I need to join each record with it's subsequent row. I can do that using the ever flexible joining syntax, thanks to the fact that I know the id field is an integer sequence with no gaps.
By aliasing the table I could incorporate it into the SQL query twice, then join them together in a staggered fashion by adding 1 to the id of the first aliased table. The first record in the table has an id of 1. 1 + 1 = 2 so it should join on the row with id of 2 in the second aliased table. And so on.
Now I would simply subtract one from the other.
Then I would use the ABS function to ensure that I always get positive integers as a result of the subtraction regardless of which side of the expression is the higher figure.
Is there an easier way to achieve what I want?
The lead analytic function should do the trick:
SELECT period, stat, stat - LEAD(stat) OVER (ORDER BY period) AS gap
FROM tbstats
The average value of the gaps can be done by calculating the difference between the first value and the last value and dividing by one less than the number of elements:
select sum(case when seqnum = num then stat else - stat end) / (max(num) - 1);
from (select period, row_number() over (order by period) as seqnum,
count(*) over () as num
from tbstats
) t
where seqnum = num or seqnum = 1;
Of course, you can also do the calculation using lead(), but this will also work in SQL Server 2005 and 2008.
By using Join also you achieve this
SELECT t1.period,
t1.stat,
t1.stat - t2.stat gap
FROM #tbstats t1
LEFT JOIN #tbstats t2
ON t1.id + 1 = t2.id
To calculate the difference between each statistic and the next, LEAD() and LAG() may be the simplest option. You provide an ORDER BY, and LEAD(something) returns the next something and LAG(something) returns the previous something in the given order.
select
x.id thisStatId,
LAG(x.id) OVER (ORDER BY x.id) lastStatId,
x.stat thisStatValue,
LAG(x.stat) OVER (ORDER BY x.id) lastStatValue,
x.stat - LAG(x.stat) OVER (ORDER BY x.id) diff
from tbStats x

How to SELECT top N rows that sum to a certain amount?

Suppose:
MyTable
--
Amount
1
2
3
4
5
MyTable only has one column, Amount, with 5 rows. They are not necessarily in increasing order.
How can I create a function, which takes a #SUM INT, and returns the TOP N rows that sum to this amount?
So for input 6, I want
Amount
1
2
3
Since 1 + 2 + 3 = 6. 2 + 4 / 1 + 5 won't work since I want TOP N ROWS
For 7/8/9/10, I want
Amount
1
2
3
4
I'm using MS SQL Server 2008 R2, if this matters.
Saying "top N rows" is indeed ambiguous when it comes to relational databases.
I assume that you want to order by "amount" ascending.
I would add a second column (to a table or view) like "sum_up_to_here", and create something like that:
create view mytable_view as
select
mt1.amount,
sum(mt2.amount) as sum_up_to_here
from
mytable mt1
left join mytable mt2 on (mt2.amount < mt1.amount)
group by mt1.amount
or:
create view mytable_view as
select
mt1.amount,
(select sum(amount) from mytable where amount < mt1.amount)
from mytable mt1
and then I would select the final rows:
select amount from mytable_view where sum_up_to_here < (some value)
If you don't bother about performance you may of course run it in one query:
select amount from
(
select
mt1.amount,
sum(mt2.amount) as sum_up_to_here
from
mytable mt1
left join mytable mt2 on (mt2.amount < mt1.amount)
group by mt1.amount
) t where sum_up_to_here < 20
One approach:
select t1.amount
from MyTable t1
left join MyTable t2 on t1.amount > t2.amount
group by t1.amount
having coalesce(sum(t2.amount),0) < 7
SQLFiddle here.
In Sql Server you can use CDEs to make it pretty simple to read.
Here is a CDE I did to sum up totals used in sequence. The CDE is similar to the joins above, and holds the total up to any given index. Outside of the CDE I join it back to the original table so I can select it along with other fields.
;with summrp as (
select m1.idx, sum(m2.QtyReq) as sumUsed
from #mrpe m1
join #mrpe m2 on m2.idx <= m1.idx
group by m1.idx
)
select RefNum, RefLineSuf, QtyReq, ProjectedDate, sumUsed from #mrpe m
join summrp on summrp.idx=m.idx
In SQL Server 2012 you can use this shortcut to get a result like Grzegorz's.
SELECT amount
FROM (
SELECT * ,
SUM(amount) OVER (ORDER BY amount ASC) AS total
from demo
) T
WHERE total <= 6
A fiddle in the hand... http://sqlfiddle.com/#!6/b8506/6

MySql Join with Sum

I have a table called RESULTS with this structure :
resultid,winner,type
And a table called TICKETS with this structure :
resultid,ticketid,bet,sum_won,status
And I want to show each row from table RESULTS and for each result I want to calculate the totalBet and Sum_won using the values from table TICKETS
I tried to make some joins,some sums,but I cant get what I want.
SELECT *,COALESCE(SUM(tickets.bet),0) AS totalbets,
COALESCE(SUM(tickets.sum_won),0) AS totalwins
FROM `results` NATURAL JOIN `tickets`
WHERE tickets.status<>0
GROUP BY resultid
Please give me some advice.
I want to display something like this
RESULT WINNER TOTALBETS TOTALWINS
1 2 431 222
2 3 0 0
3 1 23 0
4 1 324 111
Use:
SELECT r.*,
COALESCE(x.totalbet, 0) AS totalbet,
COALESCE(x.totalwins, 0) AS totalwins
FROM RESULTS r
LEFT JOIN (SELECT t.resultid,
SUM(t.bet) AS totalbet,
SUM(t.sum_won) AS totalwins
FROM TICKETS t
WHERE t.status != 0
GROUP BY t.resultid) x ON x.resultid = r.resultid
I don't care for the NATURAL JOIN syntax, preferring to be explicit about how to JOIN/link tables together.
SELECT *, COALESCE(SUM(tickets.bet),0) AS totalbets,
COALESCE(SUM(tickets.sum_won),0) AS totalwins
FROM `results` NATURAL JOIN `tickets`
WHERE tickets.status<>0
GROUP BY resultid
Try to replace the first * with resultid. If this helps, then add more columns to SELECT and add them to GROUP BY at the same time.