select max value in a group of consecutive values - sql

How do you do to retrieve only the max value of a group with only consecutive values?
I have a telephone database with only unique values and I want to get only the highest number of each telephone number group TelNr and I am struggling.
id | TeNr | Position
1 | 100 | SLMO2.1.3
2 | 101 | SLMO2.3.4
3 | 103 | SLMO2.4.1
4 | 104 | SLMO2.3.2
5 | 200 | SLMO2.5.1
6 | 201 | SLMO2.5.2
7 | 204 | SLMO2.5.5
8 | 300 | SLMO2.3.5
9 | 301 | SLMO2.6.2
10 | 401 | SLMO2.4.8
Result should be:
TelNr
101
104
201
204
301
401
I have tried almost every tip I could find so far and whether I get all TelNr or no number at all which is useless in my case.
Any brilliant idea to run this with SQLITE?

So you're searching for gaps and want to get the first value of those gaps.
This is probably the best way to get them, try to check for a row with the current TeNr plus 1 and if there's none you found it:
select t1.TeNr, t1.TeNr + 1 as unused_TeNr
from tab as t1
left join Tab as t2
on t2.TeNr = t1.TeNr + 1
where t2.TeNr is null
Edit:
To get the range of missing values you need to use some old-style SQL as SQLite doesn't seem to support ROW_NUMBER, etc.
select
TeNr + 1 as RangeStart,
nextTeNr - 1 as RangeEnd,
nextTeNr - TeNr - 1 as cnt
from
(
select TeNr,
( select min(TeNr) from tab as t2
where t2.TeNr > t1.TeNr ) as nextTeNr
from tab as t1
) as dt
where nextTeNr > TeNr + 1
It's probably not very efficient, but might be ok if the number of rows is small and/or there's a index on TeNr.
Getting each value in the gap as a row in your result set is very hard, if your version of SQLite supports recursive queries:
with recursive cte (TeNr, missing, maxTeNr) as
(
select
min(TeNr) as TeNr, -- start of range of existing numbers
0 as missing, -- 0 = TeNr exists, 1 = TeNr is missing
max(TeNr) as maxTeNr -- end of range of existing numbers
from tab
union all
select
cte.TeNr + 1, -- next TeNr, if it doesn't exists tab.TeNr will be NULL
case when tab.TeNr is not null then 0 else 1 end,
maxTeNr
from cte left join tab
on tab.TeNr = cte.TeNr + 1
where cte.TeNr + 1 < maxTeNr
)
select TeNr
from cte
where missing = 1
Depending on your data this might return a huge amount of rows.
You might also use the result of the previous RangeStart/RangeEnd query as input to this recursion.

Related

Split a quantity into multiple rows with limit on quantity per row

I have a table of ids and quantities that looks like this:
dbo.Quantity
id | qty
-------
1 | 3
2 | 6
I would like to split the quantity column into multiple lines and number them, but with a set limit (which can be arbitrary) on the maximum quantity allowed for each row.
So for the value of 2, expected output should be:
dbo.DesiredResult
id | qty | bucket
---------------
1 | 2 | 1
1 | 1 | 2
2 | 1 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
In other words,
Running SELECT id, SUM(qty) as qty FROM dbo.DesiredResult should return the original table (dbo.Quantity).
Running
SELECT id, SUM(qty) as qty FROM dbo.DesiredResult GROUP BY bucket
should give you this table.
id | qty | bucket
------------------
1 | 2 | 1
1 | 2 | 2
2 | 2 | 3
2 | 2 | 4
2 | 1 | 5
I feel I can do this with cursors imperitavely, looping over each row, keeping a counter that increments and resets as the "max" for each is filled. But this is very "anti-SQL" I feel there is a better way around this.
One approach is recursive CTE which emulates cursor sequentially going through rows.
Another approach that comes to mind is to represent your data as intervals and intersections of intervals.
Represent this:
id | qty
-------
1 | 3
2 | 6
as intervals [0;3), [3;9) with ids being their labels
0123456789
|--|-----|
1 2 - id
It is easy to generate this set of intervals using running total SUM() OVER().
Represent your buckets also as intervals [0;2), [2;4), [4;6), etc. with their own labels
0123456789
|-|-|-|-|-|
1 2 3 4 5 - bucket
It is easy to generate this set of intervals using a table of numbers.
Intersect these two sets of intervals preserving information about their labels.
Working with sets should be possible in a set-based SQL query, rather than a sequential cursor or recursion.
It is bit too much for me to write down the actual query right now. But, it is quite possible that ideas similar to those discussed in Packing Intervals by Itzik Ben-Gan may be useful here.
Actually, once you have your quantities represented as intervals you can generate required number of rows/buckets on the fly from the table of numbers using CROSS APPLY.
Imagine we transformed your Quantity table into Intervals:
Start | End | ID
0 | 3 | 1
3 | 9 | 2
And we also have a table of numbers - a table Numbers with column Number with values from 0 to, say, 100K.
For each Start and End of the interval we can calculate the corresponding bucket number by dividing the value by the bucket size and rounding down or up.
Something along these lines:
SELECT
Intervals.ID
,A.qty
,A.Bucket
FROM
Intervals
CROSS APPLY
(
SELECT
Numbers.Number + 1 AS Bucket
,#BucketSize AS qty
-- it is equal to #BucketSize if the bucket is completely within the Start and End boundaries
-- it should be adjusted for the first and last buckets of the interval
FROM Numbers
WHERE
Numbers.Number >= Start / #BucketSize
AND Numbers.Number < End / #BucketSize + 1
) AS A
;
You'll need to check and adjust formulas for errors +-1.
And write some CASE WHEN logic for calculating the correct qty for the buckets that happen to be on the lower and upper boundary of the interval.
Use a recursive CTE:
with cte as (
select id, 1 as n, qty
from t
union all
select id, n + 1, qty
from cte
where n + 1 < qty
)
select id, n
from cte;
Here is a db<>fiddle.

Reference the output of a calculated column in Hive SQL

I have a self-referencing/recursive calculation in Excel that needs to be moved to Hive SQL. Basically the column needs to SUM the two values only if the total of the concrete column plus the result from the previous calculation is greater than 0.
The data is as follows, A is the value and B is the expected output:
| A | B |
|-----|-----|
| -1 | 0 |
| 2 | 2 |
| -2 | 0 |
| 2 | 2 |
| 2 | 4 |
| -1 | 3 |
| 2 | 5 |
In Excel it would be written in column B as:
=MAX(0,B1+A2)
The problem in SQL is you need to have the output of the current calculation. I think I've got it sorted in SQL as the following:
DECLARE #Numbers TABLE(A INT, Rn INT)
INSERT INTO #Numbers VALUES (-1,1),(2,2),(-2,3),(2,4),(2,5),(-1,6),(2,7);
WITH lagged AS
(
SELECT A, 0 AS B, Rn
FROM #Numbers
WHERE Rn = 1
UNION ALL
SELECT i.A,
CASE WHEN ((i.A + l.B) >= 0) THEN (i.A + l.B)
ELSE l.B
END,
i.Rn
FROM #Numbers i INNER JOIN lagged l
ON i.Rn = l.Rn + 1
)
SELECT *
FROM lagged;
But this being Hive, it doesn't support CTEs so I need to dumb the SQL down a touch. Is that possible using LAG/LEAD? My brain is hurting having got this far!
I initially thought that it would help to first compute the Sum of all elements until each rank and then fix the values somehow using negative elements.
However, one big negative that would zero the B column will carry forward in the sum and will make all following elements negative.
It's as Gordon commented - 0 is max in the calculation =MAX(0,B1+A2) depends on the previous location where it happened and it seems to be impossible to compute them in advance analytically.

SQL Order random rows based on 2 columns

How to sort this table in Oracle9:
START | END | VALUE
A | F | 1
D | H | 9
F | C | 8
C | D | 12
To make it look like this?:
START | END | VALUE
A | F | 1
F | C | 12
C | D | 8
D | H | 9
Goal is to start every next row with the end from the previous row.
This cannot be done with the order by clause alone, as it would have to find the record without a predecessor first, then find the next record comparing end and start column of the two records etc. This is an iterative process for which you need a recursive query.
That recursive query would find the first record, then the next and so on, giving them sequence numbers. Then you'd use the result and order by those generated numbers.
Here is how to do it in standard SQL. This is supported from Oracle 11g onwards only, however. In Oracle 9 you'll have to use CONNECT BY with which I am not familiar. Hopefully you or someone else can convert the query for you:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where not exists (select * from mytable prev where prev.endkey = mytable.startkey)
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
select startkey, endkey, value
from chain
order by pos;
UPDATE: As you say the data is cyclic, you'd have to change above query so as to start with an arbitrarily chosen row and stop when through:
with chain(startkey, endkey, value, pos) as
(
select startkey, endkey, value, 1 as pos
from mytable
where rownum = 1
union all
select mytable.startkey, mytable.endkey, mytable.value, chain.pos + 1 as pos
from chain
join mytable on mytable.startkey = chain.endkey
)
cycle startkey set cycle to 1 default 0
select startkey, endkey, value
from chain
where cycle = 0
order by pos;

Joining series of dates and counting continous days

Let's say I have a table as below
date add_days
2015-01-01 5
2015-01-04 2
2015-01-11 7
2015-01-20 10
2015-01-30 1
what I want to do is to check the days_balance, i.e. if date is greater or smaller than previous date + N days (add_days) and take the cumulated sum of days count if they are a continuous series.
So the algorithm should work like
for i in 2:N_rows {
days_balance[i] := date[i-1] + add_days[i-1] - date[i]
if days_balance[i] >= 0 then
date[i] := date[i] + days_balance[i]
}
The expected result should be as follows
date days_balance
2015-01-01 0
2015-01-04 2
2015-01-11 -3
2015-01-20 -2
2015-01-30 0
Is it possible in pure SQL? I imagine it should be with some conditional joins, but cannot see how it could be implemented.
I'm posting another answer since it may be nice to compare them since they use different methods (this one just does a n^2 style join, other one used a recursive CTE). This one takes advantage of the fact that you don't have to calculate the days_balance for each previous row before calculating it for a particular row, you just need to sum things from previous days....
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
, combinedWithAllPreviousDaysCte as
(
select i [curr_i], date [curr_date], add_days [curr_add_days], days_since_prev [curr_days_since_prev], 0 [prev_add_days], 0 [prev_days_since_prev] from cte where i = 1 --get first row explicitly since it has no preceding rows
UNION ALL
select curr.i [curr_i], curr.date [curr_date], curr.add_days [curr_add_days], curr.days_since_prev [curr_days_since_prev], prev.add_days [prev_add_days], prev.days_since_prev [prev_days_since_prev]
from cte curr
join cte prev on curr.i > prev.i --join to all previous days
)
select curr_i, curr_date, SUM(prev_add_days) - curr_days_since_prev - SUM(prev_days_since_prev) [days_balance]
from combinedWithAllPreviousDaysCte
group by curr_i, curr_date, curr_days_since_prev
order by curr_i
outputs:
+--------+-------------------------+--------------+
| curr_i | curr_date | days_balance |
+--------+-------------------------+--------------+
| 1 | 2015-01-01 00:00:00.000 | 0 |
| 2 | 2015-01-04 00:00:00.000 | 2 |
| 3 | 2015-01-11 00:00:00.000 | -3 |
| 4 | 2015-01-20 00:00:00.000 | -5 |
| 5 | 2015-01-30 00:00:00.000 | -5 |
+--------+-------------------------+--------------+
Well, I think I have it with a recursive CTE (sorry, I only have Microsoft SQL Server available to me at the moment, so it may not comply with PostgreSQL).
Also I think the expected results you had were off (see comment above). If not, this can probably be modified to conform to your math.
drop table junk
create table junk(date DATETIME, add_days int)
insert into junk values
('2015-01-01',5 ),
('2015-01-04',2 ),
('2015-01-11',7 ),
('2015-01-20',10 ),
('2015-01-30',1 )
;WITH cte as
(
select ROW_NUMBER() OVER (ORDER BY date) i, date, add_days, ISNULL(DATEDIFF(DAY, LAG(date) OVER (ORDER BY date), date), 0) days_since_prev
FROM Junk
)
,recursiveCte (i, date, add_days, days_since_prev, days_balance, math) as
(
select top 1
i,
date,
add_days,
days_since_prev,
0 [days_balance],
CAST('no math for initial one, just has zero balance' as varchar(max)) [math]
from cte where i = 1
UNION ALL --recursive step now
select
curr.i,
curr.date,
curr.add_days,
curr.days_since_prev,
prev.days_balance - curr.days_since_prev + prev.add_days [days_balance],
CAST(prev.days_balance as varchar(max)) + ' - ' + CAST(curr.days_since_prev as varchar(max)) + ' + ' + CAST(prev.add_days as varchar(max)) [math]
from cte curr
JOIN recursiveCte prev ON curr.i = prev.i + 1
)
select i, DATEPART(day,date) [day], add_days, days_since_prev, days_balance, math
from recursiveCTE
order by date
And the results are like so:
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| i | day | add_days | days_since_prev | days_balance | math |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
| 1 | 1 | 5 | 0 | 0 | no math for initial one, just has zero balance |
| 2 | 4 | 2 | 3 | 2 | 0 - 3 + 5 |
| 3 | 11 | 7 | 7 | -3 | 2 - 7 + 2 |
| 4 | 20 | 10 | 9 | -5 | -3 - 9 + 7 |
| 5 | 30 | 1 | 10 | -5 | -5 - 10 + 10 |
+---+-----+----------+-----------------+--------------+------------------------------------------------+
I don’t quite get how your algorithm returns your expected results? But let me share a technique I came up with that might help.
This will only work if the end result of your data is to be exported to Excel, and even then it won’t work in all scenarios depending on what format you export your dataset in, but here it is....
If you’ll familiar with Excel Formulas, what I discovered is that if you write an Excel formula in your SQL as another field, it will execute that formula for you as soon as you export to excel (best method that works for me is just coping and pasting it into Excel, so that it doesn’t format it as text)
So for your example, here’s what you could do (noting again I don’t understand your algorithm, so this is probably wrong, but it’s just to give you the concept)
SELECT
date
, add_days
, '=INDEX($1:$65536,ROW()-1,COLUMN()-2)'
||'+INDEX($1:$65536,ROW()-1,COLUMN()-1)'
||'-INDEX($1:$65536,ROW(),COLUMN()-2)'
AS "days_balance[i]"
,'=IF(INDEX($1:$65536,ROW(),COLUMN()-1)>=0'
||',INDEX($1:$65536,ROW(),COLUMN()-3)'
||'+INDEX($1:$65536,ROW(),COLUMN()-1))'
AS "date[i]"
FROM
myTable
ORDER BY /*Ensure to order by whatever you need for your formula to work*/
The key part to making this work is using the INDEX formula function to select a cell based on the position of the current cell. So ROW()-1 tells it get me the result of the previous record, and COLUMN()-2 means take the value from two columns to the left of the current. Because you can't use cell references like A2+B2-A3 because the row numbers won't change on export, and it assumes the position of the columns.
I used SQL string concatenation with || just so it's easier to read on screen.
I tried this one in excel; it didn’t match your expected results. But if this technique works for you then just correct the excel formula to suit.

selecting only values having difference greater than 0 (no negatives and 0)

i've two columns in mysql db table votes : accept and reject
i want to query only the top result after the accept-reject column values
table : votes
=================
accept | reject
=================
7 | 9
5 | 1
2 | 15
5 | 1
i want just the positive and top value, here 5-1 = 4
actually, i've lot of other columns along with accept and reject..i just want to ORDER the results by the top value of difference(positive only) and LIMIT it to 1 so that i can get the one row i want..was i clear? :)
how to write query for this?
thanks..
Use:
SELECT t.accept,
t.reject,
t.accept - t.reject AS difference
FROM VOTES t
WHERE t.accept - t.reject > 0
ORDER BY difference DESC
LIMIT 1
Alternate using subquery:
SELECT v.accept,
v.reject
FROM VOTES v
WHERE v.accept - v.reject > 0
JOIN (SELECT MAX(t.accept - t.reject) AS difference
FROM VOTES t
WHERE t.accept - t.reject > 0) x ON x.difference = (v.accept - v.reject)
SELECT (accept - reject) as top_value FROM table WHERE (accept - reject) > 0
So if you have values like this:
=============================
accept | reject | top_value
=============================
5 | 1 | 4
7 | 9 | -2
2 | 15 | -13
5 | 5 | 0
the query will select only row 1.
I'm assuming those two columns are just an extract from your table, and you want the entire row where (accept-reject) is maximum. Then you can do it like this:
SELECT * FROM votes
WHERE accept-reject = (SELECT MAX(accept-reject) FROM votes)
LIMIT 1