sum with a specific condition in select - sql

I have a number for example: 1000 (1)
I have a query that returns different number without any order (2). for example: 100,300,1000,400,500,600
I want to write a query (not a loop) that sum my numbers in (2) till the sum be in the range of (1000-300 , 1000+ 300) -> (700,1300)
for example : 100+300+400 could be an answer or 400+500 or ...
P.S : the first order of numbers that is in that range is an answer.

Not sure if I understood your question fully, but you may be able to achieve this using the windowing clause of analytic functions.
I created a sample table number_list with the values you'd provided. Assuming (2) to be the output from below query ..
SQL> select * from number_list;
VALUE
----------
100
300
1000
400
500
600
6 rows selected.
.. you now need the first list of numbers who's sum falls within a certain range i.e. (1000 - 300) and (1000 + 300) ..
SQL> with sorted_list as
2 (
3 select rownum rnum, value from
4 ( select value from number_list order by value ) -- sort values ascending
5 )
6 select value from sorted_list where rnum <= (
7 select min(rnum) from ( -- determine first value from sorted list to fall in the specified range
8 select rownum rnum, value,
9 sum(value) over ( order by null
10 rows between
11 unbounded preceding -- indicate that the window starts at the first row
12 and current row -- indicate that the window ends at the current row
13 ) sum
14 from sorted_list
15 ) where sum between (1000-300) and (1000+300)
16 );
VALUE
----------
100
300
400

Related

Return all the second highest valued rows SQL

Let's say I have a table called bookings, containing 3 columns: hotelid, timestampOfBookingand numberOfGuests.
How do I return all the dates on which the second highest beds were booked (the amount of beds booked is the same as the number of guests).
In other words, I'm looking for the dates on which the second maximum number of numberOfGuestsoccur. This means that in the event of a tie (where there is more than 1 date on which the described condition applies), it should return all those dates. In the event that all the dates have exactly the same numberOfGuests the query should return nothing.
If possible, I would only like to have one column in the query result that contains those specific dates.
Example:
hotelid timestampOfBooking numberOfGuests
11 22/11/2021 2
34 23/11/2021 2
30 23/11/2021 5
19 24/11/2021 7
8 25/11/2021 12
34 25/11/2021 5
In this case two dates should be in the result: 23/11/2021 and 24/11/2021 as they both had 7 numberOfGuests. The max numberOfGuests here is 17 (occurs on 25/11/2021) and 7 is the second highest, explaining why 23/11/2021 (2 + 5) and 24/11/2021 (7) are returned. The final result should look like this:
dates
23/11/2021
24/11/2021
Method 1:
You can use DENSE_RANK() with SUM(numberOfGuests) IN DESC:
SELECT timestampOfBooking, total_beds FROM
(
select timestampOfBooking,
sum(numberOfGuests) as total_beds,
dense_rank() over (order by sum(numberOfGuests) DESC) as rnk
from bookings
group by timestampOfBooking
) as sq
where rnk = 2
Method 2:
Using OFFSET and LIMIT:
SELECT timestampOfBooking,
SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
HAVING sum(numberOfGuests)=
(
SELECT distinct SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
ORDER BY total_beds DESC
OFFSET 1 LIMIT 1
);
Both the methods will give you same output.
Working fiddle

How to select rows which has cumulative sum of column value min to given value

I need to fetch data from PostgreSQL, where I need to select rows on below condition.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021
2 1 12 27-Jun-2021
3 1 32 26-Jun-2021
4 1 52 25-Jun-2021
Need to get all rows [sum of total_quantity column value] matching the given value in a query and type. If I give value as 24 and type as 1, then I need to get all rows [cumulative value of total_quantity value] <= 24 and also get next immediate row which is greater than the given value, rest of rows need to be ignored. row[s] are fetched through Order by created_dttm desc
so I need to get only three rows.. for given value 24 and for type = 1.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021 [10 less than 24 ] fetch row
2 1 12 27-Jun-2021 [22 (sum of current row &previous) less than 24]fetch row
3 1 32 26-Jun-2021 [54 [10+12+32]greater than 24] when greater than reached;
then fetch this row only
4 1 52 25-Jun-2021 [query should not fetch this row, since max reached # id 3]
I tried sum of two columns, but this will not work, since I am looking for rows between a value range, and with condition to select all rows less than given value + select next max value of given value.. for the given type...
We can use SUM here as an analytic function:
WITH cte AS (
SELECT *, SUM(total_quantity) OVER (ORDER BY created_dttm DESC)
- total_quantity AS tq_sum
FROM yourTable
)
SELECT id, type, total_quantity, created_dttm
FROM cte
WHERE tq_sum < 24;
Demo
The above trick (in the CTE) works by sparing the current row's total quantity from the running total. So the first row to exceed the threshhold of 24 would also be included, because its total quantity would be excluded from the running total.

Keyset pagination with composite key

I am using oracle 12c database and I have a table with the following structure:
Id NUMBER
SeqNo NUMBER
Val NUMBER
Valid VARCHAR2
A composite primary key is created with the field Id and SeqNo.
I would like to fetch the data with Valid = 'Y' and apply ketset pagination with a page size of 3. Assume I have the following data:
Id SeqNo Val Valid
1 1 10 Y
1 2 20 N
1 3 30 Y
1 4 40 Y
1 5 50 Y
2 1 100 Y
2 2 200 Y
Expected result:
----------------------------
Page 1
----------------------------
Id SeqNo Val Valid
1 1 10 Y
1 3 30 Y
1 4 40 Y
----------------------------
Page 2
----------------------------
Id SeqNo Val Valid
1 5 50 Y
2 1 100 Y
2 2 200 Y
Offset pagination can be done like this:
SELECT * FROM table ORDER BY Id, SeqNo OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY;
However, in the actual db it has more than 5 millions of records and using OFFSET is going to slow down the query a lot. Therefore, I am looking for a ketset pagination approach (skip records using some unique fields instead of OFFSET)
Since a composite primary key is used, I need to offset the page with information from more than 1 field.
This is a sample SQL that should work in PostgreSQL (fetch 2nd page):
SELECT * FROM table WHERE (Id, SeqNo) > (1, 4) AND Valid = 'Y' ORDER BY Id, SeqNo LIMIT 3;
How do I achieve the same in oracle?
Use row_number() analytic function with ceil arithmetic fuction. Arithmetic functions don't have a negative impact on performance, and row_number() over (order by ...) expression automatically orders the data without considering the insertion order, and without adding an extra order by clause for the main query. So, consider :
select Id,SeqNo,
ceil(row_number() over (order by Id,SeqNo)/3) as page
from tab
where Valid = 'Y';
P.S. It also works for Oracle 11g, while OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY works only for Oracle 12c.
Demo
You can use order by and then fetch rows using fetch and offset like following:
Select ID, SEQ, VAL, VALID FROM TABLE
WHERE VALID = 'Y'
ORDER BY ID, SEQ
--FETCH FIRST 3 ROWS ONLY -- first page
--OFFSET 3 ROWS FETCH NEXT 3 ROWS ONLY -- second pages
--OFFSET 6 ROWS FETCH NEXT 3 ROWS ONLY -- third page
--Update--
You can use row_number analytical function as following.
Select id, seqNo, Val, valid from
(Select t.*,
Row_number(order by id, seq) as rn from table t
Where valid = 'Y')
Where ceil(rn/3) = 2 -- for page no. 2
Cheers!!

SQL random number that doesn't repeat within a group

Suppose I have a table:
HH SLOT RN
--------------
1 1 null
1 2 null
1 3 null
--------------
2 1 null
2 2 null
2 3 null
I want to set RN to be a random number between 1 and 10. It's ok for the number to repeat across the entire table, but it's bad to repeat the number within any given HH. E.g.,:
HH SLOT RN_GOOD RN_BAD
--------------------------
1 1 9 3
1 2 4 8
1 3 7 3 <--!!!
--------------------------
2 1 2 1
2 2 4 6
2 3 9 4
This is on Netezza if it makes any difference. This one's being a real headscratcher for me. Thanks in advance!
To get a random number between 1 and the number of rows in the hh, you can use:
select hh, slot, row_number() over (partition by hh order by random()) as rn
from t;
The larger range of values is a bit more challenging. The following calculates a table (called randoms) with numbers and a random position in the same range. It then uses slot to index into the position and pull the random number from the randoms table:
with nums as (
select 1 as n union all select 2 union all select 3 union all select 4 union all select 5 union all
select 6 union all select 7 union all select 8 union all select 9
),
randoms as (
select n, row_number() over (order by random()) as pos
from nums
)
select t.hh, t.slot, hnum.n
from (select hh, randoms.n, randoms.pos
from (select distinct hh
from t
) t cross join
randoms
) hnum join
t
on t.hh = hnum.hh and
t.slot = hnum.pos;
Here is a SQLFiddle that demonstrates this in Postgres, which I assume is close enough to Netezza to have matching syntax.
I am not an expert on SQL, but probably do something like this:
Initialize a counter CNT=1
Create a table such that you sample 1 row randomly from each group and a count of null RN, say C_NULL_RN.
With probability C_NULL_RN/(10-CNT+1) for each row, assign CNT as RN
Increment CNT and go to step 2
Well, I couldn't get a slick solution, so I did a hack:
Created a new integer field called rand_inst.
Assign a random number to each empty slot.
Update rand_inst to be the instance number of that random number within this household. E.g., if I get two 3's, then the second 3 will have rand_inst set to 2.
Update the table to assign a different random number anywhere that rand_inst>1.
Repeat assignment and update until we converge on a solution.
Here's what it looks like. Too lazy to anonymise it, so the names are a little different from my original post:
/* Iterative hack to fill 6 slots with a random number between 1 and 13.
A random number *must not* repeat within a household_id.
*/
update c3_lalfinal a
set a.rand_inst = b.rnum
from (
select household_id
,slot_nbr
,row_number() over (partition by household_id,rnd order by null) as rnum
from c3_lalfinal
) b
where a.household_id = b.household_id
and a.slot_nbr = b.slot_nbr
;
update c3_lalfinal
set rnd = CAST(0.5 + random() * (13-1+1) as INT)
where rand_inst>1
;
/* Repeat until this query returns 0: */
select count(*) from (
select household_id from c3_lalfinal group by 1 having count(distinct(rnd)) <> 6
) x
;

Grouping values based on sequence in SQL

Is there a way to create just using select statements a table that contains in a column the range of the repeating values like in this example?
Example:
from the input table:
id value:
1 25
2 25
3 24
4 25
5 25
6 25
7 28
8 28
9 11
should result:
range value
1-2 25
3-3 24
4-6 25
7-8 28
9-9 11
Note: The id values from the input table are always in order and the difference between 2 values ordered by id is always equal to 1
You want to find sequences of consecutive values. Here is one approach, using window functions:
select min(id), max(id), value
from (select id, value,
row_number() over (order by id) as rownum,
row_number() over (partition by value order by id) as seqnum
from t
) t
group by (rownum - seqnum), value;
This takes into account that a value might appear in different places among the rows. The idea is simple. rownum is a sequential number. seqnum is a sequential number that increments for a given value. The difference between these is constant for values that are in a row.
Let me add, if you actually want the expression as "1-2", we need more information. Assuming the id is a character string, one of the following would work, depending on the database:
select min(id)+'-'+max(id), . . .
select concat(min(id), '-', max(id)), . . .
select min(id)||'-'||max(id), . . .
If the id is an integer (as I suspect), then you need to replace the ids in the above expressions with cast(id as varchar(32)), except in Oracle, where you would use cast(id as varchar2(32)).
SELECT CONCAT(MIN(id), '-', MAX(id)) AS id_range, value
FROM input_table
GROUP BY value
Maybe this:
SELECT MIN(ID), MAX(ID), VALUE FROM TABLE GROUP BY VALUE