Return all the second highest valued rows SQL - sql

Let's say I have a table called bookings, containing 3 columns: hotelid, timestampOfBookingand numberOfGuests.
How do I return all the dates on which the second highest beds were booked (the amount of beds booked is the same as the number of guests).
In other words, I'm looking for the dates on which the second maximum number of numberOfGuestsoccur. This means that in the event of a tie (where there is more than 1 date on which the described condition applies), it should return all those dates. In the event that all the dates have exactly the same numberOfGuests the query should return nothing.
If possible, I would only like to have one column in the query result that contains those specific dates.
Example:
hotelid timestampOfBooking numberOfGuests
11 22/11/2021 2
34 23/11/2021 2
30 23/11/2021 5
19 24/11/2021 7
8 25/11/2021 12
34 25/11/2021 5
In this case two dates should be in the result: 23/11/2021 and 24/11/2021 as they both had 7 numberOfGuests. The max numberOfGuests here is 17 (occurs on 25/11/2021) and 7 is the second highest, explaining why 23/11/2021 (2 + 5) and 24/11/2021 (7) are returned. The final result should look like this:
dates
23/11/2021
24/11/2021

Method 1:
You can use DENSE_RANK() with SUM(numberOfGuests) IN DESC:
SELECT timestampOfBooking, total_beds FROM
(
select timestampOfBooking,
sum(numberOfGuests) as total_beds,
dense_rank() over (order by sum(numberOfGuests) DESC) as rnk
from bookings
group by timestampOfBooking
) as sq
where rnk = 2
Method 2:
Using OFFSET and LIMIT:
SELECT timestampOfBooking,
SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
HAVING sum(numberOfGuests)=
(
SELECT distinct SUM(numberOfGuests) AS total_beds
FROM bookings
GROUP BY timestampOfBooking
ORDER BY total_beds DESC
OFFSET 1 LIMIT 1
);
Both the methods will give you same output.
Working fiddle

Related

Calculate the first and second most maximum value at every row and the average of both snowflake SQL

I have a table with the following schema:
uid
visit name
visit date
sales quantity
xyz
visit 1
2020-01-01
29
xyz
visit 2
2020-01-03
250
xyz
visit 3
2020-01-04
20
xyz
visit 4
2020-01-27
21
abc
visit 1
2020-02-01
29
abc
visit 2
2020-03-03
34
abc
visit 3
2020-04-04
35
abc
visit 4
2020-04-27
41
base table sales
Each unique id has a few unique visits that repeat for every unique id, at every visit I have to calculate what the two most highest sales quantity is per user- across their prior visits(ascending order) up until the current visit named in the row for each unique id and excluding the current row.
output would be- the same table plus these columns
max sale
2nd max sale
avg of both max sales
output table
I have used window functions for the maximum value, but I am struggling to get the second highest value of sales for every user for every row. Is this doable using sql? If so what would the script look like?
Update: I re-wrote my answer, because the previous one ignored certain requirements.
To keep track of the 2 previous top values, you can write a UDTF in JS to hold that ranking:
create or replace function udtf_top2_before(points float)
returns table (output_col array)
language javascript
as $$
{
processRow: function f(row, rowWriter, context){
rowWriter.writeRow({OUTPUT_COL: this.prevmax.slice().reverse()});
this.prevmax.push(row.POINTS);
// silly js sort https://stackoverflow.com/a/21595293/132438
this.prevmax = this.prevmax.sort(function (a, b) {return a - b;}).slice(-2);
}
, initialize: function(argumentInfo, context) {
this.prevmax = [];
}
}
$$;
Then that tabular UDF can will give you the numbers as expected:
with data as (
select v:author::string author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
from reddit_comments_sample
where v:subreddit = 'wallstreetbets'
)
select author, score, ts
, output_col[0] prev_max
, output_col[1] prev_max2
, (prev_max+ifnull(prev_max2,prev_max))/2 avg
from (
select author, score, ts, output_col
from data, table(udtf_top2_before(score::float) over(partition by author order by ts))
order by author, ts
limit 100
)
UDTF based on my previous post:
https://towardsdatascience.com/sql-puzzle-optimization-the-udtf-approach-for-a-decay-function-4b4b3cdc8596
Previously:
You can use row_number() over() to select the top 2, and then pivot with an array_agg():
with data as (
select v:author author, v:score::int score, v:subreddit, v:created_utc::timestamp ts
from reddit_comments_sample
where v:subreddit = 'wallstreetbets'
)
select author, arr[0] max_score, arr[1] max_score_2, (max_score+ifnull(max_score_2,max_score))/2 avg
from (
select author
, array_agg(score) within group (order by score::int desc) arr
from (
select author, score, ts
from data
qualify row_number() over(partition by author order by score desc) <= 2
)
group by 1
)
order by 4 desc

How to select rows which has cumulative sum of column value min to given value

I need to fetch data from PostgreSQL, where I need to select rows on below condition.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021
2 1 12 27-Jun-2021
3 1 32 26-Jun-2021
4 1 52 25-Jun-2021
Need to get all rows [sum of total_quantity column value] matching the given value in a query and type. If I give value as 24 and type as 1, then I need to get all rows [cumulative value of total_quantity value] <= 24 and also get next immediate row which is greater than the given value, rest of rows need to be ignored. row[s] are fetched through Order by created_dttm desc
so I need to get only three rows.. for given value 24 and for type = 1.
id type total_quantity created_dttm [desc]
1 1 10 30-Jun-2021 [10 less than 24 ] fetch row
2 1 12 27-Jun-2021 [22 (sum of current row &previous) less than 24]fetch row
3 1 32 26-Jun-2021 [54 [10+12+32]greater than 24] when greater than reached;
then fetch this row only
4 1 52 25-Jun-2021 [query should not fetch this row, since max reached # id 3]
I tried sum of two columns, but this will not work, since I am looking for rows between a value range, and with condition to select all rows less than given value + select next max value of given value.. for the given type...
We can use SUM here as an analytic function:
WITH cte AS (
SELECT *, SUM(total_quantity) OVER (ORDER BY created_dttm DESC)
- total_quantity AS tq_sum
FROM yourTable
)
SELECT id, type, total_quantity, created_dttm
FROM cte
WHERE tq_sum < 24;
Demo
The above trick (in the CTE) works by sparing the current row's total quantity from the running total. So the first row to exceed the threshhold of 24 would also be included, because its total quantity would be excluded from the running total.

Loop with lead or lag through a table to create a subscription serie in T-SQL

I want to create a table with subscription series out of a table with separate subscriptions in T-SQL. A subscription series contains all the subscriptions of 1 relation_number that follow within 14 days (Difference between startdate of former subscription of the same relation is smaller than or equal to 14 days). The values in this column can for example comma separated.
I already created a column that indicates a switch_out (subscription within 14 days after this subscription) and switch_in (subscription within 14 days before this subscription) in the subscription dataset. However I did not succeed to create one of the last two columns of the table below. I thought about using a loop (with LEAD or LAG) for this, however I don't know how to do this. In the example below are only subscription series with a maximum length of two subscriptions, however this is just a simple example, subscription series with more for example 10 or 20 are also possible.
When I have one of these columns I could use these in combination with relation_number to group by to get the subscription series table, I would like to have.
Can someone help me to create one of these columns or does someone know a better way to create the table with subscription series?
Subscription table (last two columns do not exist yet):
Relation_number
Subscription_number
Startdate
Stopdate
Switch_in
Switch_out
Subscription_serie
Subscription_serie2
1
3
1-1-2020
31-12-2020
1
1
1
1
5
1-1-2021
1-6-2021
1
1
1
1
1
1-1-2022
2
2
2
4
1-1-2019
31-12-2019
1
1
3
2
7
1-1-2020
31-12-2020
1
1
3
3
6
1-1-2021
1-6-2021
1
4
3
2
1-1-2022
2
5
Subscription table I eventually would like to create:
Relation_number
Subscription_serie
1
3,5
1
1
2
4,7
3
6
3
2
You can combine the values of rows in a column with the STRING_AGG function, and to do this you must group the values by column or columns.
The answer to your question is to group the values by (Relation_number, Subscription_serie) and combine the values in the Subscription_number column with the stringaddd function
SELECT Relation_number,STRING_AGG(Subscription_number,',') AS Subscription_serie
FROM Test
GROUP BY Relation_number,Subscription_serie
ORDER BY Relation_number,Subscription_serie desc
for Subscription_serie2
SELECT Relation_number,STRING_AGG(Subscription_number,',') AS Subscription_serie2
FROM Test
GROUP BY Relation_number,Subscription_serie2
ORDER BY Relation_number,Subscription_serie2 desc
I fetched the following table with lag, lead, and any order other than this is almost impossible in my opinion.
select Relation_number,Subscription_serie,Subscription_serie2
from (select *,Lag(Subscription_serie,1) over(order by Relation_number) as prevcol1,Lag(Subscription_serie2,1) over(order by Relation_number) as prevcol2 from
(select *,
case Switch_in when 1 then concat(prevcol,',',Subscription_number) else cast(row_number() over(partition by Relation_number order by Relation_number) as nvarchar) end as Subscription_serie,
case Switch_out when 1 then concat(nextcol,',',Subscription_number) else cast(row_number() over(partition by Relation_number order by Relation_number) as nvarchar) end as Subscription_serie2
from (select Relation_number,Subscription_number,Switch_in,Switch_out,Lag(Subscription_number,1) over(order by Relation_number) as prevcol,lead(Subscription_number,1) over(order by Relation_number) as nextcol
FROM Test) t) t) t

sum with a specific condition in select

I have a number for example: 1000 (1)
I have a query that returns different number without any order (2). for example: 100,300,1000,400,500,600
I want to write a query (not a loop) that sum my numbers in (2) till the sum be in the range of (1000-300 , 1000+ 300) -> (700,1300)
for example : 100+300+400 could be an answer or 400+500 or ...
P.S : the first order of numbers that is in that range is an answer.
Not sure if I understood your question fully, but you may be able to achieve this using the windowing clause of analytic functions.
I created a sample table number_list with the values you'd provided. Assuming (2) to be the output from below query ..
SQL> select * from number_list;
VALUE
----------
100
300
1000
400
500
600
6 rows selected.
.. you now need the first list of numbers who's sum falls within a certain range i.e. (1000 - 300) and (1000 + 300) ..
SQL> with sorted_list as
2 (
3 select rownum rnum, value from
4 ( select value from number_list order by value ) -- sort values ascending
5 )
6 select value from sorted_list where rnum <= (
7 select min(rnum) from ( -- determine first value from sorted list to fall in the specified range
8 select rownum rnum, value,
9 sum(value) over ( order by null
10 rows between
11 unbounded preceding -- indicate that the window starts at the first row
12 and current row -- indicate that the window ends at the current row
13 ) sum
14 from sorted_list
15 ) where sum between (1000-300) and (1000+300)
16 );
VALUE
----------
100
300
400

Sqlite: Selecting records spread over total records

I have a sql / sqlite question. I need to write a query that select some values from a sqlite database table. I always want the maximal returned records to be 20. If the total selected records are more than 20 I need to select 20 records that are spread evenly (no random) over the total records. It is also important that I always select the first and last value from the table when sorted on the date. These records should be inserted first and last in the result.
I know how to accomplish this in code but it would be perfect to have a sqlite query that can do the same.
The query Im using now is really simple and looks like this:
"SELECT value,date,valueid FROM tblvalue WHERE tblvalue.deleted=0 ORDER BY DATE(date)"
If I for example have these records in the talbe and to make an easier example the maximum result I want is 5.
id value date
1 10 2010-04-10
2 8 2010-04-11
3 8 2010-04-13
4 9 2010-04-15
5 10 2010-04-16
6 9 2010-04-17
7 8 2010-04-18
8 11 2010-04-19
9 9 2010-04-20
10 10 2010-04-24
The result I would like is spread evenly like this:
id value date
1 10 2010-04-10
3 8 2010-04-13
5 10 2010-04-16
7 8 2010-04-18
10 10 2010-04-24
Hope that explain what I want, thanks!
Something like this should work for you:
SELECT *
FROM (
SELECT v.value, v.date, v.valueid
FROM tblvalue v
LEFT OUTER JOIN (
SELECT min(DATE(date)) as MinDate, max(DATE(date)) as MaxDate
FROM tblvalue
WHERE tblvalue.deleted = 0
) vm on DATE(v.date) = vm.MinDate or DATE(v.date) = vm.MaxDate
WHERE tblvalue.deleted = 0
ORDER BY vm.MinDate desc, Random()
LIMIT 20
) a
ORDER BY DATE(date)
I think you want this:
SELECT value,date,valueid FROM tblvalue WHERE tblvalue.deleted=0
ORDER BY DATE(date), Random()
LIMIT 20
In other words you want select rows with date column, so that date is from the sorted list of dates, from where we take every odd element? And add the last recorded element (with the latest date)? And everything limited to max 20 rows?
If that's the case, then I think this one should do:
SELECT id,value,date FROM source_table WHERE date IN (SELECT date FROM source_table WHERE (rowid-1) % 2 = 0 OR date = (SELECT max(date) FROM source_table) ORDER BY date) LIMIT 20