ORACLE SQL: select MAX value based on 2 different variables (group by function) [duplicate]

ORACLE SQL: select MAX value based on 2 different variables (group by function) [duplicate] - sql

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
SQL: getting the max value of one column and the corresponding other columns [duplicate]
(2 answers)
Closed 11 months ago.
I think my problem is quite simple but I don´t really get it :)
I'm using SQL Developer as IDE and have a large table which looks like this:
ID
technology
speed
1
3G
20
1
2G
10
1
4G
40
1
5G
100
2
3G
60
2
4G
90
2
5G
150
3
3G
30
3
4G
50
I need the max value of 'technology' for each 'ID' and also need the 'speed' in the result:
ID
technology
speed
1
5G
100
2
5G
150
3
4G
50
my SQL looks like that:
SELECT ID, MAX(technology) AS technology, speed
FROM "table"
GROUP BY ID, speed;
but with this SQL I get multiple selections for each ID
any ideas?

Since you're including speed in your select and group, and those values vary per row, your current query will basically return the full table just with the MAX(technology) for each row. This is because speed can't be grouped into a single value as they are all different.
ie.
ID technology speed
1 5G 20
1 5G 10
1 5G 40
1 5G 100
Based purely on your sample set, you could select the MAX(speed) since it always coincides with the MAX(technology), and then you would get the right results:
ID technology speed
1 5G 100
However, if the MAX(technology) ever has less than the MAX(speed), the above would become incorrect.
A better approach would be to use a window function because you would remove that potential flaw:
with cte as (
SELECT ID, technology, speed,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY technology DESC) RN
FROM table)
SELECT *
FROM cte
WHERE RN = 1
This assigns a number to each row, starting with number 1 for the row that has the MAX(technology) (ie. ORDER BY technology DESC), and does this for each ID (ie. PARTITION BY ID).
Therefore when we select only the rows that are assigned row number 1, we are getting the full row for each max technology / id combination.
One last note - if there are duplicate rows with the same ID and technology but with various speeds, this would pick one of them at random. You would need to further include an ORDER for speed in that case. Based on your sample set this doesn't happen, but just an fyi.

You can also use the keep keyword. It is handy in situations like this one since ordering by one column and outputting another column happens in one clause, no subquery or CTE is needed. The drawback is that it is Oracle proprietary syntax.
with t (ID, technology, speed) as (
select 1, '3G', 20 from dual union all
select 1, '2G', 10 from dual union all
select 1, '4G', 40 from dual union all
select 1, '5G', 100 from dual union all
select 2, '3G', 60 from dual union all
select 2, '4G', 90 from dual union all
select 2, '5G', 150 from dual union all
select 3, '3G', 30 from dual union all
select 3, '4G', 50 from dual
)
select id
, max(technology) keep (dense_rank first order by speed desc)
, max(speed)
from t
group by id
Db fiddle.

Related

How to limit number of groups returned in a query, but not the number of rows in Oracle

How to limit the number of groups in a query, but not the number of rows in Oracle?
If I had to do that manually, I would have to use a DISTINCT.
Would be something like this:
FOR d IN (
SELECT DISTINCT COLUMN_1 FROM myTable
WHERE myDate BETWEEN x AND y
OFFSET o ROWS
FETCH NEXT l ROWS ONLY
) LOOP
And then, do the selects from each of the ids returned in the query, which, in my opinion, is a terrible solution.
SAMPLE DATA:
If I limit the number of groups to 2 by using COLUMN_2, the expected result should be something like:

I believe you may be looking for something like this:
select *
from mytable
where id in (
select distinct id
from my_table
where my_date between x and y
fetch first :n rows only
)
;
:n is a bind variable, encoding the number of groups you want to select.
This should be more efficient than solutions using analytic functions - even if it must read the base table twice. In tests posted on OTN, I showed that the difference is not small.
EDIT If I remember correctly, FETCH is not implemented in the most efficient way (perhaps for good reasons, having to do with features we don't need in this query - such as how to deal with ties). FETCH itself resembles a DENSE_RANK() implementation rather than the faster row limiting clause (using ROWNUM). I would likely need to modify the query to do away with FETCH, if speed was really important. END EDIT
Further edit to do with performance comparisons
Frequent poster MT0 requested a pointer for the claim that aggregate solutions can (and often are) more efficient than analytic function approaches, even when the former may require multiple passes through the data where the analytic function approach requires only one.
Alas, OTN (what now calls itself the "Oracle Groundbreakers Developer Community", the discussion board hosted by Oracle itself) went through a massive - and massively botched - platform change at the end of September 2020; that messed up both the search facilities and the formatting of old posts, to the point of rendering them almost unusable.
Instead, I will show here a simple mock-up of the OP's problem in this thread; code that anyone can run so they can repeat the tests on their own machine.
I created a table with two columns, ID and STR - the ID plays the same role as in the OP's question, and STR is just extra payload to mimic real-life data. ID is number and STR is varchar2(100). I populated the table with 9 million rows - 1 million ID's, nine rows for each ID. The task is to select just three "groups" (three distinct ID's, then select all the rows from the base table for those three distinct ID's).
With no index on the ID column, the aggregate solution runs in 0.81 seconds on my machine; with an index on ID, it runs in 0.47 seconds. The analytic functions solution runs in 0.91 seconds, with or without an index (obviously - there is no way an index can benefit the analytic function solution). All these results are for column ID not declared NOT NULL.
Here is the code to create the table, the index on ID, and the two queries I tested. Note: As I explained in my first edit (above), fetch is slow; I replaced it with a standard row-limiting technique using ROWNUM in an over-query.
drop table t purge;
create table t (id number, str varchar2(100));
insert into t
with row_gen as (select level from dual connect by level <= 3000)
select mod(344227 * rownum, 1000000), rpad('x', 100, 'x')
from row_gen cross join row_gen
;
commit;
create index t_idx on t(id);
select *
from t
where id in (
select id from (select distinct id from t)
where rownum <= 3
);
select *
from ( select t.*, dense_rank() over (order by id) dr from t )
where dr <= 3;

You can use DENSE_RANK:
SELECT *
FROM (
SELECT t.*,
DENSE_RANK() OVER ( ORDER BY column2 ) AS rnk
FROM table_name t
)
WHERE rnk <= 2;
Which, for the sample data:
CREATE TABLE table_name ( column1, column2, column3, column4 ) AS
SELECT 1, 1, 1.0, 1.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.0 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.1 FROM DUAL UNION ALL
SELECT 2, 2, 2.2, 2.2 FROM DUAL UNION ALL
SELECT 2, 2, 2.0, 2.3 FROM DUAL UNION ALL
SELECT 3, 3, 3.0, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 3, 3, 3.1, 3.1 FROM DUAL UNION ALL
SELECT 4, 4, 4.2, 4.0 FROM DUAL;
Outputs:
COLUMN1 | COLUMN2 | COLUMN3 | COLUMN4 | RNK
------: | ------: | ------: | ------: | --:
1 | 1 | 1 | 1 | 1
2 | 2 | 2 | 2 | 2
2 | 2 | 2.2 | 2.1 | 2
2 | 2 | 2.2 | 2.2 | 2
2 | 2 | 2 | 2.3 | 2
(and, if you want DISTINCT rows then add DISTINCT to the outer query)
db<>fiddle here

If I understand correctly, you want ROW_NUMBER():
SELECT t.*
FROM (SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) as seqnum
FROM myTable t
WHERE t.myDate BETWEEN x AND y
) t
WHERE seqnum = 1;
This returns an arbitrary row for each id meeting the conditions.

Execute both the query and the COUNT of the query

I'm trying to build a query or a PL/SQL code to both execute a query, as well as return the number of results of this query.
Is this possible in a single query? Right now I feel like I'm being very wasteful: I first wrap the query in a COUNT (without ORDER BY) and then run the same query again (without the COUNT). A difference of a few seconds will probably not change the total number of rows, I can live with that.
The DB I'm using is Oracle Enterprise 12.2

An easy SQL way:
a test table:
create table testTable(a, b, c) as (
select 1, 'one', 'XX' from dual UNION ALL
select 2, 'two', 'YY' from dual UNION ALL
select 3, 'three', 'ZZ' from dual
)
A simple query:
select a, b, c
from testTable
A B C
---------- ----- --
1 one XX
2 two YY
3 three ZZ
3 rows selected.
The query with the number of records :
select a, b, c, count(*) over (partition by 1) as count
from testTable
A B C COUNT
---------- ----- -- ----------
1 one XX 3
2 two YY 3
3 three ZZ 3
3 rows selected.

You can try to do something like:
WITH test_query
AS (SELECT LEVEL just_a_number
FROM dual
CONNECT BY level < 101)
SELECT just_a_number
, COUNT(just_a_number) OVER (ORDER BY 1) total_count
FROM test_query;
Where COUNT(just_a_number) OVER (ORDER BY 1) will return total number of rows fetched in each row. Note that it may slow down the query.

Typically, when I do something like this, I create a stored procedure that returns 2 values. The first would be the result set as a REF CURSOR, and the other a number(12,0) returning the count. Both would of course require separate queries, but since it is in a single stored procedure, it is only one database connection and command being executed.
You are of course right for having your COUNT query forgo the ORDER BY clause.
To answer your question, you are not being wasteful per se. This is common practice in enterprise software.

How to return the rows between 20th and 30th in Oracle Sql [duplicate]

This question already has answers here:
Oracle SQL: Filtering by ROWNUM not returning results when it should
(2 answers)
Closed 4 years ago.
so I have a large table that I'd like to output, however, I only want to see the rows between 20 and 30.
I tried
select col1, col2
from table
where rownum<= 30 and rownum>= 20;
but sql gave an error
I also tried --where rownum between 20 and 30
it also did not work.
so whats the best way to do this?

SELECT *
FROM T
ORDER BY I
OFFSET 20 ROWS --skips 20 rows
FETCH NEXT 10 ROWS ONLY --takes 10 rows
This shows only rows 21 to 30. Take care here that you need to sort the data, otherwise you may get different results every time.
See also here in the documentation.
Addendum: As in the possible duplicate link shown, your problem here is that there can't be a row with number 20 if there is no row with number 19. That's why the rownum-approach works to take only the first x records, but when you need to skip records you need a workaround by selecting the rownum in a subquery or using offset ... fetch
Example for a approach with using rownum (for lower oracle versions or whatever):
with testtab as (
select 'a' as "COL1" from dual
union all select 'b' from dual
union all select 'c' from dual
union all select 'd' from dual
union all select 'e' from dual
)
select * from
(select rownum as "ROWNR", testtab.* from testtab) tabWithRownum
where tabWithRownum.ROWNR > 2 and tabWithRownum.ROWNR < 4;
--returns only rownr 3, col1 'c'

Whenever you use rownum, it counts the rows that your query returns. SO if you are trying to filter by selecting all records between rownum 20 and 30, that is only 10 rows, so 20 and 30 dont exist. You can however, use WITH (whatever you want to name it) as and then wrap your query and rename your rownum column. This way you are selecting from your select. Example.
with T as (
select requestor, request_id, program, rownum as "ROW_NUM"
from fnd_conc_req_summary_v where recalc_parameters='N')
select * from T where row_num between 20 and 30;

sql server : select rows who's sum matches a value [duplicate]

This question already has answers here:
How to get rows having sum equal to given value
(4 answers)
Closed 9 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
here is table T :-
id num
-------
1 50
2 20
3 90
4 40
5 10
6 60
7 30
8 100
9 70
10 80
and the following is a fictional sql
select *
from T
where sum(num) = '150'
the expected result is :-
(A)
id num
-------
1 50
8 100
(B)
id num
-------
2 20
7 30
8 100
(C)
id num
-------
4 40
5 10
8 100
the 'A' case is most preferred !
i know this case is related to combinations.
in real world - client gets items from a shop, and because of an agreement between him and the shop, he pay every Friday. the payment amount is not the exact total of items
for example: he gets 5 books of 50 € ( = 250 € ), and on Friday he bring 150 €, so the first 3 books are perfect match - 3 * 50 = 150. i need to find the id's of those 3 books !
any help would be appreciated!

You can use recursive query in MSSQL to solve this.
SQLFiddle demo
The first recursive query build a tree of items with cumulative sum <= 150. Second recursive query takes leafs with cumulative sum = 150 and output all such paths to its roots. Also in the final results ordered by ItemsCount so you will get preferred groups (with minimal items count) first.
WITH CTE as
( SELECT id,num,
id as Grp,
0 as parent,
num as CSum,
1 as cnt,
CAST(id as Varchar(MAX)) as path
from T where num<=150
UNION all
SELECT t.id,t.num,
CTE.Grp as Grp,
CTE.id as parent,
T.num+CTE.CSum as CSum,
CTE.cnt+1 as cnt,
CTE.path+','+CAST(t.id as Varchar(MAX)) as path
from T
JOIN CTE on T.num+CTE.CSum<=150
and CTE.id<T.id
),
BACK_CTE as
(select CTE.id,CTE.num,CTE.grp,
CTE.path ,CTE.cnt as cnt,
CTE.parent,CSum
from CTE where CTE.CSum=150
union all
select CTE.id,CTE.num,CTE.grp,
BACK_CTE.path,BACK_CTE.cnt,
CTE.parent,CTE.CSum
from CTE
JOIN BACK_CTE on CTE.id=BACK_CTE.parent
and CTE.Grp=BACK_CTE.Grp
and BACK_CTE.CSum-BACK_CTE.num=CTE.CSum
)
select id,NUM,path, cnt as ItemsCount from BACK_CTE order by cnt,path,Id

If you restrict your problem to "which two numbers add up to a value", the solution is as follows:
SELECT t1.id, t1.num, t2.id,t2.num
FROM T t1
INNER JOIN T t2
ON t1.id < t2.id
WHERE t1.num + t2.num = 150
If you also want the result for three and more numbers you can achieve that by using the above query as a base for recursive SQL. Don't forget to specify a maximum recursion depth!

To find the id's of the books that the client is paying, you would need to have a table with your clients, and another one to store the orders of the client, and what products he bought.
Otherwise it would be impossible to know what product the payment refers to.

Joining onto a table that doesn't have ranges, but requires ranges

Trying to find the best way to write this SQL statement.
I have a customer table that has the internal credit score of that customer. Then i have another table with definitions of that credit score. I would like to join these tables together, but the second table doesn't have any way to link it easily.
The score of the customer is an integer between 1-999, and the definition table has these columns:
Score
Description
And these rows:
60 LOW
99 MED
999 HIGH
So basically if a customer has a score between 1 and 60 they are low, 61-99 they are med, and 100-999 they are high.
I can't really INNER JOIN these, because it would only join them IF the score was 60, 99, or 999, and that would exclude anyone else with those scores.
I don't want to do a case statement with the static numbers, because our scores may change in the future and I don't want to have to update my initial query when/if they do. I also cannot create any tables or functions to do this- I need to create a SQL statement to do it for me.
EDIT:
A coworker said this would work, but its a little crazy. I'm thinking there has to be a better way:
SELECT
internal_credit_score
(
SELECT
credit_score_short_desc
FROM
cf_internal_credit_score
WHERE
internal_credit_score = (
SELECT
max(credit.internal_credit_score)
FROM
cf_internal_credit_score credit
WHERE
cs.internal_credit_score <= credit.internal_credit_score
AND credit.internal_credit_score <= (
SELECT
min(credit2.internal_credit_score)
FROM
cf_internal_credit_score credit2
WHERE
cs.internal_credit_score <= credit2.internal_credit_score
)
)
)
FROM
customer_statements cs

try this, change your table to contain the range of the scores:
ScoreTable
-------------
LowScore int
HighScore int
ScoreDescription string
data values
LowScore HighScore ScoreDescription
-------- --------- ----------------
1 60 Low
61 99 Med
100 999 High
query:
Select
.... , Score.ScoreDescription
FROM YourTable
INNER JOIN Score ON YourTable.Score>=Score.LowScore
AND YourTable.Score<=Score.HighScore
WHERE ...

Assuming you table is named CreditTable, this is what you want:
select * from
(
select Description, Score
from CreditTable
where Score > 80 /*client's credit*/
order by Score
)
where rownum = 1
Also, make sure your high score reference value is 1000, even though client's highest score possible is 999.
Update
The above SQL gives you the credit record for a given value. If you want to join with, say, Clients table, you'd do something like this:
select
c.Name,
c.Score,
(select Description from
(select Description from CreditTable where Score > c.Score order by Score)
where rownum = 1)
from clients c
I know this is a sub-select that executed for each returning row, but then again, CreditTable is ridiculously small and there will be no significant performance loss because of the the sub-select usage.

You can use analytic functions to convert the data in your score description table to ranges (I assume that you meant that 100-999 should map to 'HIGH', not 99-999).
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 60 score, 'Low' description from dual union all
3 select 99, 'Med' from dual union all
4 select 999, 'High' from dual
5 )
6 select description,
7 nvl(lag(score) over (order by score),0) + 1 low_range,
8 score high_range
9* from x
SQL> /
DESC LOW_RANGE HIGH_RANGE
---- ---------- ----------
Low 1 60
Med 61 99
High 100 999
You can then join this to your CUSTOMER table with something like
SELECT c.*,
sd.*
FROM customer c,
(select description,
nvl(lag(score) over (order by score),0) + 1 low_range,
score high_range
from score_description) sd
WHERE c.credit_score BETWEEN sd.low_range AND sd.high_range

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas