Selecting filtered values from Oracle using ROWNUM - sql

I have a requirement wherein i need to find the record number of the records that are returned from the resultset. I know that i can use ROWNUM to get the record number from the resultset but my issue is slightly different. below are the details
Table : ProcessSummary
Columns:
PS_PK ProcessId StepId AsscoiateId ProcessName AssetAmount
145 25 50 Process1 3,500.00
267 26 45 Process2 4,400.00
356 27 70 Process3 2,400.00
456 28 80 90 Process4 780.00
556 29 56 67 Process5 4,500.00
656 45 70 Process6 6,000.00
789 31 75 Process7 8,000.00
Now what i need to do is fetch all the records from the ProcessSummary Table when either of ProcessId OR StepId OR AssociateId is NULL. I wrote the below query
select * from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
As expected i got 1st, 2nd, 3rd, 6th and 7th records in the resultset that got returned.
Now what i need is to get the records numbers 1,2,3,6,7. I tried to use the ROWNUM as below but i got the values of 1,2,3,4,5 and not 1,2,3,6,7.
select ROWNUM from ProcessSummary where ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
Is it possible to get the ROWNUM values in the sequence that i want and if yes then can you please let me know how can i do this. Also if ROWNUM cannot be used then what would be the other option that i can use to get the result in the form that i want.
Any help would be greately appericiated as i could not find much on the net or SO regarding this sort of requirement.
Thanks
Vikeng21

rownum is an internal numbering that gives you a row number based on the current query results only, so that numbering is not tied to a specific record, and it will change when you change the data or the query.
But the numbering you ask for is already in your table. It looks like you just need to SELECT PS_PK .. instead. PS_PK is the field in your table that contains the actual number you want.
You can generate a numbering using an analytical function, and then filter that query. You need some fields to order by, though. In this case I've chosen PS_PK, but it can be another field, like ProcessName or a combination of other fields as well.
select
*
from
(select
dense_rank() over (order by PS_PK) as RANKING,
p.*
from
ProcessSummary p)
where
ProcessId IS NULL OR StepId IS NULL OR AsscoiateId IS NULL
So, in this query, first a numbering is calculated for each row that is returned from the inner query. The numbering is returned as the field RANKING. And then the other query filters further, but still will return the field RANKING with the original numbering.
Instead of dense_rank there is also rank and row_number. The differences are subtle, but you can just experiment and read some docs here and here to learn about the differences and see which one fits you best.
Note that this might slow down your query, because the inner query first generates a number for each row in the table (there is no filtering on that level now).

Related

SQL JOIN with CASE statement result

Is there any way of joining the result of a case statement with a reference table without creating a CTE, ect.
Result AFTER CASE statement:
ID Name Bonus Level (this is the result of a CASE statement)
01 John A
02 Jim B
01 John B
03 Jake C
Reference table
A 10%
B 20%
C 30%
I want to then get the % next to each employee, then the max %age using the MAX function and grouping by ID, then link it back again to the reference so that each employee has the single correct (highest) bonus level next to their name. (This is a totally fictitious scenario, but very similar to what I am looking for).
Just need help with joining the result of the CASE statement with the reference table.
Thanks in advance.
In place of a temporary value as the result of the case statement, you could use a select statement from the reference table.
So if your case statement looks like:
case when variable1=value then bonuslevel =A
Then, replacing it like this might help
case when variable1=value then (select percentage from ReferenceTable where variable2InReferenceTable=A)
Don't know if I am overly simplifying, but based on the results of your case result query, why not just join that to the reference table, and do a max grouped by ID/Name. Since the ID and persons name wont change anyhow since they are the same person, you are just getting the max you want. To complete the Bonus level, rejoin just that portion after the max percentage determined for the person.
select
lvl1.ID,
lvl1.Name,
lvl1.FinalBonus,
rt2.BonusLvl
from
( select
PQ.ID,
PQ.Name,
max( rt.PcntBonus ) as FinalBonus
from
(however you
got your
data query ) PQ
JOIN RefTbl rt
on PQ.BonusLvl = rt.BonusLvl
) lvl1
JOIN RefTbl rt2
on lvl1.FinalBonus = rt2.PcntBonus
Since the Bonus levels (A,B,C) do not guarantee corresponding % levels (10,20,30), I did it this way... OTHERWISE, you could have just used max() on both the bonus level and percent. But what if your bonus levels were listed as something like
Limited 10%
Aggressive 20%
Ace 30%
You could see that a max of the level above would have "Limited", but the max % = 30 is associated with an "Ace" sales rep... Get the 30% first, then see what the label that matched that is.

Is there a way do dynamically set ROWS BETWEEN X PRECENDING AND CURRENT ROW?

i'm looking for a way to, on my query, dynamically set the beginning of the window function on Sql Server using ROWS BETWEEN.
Something like:
SUM(field) OVER(ORDER BY field2 ROWS BETWEEN field3 PRECEDING AND CURRENT ROW)
field3 holds the amount of items (via group by from a CTE) that represent a group.
Is that possible or should i try a different approach?
>> EDIT
My query is too big and messy to share here, but let me try to explain what i need. It's from a report builder which allows users to create custom formulas, like "emplyoees/10". This also allows the user to simply input a formula like "12" and i need to calculate subtotals and the grand total for them. When using a field, like "employees", everything works fine. But for constant values i can't sum the values without rewriting a lot of stuff (which i'm trying to avoid).
So, consider a CTE called "aggregator" and the following query:
SELECT
*,
"employees"/10 as "ten_percent"
12 as "twelve"
FROM aggregator
This query returns this output:
row_type counter company_name department_name employees ten_percent twelve
data 1 A A1 10 1 12
data 1 A A2 15 1,5 12
data 1 A A3 10 1 12
subtotal 3 A 35 3,5 12
data 1 B B1 10 1 12
subtotal 1 B 10 1 12
total 4 45 4,5 12
As you can see, the values fot "twelve" are wrong for subtotal and total row types. I'm trying to solve this without changing the CTE.
ROLLUP won't work because i already have the sum for other columns.
I tried this (i ommited "row_type_sort" on the table above, it defines the sorting):
CASE
WHEN row_type = 'data' THEN
MAX(aggregator.[twelve])
ELSE
SUM(SUM(aggregator.[twelve]))
OVER (ORDER BY "row_type_sort" ROWS BETWEEN unbounded PRECEDING AND CURRENT ROW)
END AS "twelve"
This would work OK if i could change "unbounded" by the value of column "counter", which was my original question.
LAG/LEAD wasn't helpful neither.
I'm out of ideas. Is it possible to achieve what i need only by changing this part of the query, or the result of the CTE should be changed as well?
Thanks

How to get the biggest column value between duplicated rows id?

I am working on an Oracle 11g database query that needs to retrieve a list of the highest NUM value between duplicated rows in a table.
Here is an example of my context:
ID | NUM
------------
1 | 1111
1 | 2222
2 | 3333
2 | 4444
3 | 5555
3 | 6666
And here is the result I am expecting after the query is executed:
NUM
----
2222
4444
6666
I know how to get the GREATEST value in a list of numbers, but I have absolutely no guess on how to group two lines, fetch the biggest column value between them IF they have the same ID.
Programmaticaly it is something quite easy to achieve, but using SQL it tends to be a litle bit less intuitive for me. Any suggestion or advise is welcomed as I don't even know which function could help me doing this in Oracle.
Thank you !
This is the typical use case for a GROUP BY. Assuming your Num field can be compared:
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
If you don't want to select the ID (as in the output you provided), you can run
SELECT Max
FROM (
SELECT ID, MAX(NUM) as Max
FROM myTable
GROUP BY ID
) results
And here is the SQL fiddle
Edit : if NUM is, as you mentioned later, VARCHAR2, then you have to cast it to an Int. See this question.
The most efficient way I would suggest is
SELECT ids,
value
FROM (SELECT ids,
value,
max(value)
over (
PARTITION BY ids) max_value
FROM test)
WHERE value = max_value;
This requires that the query maintain a single value per id of the maximum value encountered so far. If a new maximum is found then the existing value is modified, otherwise the new value is discarded. The total number of elements that have to be held in memory is related to the number of ids, not the number of rows scanned.
See this SQLFIDDLE

SQL rollup - prevent summing records multiple times

Firstly, I could not think of a better question title. Apologies for that.
So, I am writing a query and here is something(I think) it would return without aggregating functions and group by. I am using this as an example and actual query contains a lot more fields:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
B 2 50
C 3 60
A 2 50
A 1 25 <--Not actually duplicate
Now you would say there are duplicate records. But in fact they are not duplicate in a way that there are some extra fields(not shown here) which would have different values for those seemingly duplicate records.
What I want:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
2 50
TOTAL 75
B 2 50
TOTAL 50
C 3 60
TOTAL 60
//EDIT - Apparently following line is causing too much confusion. Ignore it. How can I get rest of the table correctly?
TOTAL 135 //It seems its quite difficult to get 135 here. Its ok if this total is messed up
What I am trying:
SELECT
SOME_FIELDS,
SUBJ,
CLASSROOM,
SUM(CLASSROOM_CAPACITY)
FROM
MYTABLE
WHERE .....
GROUP BY SOME_FIELDS, ROLLUP(SUBJ,CLASSROOM)
The problem:
Thanks to those "seemingly duplicate" records, classroom capacities are being summed up multiple times. How do I prevent that? Am I doing this the wrong way?
The actual query is lot more complicated but I think if I can get this right, I can apply it to bigger query.
PS: I know how to get text "Total" instead of blank entry with ROLLUP using GROUPING so you can skip that part.
The cardinality you're introducing is a little off and when you sort the that ROLLUP starts to work. Your saying that:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
is equal to:
SUBJ CLASSROOM CLASSROOM_CAPACITY
A 1 25
But the SOME_FIELDS could vary per row. When you aggregate up to just the columns above, what do you expect to happen to SOME_FIELDS?
If these can be ignore for the purposes of this query your best bet is to first find the DISTINCT records (i.e. records that contain a unique tuple of subj, classroom and classroom_capacity) and then do the ROLLUP on this data set. The following query achieves this:
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT
subj
, classroom
, SUM(classroom_capacity)
FROM distinct_subj_classrm_capacity
GROUP BY ROLLUP(subj, classroom)
If you're not interested in the break report results that ROLLUP gives you and you simply want the raw totals then you can use the analytic version of SUM (see here for more on Oracle analytic functions: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions004.htm)
WITH distinct_subj_classrm_capacity AS (
SELECT DISTINCT
subj
, classroom
, classroom_capacity
FROM mytable
)
SELECT DISTINCT
subj
, SUM(classroom_capacity) OVER (PARTITION BY subj) classroom_capacity_per_subj
FROM distinct_subj_classrm_capacity
This gives results in the format:
SUBJ CLASSROOM_CAPACITY_PER_SUBJ
A 75
B 50
C 60

MySQL: Getting highest score for a user

I have the following table (highscores),
id gameid userid name score date
1 38 2345 A 100 2009-07-23 16:45:01
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
5 38 2345 A 50 2009-07-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
7 32 2345 A 100 2009-07-20 16:45:01
Now in the above structure, a user can play a game multiple times but I want to display the "Games Played" by a specific user. So in games played section I can't display multiple games. So the concept should be like if a user played a game 3 times then the game with highest score should be displayed out of all.
I want result data like:
id gameid userid name score date
2 39 2345 A 500 2009-07-20 16:45:01
3 31 2345 A 100 2009-07-20 16:45:01
4 38 2345 A 200 2009-10-20 16:45:01
6 32 2345 A 120 2009-07-20 16:45:01
I tried following query but its not giving me the correct result:
SELECT id,
gameid,
userid,
date,
MAX(score) AS score
FROM highscores
WHERE userid='2345'
GROUP BY gameid
Please tell me what will be the query for this?
Thanks
Requirement is a bit vague/confusing but would something like this satisfy the need ?
(purposely added various aggregates that may be of interest).
SELECT gameid,
MIN(date) AS FirstTime,
MAX(date) AS LastTime,
MAX(score) AS TOPscore.
COUNT(*) AS NbOfTimesPlayed
FROM highscores
WHERE userid='2345'
GROUP BY gameid
-- ORDER BY COUNT(*) DESC -- for ex. to have games played most at top
Edit: New question about adding the id column to the the SELECT list
The short answer is: "No, id cannot be added, not within this particular construct". (Read further to see why) However, if the intent is to have the id of the game with the highest score, the query can be modified, using a sub-query, to achieve that.
As explained by Alex M on this page, all the column names referenced in the SELECT list and which are not used in the context of an aggregate function (MAX, MIN, AVG, COUNT and the like), MUST be included in the ORDER BY clause. The reason for this rule of the SQL language is simply that in gathering the info for the results list, SQL may encounter multiple values for such an column (listed in SELECT but not GROUP BY) and would then not know how to deal with it; rather than doing anything -possibly useful but possibly silly as well- with these extra rows/values, SQL standard dictates a error message, so that the user can modify the query and express explicitly his/her goals.
In our specific case, we could add the id in the SELECT and also add it in the GROUP BY list, but in doing so the grouping upon which the aggregation takes place would be different: the results list would include as many rows as we have id + gameid combinations the aggregate values for each of this row would be based on only the records from the table where the id and the gameid have the corresponding values (assuming id is the PK in table, we'd get a single row per aggregation, making the MAX() and such quite meaningless).
The way to include the id (and possibly other columns) corresponding to the game with the top score, is with a sub-query. The idea is that the subquery selects the game with TOP score (within a given group by), and the main query's SELECTs any column of this rows, even when the fieds wasn't (couldn't be) in the sub-query's group-by construct. BTW, do give credit on this page to rexem for showing this type of query first.
SELECT H.id,
H.gameid,
H.userid,
H.name,
H.score,
H.date
FROM highscores H
JOIN (
SELECT M.gameid, hs.userid, MAX(hs.score) MaxScoreByGameUser
FROM highscores H2
GROUP BY H2.gameid, H2.userid
) AS M
ON M.gameid = H.gameid
AND M.userid = H.userid
AND M.MaxScoreByGameUser = H.score
WHERE H.userid='2345'
A few important remarks about the query above
Duplicates: if there the user played several games that reached the same hi-score, the query will produce that many rows.
GROUP BY of the sub-query may need to change for different uses of the query. If rather than searching for the game's hi-score on a per user basis, we wanted the absolute hi-score, we would need to exclude userid from the GROUP BY (that's why I named the alias of the MAX with a long, explicit name)
The userid = '2345' may be added in the [now absent] WHERE clause of the sub-query, for efficiency purposes (unless MySQL's optimizer is very smart, currently all hi-scores for all game+user combinations get calculated, whereby we only need these for user '2345'); down side duplication; solution; variables.
There are several ways to deal with the issues mentioned above, but these seem to be out of scope for a [now rather lenghty] explanation about the GROUP BY constructs.
Every field you have in your SELECT (when a GROUP BY clause is present) must be either one of the fields in the GROUP BY clause, or else a group function such as MAX, SUM, AVG, etc. In your code, userid is technically violating that but in a pretty harmless fashion (you could make your code technically SQL standard compliant with a GROUP BY gameid, userid); fields id and date are in more serious violation - there will be many ids and dates within one GROUP BY set, and you're not telling how to make a single value out of that set (MySQL picks a more-or-less random ones, stricter SQL engines might more helpfully give you an error).
I know you want the id and date corresponding to the maximum score for a given grouping, but that's not explicit in your code. You'll need a subselect or a self-join to make it explicit!
Use:
SELECT t.id,
t.gameid,
t.userid,
t.name,
t.score,
t.date
FROM HIGHSCORES t
JOIN (SELECT hs.gameid,
hs.userid,
MAX(hs.score) 'max_score'
FROM HIGHSCORES hs
GROUP BY hs.gameid, hs.userid) mhs ON mhs.gameid = t.gameid
AND mhs.userid = t.userid
AND mhs.max_score = t.score
WHERE t.userid = '2345'