row_number() function in oracle - sql

I am using ROW_NUMBER function in oracle and trying to understand how it is going to behave when the partition by and order by clause holds the same data then how ranking will work (if there are duplicate records).
below is the sample dataset
select * from test
Result
Dept salary created date
HR 500 25-Jul
HR 200 25-Jul
HR 500 26-Jul
Accounts 300 25-Jan
Accounts 300 26-Jan
Accounts 300 27-Jan
i ran the row_number function based on above set
select *,ROW_NUMBER() OVER(partition by Dept order by salary) as row_number
from test
result
Dept salary created date row_number
HR 500 25-Jul 1
HR 200 25-Jul 1
HR 500 26-Jul 2
Accounts 300 25-Jan 1
Accounts 300 26-Jan 2
Accounts 300 27-Jan 3
As you can see the output above, i am using the Dept as partition by and salary as order by for row_number, it gave me the ranking 1,2,3.
I am trying to understand here is that for the same data in the partition by and order by clause, does oracle assign the row_number based on when record entered into the system like in above "Accounts" "300" it gave the row_number 1 for the record which entered earliest in the system "25-Jan"
is there anywhere it is clearly mentioned that if it is doing partition by and order by on same data then ranking will be done based on when those records entered into the system.

I am trying to understand here is that for the same data in the partition by and order by clause, does oracle assign the row_number based on when record entered into the system like in above "Accounts" "300"
No, it does not. SQL tables represent unordered sets. There is no ordering, unless provided by explicitly by referring to column values.
If you are sorting by values that are the same, there is no guarantee on the ordering of the rows. Note that running the same query twice can produce different results when there are ties in order by keys. It is even possible within the same query. This is true both for the order by clause and for analytic functions.
If you want a guarantee, then you need to include a unique column as the last sorting key (well, it could not be the last, but it would effectively be the last one).

I guess you end result can be achieved using ROWID pseudocolumn as ROWID only generated when data entered into system -
SELECT T.*,ROW_NUMBER() OVER(partition by Dept order by salary, ROWID) as row_number
FROM test T

Related

SQL group by not returning row value for an aggregate column

I was using SQL statement to bring an aggregate (MAX) for a column and rest of the columns should come from that row. I was using group by clause but for other columns I must also use either max or min, etc. This was budget oriented project so I could not have time to do it using LINQ. (Where I could have used first or default). Anyways I believe this is strong inability of SQL language.
Again this could have done by many ways but not using simple SQL group by.
any ideas?
Your question is a bit light on details but it sounds like you want to know, for some set of items, which item has the maximum of something and then what it’s other properties are.
You cannot group by all the non max columns because this breaks the group down into too small chunks to make the max work
You cannot max all the other columns because this mixes row data up
Here is a simple example:
Name, JobRole, StartDate
John, JuniorProgrammer, 2000-01-01
John, SeniorProgrammer, 2010-01-01
John was promoted to senior programmer in 2010. We want johns most recent promotion and what he does now. If we do this:
SELECT name, jobrole, max(startdate)
FROM emp
GROUP BY name
The database will complain that jobrole is not in the group by. If we add it to the group by, John will appear twice, not what we want. If instead we max(jobrole), it DOES accidentally work out ok because alphabetically, SeniorProgrmamer is higher than JuniorProgrammer
If however, John then gets a promotion again in 2019:
Name, JobRole, StartDate
John, JuniorProgrammer, 2000-01-01
John, SeniorProgrammer, 2010-01-01
John, ExecutiveDirector, 2019-01-01
This time our query is wrong:
SELECT name, max(jobrole), max(startdate)
FROM emp
GROUP BY name
Hi he row data will be mixed up: the date will be 2019 but the job will still be seniorprogrammer because it’s alphabetically the maximum value
Instead we have to find the max for the person and then join it back to find the rest of the data:
SELECT name, jobrole, startdate
FROM
emp
INNER JOIN
(
SELECT name, max(startdate) d
FROM emp
GROUP BY name
)findmax
ON findmax.d = emp.startdate and findmax.name = emp.name
There are other ways of achieving the same thing without a join- this method would have issues if an employee was promoted twice on the same day, two records would result. In a dB that supports analytical functions we an do:
SELECT name, jobrole, row_number() over (partition by name order by startdate desc)
FROM emp
This establishes an incrementing counter in order of descending start date. The counter restarts from 1 for every different employee. There is no group by so no complaints that the extra data isn’t grouped or on aggregate function. All we need to do to choose the most recent promotion date is wrap the whole thing in a select that demands the row number be 1:
SELECT * FROM
(
SELECT name, jobrole, row_number() over (partition by name order by startdate desc) r
FROM emp
) emp_with_rownum
WHERE r = 1
You don't want a group by. You seem to want a window function:
select t.*, max(col) over () as overall_max
from t;

Return All Historical Account Records for Accounts with Change in Corresponding Value

I am trying to select all records in a time-variant Account table for each account with a change in an associated value (e.g. the maturity date). A change in the value will result in the most recent record for an account being end-dated and a new record (containing a new effective date of the following day) being created. The most recent records for accounts in this table have an end-date of 12/31/9000.
For instance, in the below illustration, account 44444444 would not be included in my query result set since it hasn't had a change in the value (and thus also has no additional records aside from the original); however, the other accounts have multiple changes in values (and multiple records), so I would want to see those returned.
I've tried using the row_num function, as well as a reflexive join, but for some reason I'm not getting the expected results. What are some ways to obtain the results I need?
Note: The primary key for this table includes the acct_id and eff_dt. Also, I'm using PostgreSQL in a Greenplum environment.
Here are two types of queries I tried to use but which produced problematic results:
Query 1
Query 2
If you want only the accounts, use aggregation:
select acct_id
from t
group by acct_id
having min(value) <> max(value);
Based on your description, you could also use count(*) >.
If you want the original records, you can use window functions:
select t.*
from (select t.*, count(*) over (partition by acct_id) as cnt
from t
) t
where cnt > 1;

Get the last value of a column in Oracle database

I have a table in oracle 10g and I want to get the last value of specified column of a table based on some conditions. How to write the query for this in NetBeans for exp. Suppose I want to get the last value of balance column where student id = 101, class = nursery and academic year = 2014
You can use ROWNUM to limit the number of results.
And to get the last records, you have to sort the records based on your condition.
An example of your query can be.
select * from (
select * from student
where class='nursery' and academic_year=2014
order by id desc
) where ROWNUM=1;
This will sort the records in DESCENDING order by id and returns the first record from the results.
Updated
As mentioned by MT0 in comments, ROWNUM value will be assigned before ORDER BY clause. Thus, may result in incorrect result.
I have corrected the above query, and now
subquery will sort the records in descending order of id.
selection will be done on the results. (WHERE ROWNUM=1)
Starting with oracle 10g you can use the LAST_VALUE function. Assuming if "balance" per student can be ordered by some time variable, and if student_id is unique within academic_year, class:
SELECT student_id, academic_year,class, LAST_VALUE(balance) OVER
(PARTITION BY student_id, academic_year,class ORDER BY time_variable desc) AS
last_balance from student where student_id ...;
The first row of this query contains the most recent value of "balance" (last_balance)
I am using a little bit different query. Don't know why, but always put rownum in subquery as a new column
select m.* from (
select rownum as rn, t.* from student t
where t.class='nursery' and t.academic_year=2014
order by t.di desc) m
where m.rn=1;

MySQL/Ms SQL latest records with multiple id's

I'm no sql-expert, but came across this problem:
I have to retrieve data from Microsoft SQL 2008 server. It holds different measurement data from different probes, that don't have any recording intervals. Meaning that some probe can transfer data in the database once every week, another once every second. Probes are identified by id's (not unique), and the point is to retrieve only the last record from each id (probe). Table looks like this (last 5, order by SampleDateTime desc):
TagID SampleDateTime SampleValue QualityID
13 634720670797944946 112 192
23 634720670797944946 38.1 192
17 634720670797944946 107.5 192
14 634720670748012090 110.6 192
19 634720670748012090 99.7 192
I CAN'T modify the server or even the settings, am only authorized to do queries. And I'd need to retrieve the requested data on even intervals (say once every minute or so). There are over 100 probes (with different id's) of which about 40 need to be read. So I am guessing that if this could be done in a single query it could be way more efficient than to get each row in a separate query.
Using MySQL and a similar table got the desired result this way (suggestions for a better way highly appreciated!):
SELECT TagID,SampleDateTime,SampleValue FROM
(
SELECT TagID,SampleDateTime,SampleValue FROM measurements
WHERE TagID IN(101,102,103) ORDER BY SampleDateTime DESC
)
AS table1 GROUP BY TagID;
Thought that would do the trick (didn't manage with MAX() or DISTINCT or no matter what I tried), as it did, with the correct data even. But naturally it doesn't work in Ms SQL because of 'GROUP BY'.
Column 'table1.SampleValue' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
I'm extremely stuck with this and so any insight would be more than welcome.
I am slightly confused as you have tagged MySQL and SQL-Server. For SQL-Server, I would use the ROW_NUMBER function to assist:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY TagID ORDER BY SampleDateTime DESC) [RowNumber]
FROM Measurements
) m
WHERE Rownumber = 1
The ROW_NUMBER function does exactly what it says on the tin, gives each row a number based on criteria you provide. So in the example above PARTITION BY TagID tells ROW_NUMBER to start again at 1 each time a new TagID is encountered. ORDER BY SampleDateTime DESC tells ROW_NUMBER to start numbering the each TagID at the latest entry and work upwards to the earliest entry.
The reason your query failed is because MySQL allows implicit group by, meaning that because you have only specified GROUP BY TagID any fields that are in the select list and not contained within an aggregate function will get the values of a "random" row assigned to them (the latest row in your case because you specified ORDER BY SampleDateTime DESC in the subquery.
Just in case it is required the following should work in most DBMS and is a better way of producing a similar query to the one you have been running in MySQL:
SELECT m.TagID, m.SampleDateTime, m.SampleValue, m.QualityID
FROM Measurements m
INNER JOIN
( SELECT TagID, MAX(SampleDateTime) AS SampleDateTime
FROM Measurements
GROUP BY TagID
) MaxTag
ON MaxTag.TagID = m.TagID
AND MaxTag.SampleDateTime = m.SampleDateTime

Find row number in a sort based on row id, then find its neighbours

Say that I have some SELECT statement:
SELECT id, name FROM people
ORDER BY name ASC;
I have a few million rows in the people table and the ORDER BY clause can be much more complex than what I have shown here (possibly operating on a dozen columns).
I retrieve only a small subset of the rows (say rows 1..11) in order to display them in the UI. Now, I would like to solve following problems:
Find the number of a row with a given id.
Display the 5 items before and the 5 items after a row with a given id.
Problem 2 is easy to solve once I have solved problem 1, as I can then use something like this if I know that the item I was looking for has row number 1000 in the sorted result set (this is the Firebird SQL dialect):
SELECT id, name FROM people
ORDER BY name ASC
ROWS 995 TO 1005;
I also know that I can find the rank of a row by counting all of the rows which come before the one I am looking for, but this can lead to very long WHERE clauses with tons of OR and AND in the condition. And I have to do this repeatedly. With my test data, this takes hundreds of milliseconds, even when using properly indexed columns, which is way too slow.
Is there some means of achieving this by using some SQL:2003 features (such as row_number supported in Firebird 3.0)? I am by no way an SQL guru and I need some pointers here. Could I create a cached view where the result would include a rank/dense rank/row index?
Firebird appears to support window functions (called analytic functions in Oracle). So you can do the following:
To find the "row" number of a a row with a given id:
select id, row_number() over (partition by NULL order by name, id)
from t
where id = <id>
This assumes the id's are unique.
To solve the second problem:
select t.*
from (select id, row_number() over (partition by NULL order by name, id) as rownum
from t
) t join
(select id, row_number() over (partition by NULL order by name, id) as rownum
from t
where id = <id>
) tid
on t.rownum between tid.rownum - 5 and tid.rownum + 5
I might suggest something else, though, if you can modify the table structure. Most databases offer the ability to add an auto-increment column when a row is inserted. If your records are never deleted, this can server as your counter, simplifying your queries.