Apply OFFSET and LIMIT in ORACLE for complex Join Queries? - sql

I'm using Oracle 11g and have a complex join query. In this query I really wanted to apply OFFSET and LIMIT in order to be get used in Spring Batch Framework effectively.
I went through:
How do I limit the number of rows returned by an Oracle query after ordering? and
Alternatives to LIMIT and OFFSET for paging in Oracle
But things are not very clear to me.
My Query
SELECT DEPT.ID rowobjid, DEPT.CREATOR createdby, DEPT.CREATE_DATE createddate, DEPT.UPDATED_BY updatedby, DEPT.LAST_UPDATE_DATE updateddate,
DEPT.NAME name, DEPT.STATUS status, statusT.DESCR statusdesc,
REL.ROWID_DEPT1 rowidDEPT1, REL.ROWID_DEPT2 rowidDEPT2, DEPT2.DEPT_FROM_VAL parentcid, DEPT2.NAME parentname
FROM TEST.DEPT_TABLE DEPT
LEFT JOIN TEST.STATUS_TABLE statusT ON DEPT.STATUS = statusT.STATUS
LEFT JOIN TEST.C_REL_DEPT rel ON DEPT.ID=REL.ROWID_DEPT2
LEFT JOIN TEST.DEPT_TABLE DEPT2 ON REL.ROWID_DEPT1=DEPT2.ID
ORDER BY rowobjid asc;
Above Query gives me 10 millions records.
Note: Neither database table has PK, so I would need to use OFFSET and LIMIT.

You can use Analytic functions such as ROW_NUMBER() within a subquery for Oracle 11g assuming you need to get the rows ranked between 3rd and 8th in order to capture the OFFSET 3 LIMIT 8 logic within the Oracle DB(indeed those clauses are included for versions 12c+), whenever the result should be grouped by CREATE_DATE and ordered by the ID of the departments :
SELECT q.*
FROM (SELECT DEPT.ID rowobjid,
DEPT.CREATOR createdby,
DEPT.CREATE_DATE createddate,
DEPT.UPDATED_BY updatedby,
DEPT.LAST_UPDATE_DATE updateddate,
DEPT.NAME name,
DEPT.STATUS status,
statusT.DESCR statusdesc,
REL.ROWID_DEPT1 rowidDEPT1,
REL.ROWID_DEPT2 rowidDEPT2,
DEPT2.DEPT_FROM_VAL parentcid,
DEPT2.NAME parentname,
ROW_NUMBER() OVER (PARTITION BY DEPT.CREATE_DATE ORDER BY DEPT.ID) AS rn
FROM TEST.DEPT_TABLE DEPT
LEFT JOIN TEST.STATUS_TABLE statusT
ON DEPT.STATUS = statusT.STATUS
LEFT JOIN TEST.C_REL_DEPT rel
ON DEPT.ID = REL.ROWID_DEPT2
LEFT JOIN TEST.DEPT_TABLE DEPT2
ON REL.ROWID_DEPT1 = DEPT2.ID) q
WHERE rn BETWEEN 3 AND 8;
which returns exactly 6(8-3+1) rows. If you need to include the ties(the equal values for department identities for each creation date), ROW_NUMBER() should be replaced with another window function called DENSE_RANK() as all other parts of the query remains the same. At least 6 records would return in this case.

Related

SQL How to select customers with highest transaction amount by state

I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?

SQL: Get the first value

I have two tables:
patients(ID, Firstname, Lastname, ...)
records(ID, Date, Time, Version)
I want to (inner) join these tables, so I have the records with patient data, but in the column for Version I want always the first value that was recorded for the patient (so with the minimum of date and time dependent on the patient (id)). I tried with subquery but HANA doesn't allow ORDER-BY or LIMIT clause in subqueries.
How can I implement this with SQL? (HANA SQL)
Kind regards and thanks in advance.
HANA supports window functions, so you can join against a derived table that picks the first version:
select p.*, r.id, r.date, r.time, r.version
from patients p
join (
select id, date, time, version, patient_id,
row_number() over (partition by patient_id order by version) as rn
from records
) r on p.id = r.patient_id and r.rn = 1
The above assumes that the records table has a column patient_id that contains the id of the patients table to which that record belongs to.

SQL plus, top 3 rank across two tables

I'm trying to find a way to query the top three users in a database in terms of number of listens and output their user ID and their rank.
The schema for the two tables in question is as follows :
User(user_id, email, first_name, last_name, password, created_on, last_sign_in)
PreviouslyPlayed(user_id, track_id, timestamp)
I could see how many people pull this off with a count query, but am wondering is there's a way to do this with a rank or dense rank
If you just want the user id and are using Oracle 12g+, then you can do:
select pp.user_id, rank() over (order by count(*) desc) as therank
from previouslyplayed pp
group by pp.user_id
order by count(*) desc
fetch first 3 rows only;
In earlier versions, you would use a subquery:
select pp.*
from (select pp.user_id, rank() over (order by count(*) desc) as therank
from previouslyplayed pp
group by pp.user_id
) pp
where therank <= 3;
You might want to review row_number(), rank(), and dense_rank() to be sure you are getting what you really want (the difference is in how they handle ties).
You only need the join if you are concerned that something called user_id in one table is not a valid user id. That seems unlikely, in any well-designed database.

Trying to figure out how to join these queries

I have a table named grades. A column named Students, Practical, Written. I am trying to figure out the top 5 students by total score on the test. Here are the queries that I have not sure how to join them correctly. I am using oracle 11g.
This get's me the total sums from each student:
SELECT Student, Practical, Written, (Practical+Written) AS SumColumn
FROM Grades;
This gets the top 5 students:
SELECT Student
FROM ( SELECT Student,
, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM Grades )
WHERE Student_dr <= 5
order by Student_dr;
The approach I prefer is data-centric, rather than row-position centric:
SELECT g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
FROM Grades g
LEFT JOIN Grades g2 on g2.Practical+g2.Written > g.Practical+g.Written
GROUP BY g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
HAVING COUNT(*) < 5
ORDER BY g.Practical+g.Written DESC
This works by joining with all students that have greater scores, then using a HAVING clause to filter out those that have less than 5 with a greater score - giving you the top 5.
The left join is needed to return the top scorer(s), which have no other students with greater scores to join to.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
By not using row position logic, which varies from darabase to database, this query is also completely portable.
Note that the ORDER BY is optional.
With Oracle's PLSQL you can do:
SELECT score.Student, Practical, Written, (Practical+Written) as SumColumn
FROM ( SELECT Student, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM VOTES ) as score, students
WHERE score.score_dr <= 5
and score.Student = students.Student
order by score.Score_dr;
You can easily include the projection of the first query in the sub-query of the second.
SELECT Student
, Practical
, Written
, tot_score
FROM (
SELECT Student
, Practical
, Written
, (Practical+Written) AS tot_score
, DENSE_RANK() OVER (ORDER BY (Practical+Written) DESC) as Score_dr
FROM Grades
)
WHERE Student_dr <= 5
order by Student_dr;
One virtue of analytic functions is that we can just use them in any query. This distinguishes them from aggregate functions, where we need to include all non-aggregate columns in the GROUP BY clause (at least with Oracle).

Implement FIRST() in select and not in WHERE

I want to get first value in a field in Oracle when another corresponding field has max value.
Normally, we would do this using a query and a subquery. The subquery ordering by a field and the outer query with where rownum<=1.
But, I cannot do this because the table aliases persist only one level deep and this query is a part of another big query and I need to use some aliases from the outermost query.
Here's the query structure
select
(
select a --This should get first value of a after b's are sorted desc
from
(
select a,b from table1 where table1.ID=t2.ID order by b desc
)
where rownum<=1
)
) as "A",
ID
from
table2 t2
Now this is not gonna work because alias t2 wont be available at innermost query.
Real world analogy that comes to my mind is I have a table containing records for all employees of a company, their salaries(including past salaries) and the date from which the salary was effective. So, for each employee, there will multiple records. Now, I want to get latest salaries for all the employees.
With SQL server, I could have used SELECT TOP. But that's not available with Oracle and since where clauses execute before order by, I cannot use where rownum<=1 and order by in same query and expect correct results.
How do I do this?
Using your analogy of employees and their salaries, if I understand what you are trying to do, you could do something like this (haven't tested):
SELECT *
FROM (
SELECT employee_id,
salary,
effective_date,
ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY effective_date DESC) rowno
FROM employees
)
WHERE rowno=1
I would much rather see you connect the subquery up with a JOIN instead of embedding it in the SELECT. Cleaner SQL. Then you can use the windowing function that roartechs suggests.
Select t2.whatever, t1.a
From table2 t2
Inner Join (
Select tfirst.ID, tfirst.a
From (
Select ID, a,
ROW_NUMBER() Over (Partition BY ID ORDER BY b DESC) rownumber
FROM table1
) tfirst
WHERE tfirst.rownumber=1
) t1 on t2.ID=t1.ID