Oracle query optimization on 100 million table - sql

Below is the SMAPLE query. How can I write this optimally?? It is scanning around 80 million records.. It runs hours together without any results
select * FROM (SELECT organization, ROW_NUMBER() OVER (PARTITION BY empno, sal ORDER BY deptno DESC) AS row_num
FROM emp) x
Where
x.row_num=1
And
x.organization !=3
Beloe are all I tried, which didn’t help at all..
UNIQUE Index on deptno
COMPOSITE INDEX ON (empno, sal)
Additional NON-UNIQUE index on Organization column
I tried re-writing the inequality condition as x.organization< 3 or x.organization > 3 with no luck
Nothing is helping to produce results.. they query just runs for hours with NO RESULTS
please advice

For this query:
SELECT *
FROM (SELECT empname,
ROW_NUMBER() OVER (PARTITION BY empno, sal ORDER BY deptno DESC) AS row_num
FROM emp
) x
Where x.row_num = 1;
the optimal index is emp(empno, sal, deptno desc).
It might be faster to try a correlated subquery:
select e.*
from emp e
where e.deptno = (select max(e2.deptno) from emp e2 where e2.empno = e.empno and e2.sal = e.sal);

I think you need a composite index on empno, sal deptno.
This will speed things up as the ordering will be done and the DB will not have to do the sorting
But I can be wrong - The explain plan will help

Related

Stuck with my query

Would really appreciate if someone could help me with my query.
I need it to show the difference between max salary and salary of others.
select e.ename, f.sal, e.sal
from emp e , (select max(sal) as "sal" from emp) f
where 1=1 ;
Where am I making the mistake?
Thanks in advance!
select
name
, sal
, max_sal - sal as diff
from (
select
e.ename
, e.sal
, max(salary) over() as max_sal
from emp e
) d
I suggest using max(salary) over() analytic (or window) function which places the maximum salary on every row of the subquery, then it is a simple subtraction to calculate the difference. No group by clause is required in this form of aggregate.
select e.ename,
e.sal sal_of_others,
(select max(sal) from emp where ename = e.ename ) - e.sal sal_diff
from emp e
No need to use where condition unnecessarily and also you can use subquery as shown below
SELECT e.ename, e.sal, (select max(sal) from emp) - E.Sal as difference
FROM emp e
You can use an analytic function to find the maximum salary - this negates the need to query the table twice:
select ename,
sal,
max(sal) over () max_overall_sal,
max(sal) over (partition by deptno) max_sal_per_dept
from emp;
I've given you two ways to find the max salary - one across the whole data set (over ()) and one per department (over (partition by deptno)) since I wasn't sure which you wanted.
If you are unfamiliar with analytic functions, I highly recommend you look them up. In their basic form, they're similar to how aggregate functions work, except instead of collapsing the rows, they report the value for each row. The partition by clause groups the data in the same way as the group by does in an aggregate query.
use the below query
select ename,sal,((select max(sal) from emp)-sal) as saldiff from emp

How can I select the record with the 2nd highest salary in database Oracle?

Suppose I have a table employee with id, user_name, salary. How can I select the record with the 2nd highest salary in Oracle?
I googled it, find this solution, is the following right?:
select sal from
(select rownum n,a.* from
( select distinct sal from emp order by sal desc) a)
where n = 2;
RANK and DENSE_RANK have already been suggested - depending on your requirements, you might also consider ROW_NUMBER():
select * from (
select e.*, row_number() over (order by sal desc) rn from emp e
)
where rn = 2;
The difference between RANK(), DENSE_RANK() and ROW_NUMBER() boils down to:
ROW_NUMBER() always generates a unique ranking; if the ORDER BY clause cannot distinguish between two rows, it will still give them different rankings (randomly)
RANK() and DENSE_RANK() will give the same ranking to rows that cannot be distinguished by the ORDER BY clause
DENSE_RANK() will always generate a contiguous sequence of ranks (1,2,3,...), whereas RANK() will leave gaps after two or more rows with the same rank (think "Olympic Games": if two athletes win the gold medal, there is no second place, only third)
So, if you only want one employee (even if there are several with the 2nd highest salary), I'd recommend ROW_NUMBER().
If you're using Oracle 8+, you can use the RANK() or DENSE_RANK() functions like so
SELECT *
FROM (
SELECT some_column,
rank() over (order by your_sort_column desc) as row_rank
) t
WHERE row_rank = 2;
This query works in SQL*PLUS to find out the 2nd Highest Salary -
SELECT * FROM EMP
WHERE SAL = (SELECT MAX(SAL) FROM EMP
WHERE SAL < (SELECT MAX(SAL) FROM EMP));
This is double sub-query.
I hope this helps you..
WITH records
AS
(
SELECT id, user_name, salary,
DENSE_RANK() OVER (PARTITION BY id ORDER BY salary DESC) rn
FROM tableName
)
SELECT id, user_name, salary
FROM records
WHERE rn = 2
DENSE_RANK()
You should use something like this:
SELECT *
FROM (select salary2.*, rownum rnum from
(select * from salary ORDER BY salary_amount DESC) salary2
where rownum <= 2 )
WHERE rnum >= 2;
select * from emp where sal=(select max(sal) from emp where sal<(select max(sal) from emp))
so in our emp table(default provided by oracle) here is the output
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7698 BLAKE MANAGER 7839 01-MAY-81 3000 30
7788 SCOTT ANALYST 7566 19-APR-87 3000 20
7902 FORD ANALYST 7566 03-DEC-81 3000 20
or just you want 2nd maximum salary to be displayed
select max(sal) from emp where sal<(select max(sal) from emp)
MAX(SAL)
3000
select * FROM (
select EmployeeID, Salary
, dense_rank() over (order by Salary DESC) ranking
from Employee
)
WHERE ranking = 2;
dense_rank() is used for the salary has to be same.So it give the proper output instead of using rank().
select Max(Salary) as SecondHighestSalary from Employee where Salary not in
(select max(Salary) from Employee)
I would suggest following two ways to implement this in Oracle.
Using Sub-query:
select distinct SALARY
from EMPLOYEE e1
where 1=(select count(DISTINCT e2.SALARY) from EMPLOYEE e2 where
e2.SALARY>e1.SALARY);
This is very simple query to get required output. However, this query is quite slow as each salary in inner query is compared with all distinct salaries.
Using DENSE_RANK():
select distinct SALARY
from
(
select e1.*, DENSE_RANK () OVER (order by SALARY desc) as RN
from EMPLOYEE e
) E
where E.RN=2;
This is very efficient query. It works well with DENSE_RANK() which assigns consecutive ranks unlike RANK() which assigns next rank depending on row number which is like olympic medaling.
Difference between RANK() and DENSE_RANK():
https://oracle-base.com/articles/misc/rank-dense-rank-first-last-analytic-functions
I believe this will accomplish the same result, without a subquery or a ranking function:
SELECT *
FROM emp
ORDER BY sal DESC
LIMIT 1
OFFSET 2
This query helps me every time for problems like this. Replace N with position..
select *
from(
select *
from (select * from TABLE_NAME order by SALARY_COLUMN desc)
where rownum <=N
)
where SALARY_COLUMN <= all(
select SALARY_COLUMN
from (select * from TABLE_NAME order by SALARY_COLUMN desc)
where rownum <=N
);
select * from emp where sal = (
select sal from
(select rownum n,a.sal from
( select distinct sal from emp order by sal desc) a)
where n = 2);
This is more optimum, it suits all scenarios...
select max(Salary) from EmployeeTest where Salary < ( select max(Salary) from EmployeeTest ) ;
this will work for all DBs.
You can use two max function. Let's say get data of userid=10 and its 2nd highest salary from SALARY_TBL.
select max(salary) from SALARY_TBL
where
userid=10
salary <> (select max(salary) from SALARY_TBL where userid=10)
Replace N with your Highest Number
SELECT *
FROM Employee Emp1
WHERE (N-1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)
Explanation
The query above can be quite confusing if you have not seen anything like it before – the inner query is what’s called a correlated sub-query because the inner query (the subquery) uses a value from the outer query (in this case the Emp1 table) in it’s WHERE clause.
And Source
I have given the answer here
By the way I am flagging this Question as Duplicate.
Syntax it for Sql server
SELECT MAX(Salary) as 'Salary' from EmployeeDetails
where Salary NOT IN
(
SELECT TOP n-1 (SALARY) from EmployeeDetails ORDER BY Salary Desc
)
To get 2nd highest salary of employee then we need replace “n” with 2 our query like will be this
SELECT MAX(Salary) as 'Salary' from EmployeeDetails
where Salary NOT IN
(
SELECT TOP 1 (SALARY) from EmployeeDetails ORDER BY Salary Desc
)
3rd highest salary of employee
SELECT MAX(Salary) as 'Salary' from EmployeeDetails
where Salary NOT IN
(
SELECT TOP 2 (SALARY) from EmployeeDetails ORDER BY Salary Desc
)
SELECT * FROM EMP WHERE SAL=(SELECT MAX(SAL) FROM EMP WHERE SAL<(SELECT MAX(SAL) FROM EMP));
(OR)
SELECT ENAME ,SAL FROM EMP ORDER BY SAL DESC;
(OR)
SELECT * FROM(SELECT ENAME,SAL ,DENSE_RANK()
OVER(PARTITION BY DEPTNO ORDER BY SAL DESC) R FROM EMP) WHERE R=2;
select salary from EmployeeDetails order by salary desc limit 1 offset (n-1).
If you want to find 2nd highest than replace n with that 2.

SQL Subquery and joins

There are 2 tables available
EMP(empname PK, empno, sal, comm, deptno, hiredate, job, mgr)
DEPT(depnto, dname, loc)
The queries are
a)Display the ename and dname who is earning the first highest salary
They have given a statement this has to be done using only subqueries and Join
So i started like this:
select A.ename, B.dname from emp A, dept B where A.deptno=B.deptno
and i tried various perm&comb, but i couldn't get the join statement...
This is not a homework problem, i am just trying to solve the exercise problem given in my textbook..Thanks in advance
To get the record with the max salary I've Ordered by e.sal DESC. This will order the records in terms of salary, with the highest at the top (DESC = Descending, and therefore highest-lowest).
Then I've used TOP 1 to only return 1 record.
To get the dname I've joined the tables relating the 2 deptno columns.
SELECT TOP 1 e.empname
,d.dname
FROM [EMP] e
JOIN [DEPT] d ON d.deptno=e.deptno
ORDER BY e.sal DESC
I hope this helps
You can try following query:
SELECT emp.ename,dept.dname FROM emp
JOIN dept ON emp.deptno=dept.deptno
WHERE emp.sal=(SELECT MAX(sal) FROM emp)

Combining columns

I am trying to do the following:
I have a table with ename, job, deptno, and sal. I am trying to initiate a query that returns the top earners of each department. I have done this with grouping and a subquery. However, I also want to display the average sal by deptno. So the following would be the result:
"ename" "dept" "sal" "average of dept"
sal 20 1000 500
kelly 30 2000 800
mika 40 3000 400
this might be impossible since the average does not associate with the other rows.
any suggestion would be appreciated. Thanks. I am using Oracle 10g to run my queries.
You could use analytic functions:
WITH RankedAndAveraged AS (
SELECT
ename,
dept,
sal,
RANK() OVER (PARTITION BY dept ORDER BY sal DESC) AS rnk,
AVG(sal) OVER (PARTITION BY dept) AS "average of dept"
FROM atable
)
SELECT
ename,
dept,
sal,
"average of dept"
FROM RankedAndAveraged
WHERE rnk = 1
This may return more than one employee per department if all of them have the same maximum value of sal. You can replace RANK() with ROW_NUMBER() if you only want one person per department (in which case you could also further extend ORDER BY by specifying additional sorting criteria to pick the top item, otherwise it will be picked randomly from among those with the maximum salary).
This should work. The only trick is that if you have several employees with the maximum salary in a department, it will show all of them.
SELECT t.ename, t.deptno, mx.sal as sal, mx.avg_sal as avg_sal
FROM tbl t,
(SELECT MAX(sal) AS sal, AVG(sal) AS avg_sal, deptno
FROM tbl
GROUP BY deptno) mx
WHERE t.deptno = mx.deptno AND t.sal = mx.sal
Not sure about Oracle, haven't used it in about 10 years, but something like this should be possible:
SELECT
ename, deptno, sal,
(SELECT AVG(T2.sal)
FROM tbl T2
WHERE T2.deptno = T.deptno
) AS average_of_dept
FROM tbl T
GROUP BY deptno
HAVING sal = MAX(sal)

SQL: aggregate function and group by

Consider the Oracle emp table. I'd like to get the employees with the top salary with department = 20 and job = clerk. Also assume that there is no "empno" column, and that the primary key involves a number of columns. You can do this with:
select * from scott.emp
where deptno = 20 and job = 'CLERK'
and sal = (select max(sal) from scott.emp
where deptno = 20 and job = 'CLERK')
This works, but I have to duplicate the test deptno = 20 and job = 'CLERK', which I would like to avoid. Is there a more elegant way to write this, maybe using a group by? BTW, if this matters, I am using Oracle.
The following is slightly over-engineered, but is a good SQL pattern for "top x" queries.
SELECT
*
FROM
scott.emp
WHERE
(deptno,job,sal) IN
(SELECT
deptno,
job,
max(sal)
FROM
scott.emp
WHERE
deptno = 20
and job = 'CLERK'
GROUP BY
deptno,
job
)
Also note that this will work in Oracle and Postgress (i think) but not MS SQL. For something similar in MS SQL see question SQL Query to get latest price
If I was certain of the targeted database I'd go with Mark Nold's solution, but if you ever want some dialect agnostic SQL*, try
SELECT *
FROM scott.emp e
WHERE e.deptno = 20
AND e.job = 'CLERK'
AND e.sal = (
SELECT MAX(e2.sal)
FROM scott.emp e2
WHERE e.deptno = e2.deptno
AND e.job = e2.job
)
*I believe this should work everywhere, but I don't have the environments to test it.
In Oracle I'd do it with an analytical function, so you'd only query the emp table once :
SELECT *
FROM (SELECT e.*, MAX (sal) OVER () AS max_sal
FROM scott.emp e
WHERE deptno = 20
AND job = 'CLERK')
WHERE sal = max_sal
It's simpler, easier to read and more efficient.
If you want to modify it to list list this information for all departments, then you'll need to use the "PARTITION BY" clause in OVER:
SELECT *
FROM (SELECT e.*, MAX (sal) OVER (PARTITION BY deptno) AS max_sal
FROM scott.emp e
WHERE job = 'CLERK')
WHERE sal = max_sal
ORDER BY deptno
That's great! I didn't know you could do a comparison of (x, y, z) with the result of a SELECT statement. This works great with Oracle.
As a side-note for other readers, the above query is missing a "=" after "(deptno,job,sal)". Maybe the Stack Overflow formatter ate it (?).
Again, thanks Mark.
In Oracle you can also use the EXISTS statement, which in some cases is faster.
For example...
SELECT name, number
FROM cust
WHERE cust IN
( SELECT cust_id FROM big_table )
AND entered > SYSDATE -1
would be slow.
but
SELECT name, number
FROM cust c
WHERE EXISTS
( SELECT cust_id FROM big_table WHERE cust_id=c.cust_id )
AND entered > SYSDATE -1
would be very fast with proper indexing. You can also use this with multiple parameters.
There are many solutions. You could also keep your original query layout by simply adding table aliases and joining on the column names, you would still only have DEPTNO = 20 and JOB = 'CLERK' in the query once.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL =
(
select
max(salmax.SAL)
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
It could also be noted that the key word "ALL" can be used for these types of queries which would allow you to remove the "MAX" function.
SELECT
*
FROM
scott.emp emptbl
WHERE
emptbl.DEPTNO = 20
AND emptbl.JOB = 'CLERK'
AND emptbl.SAL >= ALL
(
select
salmax.SAL
from
scott.emp salmax
where
salmax.DEPTNO = emptbl.DEPTNO
AND salmax.JOB = emptbl.JOB
)
I hope that helps and makes sense.