I have a table col where i have:
select * from offc.col;
I returned some data using query by year wise ans dept_id wise:
SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
ORDER BY dept_id,
year
The data I got was:
Here there is no problem as my sql is running right.So, I needed to extract all the information of col table,So I used subquery as:
SELECT *
FROM offc.col
WHERE ( dept_id, year, marks ) IN (SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
ORDER BY dept_id,
year);
But,I got error as:
ORA-00920: invalid relational operator
i searched this error at other pages also,but I found them as bracket misplace error.But in my case,I dont know what is happening here?
I would suggest to use dense_rank analytical function as it can return two department if they have same marks in same year.(your current logic is same as this)
Row_number will give you only one random record if two department have same marks in same year.
select *
from (
select
c.*,
dense_rank() over(partition by dept_id, year order by marks desc nulls last) as dr
from offc.col c
) x
where dr = 1
order by dept_id, year
Also, your query is correct, just remove order by from it.
SELECT *
FROM offc.col
WHERE ( dept_id, year, marks ) IN (SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
-- ORDER BY dept_id,
-- year
);
Demo of error with order by and working fine without order by.
Cheers!!
Instead of aggregating, you can filter with a correlated subquery:
select c.*
from offc.col c
where marks = (
select max(marks)
from offc.col c1
where c1.dept_id = c.dept_id and c1.year = c.year
)
order by dept_id, year
An index on (dept_id, year, marks) would speed up this query.
Another option is to use window function row_number():
select *
from (
select
c.*,
row_number() over(partition by dept_id, year order by marks desc) rn
from offc.col c
) x
where rn = 1
order by dept_id, year
If you do want to stick to aggregation, then you can join your subquery with the original table as follows:
select c.*
from offc.col c
inner join (
select dept_id, year, max(marks) marks
from offc.col
group by dept_id, year
) m
on m.dpt_id = c.dept_id
and m.year = c.year
and m.marks = m.marks
Perform an INNER JOIN with your subquery:
SELECT c.*
FROM offc.col c
INNER JOIN (SELECT dept_id,
year,
Max(marks) AS MAX_MARK
FROM offc.col
GROUP BY dept_id,
year) s
ON s.DEPT_ID = c.DEPT_ID AND
s.YEAR = c.YEAR AND
s.MAX_MARK = c.MARKS
ORDER BY c.DEPT_ID, c.YEAR
An INNER JOIN only returns rows where the join condition is satisfied so any rows in OFFC.COL which do not have the maximum value for MARKS for a particular DEPT_ID and YEAR will not be returned.
Related
I need to find 10 employees with the largest difference between current salary and salary, when they were hired and 10 with the smallest difference.
The table looks like this salary table and contains 2844047 records.
My code is:
WITH t1 AS (
SELECT emp_no, FIRST_VALUE(salary) OVER (PARTITION BY emp_no ORDER BY from_date) AS `first`
FROM salaries),
t2 AS (
SELECT emp_no, salary AS last
FROM salaries
WHERE to_date = '9999-01-01')
(SELECT DISTINCT emp_no, last - first AS `diff`
FROM t1
INNER JOIN t2 USING (emp_no)
ORDER BY `diff`
LIMIT 10)
UNION ALL
(SELECT DISTINCT emp_no, last - first AS `diff`
FROM t1
INNER JOIN t2 USING (emp_no)
ORDER BY `diff` DESC
LIMIT 10);
but it takes a lot of time to execute this.
Condition where to_date = '9999-01-01' means that employee is still working.
How can I optimize this query, so it will execute faster?
The join is not necessary. Perhaps this will be a bit faster:
SELECT s.*
FROM (SELECT s.*,
ROW_NUMBER() OVER (ORDER BY salary - first ASC) as seqnum,
ROW_NUMBER() OVER (ORDER BY salary - first DESC) as seqnum_desc
FROM (SELECT s.*,
FIRST_VALUE(salary) OVER (PARTITION BY emp_no ORDER BY from_date) AS first
FROM salaries s
) s
WHERE to_date = '9999-01-01'
) s
WHERE seqnum_asc <= 10 or seqnum_desc <= 10;
select *
from employees
where department_id,salary in (
select department_id,max(salary)
from employees group by department_id
)
You want tuple comparison - you need to surround the tuple of columns on the left side of in with parentheses:
select *
from employees
where (department_id,salary) in (
select department_id, max(salary) from employees group by department_id
)
Note that this top-1-per-group query can be more efficiently phrased with window functions:
select *
from (
select e.*, rank() over(partition by department_id order by salary desc nulls last) rn
from employees e
) t
where rn = 1
Suppose we have the table students (name, grade, group, year)
We want a query that ranks for each group the corresponding students.
I know that this can be done easy with rank() OVER ( partition by group order by grade DESC ). But I think that this can also be done with a self join or a subquery. Any ideas?
The equivalent to rank() is:
select s.*,
(select 1 + count(*)
from students s2
where s2.group = s.group and
s2.grade > s.grade
) as rank
from students s;
Let's say I have a table orders with 20 columns. I'm only interested in the first 4 columns: id, department_id, region_id, datetime where id is a customer id and datetime is the time the customer placed an order. The other columns are more specific to product details (e.g. product_id), so on a given order, you may have multiple rows. I'm struggling to write a query to get me the earliest department and region by each customer as the same customer can have multiple combinations of department_id and region_id.
SELECT a.*
FROM (
SELECT id,
department_id,
region_id,
min(DATETIME) AS ts
FROM orders
GROUP BY id,
department_id,
region_id
) a
INNER JOIN (
SELECT id,
min(DATETIME) AS ts
FROM orders
GROUP BY id
) b
ON a.id = b.id
AND a.ts = b.ts
This seems to work, but it doesn't seem very efficient and poorly written. Is there a better way to write this? The table itself is fairly large, so this query is slow.
I would just do:
SELECT id, department_id, region_id, datetime
FROM (SELECT o.*
row_number() over (partition by id order by datetime) as seqnum
FROM orders o
) o
where seqnum = 1;
EDIT:
You can try this version in to see if it works better:
select o.*
from orders o join
(select id, min(datetime) as min_datetime
from orders
group by id
) oo
on oo.id = o.id and oo.datetime = o.datetime;
In most databases, the row_number() version would probably have better performance. However, Hive can make arcane optimization decisions and this might be better.
I think you maybe could use having like this:
SELECT id, department_id, region_id, min(datetime) AS ts
FROM orders
GROUP BY id, department_id, region_id
HAVING ts=min(datetime)
Use dense_rank() analytic function:
SELECT
id,
department_id,
region_id,
min(DATETIME) AS ts
FROM
(
SELECT id,
department_id,
region_id,
DATETIME,
dense_rank() over(partition by id order by DATETIME) AS rnk
FROM orders
)s
WHERE rnk=1 --records with minimal date by id
GROUP BY id,
department_id,
region_id;
This query does the same as yours, but the table will be scanned once, without join.
I have a table employee
id name dept
1 bucky shp
2 name shp
3 other mrk
How can i get the name of the department(s) having maximum number of employees ? ..
I need result
dept
--------
shp
SELECT cnt,deptno FROM (
SELECT rank() OVER (ORDER BY cnt desc) AS rnk,cnt,deptno from
(SELECT COUNT(*) cnt, DEPTNO FROM EMP
GROUP BY deptno))
WHERE rnk = 1;
Assuming you are using SQL Server and each record representing an employee. So you can use window function to get the result
WITH C AS (
SELECT RANK() OVER (ORDER BY dept) Rnk
,name
,dept
FROM table
)
SELECT TOP 1 dept FROM
(SELECT COUNT(Rnk) cnt, dept FROM C GROUP BY dept) t
ORDER BY cnt DESC
With common table expressions, count the number of rows per department, then find the biggest count, then use that to select the biggest department.
WITH depts(dept, size) AS (
SELECT dept, COUNT(*) FROM employee GROUP BY dept
), biggest(size) AS (
SELECT MAX(size) FROM depts
)
SELECT dept FROM depts, biggest WHERE depts.size = biggest.size
Based on one of the answer, Let me try to explain step by step
First of all we need to get the employee count department wise. So the firstly innermost query will run
select count(*) cnt, deptno from scott.emp group by deptno
This will give result as
Now out of this we have to get the one which is having max. employee i.e. department 30.
Also please note there are chances that 2 departments have same number of employees
The second level of query is
select rank() over (order by cnt desc) as rnk,cnt,deptno from
(
select count(*) cnt, deptno from scott.emp group by deptno
)
Now we have assigned ranking to each department
Now to select rank 1 out of it. we have a simplest outer query
select * from
(
select rank() over (order by cnt desc) as rnk,cnt,deptno from
(
select count(*) cnt, deptno from scott.emp group by deptno
)
)
where rnk=1
So we have the final result where we got the department which has the maximum employees. If we want the minimum one we have to include the department table as there are chances there is a department which has no employees which will not get listed in this table
You can ignore the scott in scott.emp as that is the table owner.
The above SQL can be practised at Practise SQL online