SQL Earliest Record - sql

Let's say I have a table orders with 20 columns. I'm only interested in the first 4 columns: id, department_id, region_id, datetime where id is a customer id and datetime is the time the customer placed an order. The other columns are more specific to product details (e.g. product_id), so on a given order, you may have multiple rows. I'm struggling to write a query to get me the earliest department and region by each customer as the same customer can have multiple combinations of department_id and region_id.
SELECT a.*
FROM (
SELECT id,
department_id,
region_id,
min(DATETIME) AS ts
FROM orders
GROUP BY id,
department_id,
region_id
) a
INNER JOIN (
SELECT id,
min(DATETIME) AS ts
FROM orders
GROUP BY id
) b
ON a.id = b.id
AND a.ts = b.ts
This seems to work, but it doesn't seem very efficient and poorly written. Is there a better way to write this? The table itself is fairly large, so this query is slow.

I would just do:
SELECT id, department_id, region_id, datetime
FROM (SELECT o.*
row_number() over (partition by id order by datetime) as seqnum
FROM orders o
) o
where seqnum = 1;
EDIT:
You can try this version in to see if it works better:
select o.*
from orders o join
(select id, min(datetime) as min_datetime
from orders
group by id
) oo
on oo.id = o.id and oo.datetime = o.datetime;
In most databases, the row_number() version would probably have better performance. However, Hive can make arcane optimization decisions and this might be better.

I think you maybe could use having like this:
SELECT id, department_id, region_id, min(datetime) AS ts
FROM orders
GROUP BY id, department_id, region_id
HAVING ts=min(datetime)

Use dense_rank() analytic function:
SELECT
id,
department_id,
region_id,
min(DATETIME) AS ts
FROM
(
SELECT id,
department_id,
region_id,
DATETIME,
dense_rank() over(partition by id order by DATETIME) AS rnk
FROM orders
)s
WHERE rnk=1 --records with minimal date by id
GROUP BY id,
department_id,
region_id;
This query does the same as yours, but the table will be scanned once, without join.

Related

ORA-00920: invalid relational operator while using IN operator

I have a table col where i have:
select * from offc.col;
I returned some data using query by year wise ans dept_id wise:
SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
ORDER BY dept_id,
year
The data I got was:
Here there is no problem as my sql is running right.So, I needed to extract all the information of col table,So I used subquery as:
SELECT *
FROM offc.col
WHERE ( dept_id, year, marks ) IN (SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
ORDER BY dept_id,
year);
But,I got error as:
ORA-00920: invalid relational operator
i searched this error at other pages also,but I found them as bracket misplace error.But in my case,I dont know what is happening here?
I would suggest to use dense_rank analytical function as it can return two department if they have same marks in same year.(your current logic is same as this)
Row_number will give you only one random record if two department have same marks in same year.
select *
from (
select
c.*,
dense_rank() over(partition by dept_id, year order by marks desc nulls last) as dr
from offc.col c
) x
where dr = 1
order by dept_id, year
Also, your query is correct, just remove order by from it.
SELECT *
FROM offc.col
WHERE ( dept_id, year, marks ) IN (SELECT dept_id,
year,
Max(marks) marks
FROM offc.col
GROUP BY dept_id,
year
-- ORDER BY dept_id,
-- year
);
Demo of error with order by and working fine without order by.
Cheers!!
Instead of aggregating, you can filter with a correlated subquery:
select c.*
from offc.col c
where marks = (
select max(marks)
from offc.col c1
where c1.dept_id = c.dept_id and c1.year = c.year
)
order by dept_id, year
An index on (dept_id, year, marks) would speed up this query.
Another option is to use window function row_number():
select *
from (
select
c.*,
row_number() over(partition by dept_id, year order by marks desc) rn
from offc.col c
) x
where rn = 1
order by dept_id, year
If you do want to stick to aggregation, then you can join your subquery with the original table as follows:
select c.*
from offc.col c
inner join (
select dept_id, year, max(marks) marks
from offc.col
group by dept_id, year
) m
on m.dpt_id = c.dept_id
and m.year = c.year
and m.marks = m.marks
Perform an INNER JOIN with your subquery:
SELECT c.*
FROM offc.col c
INNER JOIN (SELECT dept_id,
year,
Max(marks) AS MAX_MARK
FROM offc.col
GROUP BY dept_id,
year) s
ON s.DEPT_ID = c.DEPT_ID AND
s.YEAR = c.YEAR AND
s.MAX_MARK = c.MARKS
ORDER BY c.DEPT_ID, c.YEAR
An INNER JOIN only returns rows where the join condition is satisfied so any rows in OFFC.COL which do not have the maximum value for MARKS for a particular DEPT_ID and YEAR will not be returned.

Query without partition by or functions like rank()

Suppose we have the table students (name, grade, group, year)
We want a query that ranks for each group the corresponding students.
I know that this can be done easy with rank() OVER ( partition by group order by grade DESC ). But I think that this can also be done with a self join or a subquery. Any ideas?
The equivalent to rank() is:
select s.*,
(select 1 + count(*)
from students s2
where s2.group = s.group and
s2.grade > s.grade
) as rank
from students s;

Can anyone explain this Query?

with a as (
select a.*, row_number() over (partition by department order by attributeID) rn
from attributes a),
e as (
select employeeId, department, attribute1, 1 rn from employees union all
select employeeId, department, attribute2, 2 rn from employees union all
select employeeId, department, attribute3, 3 rn from employees
)
select e.employeeId, a.attributeid, e.department, a.attribute, a.meaning,
e.attribute1 as value
from e join a on a.department=e.department and a.rn=e.rn
order by e.employeeId, a.attributeid
this query is written by Ponder Stibbons for the answer of this question. But i am too dizzy with it as i quite don't understand what is going on here. i am new to SQL . so i would appreciate if anyone can explain what is happening on this query . thank you
Basically he unpivots the data using 3 select statements (1 for each attribute) and UNION them together to make a common table expression so that he gets rows for each employees attribute.
select employeeId, department, attribute1, 1 rn from employees union all
select employeeId, department, attribute2, 2 rn from employees union all
select employeeId, department, attribute3, 3 rn from employees
The other table he using a window function to assign a number to attribute, department. He uses this number later to join back to his unpivoted data. He posted his code for the example.
select a.*, row_number() over (partition by department order by attributeID) rn
from attributes a
I would suggest you use his example data he provided and run the following. This will show you the CTEs. I think once you see that data it will make more sense.
with a as (
select a.*, row_number() over (partition by department order by attributeID) rn
from attributes a),
e as (
select employeeId, department, attribute1, 1 rn from employees union all
select employeeId, department, attribute2, 2 rn from employees union all
select employeeId, department, attribute3, 3 rn from employees
)
SELECT * from a
SELECT * from e

SQL Server more columns than group by

I have the following example tables
Employee (EmpID, DepID)
Order (OrderID, EMpID, description)
What I'm trying to achieve is to select employees with most orders by department. I'm on it for like 4 hours already and can't find resolution to this perhaps easy problem.
All I get is either number of order by employee or max number of orders by one employee in one department but I'm struggling to get result as:
DepID, EmpID, Number of orders
Here's my solution for you :
WITH Temp AS (
SELECT
emp.EMpID
,emp.DepID
,COUNT(OrderId) nb_order
,ROW_NUMBER() OVER(PARTITION BY emp.DepID ORDER BY COUNT(OrderId) DESC) Ordre
FROM
Order ord
INNER JOIN
Employee emp
ON emp.EmpID = ord.EmpID
GROUP BY
emp.EMpID
,emp.DepID)
SELECT *
FROM Temp
WHERE Ordre = 1
I hope this will help you :)

SQl server query multiple aggregate columns

I need to write a query in sql server to data get like this.
Essentially it is group by dept, race, gender and then
SUM(employees_of_race_by_gender),Sum(employees_Of_Dept).
I could get data of first four columns, getting sum of employees in that dept is becoming difficult.
Could you pls help me in writing the query?
All these details in same table Emp. Columns of Emp are Emp_Number, Race_Name,Gender,Dept
Your "num_of_emp_in_race" is actually by Gender too
SELECT DISTINCT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
You should probably have this
COUNT(*) OVER (PARTITION BY Dept, Gender) AS PerDeptRace
COUNT(*) OVER (PARTITION BY Dept, Race_name) AS PerDeptGender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS PerDeptRaceGender,
COUNT(*) OVER (PARTITION BY Dept) AS PerDept
Edit: the DISTINCT appears to be applied before the COUNT (which would odd based on this) so try this instead
SELECT DISTINCT
*
FROM
(
SELECT
Dept,
Race_name,
Gender,
COUNT(*) OVER (PARTITION BY Dept, Race_name, Gender) AS num_of_emp_in_race,
COUNT(*) OVER (PARTITION BY Dept) AS num_of_emp_dept
FROM
MyTable
) foo
Since the two sums you're looking for are based on a different aggregation, you need to calculate them separately and join the result. In such cases I first build the selects to show me the different results, making it easy to catch errors early:
SELECT Dept, Gender, race_name, COUNT(*) as num_of_emp_in_race
FROM Emp
GROUP BY 1, 2, 3
SELECT Dept, COUNT(*) as num_of_emp_in_dept
FROM Emp
GROUP BY 1
Afterwards, joining those two is pretty straight forward:
SELECT *
FROM ( first statement here ) as by_race
JOIN ( second statement here ) as by_dept ON (by_race.Dept = by_dept.Dept)