How to make this complex query more efficient? - sql

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.

Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50

This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.

with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

Related

ORA-00933 when trying to compare counts

This is the question
List the name of employee who work on more projects than employee 'Grace'', Show three columns in result: name of employee, project count of employee, grace's project count.
This is my code
SELECT employee."NAME", T1."# OF PROJECTS",
(SELECT COUNT(pid) FROM workon WHERE empid = 30) AS "Grace's Project"
FROM employee,
(SELECT empid, COUNT(pid) AS "# OF PROJECTS"
FROM workon
GROUP BY empid
ORDER BY empid)AS T1
WHERE T1."# OF PROJECTS" > (SELECT COUNT(pid) FROM workon WHERE empid = 30)
AND t1.empid = employee.EMPID
I keep getting ORA-00933: SQL command not properly ended. what am I missing?
The only error in your query is that Oracle does not accept AS for table aliases. Remove it and your query runs just fine.
There are two things I'd like to mention, though:
You are using an ancient join syntax you shouldn't use anymore. Comma-separated joins were made redundant by the introduction of explicit joins (e.g. INNER JOIN ... ON) in 1992.
Your query is a little over-complicated. Most of all, because you are counting projects thrice, once for all employees, twice for Grace. You can avoid this by using WITH clauses.
Here is the query built-up step by step with WITH clauses:
WITH emp AS
(
SELECT empid, e.name, COUNT(*) AS projects
FROM workon w
JOIN employee e USING(empid)
GROUP BY empid, e.name
ORDER BY empid
)
, grace AS
(
SELECT * FROM emp WHERE name = 'Grace'
)
SELECT
emp.name,
emp.projects as "# OF PROJECTS",
grace.projects as "Grace's Projects"
FROM emp
CROSS JOIN grace
WHERE emp.projects > grace.projects
ORDER BY emp.projects DESC, emp.name;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f40e2c33541c76f0af112be967370784
I would use window functions:
select ew.name, ew.num_projects
from (select e.empid, e."NAME", count(*) as num_projects,
max(case when ew."NAME" = 'Grace' then count(*) else 0 end) over () as grace_num_projects
from employee e join
workon w
on w.empid = e.empid
group by e.empid, e."NAME"
) ew
where num_projects > grace_num_projects;
SELECT DISTINCT
e.empid,
,e."NAME"
FROM employee e
INNER JOIN (SELECT
empid
,count(*) as num_projects
FROM workon
GROUP BY empid) w ON w.empid = e.empid
LEFT JOIN (SELECT
1 AS ID
COUNT(*) as grace_projects
FROM workon
GROUP BY empid
WHERE empid = 30) g ON g.ID = 1
WHERE w.num_projects > g.grace_projects;
So here's what I am doing.
I am counting the projects before they are joined, that should decrease the overhead for the query, as the return on the join is shrank considerably prior to joining.
An index on that WORKON table by EmpID would speed up the query considerably.
Then, I query Graces figures, because I want them to return a value against any person, it should only return one result for a count, and then just join that by an arbitrary value so it returns against all rows
Again, it should utilise the same index.
Because it is calculating Graces figures first, it should only need to do this once, whereas a subquery would need calculate graces figures for each employee which is an unnecessary overhead.
This is then filtered in the where clause.

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

How To improve a sql query performance

I Have a view that return to me the customers and sum of their payment and its work but it take 8 second and when i remove the computed column part from the query it didn't take any time, and i tried to solve this by creating a non-clustered index on tables but its also 8 second, so I'm asking about if there is any method to improve the performance, although i have to use this technique.
my code is
SELECT ID
,NAME
,Total = (
SELECT Sum(Value)
FROM TRANSACTION
WHERE Employee.ID = employee_ID
)
FROM Employee;
Thanks in advance
You could use a join in the subselect with the sum
select e.id, e.name, t.total
from employee e
join (
select employee_ID, sum(value) as total
from TRANSACTION
group by employee_ID
) t on t.employee_ID = e.id
You can try to use a join with group and see if that is faster:
select employee.id, employee.name, sum(value)
from employee
join transaction where transaction.employee_ID = employee.id
group by employee.id, employee.name
Also - try to avoid to give names that correspond to reserved words - like TRANSACTION - see https://dev.mysql.com/doc/refman/5.5/en/keywords.html
To hasten up you might want to consider a combined index on Emlpoyee(id,name) AND an index on Transaction.Employee_ID - both are neede to make the joins and grouping for this query fast. Check if transaction.employee_ID has a FK_Constraint to Employee.ID - that might help as well.
For this query:
SELECT ID, NAME,
(SELECT Sum(t.Value)
FROM TRANSACTION t
WHERE e.Employee.ID = t.employee_ID
) as Total
FROM Employee e;
You want an index on TRANSACTION(employee_ID, Value). That should actually give you optimal performance.
You can write this as a join -- but a left join:
SELECT e.ID, e.NAME, SUM(t.Value) as Total
FROM Employee e JOIN
TRANSACTION t
ON e.Employee.ID = t.employee_ID
GROUP BY e.ID, e.NAME;
However, the overall GROUP BY is usually less efficient than the "sub group by" in MySQL (and often in other databases as well).
Try an indexed view on the total
CREATE VIEW totalView
WITH SCHEMABINDING
AS
select employee_ID, sum(value) as total
from TRANSACTION
group by employee_ID
CREATE UNIQUE CLUSTERED INDEX PK_totalView ON totalView
(
employee_id,total
);
Then join to that
select employee.id, employee.name, coalesce(t.total,0)
from employee
left outer join totalView t where t.employee_ID = employee.id
Adjust the join type according to your need.

Struggling with SQL subquery selection

I'm trying to answer a SQL question for revision purposes but can't seem to work out how to get it to work. The tables in question are:
The question is asking me to write an SQL command to display for each employee who has a total distance from all journeys of more than 100, the employee's name and the total number of litres used by the employee on all journeys (the number of litres for a journey is distanceInKm / kmPerLitre).
So far I've tried several variations of code beginning with:
SELECT
name, TravelCost.distanceInKm / Car.kmPerLitre AS "Cost in Litres"
FROM
Employee, Car, TravelCost
WHERE
Employee.id = TravelCost.employeeID
AND Car.regNo = TravelCost.carRegNo
It's at this point I get a bit stuck, any help would be greatly appreciated, thanks!
Never use commas in the FROM clause. Always use proper, standard, explicit JOIN syntax.
You are missing a GROUP BY and a HAVING:
SELECT e.name, SUM(tc.distanceInKm / c.kmPerLitre) AS "Cost in Litres"
FROM Employee e JOIN
TravelCost tc
ON e.id = tc.employeeID JOIN
Car c
ON c.regNo = tc.carRegNo
GROUP BY e.name
HAVING SUM(tc.distanceInKm) > 100;
Use Group By and Having Clause
SELECT NAME,
Sum(TravelCost.distanceInKm/ Car.kmPerLitre) AS "Cost in Litres"
FROM Employee
INNER JOIN TravelCost
ON Employee.id = TravelCost.employeeID
INNER JOIN Car
ON Car.regNo = TravelCost.carRegNo
GROUP BY NAME
HAVING Sum(distanceInKm) > 100
You need to JOIN all the tables and find sum of litres like this:
select
e.*,
sum(distanceInKm/c.kmPerLitre) litres
from employee e
inner join travelcost t
on e.id = t.employeeId
inner join car c
on t.carRegNo = c.regNo
group by e.id, e.name
having sum(t.distanceInKm) > 100;
Also, you need to group by id instead of just names as the other answers suggest. There can be multiple employees with same name.
Also, use explicit JOIN syntax instead of older comma based syntax. It's modern and clearer.
-- **How fool am I! How arrogant am I! I just thought `sum(tc.distanceInKm/c.kmPerLitre)`
-- may have a problem, since a employee may have multiple cars,and car's kmPerLitre is differenct.
-- However there is no problem, it's simple and right!
-- The following is what I wrote, what a bloated statement it is! **
-- calcute the total number of litres used by the employee on all journeys
select e.name, sum(Cost_in_Litres) as "Cost in Litres"
from (
select t.employeeID
-- calcute the litres used by the employee on all journeys group by carRegNo
, sum(t.distanceInKm)/avg(c.kmPerLitre) as Cost_in_Litres
from TravelCost t
inner join Car c
on c.regNo = t.carRegNo
where t.employeeID in
( -- find the employees who has a total distance from all journeys of more than 100
select employeeID
from TravelCost
group by employeeID
having sum(distanceInKm)> 100
)
group by t.carRegNo, t.employeeID
) a
inner join Employee e
on e.id = a.employeeID
group by e.id,e.name;

Multiple grouped items

I can't seem to find out how to get the functionality I want. Here is an example of what my table looks like:
EmpID | ProjectID | hours_worked |
3 1 8
3 1 8
4 2 8
4 2 8
4 3 8
5 4 8
I want to group by EmpID and ProjectID and then sum up the hours worked. I then want to inner join the Employee and Project table rows that are associated with EmpID and ProjectID, however when I do this then I get an error about the aggregate function thing, which I understand from research but I don't think this would have that problem since there will be one row per EmpID and ProjectID.
Real SQL:
SELECT
WorkHours.EmpID,
WorkHours.ProjectID,
Employees.FirstName
FROM WorkHours
INNER JOIN Projects ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees ON WorkHours.EmpID = Employees.EmpID
GROUP BY WorkHours.ProjectID, WorkHours.EmpID
This gives the error:
Column 'Employees.FirstName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
You might want to use OVER (PARTITION BY) so you won't have to use GROUP BY:
Select a.EmpID
,W.ProjectID
,W.SUM(hours_worked) OVER (PARTITION BY W.EmpID,W.ProjectID)
,E.FirstName
FROM WorkHours W
INNER JOIN Projects P ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees E ON WorkHours.EmpID = Employees.EmpID
You can do a basic query to get the grouped hours and use that as a basis for the rest, either in a CTE or as a subquery. For example, as a subquery:
SELECT *
FROM
(SELECT EmpID, ProjectID, SUM(hours_worked) as HoursWorked
FROM WorkHours
GROUP BY EmpID, ProjectID) AS ProjectHours
JOIN Projects
ON Projects.ID = ProjectHours.ProjectID
JOIN Employees
ON Employees.ID = ProjectHours.EmpID
One way is to use a CTE to first form the data you want, then join onto the other table(s)
WITH AggregatedHoursWorked
AS
(
SELECT EmpID,
ProjectID,
SUM(HoursWorked) AS TotalHours
FROM WorkHours
GROUP BY EmpID, ProjectID
)
SELECT e.FirstName
p.ProjectName,
hw.TotalHours
FROM AggregatedHoursWorked hw
INNER JOIN Employees e
ON hw.EmpID = e.ID
INNER JOIN Projects p
ON hw.ProjectID = p.ID
If you use an aggregate function, all the columns must be named in the aggregate function and/or in the GROUP BY clause. If you want to join the descriptions (normally unique for a given ID), you have to include the description columns in the GROUP BY clause. This will not affect the result of the query.