How To improve a sql query performance - sql

I Have a view that return to me the customers and sum of their payment and its work but it take 8 second and when i remove the computed column part from the query it didn't take any time, and i tried to solve this by creating a non-clustered index on tables but its also 8 second, so I'm asking about if there is any method to improve the performance, although i have to use this technique.
my code is
SELECT ID
,NAME
,Total = (
SELECT Sum(Value)
FROM TRANSACTION
WHERE Employee.ID = employee_ID
)
FROM Employee;
Thanks in advance

You could use a join in the subselect with the sum
select e.id, e.name, t.total
from employee e
join (
select employee_ID, sum(value) as total
from TRANSACTION
group by employee_ID
) t on t.employee_ID = e.id

You can try to use a join with group and see if that is faster:
select employee.id, employee.name, sum(value)
from employee
join transaction where transaction.employee_ID = employee.id
group by employee.id, employee.name
Also - try to avoid to give names that correspond to reserved words - like TRANSACTION - see https://dev.mysql.com/doc/refman/5.5/en/keywords.html
To hasten up you might want to consider a combined index on Emlpoyee(id,name) AND an index on Transaction.Employee_ID - both are neede to make the joins and grouping for this query fast. Check if transaction.employee_ID has a FK_Constraint to Employee.ID - that might help as well.

For this query:
SELECT ID, NAME,
(SELECT Sum(t.Value)
FROM TRANSACTION t
WHERE e.Employee.ID = t.employee_ID
) as Total
FROM Employee e;
You want an index on TRANSACTION(employee_ID, Value). That should actually give you optimal performance.
You can write this as a join -- but a left join:
SELECT e.ID, e.NAME, SUM(t.Value) as Total
FROM Employee e JOIN
TRANSACTION t
ON e.Employee.ID = t.employee_ID
GROUP BY e.ID, e.NAME;
However, the overall GROUP BY is usually less efficient than the "sub group by" in MySQL (and often in other databases as well).

Try an indexed view on the total
CREATE VIEW totalView
WITH SCHEMABINDING
AS
select employee_ID, sum(value) as total
from TRANSACTION
group by employee_ID
CREATE UNIQUE CLUSTERED INDEX PK_totalView ON totalView
(
employee_id,total
);
Then join to that
select employee.id, employee.name, coalesce(t.total,0)
from employee
left outer join totalView t where t.employee_ID = employee.id
Adjust the join type according to your need.

Related

SQL Joins and Corelated subqueries with column data

I am facing an issue in terms of understanding the joins. Lets say for an example we have two tables employee and sales and now I have a query where we have sales of an employee using the id of the employee
select e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid from sales) s on s.eid = e.id
group by 1,2
I'd like to understand why s.city wasn't showing up? and also would like to understand what is this concept called? Is it co related sub queries on Joins? Please help me down over here.
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid,city from sales) s on s.eid = e.id
group by 1,2
in the left join above you have to add city as well. The query Imagine select sales,eid,city from sales is a table itself and then from this table you are selecting city (your second column s.city) this will run error as your table doesn't have a city column yet.
It is much easier to use CTE (common table expressions than CTE's) You can also do the above question as
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join sales as s
on e.id = s.id
group by 1,2
here I have added e.id = s.id instead of s.id = e.id it is better to reference the key of the main table first.
you could use CTE (although used when you have to do a lot of referencing but you can see how it works):
With staging as (
select
e.employeename
,s.city
,s.sales
from employee e
left join sales as s
on e.id = s.id
),
sales_stats as (
select
staging.employeename,
staging.city,
sum(staging.sales)
from staging
group by 1,2
#here you will select from staging again consider staging as a separate table so you will have to have all the columns in the staging that you want to use further. Also you will have to reference columns using staging.x
)
select * from sales_stats
-- here you could have combined the steps but I wanted to show you how cte works, Hope this works for you

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

How to make this complex query more efficient?

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.
Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50
This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.
with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

How to get the value of max() group when in subquery?

So i woud like to find the department name or department id(dpmid) for the group that has the max average of age among the other group and this is my query:
select
MAX(avg_age) as 'Max average age' FROM (
SELECT
AVG(userage) AS avg_age FROM user_data GROUP BY
(select dpmid from department_branch where
(select dpmbid from user_department_branch where
user_data.userid = user_department_branch.userid)=department_branch.dpmbid)
) AS query1
this code show only the max value of average age and when i try to show the name of the group it will show the wrong group name.
So, How to show the name of max group that has subquery from another table???
You may try this..
select MAX(avg_age) as max_avg, SUBSTRING_INDEX(MAX(avg_age_dep),'##',-1) as max_age_dep from
(
SELECT
AVG(userage) as avg_age, CONCAT( AVG(userage), CONCAT('##' ,department_name)) as avg_age_dep
FROM user_data
inner join user_department_branch
on user_data.userid = user_department_branch.userid
inner join department_branch
on department_branch.dpmbid = user_department_branch.dpmbid
inner join department
on department.dpmid = department_branch.dpmid
group by department_branch.dpmid
) tab_avg_age_by_dep
;
I've done some change on ipothesys that the department name is placed in a "department" anagraphical table.. so, as it needed put in join a table in plus, then I changed your query, eventually if the department name is placed (but I don't thing so) in the branch_department table you can add the field and its treatment to your query
update
In adjunct to as said, if you wanto to avoid identical average cases you can furtherly make univocal the averages by appending a rownum id in this way:
select MAX(avg_age) as max_avg, SUBSTRING_INDEX(MAX(avg_age_dep),'##',-1) as max_age_dep from
(
SELECT
AVG(userage) as avg_age, CONCAT( AVG(userage), CONCAT('##', CONCAT( #rownum:=#rownum+1, CONCAT('##' ,department_name)))) as avg_age_dep
FROM user_data
inner join user_department_branch
on user_data.userid = user_department_branch.userid
inner join department_branch
on department_branch.dpmbid = user_department_branch.dpmbid
inner join department
on department.dpmid = department_branch.dpmid
,(SELECT #rownum:=0) r
group by department_branch.dpmid
) tab_avg_age_by_dep
;
I took a shot at what I think you are looking for. The following will give you the department branch with the highest average age. I assumed the department_branch table had a department_name field. You may need an additional join to get the department.
SELECT db.department_name, udb.dpmid, AVG(userage) as `Average age`
FROM user_data as ud
JOIN user_department_branch as udb
ON udb.userid = ud.userid
JOIN department_branch as db
ON db.dpmbid = udb.dpmbid
GROUP BY udb.dpmid
ORDER BY `Average age` DESC
LIMIT 1

How to join two SQL queries into one?

I'm new to SQL and I'm currently trying to learn how to make reports in Visual Studio. I need to make a table, graph and few other things. I decided to do matrix as the last part and now I'm stuck. I write my queries in SQL Server.
I have two tables: Staff (empID, StaffLevel, Surname) and WorkOfArt (artID, name, curator, helpingCurator). In the columns Curator and HelpingCurator I used numbers from empID.
I'd like my matrix to show every empID and the number of paintings where they're acting as a Curator and the number of paintings where they're acting as a Helping Curator (so I want three columns: empID, count(curator), count(helpingCurator).
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.StaffLevel<7
group by Staff.empID;
Select Staff.empID, count(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
I created those two queries and they work perfectly fine, but I need it in one query.
I tried:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff FULL OUTER JOIN WorkOfArt on Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
WHERE Staff.StaffLevel<7
group by Staff.empID;
(as well as using left or right outer join)
- this one gives me a table with empID, but in both count columns there are only 0s - and:
Select Staff.empID, count(WorkOfArt.Curator) as CuratorTotal,
COUNT(WorkOfArt.HelpingCurator) as HelpingCuratorTotal
FROM Staff, WorkOfArt
WHERE Staff.empID=WorkOfArt.Curator
and Staff.empID=WorkOfArt.HelpingCurator
and Staff.StaffLevel<7
group by Staff.empID;
And this one gives me just the names of the columns.
I have no idea what to do next. I tried to find the answer in google, but all explanations I found were far more advanced for me, so I couldn't understand them... Could you please help me? Hints are fine as well.
The easiest way to do this is most likely with inner select in the select clause, with something like this:
Select
S.empID,
(select count(*) from WorkOfArt C where C.Curator = S.empID)
as CuratorTotal,
(select count(*) from WorkOfArt H where H.HelpingCurator = S.empID)
as HelpingCuratorTotal
FROM Staff S
WHERE S.StaffLevel<7
group by S.empID;
This way the rows with different role aren't causing problems with the calculation. If the tables are really large or you have a lot of different roles, then most likely more complex query with grouping the items first in the WorkOfArt table might have better performance since this requires reading the rows twice.
From a performance perspective, the following query is probably a little more efficient
select e.EmpId, CuratorForCount, HelpingCuratorForCount
from Staff s
inner join ( select Curator, count(*) as CuratorForCount
from WorkOfArt
group by Curator) mainCurator on s.EmpId = mainCurator.Curator
inner join ( select HelpingCurator, count(*) as HelpingCuratorForCount
from WorkOfArt
group by HelpingCurator) secondaryCurator on s.EmpId = secondaryCurator.HelpingCurator
One method, that can be useful if you want to get more than one value aggregated value from the WorkOfArt table is to pre-aggregate the results:
Select s.empID, COALESCE(woac.cnt, 0) as CuratorTotal,
COALESCE(woahc.cnt) as HelpingCuratorTotal
FROM Staff s LEFT JOIN
(SELECT woa.Curator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.Curator
) woac
ON s.empID = woac.Curator LEFT JOIN
(SELECT woa.HelpingCurator, COUNT(*) as cnt
FROM WorkOfArt woa
GROUP BY woa.HelpingCurator
) woahc
ON s.empID = woahc.HelpingCurator
WHERE s.StaffLevel < 7;
Notice that the aggregation on the outer level is not needed.