Sql query with joins between four tables with millions of rows - sql

We have a transact sql statement that queries 4 tables with millions of rows in each.
It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor.
The structure of the query is like:
SELECT E.EmployeeName
, SUM(M.Amount) AS TotalAmount
, SUM(B.Amount) AS BudgetAmount
, SUM(T.Hours) AS TotalHours
, SUM(TB.Hours) AS BudgetHours
, SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E
LEFT JOIN MoneyTransactions M
ON E.EmployeeID = M.EmployeeID
LEFT JOIN BudgetTransactions B
ON E.EmployeeID = B.EmployeeID
LEFT JOIN TimeTransactions T
ON E.EmployeeID = T.EmployeeID
LEFT JOIN TimeBudgetTransactions TB
ON E.EmployeeID = TB.EmployeeID
GROUP BY E.EmployeeName
Since each transaction table contains millions of rows, I consider splitting it up into one query per transaction table, using table variables like #real, #budget, and #hours, and then joining these in a final SELECT. But in tests it seems to not speed up.
How would you deal with that in order to speed it up?

I'm not sure the query you posted will yield the results you're expecting.
It will cross join all the dimension tables (MoneyTransactions etc.) and multiply all the results.
Try this:
SELECT E.EmployeeName,
(
SELECT SUM(amount)
FROM MoneyTransactions m
WHERE M.EmployeeID = E.EmployeeID
) AS TotalAmount,
(
SELECT SUM(amount)
FROM BudgetTransactions m
WHERE M.EmployeeID = E.EmployeeID
) AS BudgetAmount,
(
SELECT SUM(hours)
FROM TimeTransactions m
WHERE M.EmployeeID = E.EmployeeID
) AS TotalHours,
(
SELECT SUM(hours)
FROM TimeBudgetTransactions m
WHERE M.EmployeeID = E.EmployeeID
) AS BudgetHours
FROM Employees E

I don't know if you have all the indexes on your tables that will speed up things, but having big tables could have this impact on a query time.
I would recommend partitioning the tables if possible. It is more work, but everything you do to speed up the query now it won't be enough after few millions new records.

Try this one:
SELECT E.EmployeeName, TA.TotalAmount, BA.BudgetAmount, TWH.TotalHours, BH.BudgetHours, TWH.WastedHours
FROM Employees E
LEFT JOIN
(SELECT E.EmployeeID, SUM(M.Amount) AS TotalAmount
FROM Employees E INNER JOIN MoneyTransactions M ON E.EmployeeID = M.EmployeeID GROUP BY E.EmployeeID)TA
ON E.EmployeeID = TA.EmployeeID
LEFT JOIN
(SELECT E.EmployeeID , SUM(B.Amount) AS BudgetAmount
FROM Employees E INNER JOIN BudgetTransactions B ON E.EmployeeID = B.EmployeeID GROUP BY E.EmployeeID)BA
ON E.EmployeeID = BA.EmployeeID
LEFT JOIN
(SELECT E.EmployeeID , SUM(T.Hours) AS TotalHours , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E INNER JOIN TimeTransactions T ON E.EmployeeID = T.EmployeeID GROUP BY E.EmployeeID)TWH
ON E.EmployeeID = TWH.EmployeeID
LEFT JOIN
(SELECT E.EmployeeID , SUM(TB.Hours) AS BudgetHours
FROM Employees E INNER JOIN TimeBudgetTransactions TB ON E.EmployeeID = TB.EmployeeID GROUP BY E.EmployeeID)BH
ON E.EmployeeID = BH.EmployeeID

Related

SQL to find table entries by consulting foreign keys

I have 4 tables as below:
Company: companyid, name
Employees: employeeid, name, companyid
Jobs: jobid, title, companyid
LinkEmpolyeeJob: employeeid, jobid
I want to identify invalid entries in the LinkEmpolyeeJob table where the employee and the job are from different company.
I want to avoid query like below because it was too slow:
select *
from LinkEmpolyeeJob
where (employeeid, jobid) not in (select a.employeeid, b.jobid
from Employees a, Jobs b
where a.companyid = b.companyid);
Anyone help? Thanks!
Try using joins:
select e.*, j.*
from LinkEmpolyeeJob lej join
Employees e
on lej.employeeid = e.employeeid join
Jobs j
on lej.jobid = j.jobid
where e.companyid <> j.companyid
Try using LEFT JOINs like below. (Did not test the query, but should give you idea)
select *
from LinkEmpolyeeJob lej
left join employees e on lej.employeeid = e.employeeid
left join jobs j on e.companyid = j.companyid and lef.jobid = j.jobid
where e.employeeid is null and j.jobid is null

Query to get employee of different departments

I have 3 tables: Employee, Department and employeeProject.
The relation between employee and employeeproject is one-to-many. The relation between employee and department is many-to-one.
I want to write a query to select 10 employees who have worked in projects 3 and 4. The query should return employees of different departments if possible.
The query below kind of works. The only problem is that the relationship between employee and employeeproject is one-to-many, so it might return the same employee number multiple times.
I cannot use distinct because all fields in the order by clause should be used in select when using distinct.
select top 10 empid from employee e
inner join department d on d.depId=e.depid
inner join employeeProject p on p.empid=e.empid
where p.projectID in (3,4)
order by row_number() over(partition by e.depId order by e.empid)
Bit of a guess, but use an EXISTS?
SELECT TOP 10 e.empid
FROM employee e
JOIN department d ON e.depid = d.depid
WHERE EXISTS (SELECT 1
FROM employeeproject p
WHERE p.emdid = e.empid
AND p.projectid IN (3,4))
ORDER BY e.depid, e.empid;
I suggest aggregating by employee, and then using an assertion the HAVING clause:
SELECT TOP 10 e.empid
FROM employee e
INNER JOIN department d
ON d.depId = e.depid
INNER JOIN employeeProject p
ON p.empid = e.empid
WHERE
p.projectID IN (3,4)
GROUP BY
e.empid
HAVING
MIN(p.projectID) <> MAX(p.projectID);
If the minimum and maximum projectID are not equal for a given employee, after restricting to only projects 3 and 4, then it implies that this employee meets the criteria.
Why not just use select distinct?
select distinct top 10 empid
from employee e inner join
employeeProject p
on p.empid = e.empid
where p.projectID in (3, 4)
order by row_number() over (partition by e.depId order by e.empid);
Note that the department table is not needed.
Alternatively,
select top (10) e.*
from employee e
where exists (select 1
from employeeprojects ep
where p.emdid = e.empid and
p.projectid in (3, 4)
)
order by row_number() over (partition by e.depid order by newid());

using count on when creating a view

im have a view that need to include the count of members volunteering for an event. I'm getting an error 'not a single-group group functions'. Any idea how to resolve this?
CREATE VIEW atbyrd.events__view AS
SELECT e.name, e."DATE", b."LIMIT",b.allocated_amount, COUNT(em.member_id), e.comments
FROM events e INNER JOIN budgets b ON b.event_id = e.id
INNER JOIN event_members em ON em.event_id = e.id;
SELECT e.name,
e."DATE",
b."LIMIT",
b.allocated_amount,
(select COUNT(member_id) from event_members) as mem_count,
e.comments
FROM events e
INNER JOIN budgets b ON b.event_id = e.id
INNER JOIN event_members em ON em.event_id = e.id;
You could use the analytic function count()
SELECT e.name
, e."DATE"
, b."LIMIT"
, b.allocated_amount
, COUNT(em.member_id) over ( partition by em.event_id )
, e.comments
FROM events e
INNER JOIN budgets b
ON b.event_id = e.id
INNER JOIN event_members em
ON em.event_id = e.id;
Simply put this counts the number of members per event_id but as it's not an aggregate function no GROUP BY is required. You receive the same value per event_id.

SQL Server 2005 - Nested recursive query :(

I have a query that I need to execute that I do not know how to structure.
I have a table called Employees. I have another table called Company. There is a third table called Files. As you can imagine, a Company has Employees, and Employees have Files.
I need to list out all of the Employees in my database. The challenge is, I need to list the total number of Files in the same company as the Employee. I have tried variations on the following without any luck:
SELECT
e.FirstName,
e.LastName,
e.Company,
(SELECT COUNT(*) FROM Files f WHERE f.EmployeeID IN (SELECT [ID] FROM Employees e2 WHERE e2.CompanyID=e.CompanyID)) as 'FileCount'
FROM
Employees e
What am I doing wrong? Thank you!
Try this:
SELECT
e.FirstName,
e.LastName,
e.Company,
(
SELECT COUNT(*)
FROM Files f
JOIN Employees e2 ON f.EmployeeID = e2.id
WHERE e2.CompanyID = e.CompanyID
) as 'FileCount'
FROM
Employees e
There are a lot of ways to get that. If the performance is a concern, this is more optimal according to estimated execution plan costs.
SELECT
e.FirstName,
e.LastName,
e.Company,
COUNT(f.FileId)
FROM
Employees e
INNER JOIN Files f ON e.EmployeeID = f.EmployeeID
GROUP BY
e.FirstName,
e.LastName,
e.Company
A solution with no correlation in SELECT clause. Probably quicker...
SELECT
e.FirstName,
e.LastName,
e.Company,
foo.FileCount
FROM
Employees e
JOIN
(
SELECT
COUNT(*) AS FileCount, --OR COUNT(DISTINCT something) ?
e2.Company, f.EmployeeID
FROM
Files f JOIN Employees e2 ON f.EmployeeID = e2.id
GROUP BY
e2.Company, f.EmployeeID
) foo ON e.Company = foo.Company AND e.id = foo.EmployeeID
How about:
SELECT
e.FirstName,
e.LastName,
e.Company,
select count(*) from Files f, Employees e where f.EmployeeID=e.EmployeeID and e.CompanyID=emp.CompanyID
FROM
Employees emp
WITH FilesPerCompany (CompanyID, NumberOfFiles)
AS (SELECT c.ID AS CompanyID,
COUNT(*) AS NumberOfFiles
FROM Companies c
INNER JOIN Employees e ON c.ID = e.CompanyID
INNER JOIN Files f ON e.ID = f.EmployeeID
GROUP BY c.ID
)
SELECT e.FirstName,
e.LastName,
e.Company,
COALESCE(s.NumberOfFiles, 0) AS NumberOfFilesPerCompany
FROM Employees e
LEFT JOIN FilesPerCompany s
ON s.CompanyID = e.CompanyID
The following statement uses recursive joins to iterate down employees who manage other employees who manage other employees .... etc. Our structure is a little convoluted as the management structure is role based which actually allows an employee to have more than 1 manager. You can add a reference to Files within this recursion.
WITH Manager as
(SELECT c.Forenames + ' ' + c.Surname as Employee,
c2.Forenames + ' ' + c2.Surname AS Manages,
c.accountid AS AccountID, c.[Status] AS [Status]
FROM [intranet].[dbo].[tblContact] c
LEFT JOIN tblContactPost cp ON cp.contactid = c.contactid
LEFT JOIN tblPost p ON p.ParentRoleId = cp.RoleID AND p.ParentPostArea = cp.PostArea AND p.ParentPostNo = cp.PostNo
INNER JOIN tblContactPost cp2 ON cp2.RoleId = p.RoleId AND cp2.PostArea = p.PostArea AND cp2.PostNo = p.PostNo
INNER JOIN tblContact c2 ON c2.ContactID = cp2.ContactId
)
,jn AS
(SELECT Employee, Manages
FROM Manager
Where AccountID = 'ad\lgardner' AND [Status] = 'A'
UNION ALL
SELECT c.Employee, c.Manages
FROM jn as p JOIN Manager AS c
ON c.Employee = p.Manages
)
SELECT jn.Employee, jn.Manages
From jn
Order BY 1

Why are my SQL statement count different fields from differrent tables in one SQL statement?

I have a SQL query:
SELECT
e.name as estate_name
, g.name as governing_body
, count(s.id) as total_stands
, count(sp.id) as service_providers
FROM estates e
LEFT JOIN governing_bodies
ON e.governing_body_id = g.id
LEFT JOIN stands s
ON s.estate_id = e.id
LEFT JOIN services sp
ON sp.estate_id = e.id
GROUP BY e.id
It seems like my counts multiply each other. If my first count is 3 and second count is 10 the results in service_providers field and total_stands field will be 30.
What am I doing wrong?
What about changing the COUNT(blah) constructs to COUNT(DISTINCT blah) ?
A count() displays the number of rows found for your group. Since you're grouping on estate, it will count the number of rows you join to estate. Joins will multiply the number of rows, so 3 x 10 = 30 sounds like the correct count. Run the query without GROUP BY to see what's happening.
One way to fix it would look like this:
SELECT
e.name as estate_name,
g.name as governing_body,
(select count(*) from stands s where s.estate_id = e.id) as stands,
(select count(*) from services sp where sp.estate_id = e.id) as services
FROM estates e
LEFT JOIN governing_bodies g on e.governing_body_id = g.id
Writing out Alex Martelli's informative answer:
SELECT
e.name as estate_name
, g.name as governing_body
, count(distinct s.id) as total_stands
, count(distinct sp.id) as service_providers
FROM estates e
LEFT JOIN governing_bodies
ON e.governing_body_id = g.id
LEFT JOIN stands s
ON s.estate_id = e.id
LEFT JOIN services sp
ON sp.estate_id = e.id
GROUP BY e.id, g.name
Or, as a more complex alternative with JOIN syntax:
SELECT
e.name as estate_name,
g.name as governing_body,
IsNull(stand_count.total,0) as stand_count,
IsNull(service_count.total,0) as service_count
FROM estates e
LEFT JOIN governing_bodies g on e.governing_body_id = g.id
LEFT JOIN (
select estate_id, total = count(*) from stands group by estate_id
) stand_count on stand_count.estate_id = e.id
LEFT JOIN (
select estate_id, total = count(*) from services group by estate_id
) service_count on service_count.estate_id = e.id
GROUP BY
e.name,
g.name,
IsNull(stand_count.total,0),
IsNull(service_count.total,0)