Multiple grouped items - sql

I can't seem to find out how to get the functionality I want. Here is an example of what my table looks like:
EmpID | ProjectID | hours_worked |
3 1 8
3 1 8
4 2 8
4 2 8
4 3 8
5 4 8
I want to group by EmpID and ProjectID and then sum up the hours worked. I then want to inner join the Employee and Project table rows that are associated with EmpID and ProjectID, however when I do this then I get an error about the aggregate function thing, which I understand from research but I don't think this would have that problem since there will be one row per EmpID and ProjectID.
Real SQL:
SELECT
WorkHours.EmpID,
WorkHours.ProjectID,
Employees.FirstName
FROM WorkHours
INNER JOIN Projects ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees ON WorkHours.EmpID = Employees.EmpID
GROUP BY WorkHours.ProjectID, WorkHours.EmpID
This gives the error:
Column 'Employees.FirstName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

You might want to use OVER (PARTITION BY) so you won't have to use GROUP BY:
Select a.EmpID
,W.ProjectID
,W.SUM(hours_worked) OVER (PARTITION BY W.EmpID,W.ProjectID)
,E.FirstName
FROM WorkHours W
INNER JOIN Projects P ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees E ON WorkHours.EmpID = Employees.EmpID

You can do a basic query to get the grouped hours and use that as a basis for the rest, either in a CTE or as a subquery. For example, as a subquery:
SELECT *
FROM
(SELECT EmpID, ProjectID, SUM(hours_worked) as HoursWorked
FROM WorkHours
GROUP BY EmpID, ProjectID) AS ProjectHours
JOIN Projects
ON Projects.ID = ProjectHours.ProjectID
JOIN Employees
ON Employees.ID = ProjectHours.EmpID

One way is to use a CTE to first form the data you want, then join onto the other table(s)
WITH AggregatedHoursWorked
AS
(
SELECT EmpID,
ProjectID,
SUM(HoursWorked) AS TotalHours
FROM WorkHours
GROUP BY EmpID, ProjectID
)
SELECT e.FirstName
p.ProjectName,
hw.TotalHours
FROM AggregatedHoursWorked hw
INNER JOIN Employees e
ON hw.EmpID = e.ID
INNER JOIN Projects p
ON hw.ProjectID = p.ID

If you use an aggregate function, all the columns must be named in the aggregate function and/or in the GROUP BY clause. If you want to join the descriptions (normally unique for a given ID), you have to include the description columns in the GROUP BY clause. This will not affect the result of the query.

Related

ORA-00933 when trying to compare counts

This is the question
List the name of employee who work on more projects than employee 'Grace'', Show three columns in result: name of employee, project count of employee, grace's project count.
This is my code
SELECT employee."NAME", T1."# OF PROJECTS",
(SELECT COUNT(pid) FROM workon WHERE empid = 30) AS "Grace's Project"
FROM employee,
(SELECT empid, COUNT(pid) AS "# OF PROJECTS"
FROM workon
GROUP BY empid
ORDER BY empid)AS T1
WHERE T1."# OF PROJECTS" > (SELECT COUNT(pid) FROM workon WHERE empid = 30)
AND t1.empid = employee.EMPID
I keep getting ORA-00933: SQL command not properly ended. what am I missing?
The only error in your query is that Oracle does not accept AS for table aliases. Remove it and your query runs just fine.
There are two things I'd like to mention, though:
You are using an ancient join syntax you shouldn't use anymore. Comma-separated joins were made redundant by the introduction of explicit joins (e.g. INNER JOIN ... ON) in 1992.
Your query is a little over-complicated. Most of all, because you are counting projects thrice, once for all employees, twice for Grace. You can avoid this by using WITH clauses.
Here is the query built-up step by step with WITH clauses:
WITH emp AS
(
SELECT empid, e.name, COUNT(*) AS projects
FROM workon w
JOIN employee e USING(empid)
GROUP BY empid, e.name
ORDER BY empid
)
, grace AS
(
SELECT * FROM emp WHERE name = 'Grace'
)
SELECT
emp.name,
emp.projects as "# OF PROJECTS",
grace.projects as "Grace's Projects"
FROM emp
CROSS JOIN grace
WHERE emp.projects > grace.projects
ORDER BY emp.projects DESC, emp.name;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f40e2c33541c76f0af112be967370784
I would use window functions:
select ew.name, ew.num_projects
from (select e.empid, e."NAME", count(*) as num_projects,
max(case when ew."NAME" = 'Grace' then count(*) else 0 end) over () as grace_num_projects
from employee e join
workon w
on w.empid = e.empid
group by e.empid, e."NAME"
) ew
where num_projects > grace_num_projects;
SELECT DISTINCT
e.empid,
,e."NAME"
FROM employee e
INNER JOIN (SELECT
empid
,count(*) as num_projects
FROM workon
GROUP BY empid) w ON w.empid = e.empid
LEFT JOIN (SELECT
1 AS ID
COUNT(*) as grace_projects
FROM workon
GROUP BY empid
WHERE empid = 30) g ON g.ID = 1
WHERE w.num_projects > g.grace_projects;
So here's what I am doing.
I am counting the projects before they are joined, that should decrease the overhead for the query, as the return on the join is shrank considerably prior to joining.
An index on that WORKON table by EmpID would speed up the query considerably.
Then, I query Graces figures, because I want them to return a value against any person, it should only return one result for a count, and then just join that by an arbitrary value so it returns against all rows
Again, it should utilise the same index.
Because it is calculating Graces figures first, it should only need to do this once, whereas a subquery would need calculate graces figures for each employee which is an unnecessary overhead.
This is then filtered in the where clause.

SQL Joins and Corelated subqueries with column data

I am facing an issue in terms of understanding the joins. Lets say for an example we have two tables employee and sales and now I have a query where we have sales of an employee using the id of the employee
select e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid from sales) s on s.eid = e.id
group by 1,2
I'd like to understand why s.city wasn't showing up? and also would like to understand what is this concept called? Is it co related sub queries on Joins? Please help me down over here.
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join (select sales,eid,city from sales) s on s.eid = e.id
group by 1,2
in the left join above you have to add city as well. The query Imagine select sales,eid,city from sales is a table itself and then from this table you are selecting city (your second column s.city) this will run error as your table doesn't have a city column yet.
It is much easier to use CTE (common table expressions than CTE's) You can also do the above question as
select
e.employeename
,s.city
,SUM(s.sales)
from employee e
left join sales as s
on e.id = s.id
group by 1,2
here I have added e.id = s.id instead of s.id = e.id it is better to reference the key of the main table first.
you could use CTE (although used when you have to do a lot of referencing but you can see how it works):
With staging as (
select
e.employeename
,s.city
,s.sales
from employee e
left join sales as s
on e.id = s.id
),
sales_stats as (
select
staging.employeename,
staging.city,
sum(staging.sales)
from staging
group by 1,2
#here you will select from staging again consider staging as a separate table so you will have to have all the columns in the staging that you want to use further. Also you will have to reference columns using staging.x
)
select * from sales_stats
-- here you could have combined the steps but I wanted to show you how cte works, Hope this works for you

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

How to make this complex query more efficient?

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.
Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50
This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.
with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

Eliminate duplicate rows from query output

I have a large SELECT query with multiple JOINS and WHERE clauses. Despite specifying DISTINCT (also have tried GROUP BY) - there are duplicate rows returned. I am assuming this is because the query selects several IDs from several tables. At any rate, I would like to know if there is a way to remove duplicate rows from a result set, based on a condition.
I am looking to remove duplicates from results if x.ID appears more than once. The duplicate rows all appear grouped together with the same IDs.
Query:
SELECT e.Employee_ID, ce.CC_ID as CCID, e.Manager_ID, e.First_Name, e.Last_Name,,e.Last_Login,
e.Date_Created AS Date_Created, e.Employee_Password AS Password,e.EmpLogin
ISNULL((SELECT TOP 1 1 FROM Gift g
JOIN Type t ON g.TypeID = t.TypeID AND t.Code = 'Reb'
WHERE g.Manager_ID = e.Manager_ID),0) RebGift,
i.DateCreated as ImportDate
FROM #EmployeeTemp ct
JOIN dbo.Employee c ON ct.Employee_ID = e.Employee_ID
INNER JOIN dbo.Manager p ON e.Manager_ID = m.Manager_ID
LEFT JOIN EmployeeImp i ON e.Employee_ID = i.Employee_ID AND i.Active = 1
INNER JOIN CreditCard_Updates cc ON m.Manager_ID = ce.Manager_ID
LEFT JOIN Manager m2 ON m2.Manager_ID = ce.Modified_By
WHERE ce.CCType ='R' AND m.isT4L = 1
AND CHARINDEX(e.first_name, Selected_Emp) > 0
AND ce.Processed_Flag = #isProcessed
I don't have enough reputation to add a comment, so I'll just try to help you in an answer proper (even though this is more of a comment).
It seems like what you want to do is select distinctly on just one column.
Here are some answers which look like that:
SELECT DISTINCT on one column
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?