ORA-00933 when trying to compare counts - sql

This is the question
List the name of employee who work on more projects than employee 'Grace'', Show three columns in result: name of employee, project count of employee, grace's project count.
This is my code
SELECT employee."NAME", T1."# OF PROJECTS",
(SELECT COUNT(pid) FROM workon WHERE empid = 30) AS "Grace's Project"
FROM employee,
(SELECT empid, COUNT(pid) AS "# OF PROJECTS"
FROM workon
GROUP BY empid
ORDER BY empid)AS T1
WHERE T1."# OF PROJECTS" > (SELECT COUNT(pid) FROM workon WHERE empid = 30)
AND t1.empid = employee.EMPID
I keep getting ORA-00933: SQL command not properly ended. what am I missing?

The only error in your query is that Oracle does not accept AS for table aliases. Remove it and your query runs just fine.
There are two things I'd like to mention, though:
You are using an ancient join syntax you shouldn't use anymore. Comma-separated joins were made redundant by the introduction of explicit joins (e.g. INNER JOIN ... ON) in 1992.
Your query is a little over-complicated. Most of all, because you are counting projects thrice, once for all employees, twice for Grace. You can avoid this by using WITH clauses.
Here is the query built-up step by step with WITH clauses:
WITH emp AS
(
SELECT empid, e.name, COUNT(*) AS projects
FROM workon w
JOIN employee e USING(empid)
GROUP BY empid, e.name
ORDER BY empid
)
, grace AS
(
SELECT * FROM emp WHERE name = 'Grace'
)
SELECT
emp.name,
emp.projects as "# OF PROJECTS",
grace.projects as "Grace's Projects"
FROM emp
CROSS JOIN grace
WHERE emp.projects > grace.projects
ORDER BY emp.projects DESC, emp.name;
Demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=f40e2c33541c76f0af112be967370784

I would use window functions:
select ew.name, ew.num_projects
from (select e.empid, e."NAME", count(*) as num_projects,
max(case when ew."NAME" = 'Grace' then count(*) else 0 end) over () as grace_num_projects
from employee e join
workon w
on w.empid = e.empid
group by e.empid, e."NAME"
) ew
where num_projects > grace_num_projects;

SELECT DISTINCT
e.empid,
,e."NAME"
FROM employee e
INNER JOIN (SELECT
empid
,count(*) as num_projects
FROM workon
GROUP BY empid) w ON w.empid = e.empid
LEFT JOIN (SELECT
1 AS ID
COUNT(*) as grace_projects
FROM workon
GROUP BY empid
WHERE empid = 30) g ON g.ID = 1
WHERE w.num_projects > g.grace_projects;
So here's what I am doing.
I am counting the projects before they are joined, that should decrease the overhead for the query, as the return on the join is shrank considerably prior to joining.
An index on that WORKON table by EmpID would speed up the query considerably.
Then, I query Graces figures, because I want them to return a value against any person, it should only return one result for a count, and then just join that by an arbitrary value so it returns against all rows
Again, it should utilise the same index.
Because it is calculating Graces figures first, it should only need to do this once, whereas a subquery would need calculate graces figures for each employee which is an unnecessary overhead.
This is then filtered in the where clause.

Related

Slow MS Access Sub Query

I have three tables in Access:
employees
----------------------------------
id (pk),name
times
----------------------
id (pk),employee_id,event_time
time_notes
----------------------
id (pk),time_id,note
I want to get the record for each employee record from the times table with an event_time immediately prior to some time. Doing that is simple enough with this:
select employees.id, employees.name,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as time_id
from employees
However, I also want to get some indication of whether there's a matching record in the time_notes table:
select employees.id, employees.name,
(select top 1 time_notes.id from time_notes where time_notes.time_id=(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC)) as time_note_present,
(select top 1 times.id from times where times.employee_id=employees.id and times.event_time<=#2018-01-30 14:21:48# ORDER BY times.event_time DESC) as last_time_id
from employees
This does work but it's SOOOOO SLOW. We're talking 10 seconds or more if there's 100 records in the employee table. The problem is peculiar to Access as I can't use the last_time_id result of the other sub-query like I can in MySQL or SQL Server.
I am looking for tips on how to speed this up. Either a different query, indexes. Something.
Not sure if something like this would work for you?
SELECT
employees.id,
employees.name,
time_notes.id AS time_note_present,
times.id AS last_time_id
FROM
(
employees LEFT JOIN
(
times INNER JOIN
(
SELECT times.employee_id AS lt_employee_id, max(times.event_time) AS lt_event_time
FROM times
WHERE times.event_time <= #2018-01-30 14:21:48#
GROUP BY times.employee_id
)
AS last_times
ON times.event_time = last_times.lt_event_time AND times.employee_id = last_times.lt_employee_id
)
ON employees.id = times.employee_id
)
LEFT JOIN time_notes ON times.id = time_notes.time_id;
(Completely untested and may contain typos)
Basically, your query is running multiple correlated subqueries even a nested one in a WHERE clause. Correlated queries calculate a value separately for each row, corresponding to outer query.
Similar to #LeeMac, simply join all your tables to an aggregate query for the max event_time grouped by employee_id which will run once across all rows. Below times is the baseFROM table joined to the aggregate query, employees, and time_notes tables:
select e.id, e.name, t.event_time, n.note
from ((times t
inner join
(select sub.employee_id, max(sub.event_time) as max_event_time
from times sub
where sub.event_time <= #2018-01-30 14:21:48#
group by sub.employee_id
) as agg_qry
on t.employee_id = agg_qry.employee_id and t.event_time = agg_qry.max_event_time)
inner join employees e
on e.id = t.employee_id)
left join time_notes n
on n.time_id = t.id

How to make this complex query more efficient?

I want to select employees, having more than 10 products and older than 50. I also want to have their last product selected. I use the following query:
SELECT
PE.EmployeeID, E.Name, E.Age,
COUNT(*) as ProductCount,
(SELECT TOP(1) xP.Name
FROM ProductEmployee xPE
INNER JOIN Product xP ON xPE.ProductID = xP.ID
WHERE xPE.EmployeeID = PE.EmployeeID
AND xPE.Date = MAX(PE.Date)) as LastProductName
FROM
ProductEmployee PE
INNER JOIN
Employee E ON PE.EmployeeID = E.ID
WHERE
E.Age > 50
GROUP BY
PE.EmployeeID, E.Name, E.Age
HAVING
COUNT(*) > 10
Here is the execution plan link: https://www.dropbox.com/s/rlp3bx10ty3c1mf/ximExPlan.sqlplan?dl=0
However it takes too much time to execute it. What's wrong with it? Is it possible to make a more efficient query?
I have one limitation - I can not use CTE. I believe it will not bring performance here anyway though.
Before creating Index I believe we can restructure the query.
Your query can be rewritten like this
SELECT E.ID,
E.NAME,
E.Age,
CS.ProductCount,
CS.LastProductName
FROM Employee E
CROSS apply(SELECT TOP 1 P.NAME AS LastProductName,
ProductCount
FROM (SELECT *,
Count(1)OVER(partition BY EmployeeID) AS ProductCount -- to find product count for each employee
FROM ProductEmployee PE
WHERE PE.EmployeeID = E.Id) PE
JOIN Product P
ON PE.ProductID = P.ID
WHERE ProductCount > 10 -- to filter the employees who is having more than 10 products
ORDER BY date DESC) CS -- To find the latest sold product
WHERE age > 50
This should work:
SELECT *
FROM Employee AS E
INNER JOIN (
SELECT PE.EmployeeID
FROM ProductEmployee AS PE
GROUP BY PE.EmployeeID
HAVING COUNT(*) > 10
) AS PE
ON PE.EmployeeID = E.ID
CROSS APPLY (
SELECT TOP (1) P.*
FROM Product AS P
INNER JOIN ProductEmployee AS PE2
ON PE2.ProductID = P.ID
WHERE PE2.EmployeeID = E.ID
ORDER BY PE2.Date DESC
) AS P
WHERE E.Age > 50;
Proper indexes should speed query up.
You're filtering by Age, so followining one should help:
CREATE INDEX ix_Person_Age_Name
ON Person (Age, Name);
Subquery that finds emploees with more than 10 records should be calculated first and CROSS APPLY should bring back data more efficient with TOP operator rather than comparing it to MAX value.
Answer by #Prdp is great, but I thought I'll drop an alternative in. Sometimes windowed functions do not work very well and it's worth to replace them with ol'good subqueries.
Also, do not use datetime, use datetime2. This is suggest by Microsoft:
https://msdn.microsoft.com/en-us/library/ms187819.aspx
Use the time, date, datetime2 and datetimeoffset data
types for new work. These types align with the SQL Standard. They are
more portable. time, datetime2 and datetimeoffset provide
more seconds precision. datetimeoffset provides time zone support
for globally deployed applications.
By the way, here's a tip. Try to name your surrogate primary keys after table, so they become more meaningful and joins feel more natural. I.E.:
In Employee table replace ID with EmployeeID
In Product table replace ID with ProductID
I find these a good practice.
with usersOver50with10productsOrMore (employeeID, productID, date, id, name, age, products ) as (
select employeeID, productID, date, id, name, age, count(productID) from productEmployee
join employee on productEmployee.employeeID = employee.id
where age >= 50
group by employeeID, productID, date, id, name, age
having count(productID) >= 10
)
select sfq.name, sfq.age, pro.name, sfq.products, max(date) from usersOver50with10productsOrMore as sfq
join product pro on sfq.productID = pro.id
group by sfq.name, sfq.age, pro.name, sfq.products
;
There is no need to find the last productID for the entire table, just filler the last product from the results of employees with 10 or more products and over the age of 50.

Multiple Count with different value each record

I have a problem with my query, please help me to solve this problem.
My Query :
SELECT D.DEPTNO,
(SELECT COUNT(DISTINCT P.PROJNO) FROM SCHEMA.PROJECT P, SCHEMA.DEPARTMENT D WHERE P.DEPTNO = D.DEPTNO) AS PROJECT,
(SELECT COUNT(DISTINCT E.EMPNO) FROM SCHEMA.EMPLOYEE E, SCHEMA.DEPARTMENT D WHERE E.WORKDEPT = D.DEPTNO) AS EMPLOYEE
FROM SCHEMA.DEPARTMENT D, SCHEMA.PROJECT P, SCEHMA.EMPLOYEE E GROUP BY D.DEPTNO#
AND HERE THE RESULT :
enter image description here
but it should each row has a different result.
I must show total of project and employee each department, so i group that by deptno, but the result shown all total project and employee
Please help me guys :)
I think you are trying to write this query:
SELECT D.DEPTNO,
(SELECT COUNT(DISTINCT P.PROJNO)
FROM SCHEMA.PROJECT P
WHERE P.DEPTNO = D.DEPTNO
) AS PROJECT,
(SELECT COUNT(DISTINCT E.EMPNO)
FROM SCHEMA.EMPLOYEE E
WHERE E.WORKDEPT = D.DEPTNO
) AS EMPLOYEE
FROM SCHEMA.DEPARTMENT D;
Notes:
You don't need JOINs in the subquery. The correlation clause is sufficient.
You don't need a GROUP BY in the outer query.
You probably don't need the COUNT(DISTINCT), but I'm not sure so I'm leaving it. in.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
When you join tables - do not forget the join predicates!
If I got your requirement right - this might be a possible solution:
select deptno
,count_projects
,count_employees
from (
select deptno
,count(projno) as count_projects
from project
group by deptno ) as p
inner join
(select workdept
,count(*) as Count_employees
from employee
group by workdept) as e
on p.deptno = e.workdept
The table DEPARTMENT is not necessary unless you have to retrieve specific data from this table as well.

Multiple grouped items

I can't seem to find out how to get the functionality I want. Here is an example of what my table looks like:
EmpID | ProjectID | hours_worked |
3 1 8
3 1 8
4 2 8
4 2 8
4 3 8
5 4 8
I want to group by EmpID and ProjectID and then sum up the hours worked. I then want to inner join the Employee and Project table rows that are associated with EmpID and ProjectID, however when I do this then I get an error about the aggregate function thing, which I understand from research but I don't think this would have that problem since there will be one row per EmpID and ProjectID.
Real SQL:
SELECT
WorkHours.EmpID,
WorkHours.ProjectID,
Employees.FirstName
FROM WorkHours
INNER JOIN Projects ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees ON WorkHours.EmpID = Employees.EmpID
GROUP BY WorkHours.ProjectID, WorkHours.EmpID
This gives the error:
Column 'Employees.FirstName' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
You might want to use OVER (PARTITION BY) so you won't have to use GROUP BY:
Select a.EmpID
,W.ProjectID
,W.SUM(hours_worked) OVER (PARTITION BY W.EmpID,W.ProjectID)
,E.FirstName
FROM WorkHours W
INNER JOIN Projects P ON WorkHours.ProjectID = Projects.ProjectID
INNER JOIN Employees E ON WorkHours.EmpID = Employees.EmpID
You can do a basic query to get the grouped hours and use that as a basis for the rest, either in a CTE or as a subquery. For example, as a subquery:
SELECT *
FROM
(SELECT EmpID, ProjectID, SUM(hours_worked) as HoursWorked
FROM WorkHours
GROUP BY EmpID, ProjectID) AS ProjectHours
JOIN Projects
ON Projects.ID = ProjectHours.ProjectID
JOIN Employees
ON Employees.ID = ProjectHours.EmpID
One way is to use a CTE to first form the data you want, then join onto the other table(s)
WITH AggregatedHoursWorked
AS
(
SELECT EmpID,
ProjectID,
SUM(HoursWorked) AS TotalHours
FROM WorkHours
GROUP BY EmpID, ProjectID
)
SELECT e.FirstName
p.ProjectName,
hw.TotalHours
FROM AggregatedHoursWorked hw
INNER JOIN Employees e
ON hw.EmpID = e.ID
INNER JOIN Projects p
ON hw.ProjectID = p.ID
If you use an aggregate function, all the columns must be named in the aggregate function and/or in the GROUP BY clause. If you want to join the descriptions (normally unique for a given ID), you have to include the description columns in the GROUP BY clause. This will not affect the result of the query.

Select specific columns from two tables

Suppose I have two tables tblEmployee and tblEmpSalary. I need to write a SQL statement to get a list of all employees, their name and salary, who receive the highest salary in each department.
Sample table data is here:
You could use ranking functions in this case:
WITH ranked AS (
SELECT
e.*,
s.monSalary,
rnk = RANK() OVER (PARTITION BY e.strDepartment ORDER BY s.monSalary DESC)
FROM tblEmplopyee e
INNER JOIN tblEmpSalary s ON e.intEmployeeID = s.intEmployeeID
)
SELECT
intEmploeeID,
strEmpName,
strDepartment,
monSalary
FROM ranked
WHERE rnk = 1
The RANK() function will do if you only need those who's got the topmost salary. With RANK(), the query may return more than employee per department if they have the same salary.
Alternatively, you can use DENSE_RANK() instead of RANK(), with the same effect, but DENSE_RANK() would also allow you to get employees with top n salaries. (You would be able to specify that in the WHERE condition like this:
WHERE rnk <= n
)
If, however, you need exactly one employee per department, even if there are several of them matching the requirement, use ROW_NUMBER() instead of RANK(). But then you'll probably need to add another criterion to the ORDER BY clause of the ranking function, e.g. like this:
... ORDER BY s.monSalary DESC, e.strEmpName ASC)
In fact, ROW_NUMBER() would simply make your query employee-oriented rather than salary-oriented. With ROW_NUMBER(), you would be able to have your query return top n most-paid employees, using the same condition as with DENSE_RANK():
WHERE rnk <= n
You can read more about ranking functions in SQL Server on MSDN:
Ranking Functions (Transact-SQL)
SELECT e.strEmpName, s.monSalary
FROM tblEmployee e
JOIN tblEmpSalary s ON e.intEmployeeID = s.intEmployeeID
WHERE e.strDepartment + '-' + CAST(s.monSalary AS varchar(20)) IN (
SELECT e2.strDepartment + '-' + CAST(MAX(s2.monSalary) AS varchar(20))
FROM tblEmployee e2
JOIN tblEmpSalary s2 ON e2.intEmployeeID = s2.intEmployeeID
GROUP BY e2.strDepartment)
Disclaimer: I can't test this query right now, so it could have some small detail wrong.
SELECT a.d, a.m, b.strEmpName
FROM (
SELECT strDepartment d, MAX(monSalary) m
FROM (
SELECT *
FROM tblEmployee e
LEFT JOIN tblEmpSalary s ON e.inEmployeeID = s.intEmployeeID
)
GROUP BY strDepartment
) a
LEFT JOIN (
SELECT *
FROM tblEmployee e
LEFT JOIN tblEmpSalary s ON e.inEmployeeID = s.intEmployeeID
) b ON a.d=b.strDepartment AND a.m=b.M
SELECT tblEmployee.strEmpName, max_salaries.strDepartment, max_salaries.salary
FROM (SELECT tblEmployee.strDepartment, MAX(monSalary)
FROM tblEmployee INNER JOIN tblEmpSalary
ON tblEmployee.intEmployeeID = tblEmpSalary.intEmployeeID
GROUP BY tblEmployee.strDepartment) max_salaries
INNER JOIN tblEmployee ON tblEmployee.strDepartment = max_salaries.strDepartment
INNER JOIN tblEmpSalary ON tblEmpSalary.monSalary = max_salaries.salary
AND tblEmpSalary.intEmployeeID = tblEmployee.intEmployeeID
In case of two or more employees with equal max salaries - this will return all of them for the specified department.