SUM(SALARY) when ID is distinct - sql

I am having trouble trying to solve this problem, I would like to only add a salary up if the
employee's id is distinct. I thought I could do this using the decode() function but I am having trouble defining an expression suitable. I was aiming for something like
SUM(DECODE(S.ID,IS DISTINCT,S.SALARY))
But this isn't going to work!
So the full query looks like
SELECT B.ID, SUM(S.SALARY), COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
GROUP BY B.ID;
But the problem is with SUM(S.SALARY) it's adding up salaries from duplicate ID's

I don't know about DECODE, but this should work:
SELECT
SUM(S.SALARY)
FROM <table> S
WHERE NOT EXISTS (
SELECT ID FROM <table> WHERE ID=S.ID GROUP BY ID HAVING COUNT(*)>1
)

Perhaps something like this...
SELECT E.ID, SUM(E.Salary)
FROM Employers E
WHERE E.ID IN (SELECT DISTINCT E2.ID FROM Employers E2)
GROUP BY E.ID
If not, perhaps you could post some sample data so that I can understand better

The joins are introducing duplicate rows. One way to fix this is by adding a row number to sequentially identify different ids. The real way would be to fix the joins so this doesn't happen, but here is the first way:
SELECT B.ID, SUM(CASE WHEN SEQNUM = 1 THEN S.SALARY END),
COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM (SELECT B.ID, S.ID, RM.MEMBER_ID,
ROW_NUMBER() OVER (PARTITION BY S.ID ORDER BY S.ID) as seqnum
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
) t
GROUP BY B.ID

You can create a virtual table with only one salary per ID like this...
SELECT
...whatever fields you've already got...
s.Salary
FROM
...whatever tables and joins you've already got...
LEFT JOIN (SELECT ID, MAX(SALARY) as "Salary" FROM SALARY_TABLE GROUP BY ID) s
ON whatevertable.ID = s.ID

Related

Is there a way to COUNT data without adding a column to the resulting query, while still filtering by it?

Because I received help from an earlier post, I was able to get a code that does correctly:
select *
from (
select e.employeename, s.skilldescription,
count(*) over(partition by e.employeeid) as count
from employeeskills_t k
inner join employee_t e on k.employeeid = e.employeeid
inner join skill_t s on k.skillid=s.skillid
) t
where count> 1
order by employeename, skilldescription
But it generates an extra column, count, which I don't need.
I don't understand SQL that well, so I dont know if a command exists.
By default select * gives you all columns available in the from clause.
Here, the from clause it itself a subquery, which returns 3 columns ; the third column is the count, which is needed in the outer query for filtering (that's the "last" where clause).
Since we cannot not return that column from the subquery, we can instead ignore it in the output. This means we need to enumerate the columns we want, rather than blindly use *.
Columns in the outer query have the same names as those returned by the subuqery. Here there are only two columns that are needed, so that's quite short to write :
Just change that select * from (...) t where ...
... To : select employeename, skilldescription from (...) t where ...
Side note: some databases support a specific syntax to select all columns but a few named columns. BigQuery has SELECT * EXCEPT - but that's not a widely available feature in other RDBMS, unfortunately.
since you have used windows function you will see the results of each row and so the column will remain there
you have two option either use the group by , where you can just filter on the count values using having clause or you could use the windows function and then select only the relevant columns you need that would be:
with main as (
select e.employeename, s.skilldescription,
count(*) over(partition by e.employeeid) as count
from employeeskills_t k
inner join employee_t e on k.employeeid = e.employeeid
inner join skill_t s on k.skillid=s.skillid
)
select
e.employeename,
s.skilldescription,
where count> 1
order by employeename, skilldescription
version 2 :
select e.employeename
from employeeskills_t k
inner join employee_t e on k.employeeid = e.employeeid
inner join skill_t s on k.skillid=s.skillid
group by 1
having count(e.employeenam) > 1
If you are looking for distinct values of employeename, skilldescription with count of emp ids more than one, same can be achieved using group by & having.
Please check if the below helps.
select e.employeename, s.skilldescription
from employeeskills_t k
inner join employee_t e on k.employeeid = e.employeeid
inner join skill_t s on k.skillid=s.skillid
group by e.employeename, s.skilldescription
having COUNT(e.employeeid) > 1
order by employeename, skilldescription;

project to which maximum number of employees have been allocated

I have these tables with the following columns :
Employee24 (EMPLOYEEID, FIRSTNAME, LASTNAME, GENDER);
PROJECT24 (PROJECTID PROJECTNAME EMPLOYEEID);
I want to write a query to find project to which maximum number of employees are alocated.
SELECT FIRSTNAME, LASTNAME
FROM EMPLOYEE24 E
WHERE E.EMPLOYEEID IN ( SELECT L2.EMPLOYEEID
FROM PROJECT24 L2 group by l2.employeeid)\\
What do you want to do if there are ties? This is an important question and why row_number()/rank() might be a better choice:
select p.*
from (select p.projectid, p.projectname, count(*) as num_employees,
rank() over (order by count(*) desc) as seqnum
from project25 p
group by p.projectid, p.projectname
) p
where seqnum = 1;
Notes:
The above query returns all rows if there are ties. If you want only one (arbitrary) project when there is a tie, then use row_number().
I see no reason to join to employee24.
Your data structure is strange. The relationship between projects and employees should be in a separate table, say project_employees. That should have projectid, but not the name. The name should be in project24.
You might try something like this (though I'm quite sure it can be done in other ways):
SELECT *
FROM (SELECT prj.projectid,
prj.projectname,
COUNT(*) AS number_employees
FROM project24 prj
JOIN employee24 emp
ON prj.employeeid = emp.employeeid
GROUP BY prj.projectid,
prj.projectname
ORDER BY number_employees DESC)
WHERE ROWNUM = 1;

Select a column that is not in the GROUP BY clause [duplicate]

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 8 years ago.
I need advice on how to do SQL queries that I returned the following:
I have 2 tables: customer and the department
SELECT a.id, a.first_name, a.last_name, MIN (b.income), b.department
/* --b.department can not be in a GROUP BY clause,
--but I need to know which department has the
--smallest income, i.e. which department is responsible for MIN (b.income) */
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id
GROUP BY a.id, a.first_name, a.last_name;
How can I do it?
You can use the PostgreSQL-specific feature distinct on to do this:
SELECT distinct on (a.id, a.first_name, a.last_name)
a.id, a.first_name, a.last_name, b.income, b.department
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id
ORDER BY a.id, a.first_name, a.last_name, b.income;
This means you get one row for each set of distinct values in the distinct on (...), and which row in that set you get is the first one (as determined by the order by) in that group
In T-SQL (and PL/SQL and in most RDBMS) you can use the OVER clause (windowing):
SELECT a.id, a.first_name, a.last_name,
-- Here is the trick
MIN (b.income) OVER (PARTITION BY a.id, a.first_name, a.last_name) AS min_income,
-- End of trick
b.department
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id
This looks like T-SQL so I'll give the answer for that.
SELECT a.id,
a.first_name,
a.last_name,
MIN(b.income),
(SELECT TOP 1 c.departmentname --Or whatever the name of your department name is
FROM department c
WHERE c.income = MIN(b.income)) AS [DepartmentName]
FROM CUSTOMERS a
INNER JOIN department b ON a.id = b.id
GROUP BY a.id, a.first_name, a.last_name;
You need to use a nested query in order to find which department has the income. You might also have to add in some more where restraints on the nested query there, assuming multiple departments can have the same income. But those will depend on your database schema, so I'll leave you to work out that logic to make sure you're talking about the same one.
Edit:
Although reading this more, it looks like you could just rephrase it all:
SELECT a.id,
a.first_name,
a.last_name,
(SELECT TOP 1 departmentname --Or whatever the name of your department name is
FROM department
WHERE department.id = customers.id
ORDER BY income DESC) AS [DepartmentName]
FROM customers
You wouldn't get the income with that, but you can add in the code to get that too.
Something like
Select cust.*, b.department from
(SELECT a.id, a.first_name, a.last_name, MIN (b.income) min_income
FROM CUSTOMERS a
GROUP BY a.id, a.first_name, a.last_name
) cust
INNER JOIN department b
ON cust.id = b.id
If your db supports this syntax.

Can I use more than one column in subquery?

I want to show the names of all employees from the EMPLOYEES table who are working on more than three projects from the PROJECT table.
PROJECTS.PersonID is a a foreign key referencing EMPLOYEES.ID:
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID, COUNT(*)
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
Can I have both PersonID, COUNT(*) in that subquery, or there must be only one column?
Not in an IN clause (or at least not the way you are trying to use it. Some RDBMSs allow tuples with more than one column in the IN clause but it wouldn't help your case here)
You just need to remove the COUNT(*) from the SELECT list to achieve your desired result.
SELECT NAME, ID
FROM EMPLOYEES
WHERE ID IN
(
SELECT PersonID
FROM PROJECTS
GROUP BY PersonID
HAVING COUNT(*) > 3
)
If you wanted to also return the count you could join onto a derived table or common table expression with more than one column though.
SELECT E.NAME,
E.ID,
P.Cnt
FROM EMPLOYEES E
JOIN (SELECT PersonID,
Count(*) AS Cnt
FROM PROJECTS
GROUP BY PersonID
HAVING Count(*) > 3) P
ON E.ID = P.PersonID
To answer your question, you can only have 1 column for the IN subquery. You could get your results using the query below:
SELECT e.ID
,e.Name
FROM dbo.Projects p
LEFT OUTER JOIN dbo.Employees e
ON p.PersonID = e.ID
GROUP BY e.ID
,e.Name
HAVING COUNT(*) > 3

Using an inner select construction for a select value

I would like to hear if anyone can tell me a simple syntax that accomplishes the same as the following (with the same flexibility):
SELECT C.CompanyName,
(SELECT Count(*) FROM Employees WHERE CompanyId = C.Id) as EmployeeCount
FROM Company C
Now, what's important is that the inner SELECT giving the EmployeeCount is:
An independent SELECT statement
This means that it should work with any existing SELECT, even if it already contains joins etc.
Can use values from the parent SELECT
I know that this scenario can be easily accomplished in other ways, but the above is a simplified example to explain the challenge. My real scenario is a complex SELECT statement where I do not want to complicate it by adding more joins. Performance is no issue.
Using INNER JOIN:
SELECT C.CompanyName, Count(E.*) as EmployeeCount
FROM Company C
INNER JOIN Employees E on E.CompanyId = C.Id
Using NESTED JOIN:
SELECT C.CompanyName, Count(E.1) as EmployeeCount
FROM Company C, Employess E
WHERE E.CompanyId = C.Id
If you want to use the same syntax, at least put this:
SELECT C.CompanyName,
(SELECT Count(1) FROM Employees WHERE CompanyId = C.Id) as EmployeeCount
FROM Company C
If you need all the data to be shown, even the ones the companies without any Employees, you can use a LEFT OUTER JOIN:
SELECT C.CompanyName, Count(E.*) as EmployeeCount
FROM Company C
LEFT OUTER JOIN Employees E on E.CompanyId = C.Id
Try using a derived table, which statifies both your conditions.
An independent SELECT statement.
a. Using a Derived Table allows you to keep your independent Select Statement
Can use values from the parent SELECT.
a. As an Inner join you can still use values from the parent select.
SELECT
C.CompanyName,
EC.EmployeeCount
FROM Company C
INNER JOIN (SELECT
Count(*) AS EmployeeCount
FROM Employees ) EC
ON WHERE EC.CompanyId = C.Id
If your inner select is complicated, then why not make a view of it:
CREATE VIEW EmpSelect AS
SELECT CompanyId, whatever FROM Employees;
Then
SELECT
C.CompanyName, Count(*) AS EmpCount
FROM
Company C
LEFT JOIN EmpSelect E
ON C.Id = E.CompanyId
GROUP BY
C.CompanyName;