Select a column that is not in the GROUP BY clause [duplicate] - sql

This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 8 years ago.
I need advice on how to do SQL queries that I returned the following:
I have 2 tables: customer and the department
SELECT a.id, a.first_name, a.last_name, MIN (b.income), b.department
/* --b.department can not be in a GROUP BY clause,
--but I need to know which department has the
--smallest income, i.e. which department is responsible for MIN (b.income) */
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id
GROUP BY a.id, a.first_name, a.last_name;
How can I do it?

You can use the PostgreSQL-specific feature distinct on to do this:
SELECT distinct on (a.id, a.first_name, a.last_name)
a.id, a.first_name, a.last_name, b.income, b.department
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id
ORDER BY a.id, a.first_name, a.last_name, b.income;
This means you get one row for each set of distinct values in the distinct on (...), and which row in that set you get is the first one (as determined by the order by) in that group

In T-SQL (and PL/SQL and in most RDBMS) you can use the OVER clause (windowing):
SELECT a.id, a.first_name, a.last_name,
-- Here is the trick
MIN (b.income) OVER (PARTITION BY a.id, a.first_name, a.last_name) AS min_income,
-- End of trick
b.department
FROM CUSTOMERS a
INNER JOIN department b
ON a.id = b.id

This looks like T-SQL so I'll give the answer for that.
SELECT a.id,
a.first_name,
a.last_name,
MIN(b.income),
(SELECT TOP 1 c.departmentname --Or whatever the name of your department name is
FROM department c
WHERE c.income = MIN(b.income)) AS [DepartmentName]
FROM CUSTOMERS a
INNER JOIN department b ON a.id = b.id
GROUP BY a.id, a.first_name, a.last_name;
You need to use a nested query in order to find which department has the income. You might also have to add in some more where restraints on the nested query there, assuming multiple departments can have the same income. But those will depend on your database schema, so I'll leave you to work out that logic to make sure you're talking about the same one.
Edit:
Although reading this more, it looks like you could just rephrase it all:
SELECT a.id,
a.first_name,
a.last_name,
(SELECT TOP 1 departmentname --Or whatever the name of your department name is
FROM department
WHERE department.id = customers.id
ORDER BY income DESC) AS [DepartmentName]
FROM customers
You wouldn't get the income with that, but you can add in the code to get that too.

Something like
Select cust.*, b.department from
(SELECT a.id, a.first_name, a.last_name, MIN (b.income) min_income
FROM CUSTOMERS a
GROUP BY a.id, a.first_name, a.last_name
) cust
INNER JOIN department b
ON cust.id = b.id
If your db supports this syntax.

Related

ERROR: column must appear in the GROUP BY clause or be used in an aggregate function when using two joins

I have the following PostgreSQL:
select
A.*,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') as number,
sum(cast(A.amount AS decimal)) as sum_amount,
count(A.amount) as cnt_amount
into result
from B
join A on B.name = A.name and B.parent = A.id
join C on A.name = C.name and B.child = C.id
group by A.name, A.unit, number;
select * from result;
But I get the following error:
SQL Error [42803]: ERROR: column "A.index" must appear in the GROUP BY clause or be used in an aggregate function.
What is the reason for this?
I tried adding A.index to the GROUP BY clause but it only kept asking for different columns. I also tried creating a subquery but failed because I have two joins and I'm trying to create a new table result.
Here is a version with the GROUP BY problem corrected:
SELECT
A.name,
A.unit,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') AS number,
SUM(CAST(A.amount AS decimal)) AS sum_amount,
COUNT(A.amount) AS cnt_amount
INTO result
FROM B
INNER JOIN A ON B.name = A.name AND B.parent = A.id
INNER JOIN C ON A.name = C.name AND B.child = C.id
GROUP BY
A.name,
A.unit,
B.child,
number;
Note that every column/alias which appears in the SELECT clause also appears in GROUP BY. Exceptions to this are columns which appear inside aggregate functions. In that case, it is OK for them to not appear in GROUP BY.

Query returning too many results

SQL query that returns expected 29 results for a.id = 366
select a.name, c.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
where a.id = 366
GROUP BY a.name, c.name
order by MAX(b.renew_date), MAX(b.date) desc;
SQL code below that returns 34 results, multiple results where two different Provides supplied the same course. I know these extra results are because I added e.name to the list to be returned. But all that is needed is the 29 entries with the latest date and Providers names.
select a.name, c.name, e.name, MAX(B.date), MAX(b.renew_date) as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
GROUP BY a.name, c.name, e.name
order by MAX(b.renew_date), MAX(b.date) desc;
Can anyone rework this code to return a single DISTINCT Provider name with the MAX(renew_date) for each course.
This returns exactly one row per distinct combination of (a.name, c.name):
The one with the latest renew_date.
Among these, the one with the latest date (may differ from global max(date)!).
Among these, the one with the alphabetically first e.name:
SELECT DISTINCT ON (a.name, c.name)
a.name AS a_name, c.name AS c_name, e.name AS e_name
, b.renew_date, b.date
FROM boson_course c
JOIN boson_coursedetail b on c.id = b.course_id
JOIN boson_coursedetail_attendance d on d.coursedetail_id = b.id
JOIN boson_employee a on a.id = d.employee_id
JOIN boson_provider e on b.provider_id = e.id
WHERE a.id = 366
ORDER BY a.name, c.name
, b.renew_date DESC NULLS LAST
, b.date DESC NULLS LAST
, e.name;
The result is sorted by a_name, c_name first. If you need your original sort order, wrap this in a subquery:
SELECT *
FROM (<query from above>) sub
ORDER BY renew_date DESC NULLS LAST
, date DESC NULLS LAST
, a_name, c_name, e_name;
Explanation for DISTINCT ON:
Select first row in each GROUP BY group?
Why DESC NULL LAST?
PostgreSQL sort by datetime asc, null first?
Aside: Don't use basic type names like date ad column names. Also, name is hardly ever a good name. As you can see, we have to use aliases to make this query useful. Some general advice on naming conventions:
How to implement a many-to-many relationship in PostgreSQL?
Try using distinct on:
select distinct on (a.name, c.name, e.name), a.name, c.name, e.name,
B.date, b.renew_date as MAXDATE
from boson_course c
inner join boson_coursedetail b on (c.id = b.course_id)
inner join boson_coursedetail_attendance d on (d.coursedetail_id = b.id)
inner join boson_employee a on (a.id = d.employee_id)
inner join boson_provider e on b.provider_id = e.id
where a.id = 366
ORDER BY a.name, c.name, e.name, B.date desc
order by MAX(b.renew_date), MAX(b.date) desc;

why when I join the same table by using alias, the row increase?

I want to join the same table by using alias to find students who has the same name and take the same course. The below is what I do to achieve it:
select a.name, a.id
from student_takes a join student_takes b on a.name = b.name
where a.name = b.name and a.course_id = b.course_id
order by a.name;
The result table has more rows than the student_takes table. I expect that if I use inner join, the result table has at most as many row as the rows in one of the tables in from clause. When I use distinct keyword, the result table has less rows. I cannot figure out what are the duplicates in the above query?
For starters, you're not including a condition that says, "and the other student is not me." You'd need something like this (cleaning your JOIN and WHERE condition up):
select
a.name,
a.id,
a.course_id
from
student_takes a
join
student_takes b on
a.name = b.name and
a.course_id = b.course_id
where
a.id <> b.id -- assuming this ID is the student
order by a.name;
In addition, you could get multiplication. Consider what happens when three students with the same name take the same course.
Three original rows:
--> Student A
--> Student B
--> Student C
Matching the rows on "shares my name and takes my class but is not me" yields six results:
--> Student A - Student B
--> Student A - Student C
--> Student B - Student A
--> Student B - Student C
--> Student C - Student A
--> Student C - Student B
Notice that you have some logically different but practically identical results - student A matches to Student C, and then Student C matches to Student A.
Given that your original query is only retrieving data from the a table, you can kind of get around this with a DISTINCT keyword:
select distinct
a.name,
a.id,
a.course_id
from
student_takes a
join
student_takes b on
a.name = b.name and
a.course_id = b.course_id
where
a.id <> b.id -- assuming this ID is the student
order by a.name;
If you want to find students with the same name taking the same course, don't use a join, use group by:
select courseid, name, count(*) as cnt
from student_takes
group by courseid, name
having count(*) > 1;

SUM(SALARY) when ID is distinct

I am having trouble trying to solve this problem, I would like to only add a salary up if the
employee's id is distinct. I thought I could do this using the decode() function but I am having trouble defining an expression suitable. I was aiming for something like
SUM(DECODE(S.ID,IS DISTINCT,S.SALARY))
But this isn't going to work!
So the full query looks like
SELECT B.ID, SUM(S.SALARY), COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
GROUP BY B.ID;
But the problem is with SUM(S.SALARY) it's adding up salaries from duplicate ID's
I don't know about DECODE, but this should work:
SELECT
SUM(S.SALARY)
FROM <table> S
WHERE NOT EXISTS (
SELECT ID FROM <table> WHERE ID=S.ID GROUP BY ID HAVING COUNT(*)>1
)
Perhaps something like this...
SELECT E.ID, SUM(E.Salary)
FROM Employers E
WHERE E.ID IN (SELECT DISTINCT E2.ID FROM Employers E2)
GROUP BY E.ID
If not, perhaps you could post some sample data so that I can understand better
The joins are introducing duplicate rows. One way to fix this is by adding a row number to sequentially identify different ids. The real way would be to fix the joins so this doesn't happen, but here is the first way:
SELECT B.ID, SUM(CASE WHEN SEQNUM = 1 THEN S.SALARY END),
COUNT(DISTINCT S.ID), COUNT(DISTINCT RM.MEMBER_ID)
FROM (SELECT B.ID, S.ID, RM.MEMBER_ID,
ROW_NUMBER() OVER (PARTITION BY S.ID ORDER BY S.ID) as seqnum
FROM BRANCH B
INNER JOIN STAFF S ON S.BRANCH_ID = B.ID
INNER JOIN RECRUIT_MEMBER RM ON RM.BRANCH_ID = B.ID
) t
GROUP BY B.ID
You can create a virtual table with only one salary per ID like this...
SELECT
...whatever fields you've already got...
s.Salary
FROM
...whatever tables and joins you've already got...
LEFT JOIN (SELECT ID, MAX(SALARY) as "Salary" FROM SALARY_TABLE GROUP BY ID) s
ON whatevertable.ID = s.ID

Using an inner select construction for a select value

I would like to hear if anyone can tell me a simple syntax that accomplishes the same as the following (with the same flexibility):
SELECT C.CompanyName,
(SELECT Count(*) FROM Employees WHERE CompanyId = C.Id) as EmployeeCount
FROM Company C
Now, what's important is that the inner SELECT giving the EmployeeCount is:
An independent SELECT statement
This means that it should work with any existing SELECT, even if it already contains joins etc.
Can use values from the parent SELECT
I know that this scenario can be easily accomplished in other ways, but the above is a simplified example to explain the challenge. My real scenario is a complex SELECT statement where I do not want to complicate it by adding more joins. Performance is no issue.
Using INNER JOIN:
SELECT C.CompanyName, Count(E.*) as EmployeeCount
FROM Company C
INNER JOIN Employees E on E.CompanyId = C.Id
Using NESTED JOIN:
SELECT C.CompanyName, Count(E.1) as EmployeeCount
FROM Company C, Employess E
WHERE E.CompanyId = C.Id
If you want to use the same syntax, at least put this:
SELECT C.CompanyName,
(SELECT Count(1) FROM Employees WHERE CompanyId = C.Id) as EmployeeCount
FROM Company C
If you need all the data to be shown, even the ones the companies without any Employees, you can use a LEFT OUTER JOIN:
SELECT C.CompanyName, Count(E.*) as EmployeeCount
FROM Company C
LEFT OUTER JOIN Employees E on E.CompanyId = C.Id
Try using a derived table, which statifies both your conditions.
An independent SELECT statement.
a. Using a Derived Table allows you to keep your independent Select Statement
Can use values from the parent SELECT.
a. As an Inner join you can still use values from the parent select.
SELECT
C.CompanyName,
EC.EmployeeCount
FROM Company C
INNER JOIN (SELECT
Count(*) AS EmployeeCount
FROM Employees ) EC
ON WHERE EC.CompanyId = C.Id
If your inner select is complicated, then why not make a view of it:
CREATE VIEW EmpSelect AS
SELECT CompanyId, whatever FROM Employees;
Then
SELECT
C.CompanyName, Count(*) AS EmpCount
FROM
Company C
LEFT JOIN EmpSelect E
ON C.Id = E.CompanyId
GROUP BY
C.CompanyName;