Alternative to GROUP BY to consolidate repeating values in single column - sql

I'm using DB2 which apparently doesn't let you use the GROUP BY clause when it's returning more than one column. I have records that have repeating values for ID and name, for example:
EmpID | - name - | code
___________________________________
111111 | Williams | 1
---------------------------------
111111 | Williams | 2
----------------------------------
111112 | Davis | 3
---------------------------------
111113 | Gomez | 1
----------------------------------
111113 | Gomez | 3
----------------------------------
(Excuse my formatting) I need to get a single instance of each employee with a code (doesn't matter which code instance gets omitted as long as one shows up per employee).
Normally I could do:
SELECT * FROM employees GROUP BY EmpID;
DB2 doesn't let you do this for some reason. It says " The grouping is inconsistent." You can do:
SELECT EmpID from employees GROUP BY EmpID;
but if you introduce more return values then it gives you the error.
I tried looking into using a subquery and derived tables but I'm not sure how to compose it to select only one code value and exclude the records with a repeating employee value. If anyone has an answer or could point me to another thread that addresses this problem I would appreciate it very much.

It is required in most databases to GROUP BY each column in the SELECT list that is not in an aggregate function that is why you received an error message.
For your situation if it does not matter what code value is returned, then you can use an aggregate function and group by:
SELECT EmpID, name, MIN(code) code
FROM employees
GROUP BY EmpID, name;
See Demo
The group by is applied to both the EmpId and name, while the aggregate function is applied to the code column.
Note that due to that EmpID and name are functionally dependent on each other (as far as we can see from the sample you posted and the "repeating values" comment), the following two queries will return the same, identical results as the above query:
--- GROUP BY EmpID
------------------
SELECT EmpID, MIN(name) name, MIN(code) code
FROM employees
GROUP BY EmpID;
--- GROUP BY name
-----------------
SELECT MIN(EmpID) EmpID, name, MIN(code) code
FROM employees
GROUP BY name;

Related

Combining two mostly identical rows in SQL

I have a table that contains data like below:
Name
ID
Dept
Joe
1001
Accounting
Joe
1001
Marketing
Mary
1003
Administration
Mary
1009
Accounting
Each row is uniquely identified with a combo of Name and ID. I want the resulting table to combine rows that have same Name and ID and put their dept's together separated by a comma in alpha order. So the result would be:
Name
ID
Dept
Joe
1001
Accounting, Marketing
Mary
1003
Administration
Mary
1009
Accounting
I am not sure how to approach this. So far I have this, which doesn't really do what I need:
SELECT Name, ID, COUNT(*)
FROM employees
GROUP BY Name, ID
I know COUNT(*) is irrelevant here, but I am not sure what to do. Any help is appreciated! By the way, I am using PostgreSQL and I am new to the language.
Apparently there is an aggregate function for string concatenation with PostgreSQL. Find documentation here. Try the following:
SELECT Name, ID, string_agg(Dept, ', ' ORDER BY Dept ASC) AS Departments
FROM employees
GROUP BY Name, ID

How to use Max while taking other values from another column?

I am new in SQL and have problem picking the biggest value of a column for every manager_id and also other information in the same row.
Let me show the example - consider this table:
name
manager_id
sales
John
1
100
David
1
80
Selena
2
26
Leo
1
120
Frank
2
97
Sara
2
105
and the result I am expecting would be like this:
name
manager_id
top_sales
Leo
1
120
Sara
2
105
I tried using Max but the problem is that I must group it with manager_id and not being able to take name of the salesPerson.
select manager_id, max(sales) as top_sales
from table
group by manager_id ;
This is just an example and the actual query is very long and I am taking the information from different tables. I know that I can use the same table and join it again but the problem is as I mentioned this query is very long as I am extracting info from different tables with multiple conditions. And I don't want to make a temporary table to save it. It should be done in one single query and I actually did solve this but the query is super long due to the inner join that I used and made original table twice.
My question is that can I use Max and have the value in the name column or is there other method to solve this?
Appreciate all help
You can use row_number() with CTE to get the highest sales for each manager as below:
with MaxSales as (
select name, manager_id, sales,row_number() over (partition by manager_id order by sales desc) rownumber from table
)
select name , manager_id ,sales from MaxSales where rownumber=1

selecting duplicate columns in psql then sorting by row id

I have tried searching the web and whilst there are plenty of answers for finding duplicates, I am yet to stumble on one that allows me to find all the duplicates within a column (i.e where the same 'name' occurs more than once) and then only select the lowest row id (which would be the first duplicate name entered).
So the table's description (inserted from a file):
create table customer(id int, name varchar,)
id| name
1 | Darren
2 | Mark
3 | Julie
4 | Mark
5 | Julie
The query:
CREATE VIEW AS
SELECT COUNT(name), name
FROM customer
GROUP BY name
HAVING COUNT(name) > 1
Result (the order is never guaranteed, I want Mark to always come first as he has the lowest id):
Julie
Mark
Now the issue is, if i select id I have to include it in the group by. Doing that means no duplicate columns get selected as there wont be any since ever id is unique. And without selecting id I cant ORDER BY desc.
I hope I am clear, if not I can re-word or supply more information.
Please try this? Nested query. Basically the SELECT/GROUP is called. On the outside, we get the information selected and sort it.
CREATE VIEW AS
SELECT CNT_NAME, NAME
FROM
(
SELECT COUNT(name) CNT_NAME, name, min(id) min_id
FROM customer
GROUP BY name
HAVING COUNT(name) > 1
) AS alias
ORDER BY MIN_ID

SQL get differences in one column by ID

It's hard for me to word what I want which is why I've had trouble researching this issue. What I want is to look at a table by id and see if another column changes:
id name
---- ------
1 Al
2 Mia
1 Al
2 Jean
In the example, I don't care about id 1 because the name always stayed as Al but I care about id 2 because there is a record with the name Mia but then, that id 2 also has a record with the name Jean. I was thinking of using group by somehow but that doesn't work. Any ideas?
Try this:
SELECT id
FROM mytable
GROUP BY id
HAVING MIN(name) <> MAX(name)
This will select all ids having at least two different values.

How to add aggregate value to SELECT?

I'm selecting data from multiple tables and I also need to get maximum "timestamp" on those tables. I will need that to create custom cache control.
tbl_name tbl_surname
id | name id | surname
--------- ------------
0 | John 0 | Doe
1 | Jane 1 | Tully
... ...
I have following query:
SELECT name, surname FROM tbl_name, tbl_surname WHERE tbl_name.id = tbl_surname.id
and I need to add following info to result set:
SELECT MAX(ora_rowscn) FROM (SELECT ora_rowscn FROM tbl_name
UNION ALL
SELECT ora_rowscn FROM tbl_surname);
I was trying to use UNION but I get error - mixing group and not single group data - or something like that, I know why I cannot use the union.
I don't want to split this into 2 calls, because I need the timestamp of the current snapshot I took from DB for my cache management. And between select and the call for MAX the DB could change.
Here is result I want:
John | Doe | 123456
Jane | Tully | 123456
where 123456 is approximate time of last change (insert, update, delete) of tables tbl_name and tbl_surname.
I have read only access to DB, so I cannot create triggers, stored procedures, extra tables etc...
Thanks for any suggestions.
EDIT: The value *ora_rowscn* is assigned per block of rows. So in one table this value can differ per row. I need the maximal value from both (all) tables involved in query.
Try:
SELECT name,
surname,
max(greatest(tbl_name.ora_rowscn, tbl_surname.ora_rowscn)) over () as max_rowscn
FROM tbl_name, tbl_surname
WHERE tbl_name.id = tbl_surname.id
There's no need to aggregate here - just include both ora_rowscn values in your query and take the max:
SELECT
n.name,
n.ora_rowscn as n_ora_rowscn,
s.surname,
s.ora_rowscn as s_ora_rowscn,
greatest(n.ora_rowscn, s.ora_rowscn) as last_ora_rowscn
FROM tbl_name n
join tbl_surname s on n.id = s.id
BTW, I've replaced your old-style joins with ANSI style - better readable, IMHO.