Sum distinct by separate ID column - sql

I have some data of the form:
ID Value
A 2
B 2
C 3
A 2
A 2
C 3
B 2
I want to sum value by distinct IDs.
select sum(distinct value) from table would give the sum of 2 and 3 = 5. I don't want that, I want the sum of value for each ID, i.e. A=2, B=2, C=3, there's 3 distinct IDs so sum(2,2,3) = 7.
In 'sql-ish' I want something like select sum(distinct value by ID) from table. Is this possible?

Get the distinct combinations of ID and Value in a subquery and then the sum of Values:
SELECT SUM(Value) sum_value
FROM (SELECT DISTINCT ID, Value FROM tablename) t
Another way to do it is with SUM() window function:
SELECT DISTINCT SUM(MAX(Value)) OVER() sum_value
FROM tablename
GROUP BY ID
See the demo.

Related

Use window functions to select the value from a column based on the sum of another column, in an aggregate query

Consider this data (View on DB Fiddle):
id
dept
value
1
A
5
1
A
5
1
B
7
1
C
5
2
A
5
2
A
5
2
B
15
2
A
2
The base query I am running is pretty simple. Just get the total value by id and the most frequent dept.
SELECT
id,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) AS value
FROM test
GROUP BY id
;
id
dept_freq
value
1
A
22
2
A
27
But I also need to get, for each id, the dept that concentrates the greatest value (so the greatest sum of value by id and dept, not the highest individual value in the original table).
Is there any way to use window functions to achieve that and do it directly in the base query above?
The expected output for this particular example would be:
id
dept_freq
dept_value
value
1
A
A
22
2
A
B
27
I could achieve that with the query below and then joining that with the results of the base query above
SELECT * FROM(
SELECT
*,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value DESC) as row
FROM (
SELECT id, dept, SUM(value) AS value
FROM test
GROUP BY id, dept
) AS alias1
) AS alias2
WHERE alias2.row = 1
;
id
dept
value
row
1
A
10
1
2
B
15
1
But it is not easy to read/maintain and seems also pretty inefficient. So I thought it should be possible to achieve this using window functions directly in the base query, and that also may also help Postgres to come up with a better query plan that does less passes over the data. But none of my attempts using over partition and filter worked.
step-by-step demo:db<>fiddle
You can fetch the dept for the highest values using the first_value() partition function. Adding this before your mode() grouping should do it:
SELECT
id,
highest_value_dept,
MODE() WITHIN GROUP(ORDER BY dept) AS dept_freq,
SUM(value) as value
FROM (
SELECT
id,
dept,
value,
FIRST_VALUE(dept) OVER (PARTITION BY id ORDER BY value DESC) as highest_value_dept
FROM test
) s
GROUP BY 1,2

PostgreSQL: Create array by grouping values of the same id

Given the following input data:
id
category
1
A
1
B
2
A
2
R
2
C
3
Z
I aim aiming to get the following output table:
id
categories
1
{"A","B"}
2
{"A","R","C"}
3
{"Z"}
using the following query:
SELECT DISTINCT id,
ARRAY(SELECT DISTINCT category::VARCHAR FROM test) AS categories
FROM my_table
But what I get is the following table:
id
categories
1
{"A","B","R","C","Z"}
2
{"A","B","R","C","Z"}
3
{"A","B","R","C","Z"}
How can I obtain the desired output?
Note: The GROUP BY clause did not work in this case as I'm not using an aggregation function.
What about using the JSON_AGG aggregation function?
SELECT id,
JSON_AGG(category) AS category
FROM tab
GROUP BY id
ORDER BY id
Check the demo here.
Assuming table has name test
select distinct id,
array(select distinct category::varchar from test b where b.id = a.id) as categories
from test a

Find duplicate values only if separate column id differs

I have the following table:
id item
1 A
2 A
3 B
4 C
3 H
1 E
I'm looking to obtain duplicate values from the id column only when the item column differs in value. The end result should be:
1 A
1 E
3 B
3 H
I've attempted:
select id, items, count(*)
from table
group by id, items
HAVING count(*) > 1
But this is giving only duplicate values from the id column and not taking into account the items column.
Any suggestions will be greatly appreciated.
You can use a window function for this, this is generally far more efficient than using a self-join
SELECT
t.id,
t.items,
t.count
from (
SELECT *,
COUNT(*) OVER (PARTITION BY t.id) AS count
FROM YourTable t
) t
WHERE t.count > 1;
db<>fiddle

Get Count Based on Combinations of Values from Second Column

I have a table format like below:
Id Code
1 A
1 B
2 A
3 A
3 C
4 A
4 B
I am trying to get count of code combinations like below:
Code Count
A,B 2 -- Row 1,2 and Row 6,7
A 1 -- Row 3
A,C 1 -- Row 4
I am unable to get the combination result. All I can do is group by but I am not getting count of IDs based in combinations.
You need to aggregate the rows, somehow, and do that twice. The code looks something like this:
select codes, count(*) as num_ids
from (select id, group_concat(code order by code) as codes
from t
group by id
) id
group by code;
group_concat() might be spelled listagg() or string_agg() depending on the database.
In SQL Server, use string_agg():
select codes, count(*) as num_ids
from (select id, string_agg(code, ',') within group (order by code) as codes
from t
group by id
) id
group by code;

group by oracle

I have a table as below:
id value
-------------------------
1 1
5 1
7 1
8 4
I can't get to table as below:
id value
-------------------------
1 1
8 4
The SQL is
select id,value from table_1 group by id_a
All you have here is a simple MIN() aggregate.
SELECT MIN(id), value AS id FROM table_1 GROUP BY value
Try this:
select min(id), id_a from table_1 group by id_a
SELECT T.value,MIN(id) AS MIN_ID
FROM TABLE T
GROUP BY T.value;
In order to use a group by expression, you must have one or more aggregating functions: count, min, max, sum, avg etc. These functions operate on a group of rows at a time. Now when you use an aggregate function with a none aggregated column(s) you need to use the group by clause.
The below will give you the correct answer:
select min(id) id, value from table_1 group by value