SQL: Max of summed list from subgroup - sql

I've been looking around for a solution for this, and found many questions about finding a max of a summed group, but none of them solving my problem, hence I decided to make a new question.
My data is grouped in 3 levels and data would be like this:
+--------+---------+----+
| Sektor | Sektion | n |
+--------+---------+----+
| 1 | a | 9 |
| 1 | b | 14 |
| 1 | a | 6 |
| 2 | d | 4 |
| 2 | d | 7 |
| 2 | f | 10 |
| 2 | e | 100|
| 3 | g | 59 |
| 4 | h | 200|
+--------+---------+----+
I would like to find the "sektion" with highest summed n for each "sektor".
I tried some different approaches, but none of them solved my problem. The closest i got was:
select
sektor, sektion, n
from
table
where
n = (select max(n) from table i where i.sektor = table.sektor)
GROUP BY sektor, sektion, n
ORDER BY n DESC
This would return
+--------+---------+----+
| Sektor | Sektion | n |
+--------+---------+----+
| 1 | b | 14 |
| 2 | e | 100|
| 3 | g | 59 |
| 4 | h | 200|
+--------+---------+----+
The problem is i don't get the max(n) for each group, but single instance. Sektor 1 should return sektion a with 15 instead.
Am i close to the answer, or far away? Seems like i just need to sum before taking the max, but not sure how.
Desired:
+--------+---------+----+
| Sektor | Sektion | n |
+--------+---------+----+
| 1 | a | 15 |
| 2 | e | 100|
| 3 | g | 59 |
| 4 | h | 200|
+--------+---------+----+

You could try the below, this works for me
Select Distinct
sektor, Sektion, n
from
table t
where
n = (select max(n) from table where Sektor = t.Sektor)
Looking at your question again, it looks like all you were missing was a group by in your subquery
Also as your comment suggests that you want the max based on the sum, you may find this works for that scenario:
Select Sektor, Sektion, SUM(n) n
from Table t
Group by Sektor, Sektion
Having Sum(n) = (Select max(n) from (Select SUM(n) n from Table where Sektor = t.Sektor group by Sektion) a)

Related

Merge groups if they contain the same value

I have the following table:
+-----+----+---------+
| grp | id | sub_grp |
+-----+----+---------+
| 10 | A2 | 1 |
| 10 | B4 | 2 |
| 10 | F1 | 2 |
| 10 | B3 | 3 |
| 10 | C2 | 4 |
| 10 | A2 | 4 |
| 10 | H4 | 5 |
| 10 | K0 | 5 |
| 10 | Z3 | 5 |
| 10 | F1 | 5 |
| 10 | A1 | 5 |
| 10 | A | 6 |
| 10 | B | 6 |
| 10 | B | 7 |
| 10 | C | 7 |
| 10 | C | 8 |
| 10 | D | 8 |
| 20 | A | 1 |
| 20 | B | 1 |
| 20 | B | 2 |
| 20 | C | 2 |
| 20 | C | 3 |
| 20 | D | 3 |
+-----+----+---------+
Within every grp, my goal is to merge all the sub_grp sharing at least one id.
More than 2 sub_grp can be merged together.
The expected result should be:
+-----+----+---------+
| grp | id | sub_grp |
+-----+----+---------+
| 10 | A2 | 1 |
| 10 | B4 | 2 |
| 10 | F1 | 2 |
| 10 | B3 | 3 |
| 10 | C2 | 1 |
| 10 | A2 | 1 |
| 10 | H4 | 2 |
| 10 | K0 | 2 |
| 10 | Z3 | 2 |
| 10 | F1 | 2 |
| 10 | A1 | 2 |
| 10 | A | 6 |
| 10 | B | 6 |
| 10 | B | 6 |
| 10 | C | 6 |
| 10 | C | 6 |
| 10 | D | 6 |
| 20 | A | 1 |
| 20 | B | 1 |
| 20 | B | 1 |
| 20 | C | 1 |
| 20 | C | 1 |
| 20 | D | 1 |
+-----+----+---------+
Here is a SQL Fiddle with the test values: http://sqlfiddle.com/#!9/13666c/2
I am trying to solve this either with a stored procedure or queries.
This is an evolution from my previous problem: Merge rows containing same values
My understanding of the problem
Merge sub_grp (for a given grp) if any one of the IDs in one sub_grp match any one of the IDs in another sub_grp. A given sub_grp can be merged with only one other (the earliest in ascending order) sub_grp.
Disclaimer
This code may work. Not tested as OP did not provide DDLs and data scripts.
Solution
UPDATE final
SET sub_grp = new_sub_grp
FROM
-- For each grp, sub_grp combination return a matching new_sub_grp
( SELECT a.grp, a.sub_grp, MatchGrp.sub_grp AS new_sub_grp
FROM tbl AS a
-- Inner join will exclude cases where there are no matching sub_grp and thus nothing to update.
INNER JOIN
-- Find the earliest (if more than one sub-group is a match) matching sub-group where one of the IDs matches
( SELECT TOP 1 grp, sub_grp
FROM tbl AS b
-- b.sub_grp > a.sub_grp - this will only look at the earlier sub-groups avoiding the "double linking"
WHERE b.grp = a.grp AND b.sub_grp > a.sub_grp AND b.ID = a.ID
ORDER BY grp, sub_grp ) AS MatchGrp ON 1 = 1
-- Only return one record per grp, sub_grp combo
GROUP BY grp, sub_grp, MatchGrp.sub_grp ) AS final
You can re-number sub groups afterwards as a separate update statement with the help of DENSE_RANK window function.

Using a Group BY or PARTITION on multiple columns where records are grouped together if ANY of the columns return a match

Let's say I have the following table:
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars
1 | A | V | F | 10
2 | A | W | G | 20
3 | B | W | H | 30
4 | B | X | I | 40
5 | C | Y | F | 50
6 | C | Z | J | 60
If I try to use a 'GROUP BY' or 'Over (PARTITION BY)' on Match_criteria_1, Match_criteria_2, and Match_criteria_3, I would end up with separate 6 groups/partitions.
SELECT *, sum(Dollars) OVER (PARTITION BY Match_criteria_1, Match_criteria_2, Match_Criteria_3) AS Total_Dollars
FROM My_table
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars | Total_Dollars
1 | A | V | F | 10 | 10
2 | A | W | G | 20 | 20
3 | B | W | H | 30 | 30
4 | B | X | I | 40 | 40
5 | C | Y | F | 50 | 50
6 | C | Z | J | 60 | 60
As you can see, none of the records have the same Match_criteria_1, Match_criteria_2, and Match_criteria_3.
But what if I wanted to group records that have the same Match_criteria_1, Match_criteria_2 OR Match_criteria_3?
So using my example, Record 1 matches with Record 2 due to Match_criteria_1, Record 2 matches with Record 3 due to Match_criteria_2, Record 3 matches with Record 4 due to Match_criteria_1, Record 5 matches with Record 1 due to Match_Criteria_3, and Record 6 matches with Record 5 due to Match_criteria 1 (so sort of a transitive property thing going on). The desired result then is :
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars | Total_Dollars
1 | A | V | F | 10 | 210
2 | A | W | G | 20 | 210
3 | B | W | H | 30 | 210
4 | B | X | I | 40 | 210
5 | C | Y | F | 50 | 210
6 | C | Z | J | 60 | 210
where Total_dollars is the sum of every record due to the fact that all six policies match with each other due to transitivity. So Records 1 and 6 may have no match criteria in common but they are still grouped together because they both match with Record 5.
Perhaps I am understanding this incorrectly, but I think you could just get the total dollars for all rows that match other rows and join to that? I'm not sure exactly what your expected output should be if a row doesn't match. This answer will still include that row, but the total dollars would not include the amount in that row.
SELECT test.*, total_dollars
FROM test,
(SELECT sum(dollars) as total_dollars
FROM (select distinct test.* FROM test
JOIN test test2 ON (ARRAY[test.match_criteria_1, test.match_criteria_2, test.match_criteria_3] && ARRAY[test2.match_criteria_1, test2.match_criteria_2, test2.match_criteria_3])
AND test.record_id != test2.record_id order by 1
) sub)
sub2;
I added another row that shouldn't match any of the others:
insert into test VALUES (7, 'M', 'N', 'O', 100);
and here are the results:
record_id | match_criteria_1 | match_criteria_2 | match_criteria_3 | dollars | total_dollars
-----------+------------------+------------------+------------------+---------+---------------
1 | A | V | F | 10 | 210
2 | A | W | G | 20 | 210
3 | B | W | H | 30 | 210
4 | B | X | I | 40 | 210
5 | C | Y | F | 50 | 210
6 | C | Z | J | 60 | 210
7 | M | N | O | 100 | 210
(7 rows)

Group rows by an incrementing column in PostgreSQL

I have the table shown below, with only one column. What I want to achieve is to separate all rows that have no gap in x, for example the numbers 1-3, 5-6 and 8-9 (because the gaps are 4 and 7).
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
| 5 |
| 6 |
| 8 |
| 9 |
+---+
I would like to make it look like this: a table with two columns (a and b), indicating the ranges where there are no gaps in the previous column x. For every gap a new record is inserted. How would I go about it in PostgreSQL?
+---+---+
| a | b |
+---+---+
| 1 | 3 |
| 5 | 6 |
| 8 | 9 |
+---+---+
You can compare the sequence with gaps to a sequence without gaps:
select min(x), max(x)
from
(
select x,
x-row_number() over (order by x) as dummy
from tab
) as dt
group by dummy
x | row_number | x - row_number
| 1 | 1 | 0 -- same value for consecutive values without gaps
| 2 | 2 | 0
| 3 | 3 | 0
| 5 | 4 | 1
| 6 | 5 | 1
| 8 | 6 | 2
| 9 | 7 | 2

find other columns value based on maximum of one column using groupby particular column

I have data like below
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 45 | 3 | A |
| 78 | 4 | A |
| 52 | 5 | A |
| 24 | 6 | A |
| 22 | 1 | B |
| 22 | 2 | B |
| 34 | 3 | B |
| 37 | 4 | B |
| 52 | 5 | B |
| 34 | 6 | B |
| 13 | 1 | C |
| 30 | 2 | C |
| 57 | 3 | C |
| 111 | 4 | C |
| 35 | 5 | C |
+-------+---------+--------+
Want to find Mindif and device based on max value of count.
Output be like
+-------+---------+--------+
| Count | Mindif | Device |
+-------+---------+--------+
| 78 | 4 | A |
| 52 | 5 | B |
| 111 | 4 | C |
+-------+---------+--------+
You can use a query like this:
SELECT t1.Count, t1.Mindif, t1.Device
FROM mytable AS t1
JOIN (
SELECT Device, MAX(Count) AS Count
FROM mytable
GROUP BY Device
) AS t2 ON t1.Device = t2.Device AND t1.Count = t2.Count
The query uses a derived table that returns the max Count value per Device. Joining back to the original table we can get the desired result.
using Window Function
SELECT Count, Mindif, Device
FROM
(SELECT Count, Mindif, Device,
rank() over (order by Count desc) as r
FROM table) S
WHERE S.r = 1;
OR
Simple Join with MAX
SELECT a.* FROM table a
LEFT SEMI JOIN
(SELECT MAX(Count)Cnt
FROM table)b on (a.Count = b.Cnt)

Select distinct combinations of values

I have a table with X values and Y values, both INT. What I want to do is group on the X value with the condition that it contains a distinct combination of Y values. I also want to see the total number of each combination.
I tried using SUM ( POWER (2, Y)), but that generates numbers that are too big as Y can get up to about 300 in some cases.
+--------------+--------------+
| X | Y |
+--------------+--------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 4 |
| 1 | 6 |
| 2 | 1 |
| 2 | 2 |
| 2 | 4 |
| 2 | 6 |
| 3 | 2 |
| 3 | 3 |
| 3 | 5 |
| 4 | 2 |
| 4 | 3 |
| 4 | 5 |
| 5 | 2 |
| 5 | 3 |
| 5 | 6 |
+--------------+--------------+
I want the result to look something like:
+--------------+--------------+
| X | COUNT |
+--------------+--------------+
| 1 | 2 |
| 3 | 2 |
| 5 | 1 |
+--------------+--------------+
Based on your description (but not on your sample data) next query should do:
select X, count(distinct Y)
from TBL
group by X
Thanks for trying to help. I realize that it might have been hard to understand what I was trying to do.
Anyway, I ended up solving it with the checksum_agg aggregate function.