Use 'group by' on one column but get data of multiple columns - sql

How can you manage to make a 'group by' on one column and still get data (the real data, not a 'sum') on the others?
Let me show an example of what I'd like to do:
Suppose Table A, with index on 'Group', a simple select * from A gives:
Group Album
---------------- ---------------
ABBA Waterloo
AC/DC Back in Black
ABBA Voulez-vous
ABBA Super Trooper
Imagine Dragons Night Visions
AC/DC Highway to Hell
ABBA The Visitors
I'd like to have the end result as following (knowing that I cannot have more than 4 albums for a group ... for now I guess):
Group Album1 Album2 Album3 Album4
---------------- --------------- --------------- --------------- ---------------
ABBA Waterloo Voulez-vous Super Trooper The visitors
AC/DC Back in Black Highway to Hell Null Null
Imagine Dragons Night Visions Null Null Null
So far, the closest I've come to make what I want is something like the following:
select tab4.GROUP,
tab1.ALBUM as PN1,
tab2.ALBUM as PN2,
tab3.ALBUM as PN3,
tab4.ALBUM as PN4
from
(
select A.GROUP, A.ALBUM
from A
where A.ROWID in
(select max(ROWID) from A
where GROUP in (select GROUP from A A group by A.GROUP having count(*) <= 4)
group by GROUP
)
) tab4
left join
(
select A.GROUP, A.ALBUM
from A A
where A.ROWID in
(select max(ROWID) from A
where GROUP in (select GROUP from A A group by A.GROUP having count(*) <= 3)
group by GROUP
)
) tab3 on tab4.GROUP = tab3.GROUP
left join
(
select A.GROUP, A.ALBUM
from A A
where A.ROWID in
(select max(ROWID) from A
where GROUP in (select GROUP from A A group by A.GROUP having count(*) <= 2)
group by GROUP
)
)tab2 on tab4.GROUP = tab2.GROUP
left join
(
select A.GROUP, A.ALBUM
from A A
where A.ROWID in
(select max(ROWID) from A
where GROUP in (select GROUP from A A group by A.GROUP having count(*) <= 1)
group by GROUP
)
) tab1 on tab4.GROUP = tab1.GROUP;
I know why the SQL request above is wrong: the max(rowid) will remain the same whatever condition on having count(*) is thrown.
There could be some pivotto be used, but I sincerely don't see how can it be used as I have only one table and need to get all data.
As furter precision, I don't need the have the result table in a spcific order and I can limit myself to 4 albums because I know each 'Group' won't have more than that ... but I'd appreciate something generic.
EDIT: Ok, seems I have forgotten to clarify that I'm on Oracle 10g (damn this legacy code ^^) so newer functions like PIVOT won't work.
Also, I'm not looking for a string aggregation like LISTAGG but really for separate columns.

#Alex Poole got it right: I was not only missing the equivalent of PIVOT code in 10g, but also ROW_NUMBER().
So the answer to my problem becomes as following:
select
tab1.group_name,
MAX(CASE WHEN tab1.rank_number = 1 THEN tab1.album_name ELSE NULL END) AS ALBUM_1,
MAX(CASE WHEN tab1.rank_number = 2 THEN tab1.album_name ELSE NULL END) AS ALBUM_2,
MAX(CASE WHEN tab1.rank_number = 3 THEN tab1.album_name ELSE NULL END) AS ALBUM_3,
MAX(CASE WHEN tab1.rank_number = 4 THEN tab1.album_name ELSE NULL END) AS ALBUM_4
from (
select group_name, album_name,
row_number() over (partition by group_name order by album_name) as rank_number
from tablea
) tab1
group by tab1.group_name;
Not sure if my title is the best for the kind of problem I had, guess I'll keep it as it is since it revolves around group by as well.

I believe this should work on 10i:
with r as (
select
group_,
album,
row_number() over (partition by group_ order by album) r
from
tq84_table_a
)
select
r.group_,
max(case when r.r=1 then r.album end) album1,
max(case when r.r=2 then r.album end) album2,
max(case when r.r=3 then r.album end) album3,
max(case when r.r=4 then r.album end) album4
from
r
group by
r.group_;
I don't have a 10i installation at hand, right now, so I can't test it.

Related

How to get rid of VIEW in this request

CREATE VIEW A1 AS
SELECT client_ID , COUNT(dog_id)
FROM test_clients
GROUP BY client_ID
HAVING COUNT(dog_id)=2;
CREATE VIEW A2 AS
SELECT filial , COUNT(A1.client_ID)
FROM A1
JOIN test_clients USING (client_ID)
GROUP BY filial
HAVING COUNT(A1.client_ID)>10;
SELECT COUNT(filial)
FROM A2;
As far as I understand, this can be done through a subquery, but how?
Burns down to:
SELECT count(*)
FROM (
SELECT 1
FROM (
SELECT client_id
FROM test_clients
GROUP BY 1
HAVING count(dog_id) = 2
) a1
JOIN test_clients USING (client_id)
GROUP BY filial
HAVING count(*) > 10
) a2;
Assuming filial is defined NOT NULL.
Probably faster to use a window function and get rid of the self-join:
SELECT count(*)
FROM (
SELECT 1
FROM (
SELECT filial
, count(dog_id) OVER (PARTITION BY client_id) AS dog_ct
FROM test_clients
) a1
WHERE dog_ct = 2
GROUP BY filial
HAVING count(*) > 10
) a2;
Depending on your exact table definition we might be able to optimize a bit further ...
A slight refractor of Erwin's suggestion, just for you to play around with...
The outer query works because...
the inner query happens first
the WHERE clause happens next
then the GROUP BY and HAVING clauses
then the SELECT clause (so the COUNT() OVER ())
finally the DISTINCT
SELECT
DISTINCT
COUNT(filial) OVER ()
FROM
(
SELECT
filial,
client_id,
COUNT(dog_id) OVER (PARTITION BY client_id) AS client_dog_ct
FROM
test_clients
)
count_dogs
WHERE
client_dog_ct = 2
GROUP BY
filial
HAVING
COUNT(DISTINCT client_id) > 10
You may or may not want the COUNT(DISTINCT client_id), its not clear. So, play with that too.
I'm not saying it's any better, just that it's different and might help your learning.

select value based on max of other column

I have a few questions about a table I'm trying to make in Postgres.
The following table is my input:
id
area
count
function
1
100
20
living
1
200
30
industry
2
400
10
living
2
400
10
industry
2
400
20
education
3
150
1
industry
3
150
1
education
I want to group by id and get the dominant function based on max area. With summing up the rows for area and count. When area is equal it should be based on max count, when area and count is equal it should be based on prior function (i still have to decide if education is prior to industry or vice versa). So the result should be:
id
area
count
function
1
300
50
industry
2
1200
40
education
3
300
2
industry
I tried a lot of things and maybe it's easy, but i don't get it. Can someone help to get the right SQL?
One method uses row_number() and conditional aggregation:
select id, sum(area), sum(count),
max(function) over (filter where seqnum = 1) as function
from (select t.*,
row_number() over (partition by id order by area desc) as seqnum
from t
) t
group by id;
Another method uses ``distinct on`:
select id, sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
function
from t
order by id, area desc;
Use a scalar sub-query for "function".
select t.id, sum(t.area), sum(t.count),
(
select "function"
from the_table
where id = t.id
order by area desc, count desc, "function" desc
limit 1
) as "function"
from the_table as t
group by t.id order by t.id;
SQL Fiddle
you can use sum as window function:
select distinct on (t.id)
id,
sum(area) over (partition by id) as area,
sum(count) over (partition by id) as count,
( select function from tbl_test where tbl_test.id = t.id order by count desc limit 1 ) as function
from tbl_test t
This is how you get the function for each group based on id:
select id, function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null;
(we ensure that no yt2 exists that would be of the same id but of higher areay)
This would work nicely, but you might have several max areas with different values. To cope with this isue, let's ensure that exactly one is chosen:
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id;
Now, let's join this to our main table;
select yourtable.id, sum(yourtable.area), sum(yourtable.count), t.function
from yourtable
join (
select id, max(function) as function
from yourtable yt1
left join yourtable yt2
on yt1.id = yt2.id and yt1.area < yt2.area
where yt2.area.id is null
group by id
) t
on yourtable.id = t.id
group by yourtable.id;

Put together two selects into one

Could you help me put the second select into first one? I need calculate rate of type in first select. Second select works good.
First select:
WITH "global" AS (
SELECT
m.id
,json_build_array(
ce.payload->>'Name',
ce.payload->>'Date',
ce.payload->>’Type,
ce.payload->>’Rate’,
row_number() over (partition by m.id order by ce.payload->>’Date’ desc)) as "value"
FROM public."events" ce
LEFT OUTER JOIN "external"."mapping" m
ON ce.id=m.id
WHERE ce.type IN ('cs_calls','pc_calls')
AND coalesce(ce.payload ->> 'Name', '')!=''
AND m.id IS NOT NULL
)
SELECT
id,
value
FROM “global”
Second select:
select
id,
cast(issue as float)/cast(total_count as float) as Rate
from (select
id,
sum(case when type='Issue' then 1 else 0 end) as issue,
count(*) total_count
from events
GROUP BY id)
If Id is the way to join this tables then you can try the following
select
g.id,
g.value,
((issue * 1.0) / total_count) as Rate
from
(
select
id,
sum(case when type='Issue' then 1 else 0 end) as issue,
count(*) total_count
from events
group by
id
) e
join global g
on e.id = g.id

Take precedence on a specific value from a table

For each person's distinct record that has a toyota,
only take toyota and filter out that person's other cars
else bring all cars.
The actual script will not match my logic above. I was trying to simplify my question by using random names and car brands, but the objective was the same since I wanted to get a specific address code and filter out the rest if it did exist for other vendor names (see below). Thank you, GMB.
GPMEM.dbo.PM00200 a -- Vendor Master
LEFT JOIN GPMEM.dbo.PM30200 b -- Historical/Paid Transactions
ON a.VENDORID = b.VENDORID
LEFT JOIN GPMEM.dbo.PM20000 c -- Open/Posted Transactions
ON a.VENDORID = c.VENDORID
LEFT JOIN (
SELECT d.*,
rank() over(
partition by d.VENDORID
order by case when d.ADRSCODE = 'ACH' THEN 0 ELSE 1 END
)rn
FROM GPMEM.dbo.PM00300 d
) d -- Vendor Address Master
ON a.VENDORID = d.VENDORID
WHERE
d.rn = 1
You can use window functions:
select colA, colB
from (
select
t.*,
rank() over(
partition by colA
order by case when colB = 'Toyota' then 0 else 1 end
) rn
from mytable t
) t
where rn = 1
The trick likes in the order by clause in the over() clause of window function rank(): if a person has a Toyota, it will be ranked first, and their (possible) other cars will be ranked second. If it has no Toyota, all their car will be ranked first.
You can do this with filtering logic:
select t.*
from t
where t.colb = 'toyota' or
not exists (select 1 from t t2 where t2.cola = t.cola and t2.colb = 'toyota');
If I were to use window functions for this, I would simply count the toyotas:
select t.*
from (select t.*,
sum(case when colb = 'toyota' then 1 else 0 end) over (partition by cola) as num_toyotas
from t
) t
where colb = 'toyota' or num_toyotas = 0;

Group By Retrieve 4 Values

I have the following query
SELECT Cod ,
MIN(Id) AS id_Min,
-- retrieve value min in the middle as id_Min_Middle,
-- retrieve value max in the middle as id_Max_Middle,
MAX(Id) AS id_Max,
COUNT(*) AS Tot
FROM Table a ( NOLOCK )
GROUP BY Cod
HAVING COUNT(*)=4
How could I retrieve the values between min and max as I have done for min and max?
If I use (SUM(Id) - (MIN(Id)+MAX(Id)) I get the sum of middle min and max, but not the values I want.
EXAMPLES
Cod | Id
Stack 10
Stack 15
Stack 11
Stack 40
Overflow 1
Overflow 120
Overflow 15
Overflow 100
Required output
Cod | Min | Min_In_The_Middle | Max_In_The_Middle | Max
Stack 10 11 15 40
Overflow 1 15 100 120
Just only one [Table|[Clustered] Index]]Scan (demo here):
SELECT pvt.Cod,
pvt.[1] AS MinValue,
pvt.[2] AS MinInterValue,
pvt.[3] AS MaxInterValue,
pvt.[4] AS MaxValue
FROM
(
SELECT x.Cod, x.ID, x.RowNumAsc
FROM
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID ASC) RowNumAsc,
ROW_NUMBER() OVER(PARTITION BY t.Cod ORDER BY t.ID DESC) RowNumDesc
FROM MyTable t
) x
WHERE x.RowNumAsc = 1 AND x.RowNumDesc = 4
OR x.RowNumAsc = 2 AND x.RowNumDesc = 3
OR x.RowNumAsc = 3 AND x.RowNumDesc = 2
OR x.RowNumAsc = 4 AND x.RowNumDesc = 1
) y
PIVOT ( MAX(y.ID) FOR y.RowNumAsc IN ([1], [2], [3], [4]) ) pvt;
Try using this, best of luck
WITH temp AS
(SELECT cod, MIN (ID) min_id, MAX (ID) max_id
FROM tab
GROUP BY cod
HAVING COUNT (ID) = 4)
SELECT code, temp.min_id,
(SELECT MIN (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.min_id)
GROUP BY cod) min_mid_id,
(SELECT MAX (ID)
FROM tab
WHERE cod = temp.cod AND ID NOT IN (temp.max_id)
GROUP BY cod) max_min_id, temp.max_id
FROM temp;
I'm not sure what it means for your question to be tagged plsql and sql-server. But I'll assume you're working with a database system that supports CTEs and window functions.
To generalize what you're been trying to do, first assign row numbers to the rows, then use whatever technique you want to achieve the pivot:
;WITH OrderedValues as (
SELECT Cod,Id,ROW_NUMBER() OVER (PARTITION BY Cod ORDER BY Id) as rn
COUNT(*) OVER (PARTITION BY Cod) as Cnt
FROM Table (NOLOCK)
), With4Values as (
SELECT * from OrderedValues where Cnt=4
)
SELECT Cod,
--However you want to do the pivot. Here I'll use MAX/CASE
MAX(CASE WHEN rn=1 THEN Id END) as Value1,
MAX(CASE WHEN rn=2 THEN Id END) as Value2,
MAX(CASE WHEN rn=3 THEN Id END) as Value3,
MAX(CASE WHEN rn=4 THEN Id END) as Value4
FROM
With4Values
GROUP BY
Cod
You can hopefully see that this is more easily extended to more columns than answering your overly specific questions about 3 rows, or 4 rows. But if you need to deal with an arbitrary number of columns, you'll have to switch to dynamic SQL.
I understand you want to exclude the extreme values and find min and max for the rest.
This is what I think of, but I had no chance to run and test it...
WITH Extremes AS ( SELECT Cod, MAX(ID) AS Id_Max, MIN(ID) AS Id_Min
FROM [Table] a GROUP BY Cod)
SELECT
e.Cod,
e.Id_Min,
MIN(a.Id) AS id_Min_Middle,
MAX(a.Id) AS id_Max_Middle,
e.Id_Max
FROM Extremes e
LEFT JOIN [Table] a ON a.Cod = e.Cod AND a.Id > e.Id_Min AND a.Id < e.Id_Max
GROUP BY e.Cod