How to use partition on a join to get a count

How to use partition on a join to get a count - sql

I'm confused on how to get a count without using group by on a join
I know I can get the desired results using group by, but the table joins are long and lots of selected headers with case statement so I was hoping to avoid that
I'm sure I've seen this done before using partition over but can't find a good example using it on a join. Maybe it's not possible!?
I've tried
select
p.FirstName,
p.Surname,
count(pr.RelativePersonId) over (partition by pr.RelativePersonId) as [RelativesOnRecord]
from People p
left join PersonRelatives pr
on p.PersonId = pr.PersonId
For my tables:
People
PersonId | FirstName | Surname
1 Jim Bo
2 Harry Bo
3 Strong Bo
PersonRelatives
Id | PersonId | RelativePersonId
1 1 2
2 1 3
Where I'm trying to get
PersonId | FirstName | Surname | RelativesOnRecord
1 Jim Bo 2
I also tried joining with a SELECT TOP 1 but that just gives me the one result so one count. Is this even possible without group by?

It seems you are partitioning by the wrong column - you want to have the number of relatives for each person from People, right ? Use
count(pr.RelativePersonId) over (partition by pr.PersonId) as [RelativesOnRecord]

Based on your example, you want aggregation:
select p.PersonId, p.FirstName, p.Surname, count(*) as [RelativesOnRecord]
from People p join
PersonRelatives pr
on p.PersonId = pr.PersonId
group by p.PersonId, p.FirstName, p.Surname;
You could use apply or a correlated subquery, but window functions do not seem appropriate here.

Related

SQL multiple Joing Question, cant join 5 tables, problem with max

I got 6 tables:
Albums
id_album | title | id_band | year |
Bands
id_band | name |style | origin
composers
id_musician | id_song
members
id_musician | id_band | instrument
musicians
id_musician | name | birth | death | gender
songs
id_song | title | duration | id_album
I need to write a query where I get the six bands with more members and of those bands, get the longest song duration and it's title.
So far, I can get the biggest bands:
SELECT bands.name, COUNT(id_musician) AS numberMusician
FROM bands
INNER JOIN members USING (id_band)
GROUP BY bands.name
ORDER BY numberMusician DESC
LIMIT 6;
I can also get the longest songs:
SELECT MAX(duration), songs.title, id_album, id_band
FROM SONGs
INNER JOIN albums USING (id_album)
GROUP BY songs.title, id_album, id_band
ORDER BY MAX(duration) DESC
The problem occurs when I am trying to write a subquery to get the band with the corresponding song and its duration. Trying to do it with inner joins also gets me undesired results. Could someone help me?
I have tried to put the subquery in the where, but I can't find how to do it due to MAX.
Thanks

I find that using lateral joins makre the query easier to write. You already have the join logic all right, so we just need to correlate the bands with the musicians the songs.
So:
select b.name, m.*, s.*
from bands b
cross join lateral (
select count(*) as cnt_musicians
from members m
where m.id_band = b.id_band
) m
cross join lateral (
select s.title, s.duration
from songs s
inner join albums a using (id_album)
where a.id_band = b.id_band
order by s.duration desc limit 1
) s
order by m.cnt_musicians desc
limit 6
For each band, subquery m counts the number of musicians per group (its where clause correlates to the outer query), while s retrieves the longest song, using correlation, order by and limit. The outer query just combines the information, and then orders selects the top 6 bands.

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.

I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.

If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

Summarize Null Values in Table with Group By

I have two tables:
Person(ID, Name)
Sports(person_ID, Sport)
The Problem: Sport can have NULL values. And if that is the case then if I group by ID the sport should be NULL.
SELECT p.ID, p.Name, s.Sport
FROM Person p
INNER JOIN Sports s ON p.ID=s.person_id
GROUP BY p.ID
Without the Group By the table looks like this:
p.ID p.Name s.Sport
1 tom soccer
1 tom NULL
2 lisa golf
2 lisa soccer
3 tim golf
3 tim NULL
What I want now:
1 tom NULL
2 lisa golf
3 tim NULL
But what I get:
1 tom soccer
2 lisa golf
3 tim golf
I've tried subselects and ifs but I couldn't get anything to work. Thanks in advance!

Here is a query which should generate your expected result set, though as #jarlh has pointed out, it isn't clear why Lisa should play golf over soccer.
SELECT
p.ID,
p.Name,
CASE WHEN COUNT(CASE WHEN s.Sport IS NULL THEN 1 END) > 0
THEN NULL ELSE MIN(s.Sport) END AS Sport
FROM Person p
INNER JOIN Sports s
ON p.ID = s.person_id
GROUP BY
p.ID,
p.name;
Note that I group by both the ID and name, which would be required on many databases (though perhaps not SQLite).

you can't manage the NULL value with aggreagtion function as MIN()
but you could try
SELECT p.ID, p.Name, min(ifnull(s.Sport,''))
FROM Person p
INNER JOIN Sports s ON p.ID=s.person_id
GROUP BY p.ID, p.name

Assuming the version of SQLLite you are using supports row_number(), please try below, you can set a row_number to 1 if you order by s.sport ASC, then select the first row for each category. If there is NULL, it should locate at the top row of each category via this query. You don't need to use group by:
;with cte as (
select p.ID, p.Name, s.Sport,
ROW_NUMBER() OVER (PARTITION BY p.ID ORDER BY s.Sport ASC) AS rn
FROM Person p INNER JOIN Sports s ON p.ID=s.person_id
)
select *
from cte
where rn=1

You can do this with a correlated subquery, avoiding the join in the outer query:
select p.*,
(select s.sport
from sports s
where s.personId = p.id
order by (s.sport is null) desc, s.sport asc
) as min_sport
from person p;
This may prove useful under some circumstances. With an index on sports(personid, sport), it might be faster than the group by, depending on the data (lots of people, few sports per person).
Also, this is slightly different from your query because it returns all people, even those with no sports.

How to use LISTAGG to concatenate from multiple rows?

I have report query along these lines in APEX 5.0:
WITH inner_table AS
( select distinct
i.ID
,i.name
,i.lastname
,case i.gender
when 'm' then 'Male'
when 'f' then 'Female'
end gender
,i.username
,b.name region
,i.address
,i.city city
,i.EMAIL
,r.name as "ROLE"
,ie.address as "region_location"
,case
when i.gender='m' THEN 'blue'
when i.gender='f' THEN '#F6358A'
END i_color
,b.course as COURSE
,si.city UNIVERSITY
,case
when i.id in (select app_user from scholarship) then 'check'
else 'close'
end as scholarship,
case
when i.id in (select ieur.app_user from ie_user_role ieur where role=4) then 'Admin'
else ''
end admin,
apex_item.checkbox(10, i.id, 'UNCHECKED onclick="highlightRow(this);"') as Del_usr
from app_users i left join regions b on (i.region=b.id)
left join ie_user_role ur on (i.id = ur.app_user)
left join ie_roles r on(ur.role = r.id)
left join user_house uh on (i.id=uh.app_user)
left join reg_location ie on (uh.house=ie.id)
left join study_list sl on i.id = sl.insan
left join study_institute si on sl.institute = si.id
left join course c on sl.course = c.id
where i.is_active='Y'
order by
i.name,i.lastname,i.username,region, city, i.EMAIL)
SELECT * FROM inner_table where (scholarship = :P5_SCHOLARSHIP or :P5_SCHOLARSHIP is null)
I might get results like this:
|---------------------|------------------|-------|------------------|
| Name | Lastname | ... | Course |
|---------------------|------------------|-------|------------------|
| Some | User | ... | Course1 |
|---------------------|------------------|-------|------------------|
| Some | User | ... | Course2 |
|---------------------|------------------|-------|------------------|
But I would like to achieve enlisted courses in same row, that was repeating previously, so:
|---------------------|------------------|-------|------------------|
| Name | Lastname | ... | Course |
|---------------------|------------------|-------|------------------|
| Some | User | ... | Course1, Course2 |
|---------------------|------------------|-------|------------------|
I tried using LISTAGG, and I didn't note down my attempts, so unfortunately I can't post that now. I basically tried:
,LISTAGG(b.course, ', ') within group (order by b.course) as COURSE
Then adding GROUP BY using COURSE, but in that case whole query is affected by GROUP BY and I have to apply other columns correctly, right? Otherwise its resulting in "ORA-00937: not a single-group group function". I got lost a bit there.
Other thing I tried is using a subquery table with same LISTAGG line above, and got wanted output from subquery, but then joining to the rest of the query didn't provide expected results.
I think I could use a bit of SQL help here for LISTAGG when joining multiple tables.
Thanks.

When you use an aggregate function (that collapses multiple rows into one) you need a GROUP BY clause, so you'd need something like this:
SELECT i.username,
LISTAGG( c.course, ', ' ) WITHIN GROUP ORDER BY ( c.course )
FROM app_users i
...
LEFT JOIN course c on sl.course = c.id
GROUP BY i.username
Basically, anything that's not being aggregated, needs to be in the GROUP BY clause. Try it in a much simpler query until you get the hang of it, then make your big one.

What you want is LISTAGG with an analytical window function. Then remove duplicates using distinct. Here is my sample result/ data: http://sqlfiddle.com/#!4/6e8e3f/3
Select DISTINCT name, last_name, other columns,
LISTAGG(course, ', ') WITHIN GROUP (ORDER BY course)
OVER (PARTITION BY name, last_name) as "Course"
FROM inner_table;

SQL Removing Duplicate rows

I've been trying to remove duplicates using HAVING count(*) > 1, group by, distinct and sub queries but can't get any of these to work..
SELECT UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID
GROUP BY TblStaff.User_Code, BuildingNo
What I get is..
StaffID1 | BuildingNo1
StaffID1 | BuildingNo2
StaffID2 | BuildingNo2
StaffID3 | BuildingNo1
StaffID3 | BuildingNo2
I'm trying to get it so it just displays staff with one building number (if they have two regardless of which it shows) like:
StaffID1 | BuildingNo1
StaffID2 | BuildingNo2
StaffID3 | BuildingNo1
It can't be too hard.. I've tried CTE's left joining the building to the staff table, these come up NULL for some reason when I try this
Any help would be great!

Don't group by BuildingNo, then you can use having to filter out the groups you want.
SELECT s.UserID, min(b.BuildingNo) as buildingno
FROM Staff s
JOIN TblBuildings ON s.StaffID = b.TblBuildingsStaffID
GROUP BY s.UserID
having count(distinct b.BuildingNo) = 1;
The min() aggregate is required because buildingno is not part of the group by clause. But as the having() clause only returns those with one building, it doesn't change anything.
If you want to display all staff members, and simply pick one (arbitrary) building, then simply leave out the having condition.
If you want to include staff members without a building you need a left join:
SELECT s.UserID, min(b.BuildingNo) as buildingno
FROM Staff s
LEFT JOIN TblBuildings b ON s.StaffID = t.TblBuildingsStaffID
GROUP BY t.UserID;

Use row partition keyword in your query to avoid duplicacy
WITH CTE AS( SELECT ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY UserID ) AS 'Num',UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID
GROUP BY TblStaff.User_Code, BuildingNo)
SELECT * FROM CTE
WHERE Num =1

try this -
SELECT distinct UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use partition on a join to get a count - sql

It seems you are partitioning by the wrong column - you want to have the number of relatives for each person from People, right ? Use count(pr.RelativePersonId) over (partition by pr.PersonId) as [RelativesOnRecord]

Related

SQL multiple Joing Question, cant join 5 tables, problem with max

How to Limit Results Per Match on a Left Join - SQL Server

Summarize Null Values in Table with Group By

How to use LISTAGG to concatenate from multiple rows?

SQL Removing Duplicate rows

Categories

Resources