SQL Removing Duplicate rows - sql

I've been trying to remove duplicates using HAVING count(*) > 1, group by, distinct and sub queries but can't get any of these to work..
SELECT UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID
GROUP BY TblStaff.User_Code, BuildingNo
What I get is..
StaffID1 | BuildingNo1
StaffID1 | BuildingNo2
StaffID2 | BuildingNo2
StaffID3 | BuildingNo1
StaffID3 | BuildingNo2
I'm trying to get it so it just displays staff with one building number (if they have two regardless of which it shows) like:
StaffID1 | BuildingNo1
StaffID2 | BuildingNo2
StaffID3 | BuildingNo1
It can't be too hard.. I've tried CTE's left joining the building to the staff table, these come up NULL for some reason when I try this
Any help would be great!

Don't group by BuildingNo, then you can use having to filter out the groups you want.
SELECT s.UserID, min(b.BuildingNo) as buildingno
FROM Staff s
JOIN TblBuildings ON s.StaffID = b.TblBuildingsStaffID
GROUP BY s.UserID
having count(distinct b.BuildingNo) = 1;
The min() aggregate is required because buildingno is not part of the group by clause. But as the having() clause only returns those with one building, it doesn't change anything.
If you want to display all staff members, and simply pick one (arbitrary) building, then simply leave out the having condition.
If you want to include staff members without a building you need a left join:
SELECT s.UserID, min(b.BuildingNo) as buildingno
FROM Staff s
LEFT JOIN TblBuildings b ON s.StaffID = t.TblBuildingsStaffID
GROUP BY t.UserID;

Use row partition keyword in your query to avoid duplicacy
WITH CTE AS( SELECT ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY UserID ) AS 'Num',UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID
GROUP BY TblStaff.User_Code, BuildingNo)
SELECT * FROM CTE
WHERE Num =1

try this -
SELECT distinct UserID, BuildingNo
FROM Staff INNER JOIN TblBuildings ON Staff.StaffID =
TblBuildingsStaffID

Related

Mysql subquery with "in" problem to associate parent table

I try to create a query who select the contacts information (table invoice_contacts), and the adresses (table invoice_adresses) associate to the contact which is the most used in the (table invoice_compta)
For exemple I have two contact :
Mike
John
Mike have 2 adresses :
Paris
London
Mike have 1 invoice with Paris, and 5 invoice with London, so I want the adresse of London associate to Mike.
I have try this query with an subquery which count all adresses associate to the contact for an adresses (with NB_ADRESSES), and select only the biggest (with order by NB_ADRESSES desc and limit 1), it's seem wells but I have an error when I do where ia2.ID_CONTACT = ic.ID_CONTACT ic.ID_CONTACT is not found.. (and I need to associate the contact to the subquery).
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE in (
select ia3.ID_ADRESSE
from (
select ia2.ID_ADRESSE,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
where ia2.ID_CONTACT = ic.ID_CONTACT
group by ia2.ID_ADRESSE
order by NB_ADRESSES desc
limit 1
) as ia3
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
I also have try with "exist" or "inner join" instead of "in" but I doesn't find good results, so the best way seems it to be with this query for me, but I don't found the solution.
I hope you will help me :)
Thanks
UPDATE :
So finally I have found an solution with this query :
select ic.*,
ia.*
from invoice_contacts ic
left join invoice_adresses ia on ia.ID_CONTACT = ic.ID_CONTACT
and ia.ID_ADRESSE = (
select ia3.ID_ADRESSE
from (
select ia2.*,
count(*) as NB_ADRESSES
from invoice_adresses ia2
left join invoice_comptas ico on ico.ID_ADRESSE_CONTACT = ia2.ID_ADRESSE
group by ia2.ID_ADRESSE
) as ia3
where ia3.ID_CONTACT = ic.ID_CONTACT
order by NB_ADRESSES desc
limit 1
)
group by ic.ID_CONTACT
order by CONTACT_TITRE asc
Thanks
Let me rephrase the problem as finding the most common contact/address combination for a given invoice.
I find it hard to follow your query and your table naming. But this is the idea:
select contact, address
from (select contact, address, count(*) as cnt,
row_number() over (partition by contact order by count(*) desc) as seqnum
from invoices
group by contact, address
) ca
where seqnum = 1;
The subquery is counting the number of times a given address (or city if you prefer) occur for each contact. The row_number() enumerates these, so the most common one has a value of "1". The outer query then chooses the most common value.

How to use partition on a join to get a count

I'm confused on how to get a count without using group by on a join
I know I can get the desired results using group by, but the table joins are long and lots of selected headers with case statement so I was hoping to avoid that
I'm sure I've seen this done before using partition over but can't find a good example using it on a join. Maybe it's not possible!?
I've tried
select
p.FirstName,
p.Surname,
count(pr.RelativePersonId) over (partition by pr.RelativePersonId) as [RelativesOnRecord]
from People p
left join PersonRelatives pr
on p.PersonId = pr.PersonId
For my tables:
People
PersonId | FirstName | Surname
1 Jim Bo
2 Harry Bo
3 Strong Bo
PersonRelatives
Id | PersonId | RelativePersonId
1 1 2
2 1 3
Where I'm trying to get
PersonId | FirstName | Surname | RelativesOnRecord
1 Jim Bo 2
I also tried joining with a SELECT TOP 1 but that just gives me the one result so one count. Is this even possible without group by?
It seems you are partitioning by the wrong column - you want to have the number of relatives for each person from People, right ? Use
count(pr.RelativePersonId) over (partition by pr.PersonId) as [RelativesOnRecord]
Based on your example, you want aggregation:
select p.PersonId, p.FirstName, p.Surname, count(*) as [RelativesOnRecord]
from People p join
PersonRelatives pr
on p.PersonId = pr.PersonId
group by p.PersonId, p.FirstName, p.Surname;
You could use apply or a correlated subquery, but window functions do not seem appropriate here.

Oracle sql - referencing tables

My school task was to get names from my movie database actors which play in movies with highest ratings
I made it this way and it works :
select name,surname
from actor
where ACTORID in(
select actorid
from actor_movie
where MOVIEID in (
select movieid
from movie
where RATINGID in (
select ratingid
from rating
where PERCENT_CSFD = (
select max(percent_csfd)
from rating
)
)
)
);
the output is :
Gary Oldman
Sigourney Weaver
...but I'd like to also add to this select mentioned movie and its rating. It accessible in inner selects but I don't know how to join it with outer select in which i can work just with rows found in Actor Table.
Thank you for your answers.
You just need to join the tables properly. Afterwards you can simply add the columns you´d like to select. The final select could be looking like this.
select ac.name, ac.surname, -- go on selecting from the different tables
from actor ac
inner join actor_movie amo
on amo.actorid = ac.actorid
inner join movie mo
on amo.movieid = mo.movieid
inner join rating ra
on ra.ratingid = mo.ratingid
where ra.PERCENT_CSFD =
(select max(percent_csfd)
from rating)
A way to get your result with a slightly different method could be something like:
select *
from
(
select name, surname, percent_csfd, row_number() over ( order by percent_csfd desc) as rank
from actor
inner join actor_movie
using (actorId)
inner join movie
using (movieId)
inner join rating
using(ratingId)
(
where rank = 1
This uses row_number to evaluate the "rank" of the movie(s) and then filter for the movie(s) with the highest rating.

Find the latest date of two tables with matching primary keys

I have two tables tables, each with primary keys for different people and the contact dates in each category.I am trying to find the most recent contact date for each person, regardless of what table its in. For example:
CustomerService columns: CustomerKey, DateContacted
CustomerOutreach columns: CustomerKey, DateContacted
And I'm just trying to find the very latest date for each person.
Use something like this.
You need to combine the two tables. You can do this by a union. There will be duplicates, but you just group by the customerKey and then find the Max DateContacted
SELECT * INTO #TEMP FROM (
SELECT
CustomerKey
, DateContacted
FROM CustomerService CS
UNION
SELECT
CustomerKey
, DateContacted
FROM CustomerOutreach CS
)
SELECT
CustomerKey
, MAX(DateContacted)
FROM #TEMP
GROUP BY
CustomerKey
Join your tables on primary keys and make a conditional projection.
Select cs.CustomerKey,
CASE WHEN cs.DateContacted <= co.DateContacted
THEN co.DateContacted
ELSE cs.DateContacted END
from CustomerService cs inner join CustomerOutreach co
on cs.CustomerKey = co.CustomerKey
I would do something like this.
Select b.customerKey, b.dateContacted
from (
select a.customerKey, a.DateContacted, Row_Number() over (Partition by customerKey order by DateContacted desc) as RN
from (
Select c.customerKey,
case when (s.DateContacted > o.dateContacted) then s.dateContacted else o.datecontacted end as DateContacted
from Customer c
left outer join customerService s on c.customerKey = s.customerKey
left outer join customerOutreach o on c.customerKey = s.customerKey
where s.customerKey is not null or o.customerKey is not null
)a
)b
where b.RN = 1
This solution should take care of preventing the case of having duplicates if both tables have the same max DateContacted.
http://sqlfiddle.com/#!3/ca968/1

writing sql to find aggregation

I have these relations
Entry
-----
id
creationdate
grade
Subject
------
id
name
and join table
Entry_Subjects
------------
entry_id
subject_id
I need to create the sql to find average grade of entries belonging to a particular subject ( say 'java') on a particular creationdate
I tried the following
assume id for Subject 'java' is 2
SELECT creationdate,
avg(grade)
FROM (SELECT *
FROM Entry
WHERE id IN
(SELECT id
FROM Entry_Subjects
WHERE subject_id =2
)
)
GROUP BY creationdate;
I get the error
subquery in FROM must have an alias
I tried to correct this but couldn't
can somebody tell me why this error occurs.. my db knowledge is not that good
Probably want JOINs instead of nested SELECTs.
SELECT
creationdate,
AVG(grade)
FROM Entry e
INNER JOIN Entry_Subjects f
ON e.id = f.entry_id
INNER JOIN Subject s
ON f.subject_id = s.id
WHERE s.name = 'java' --this is where you replace 'java' with a variable to search by name
GROUP BY creationdate
Can also be done using analytical functions
select a.creationdate,avg(a.grade) over (partition by a.creationdate order by a.creationdate) as avg_grade
from entry a,subject b,entry_subjects c
where a.id=c.entry_id and b.id=c.subject_id
and upper(b.name)='JAVA';