Find multiple column duplicates then list them all - sql

I find database records which are duplicated like so :
select s.name, r.name Region, c.name Country
from supplier s
join region r on r.id = s.regionid
join region c on c.id = isnull(r.pid, r.id)
group by s.name, r.name, c.name
having count(s.name) >1
whats the best way to list them all (so if two duplicates it will appear twice etc...)

The easiest way is to create an in-line query from your Find-dups query and join to a "without-a-group-by" query.
select s.name, r.name Region, c.name Country
from supplier s
join region r on r.id = s.regionid
join region c on c.id = isnull(r.pid, r.id)
inner join (select s.name, r.name Region, c.name Country
from supplier s
join region r on r.id = s.regionid
join region c on c.id = isnull(r.pid, r.id)
group by s.name, r.name, c.name
having count(s.name) >1 ) dups
ON s.name = dups.name
and r.name = dups.region
and c.name = dups.country

I think this should do it:
with C as (
select
s.name,
r.name Region,
c.name Country,
count(*) over (
partition by s.name, r.name, c.name
) as ct
from supplier s
join region r on r.id = s.regionid
join region c on c.id = isnull(r.pid, r.id)
)
select
name, Region, Country
from C
where ct > 1;

Related

Writing a subquery in SQL, how to combine two queries?

My goal is to find the top 10 countries within the top 10 cities. I successfully used this query:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country, C.city
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
WHERE
country IN ('India', 'China', 'United States', 'Japan', 'Mexico', 'Brazil', 'Russian Federation', 'Phillipines', 'Turkey', 'Indonesia')
GROUP BY
C.city, D.country
ORDER BY
number_of_customers DESC
LIMIT 10
But I would like to use a subquery rather than listing the countries which I found using a previous query:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
GROUP BY
D.country
ORDER BY
number_of_customers DESC
LIMIT 10
How can I combine these two queries correctly? I keep getting different errors when I try to replace the list of countries with in the second query I posted. I apologize if this is a stupid question; I am a beginner.
My attempt:
SELECT
COUNT(A.customer_id) AS number_of_customers,
D.country, C.city
FROM
customer A
INNER JOIN
address B ON A.address_id = B.address_id
INNER JOIN
city C ON B.city_id = C.city_id
INNER JOIN
country D ON C.country_ID = D.country_ID
WHERE
country IN (SELECT COUNT(A.customer_id) AS number_of_customers, D.country
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
GROUP BY D.country
ORDER BY number_of_customers DESC
LIMIT 10)
GROUP BY
C.city, D.country
ORDER BY
number_of_customers DESC
LIMIT 10
But I get an error
subquery has too many columns
For instance, in PostgreSQL you can use with queries, see the Documentation:
WITH top_countries AS (
SELECT count(A.customer_id) AS number_of_customers,
D.country AS country
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
GROUP BY D.country
ORDER BY number_of_customers DESC
LIMIT 10
)
SELECT count(A.customer_id) AS number_of_customers,
D.country, C.city
FROM customer A
INNER JOIN address B ON A.address_id = B.address_id
INNER JOIN city C ON B.city_id = C.city_id
INNER JOIN country D ON C.country_ID = D.country_ID
WHERE country IN (SELECT tc.country FROM top_countries)
GROUP BY C.city,D.country
ORDER BY number_of_customers DESC
LIMIT 10

How to do I query all distinct rows with only their highest values?

I have been trying to query each city's popular genre. I am only trying to get the rows that I have highlighted. I tried using MAX() on a group by but gave me a syntax error.
My CTE query is as follows, its based on the dbeaver sample dataset:
with q_table
as
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
I tried the following query.
I don't have a dataset to test this on, but you should be able to just add a ROW_NUMBER() function to your CTE to get the values you are looking for. Such as:
with q_table
as
( select City, Genre, count(*) as counts,
,ROW_NUMBER() OVER(partition by City order by count(*) desc) RN
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
SELECT City, Genre, Counts
from q_table
WHERE RN=1
Order BY City
This use of MAX should work.
Edit; Added inner join. Thanks to Gordon Linoff for the observation that my original answer didn't actually achieve anything.
with q_table
as
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)
SELECT a.City, a.Genre, a.counts
FROM q_table a
INNER JOIN (
SELECT City, MAX(counts) counts
FROM q_table
GROUP BY City
) b ON a.City = b.City AND a.counts = b.counts;
try this
with q_table
as
(select * from (
( select City, Genre, count(*) as counts
from
(select c.City, g.Name as Genre
from bus5dwr.dbeaver_sample.Customer c
inner join bus5dwr.dbeaver_sample.Invoice i
on i.CustomerId = c.CustomerId
inner join bus5dwr.dbeaver_sample.InvoiceLine il
on il.InvoiceId = i.InvoiceId
inner join bus5dwr.dbeaver_sample.track t
on t.TrackId = il.TrackId
inner join bus5dwr.dbeaver_sample.Genre g
on g.GenreId = t.GenreId
where Country = 'USA'
) as t2
group by City, Genre)) as t3 where count in (select max(count) count from t3 group by city)

Query , joining, SQL server example (Concert) with a couple of related tables

I dont know who to return what I wrote before, apologise. vowejin firnefk rneqkln qrecjinrelqkjnr klwencirowejncienfvenciernicnreinc ikrenicernircniwncikwnkwjnkcjwnkjnckjncwkjnwckjnweknckejnckwjnckjnwekcjnwekjnckwjenckjwenkcjnwekjnckwenckwjenklwneocnwocnowencoejnkjwencojnwekojcnwekjcnkwejnckejcnkwejnckjwenkcjnwkjcnwkn:)
Using TOP:
SELECT TOP 1
PID, NAME, AGE
FROM (
SELECT
p.*, h.HID
FROM Performer p
INNER JOIN Concert c
ON c.PID = p.PID
INNER JOIN Hall h
ON h.HID = c.HID
INNER JOIN Tickets t
ON t.CID = c.CID
GROUP BY p.PID, p.NAME, p.AGE, h.HID, h.CAPACITY
HAVING COUNT(t.TID) = h.CAPACITY
) t
GROUP BY PID, NAME, AGE
ORDER BY COUNT(*) DESC
This should return expected result
;with Cte1 AS (
select C.CID, P.Name AS PerformerName, H.Name AS HallName, H.Capacity, H.HID
from #Performer P
inner join #Concert C on C.PID = P.PID
inner join #Hall H on H.HID = C.HID
)
, Cte2 AS (
select C.CID, H.HID, COUNT(*) SellCount
from #Concert C
inner join #Hall H on H.HID = C.HID
inner join #Tickets T on T.CID = C.CID
group by C.CID, H.HID
)
select Cte1.CID, Cte1.PerformerName, Cte1.HallName, Cte2.SellCount
from Cte1 inner join Cte2 on Cte2.CID = Cte1.CID AND Cte2.HID = Cte1.HID
where Cte1.Capacity = Cte2.SellCount

Identifying percentage in Fact Table

I am new in programming and could not find an answer.
I have following dimensions(tables) and fact table:
Customer: CustomerId, HomeRegion
Regions: RegionId, RegionName
MyTime: id, MyHour
Fact table: CustomerId, RegionId, TimeId, FactId
I must have report as: HomeRegion, Hour, RegionName, UserPercentage.
As shown in the example, only 3.67% people whose home region is A move to B at 9am and so on.
I should create simular one.
The problem is obtainig UserPercentage. Here is the code I did so far.
SELECT c.HomeRegion, mt.myhour as Time, r.RegionName as CurrentRegion,
(SELECT COUNT(*)
/*number of users who move from their home
region to CurrentRegion at specific time*/
)/COUNT(c.CustomerId)*100 as UserPercentage
FROM dbo.FactTable ft
inner join dbo.Customer c
ON ft.CustomerId = c.CustomerId
inner join dbo.myTime mt
ON ft.TimeId = mt.ID
inner join dbo.Regions r
ON ft.RegionId = r.RegionId
WHERE mt.myhour = '09'
GROUP BY c.HomeRegion, mt.myhour, r.RegionName
ORDER BY c.HomeRegion, r.RegionName
Using the analytical functions
* no need to select or groupby myHour constant
* assuming one Customer should be located in 1 region at once (if not - it would be much harder to select)
select HomeRegion, CurrentRegion,
count(*) / count(*) over () as overall_share,
count(*) / count(*) over (partition by HomeRegion) as homeregion_share,
from
(SELECT c.HomeRegion, r.RegionName as CurrentRegion, c.CustomerId as CUST
FROM dbo.FactTable ft
inner join dbo.Customer c
ON ft.CustomerId = c.CustomerId
inner join dbo.myTime mt
ON ft.TimeId = mt.ID
inner join dbo.Regions r
ON ft.RegionId = r.RegionId
WHERE mt.myhour = '09'
GROUP BY c.HomeRegion, r.RegionName, c.CustomerId) uni_users
GROUP by HomeRegion, CurrentRegion
Try something like this in your comment area.
SELECT (TMP1.Count*100)/COUNT(TMP2.CustomerId) AS 'Percentage'
FROM
(
SELECT COUNT(*) AS 'Count'
FROM dbo.FactTable ft
inner join dbo.Customer c ON ft.CustomerId = c.CustomerId
inner join dbo.Regions r ON ft.RegionId = r.RegionId
WHERE
r.RegionName IN ('A','B','C','D','E') AND
c.HomeRegion IN ('A','B','C','D','E')
) AS 'TMP1', dbo.Customer AS 'TMP2'

database sql join question

i have 2 tables called
Location (id, name)
Person (id, name, location_id)
A person has a location Id which joins these tables . . i would like a SQL query that gives me each location and the count of person table for that id.
i could do something like this and then add up the records in code but i want to find out a way that i only get one row per region with count of people in that region
SELECT l.*, r.id from Location l
inner join Person r
on r.location_id = l.id
order by l.name asc
You want to use aggregates and the GROUP BY clause
SELECT l.id, l.name, count(r.id)
FROM Location l
INNER JOIN Person r on r.location_id = l.id
GROUP BY l.id., l.name
ORDER BY l.name asc
Try:
Select L.Name, Count(*) PersonsCount
From Location L
Join Person P On P.Location_Id = L.Id
Group By L.Name
or if you want to see Locations with zero counts,
Select L.Name, Count(*) PersonsCount
From Location L
Left Join Person P On P.Location_Id = L.Id
Group By L.Name
SELECT lo.name, COUNT(*)
FROM LOCATION lo
JOIN PERSON p ON p.location_id = lo.id
GROUP BY lo.name
ORDER BY lo.name
try this
select count(*), Location.name, Location.id from Location, Person where Person.location_id = Location.id group by Location.id