How do I find a duplicate in SQL - sql

I have a query that selects 3 columns. Each row row should be a unique combination of county, city,and zip. However, I have reason to believe I'm getting a duplicate somewhere. How do I find the duplicate ? COUNT() ?? This in MS SQL Server . Any help would be most appreciated. --Jason
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
ORDER BY County

You coul use group by and having
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
GROUP BY COUNTY, CITY, ZIP
HAVING COUNT(1) >1
ORDER BY County
If you want to get the full row details you can use a sub query in combination with the group by and having statements
SELECT x.*
FROM MoratoriumLocations x
INNER JOIN(
SELECT COUNTY, CITY, ZIP
FROM MoratoriumLocations
WHERE MoratoriumID=20
GROUP BY COUNTY, CITY, ZIP
HAVING COUNT(1) >1
) dups ON dups.County = x.County
AND dups.City = x.City
AND dups.Zip = x.Zip

See Preben's answer for how to find dups.
To avoid dups altogether consider creating an unique index.

I would suggest window functions:
SELECT ml.*
FROM (SELECT ml.*, COUNT(*) OVER (PARTITION BY County, City, Zip) as cnt
FROM MoratoriumLocations ml
WHERE MoratoriumID = 20
) ml
ORDER BY cnt DESC, County, City, Zip;
This will show the complete rows with duplicates, which can help you understand them better.

Related

Show 'MULTIPLE_VALUES' instead of LISTAGG if there are > 1 records after GROUP BY

Imagine a table below:
I want to get the total population in each country but also I'd like to see the name of a city if that city is the only city in the country.
I could run something like
select
min(Country),
listagg(City) within group as City,
sum(Population) as Population
from table1
group by Country
but what i want is ('MULTIPLE' is just an example of text I'd like to see instead of the list of cities)
How can I do that?
I haven't been able to find any solution and my only idea is to use CASE with COUNT but it won't work
P.S. Sorry for the formatting
Just count the cities or compare min and max city:
select
country,
case when min(city) = max(city) then min(city) else 'multiple' as city,
sum(population) as population
from cities
group by country
order by country;
Assuming you won't have same city appearing twice
CODE 1
WITH list_of_cities_ as (
SELECT
COUNTRY,
city,
count(city) over (partition by country) AS TOTAL_Cities,
SUM(POPULATION) over (partition by country) AS TOTAL_POPULATION
FROM table_
)
select
DISTINCT
country,
case when TOTAL_Cities > 1 THEN 'Multiple' Else city end as city,
TOTAL_POPULATION
from list_of_cities_
CODE 2
IF you are interested in getting the list of all cities, instead of keyword
"Multiple"
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=68efe08ab72ef63afa5813925e7f99e0
SELECT
COUNTRY,
STRING_AGG(City,',') as LIST_OF_CITIES,
SUM(POPULATION) AS TOTAL_POPULATION
FROM table_
GROUP BY 1

Column must appear in group by or aggregate function in nested query

I have the following table.
Fights (fight_year, fight_round, winner, fid, city, league)
I am trying to query the following:
For each year that appears in the Fights table, find the city that held the most fights. For example, if in year 1992, Jersey held more fights than any other city did, you should print out (1992, Jersey)
Here's what I have so far but I keep getting the following error. I am not sure how I should construct my group by functions.
ERROR: column, 'ans.fight_round' must appear in the GROUP BY clause or be used in an aggregate function. Line 3 from (select *
select fight_year, city, max(*)
from (select *
from (select *
from fights as ans
group by (fight_year)) as l2
group by (ans.city)) as l1;
In Postgres, I would recommend aggregation and distinct on:
select distinct on (flight_year) flight_year, city, count(*) cnt
from flights
group by flight_year, city
order by flight_year, count(*) desc
This counts how many fights each city had each year, and retains the city with most fight per year.
If you want to allow ties, then use window functions:
select flight_year, city, cnt
from (
select flight_year, city, count(*) cnt,
rank() over(partition by flight_year order by count(*) desc) rn
from flights
group by flight_year, city
) f
where rn = 1
Although row_number is the easiest way as done by #GMB. Can try this alternative as well
select city, fight_year
from fights
group by city, fightyear
having count(*) = sum(case when fid is not null then 1 end)

Group By query - any other way?

I have the following table that contains the following data:
http://img513.imageshack.us/img513/9039/mycities.png
The CREATE statement and the inserts are at http://snipt.org/xoKl .
The table is a list of cities and each city belongs to a region and a country and each city has a founding date. The goal here is to get for each "Country / Region" pair a list of the oldest cities. We need the oldest city on the east coast of Canada, the oldest city on the west coast of the U.S and so on ...
The query that I use right now is:
SELECT * FROM MyCities
INNER JOIN
(SELECT Country, Region, MIN(FoundingDate) AS CityFoundingDate
FROM MyCities
GROUP BY Country, Region ) AS subquery
ON subquery.CityFoundingDate = MyCities.FoundingDate
AND MyCities.Country = subquery.Country
AND MyCities.Region = subquery.Region
I just want to know whether there are other ways to write this group by query or not. :-)
Is this query efficient or not?
Looking forward to a discussion.
What about?
select country, region, city from MyCities mc1
where foundingDate <= ALL (
select foundingDate from MyCities as mc2
where mc1.country = mc2.country and mc1.region = mc2.region
)
How about something like this?
Should work in Oracle (although I can't test it right now)
SELECT country, region, city, foundingdate
FROM (
SELECT country, region, city, foundingdate, MIN(founding_date) OVER PARTITION BY (country, region) min_date
FROM mycities) WHERE foundingdate=min_date
But what if there are two cities founded on the same year in the same country/region?

SQL Max and Sum

Below is my query that I am using:
SELECT
County,
Code,
Sum(PaidAmount) AS TotalPaid
FROM
Counties
GROUP BY
County,
Code
It returns the set:
County Code TotalPaid
Brown 99 210.21
Lyon 73 322.22
Lyon 88 533.22
Lincoln 22 223.21
What I am looking for is a query that will return the rows that show the County and the Code for the Max TotalPaid for each County. An example of the result set that I need is shown below (notice that Lyon, 73 is removed since Lyon, 88 has a higher TotalPaid amount):
County Code TotalPaid
Brown 99 210.21
Lyon 88 533.22
Lincoln 22 223.21
I wasn't able to test this, but RANK should solve this:
SELECT x.County, x.Code x.TotalPaid
,RANK() OVER
(PARTITION BY x.County ORDER BY x.TotalPaid DESC) AS 'RANK'
FROM
(SELECT
County,
Code,
Sum(PaidAmount) AS TotalPaid
FROM
Counties
GROUP BY
County,
Code) x
WHERE Rank = 1
I think you need to do something like the follwoing. I've just been called away before I could review what I've written but hopefully it will give you enough of a pointer. Some RDBMSes won't allow the "where country, TotalPaid = select value, value" construct but you can work around this
select
County,
Code,
TotalPaid
from (SELECT
County,
Code,
Sum(PaidAmount) AS TotalPaid
FROM
Counties
GROUP BY
County,
Code ) tbl
where County, TotalPaid = (select County,
max(TotalPaid)
FROM
Counties
GROUP BY
County,
Code ) tbl2
SELECT
c.County,
c.Code,
Sum(c.PaidAmount) AS TotalPaid
FROM
Counties c
WHERE
c.Code in (select max(c2.code) from counties c2 where c2.county = c.county)
GROUP BY
c.County,
c.Code
this one should work although i haven't tested
You'll have to use windowing functions to do this. While what you want is easily expressed in english, it's not easily expressed in SQL, unfortunately. This should do what you need:
select
County, Code, TotalPaid
from
(
SELECT
County,
Code,
sum(PaidAmount) AS TotalPaid
FROM
Counties
GROUP BY
County, Code
) source
where (row_number() over (partition by County order by TotalPaid desc)) = 1
Here's an updated solution:
select c1.county, c1.code, c1.paidAmount
from counties c1
inner join (
select county, max(paidAmount) paidAmount
from counties
group by county) c2
on c1.county=c2.county and c1.paidAmount=c2.paidAmount;
Note, if there are multiple max payments for a certain county, this will return all rows that share that maximum.

Select count / duplicates

I have a table with all U.S. zip codes. each row contains the city and state name for the zip code. I'm trying to get a list of cities that show up in multiple states. This wouldn't be a problem if there weren't X amount of zip codes in the same city...
So basically, I just want to the city in a state to count as 1 instead of it counting the city/state 7 times because there are 2+ zip codes in that city/state...
I'm not really sure how to do this. I know I need to use count but how do I tell the mysql to only count a given city/state combo as 1?
SELECT City, Count(City) As theCount
FROM (Select City, State From tblCityStateZips Group By City, State) As C
GROUP By City
HAVING COUNT Count(City) > 1
This would return all cities, with count, that were contained in more than one state.
Greenville 39
Greenwood 2
GreenBriar 3
etc.
First group on state and city, then group the result on city:
select City
from (
select State, City
from ZipCode
group by State, City
) x
group by City
having count(*) > 1
Will this do the trick
Select CityName, Count (Distinct State) as StateCount
From CityStateTable
Group by CityName
HAVING Count (Distinct State) > 1
Try using a select distinct
SELECT DISTINCT city, state FROM table GROUP BY city
You probably should have created a separate table for zip codes then to avoid the duplication.
You want to look into the GROUP BY Aggregate.