How can I count the non-unique combinations of values in MySQL? - sql

I have a table with some legacy data that I suspect may be a little messed up. It is a many-to-many join table.
LIST_MEMBERSHIPS
----------------
list_id
address_id
I'd like to run a query that will count the occurrences of each list_id-address_id pair and show the occurrence count for each from highest to lowest number of occurrences.
I know it's got to involve COUNT() and GROUP BY, right?

select list_id, address_id, count(*) as count
from LIST_MEMBERSHIPS
group by 1, 2
order by 3 desc
You may find it useful to add
having count > 1

select count(*), list_id, address_id
from list_membership
group by list_id, address_id
order by count(*) desc

Related

how can I select rows that column does NOT have more than 1 value?

I am very new to SQL and I am wondering how to solve this issue. For example, my table looks as follows:
As you see in the table item_id 1 appears in both city_id 1 and 2, so does the item_id 4, but I want to get all the items where appears only in one city_id.
In this example, these would be item_id 2 (appearing only in city_id 2) and item_id 3 (appearing in city_id 1).
Use aggregation on item_id and count distinct values of city_id. The having clause can be used to filter on aggregates.
select item_id from mytable group by id having count(distinct city_id) = 1
You can use the following query:
SELECT item_id
FROM table_name
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
In case you want to see the city_id to you can use this query:
SELECT item_id, MIN(city_id) AS city_id
FROM example
GROUP BY item_id
HAVING COUNT(DISTINCT city_id) = 1
Since there is only one city_id you can use MIN or MAX to get the id.
demo on dbfiddle.uk
You want all the id where they have only one distinct city:
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
It works by counting all the different values that city_id has for the same item_id. For those item ids where they repeat a lot, but the city_id is always the same the count of unique values in the city id is 1, and we can look for these using a HAVING clause. "Having" is like a where clause that runs after a GROUP BY operation is completed. It is the conceptual equivalent of this:
SELECT item_id
FROM
(
SELECT item_id, count(distinct city_id) as cdci
FROM table
GROUP BY item_id
) x
WHERE cdci = 1
If you want the city id too you can either get the MAX city (because in this case there is only one city so it's safe to do):
SELECT item_id, MAX(city_id) as city_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
or you could join this query back to the item table as a subquery:
SELECT t.*
(
SELECT item_id
FROM table
GROUP BY item_id
HAVING count(distinct city_id) = 1
) x
INNER JOIN
table t
ON x.item_id = t.item_id
This technique is the more general process for performing a group by that finds some particular set of rows, then bringing in the rest of the data from that row. You cant always stick every other column you want in a MAX because it will mix row data up, and you can't put the extra columns in your group by because that will subdivide what you're grouping on, giving the wrong results. Doing the group as a subquery and joining it back is a typical way to get all the row data when you have to group it to find which rows are interesting
In your case this form of query will bring all the duplicated rows (whereas the group by/max won't). If you don't want the duplicate rows you can make the top line SELECT DISTINCT t.* but don't make a habit of slapping distinct in to get rid of duplicated rows; if your tables don't have duplicates to start with but suddenly after you wrote a JOIN you got duplicated rows, google fornwhat a Cartesian product is in database queries and how to prevent it
You just need a group by on item id with having
Select item_id from table group by
item_id having count(distinct city_id)
=1
Also, if you want to have majority of same no of rows as input then
Select item_id, city, rank()
over(partition by item_id order by city)
rn
From table where rn=1;

select and count where regexp group by count

you can see what im trying to do here:
select *, count(*) as count
from `songs`
where `band` REGEXP '^[^[:alpha:]]'
group by `band`
order by `band` asc
bands can be:
avenged sevenfold
3 days grace
led zeppelin
98 mute
back street boys
beastie boys
i need this to select the bands whose first-character is not an alpha, and count how many rows exist for each band.
unfortunately my current query just seems to group all of them together that match the REGEXP.
You can't select columns that are not on the group by clause neither are a group function (count, max...)
The where it's ok because you don't need to group unneed rows and the condition is not over the group value (the result of a group function).
ASC is the default sort sense, so you don't need to specify it.
select band, count(*) as count
from songs
where band REGEXP '^[^[:alpha:]]'
group by band
order by band
Does doing the selection after the group help?
select `band`, count(*) as count
from `songs`
group by `band`
having `band` REGEXP '^[^[:alpha:]]'
order by `band` asc
Also you appear to be selecting columns that aren't in the group clause. Try:
select `band`, count(*) as count
from `songs`
where `band` REGEXP '^[^[:alpha:]]'
group by `band`
order by `band` asc

How do I select those records where the group by clause returns 2 or more?

I'd like to return a list of items of only those that have two or more in the group:
select count(item_id) from items group by type_id;
Specifically, I'd like to know the values of item_id when the count(item_id) == 2.
You're asking for something that's not particularly possible without a subquery.
Basically, you want to list all values in a column while aggregating on that same column. You can't do this. Aggregating on a column makes it impossible to list of all the individual values from that column.
What you can do is find all type_id values which have an item_id count equal to 2, then select all item_ids from records matching those type_id values:
SELECT item_id
FROM items
WHERE type_id IN (
SELECT type_id
FROM items
GROUP BY type_id
HAVING COUNT(item_id) = 2
)
This is best expressed using a join rather than a WHERE IN clause, but the idea is the same no matter how you approach it. You may also want to select distinct item_ids in which case you'll need the DISTINCT keyword before item_id in the outer query.
If your SQL dialect includes GROUP_CONCAT(), that could be used to generate a list of items without the inner query. However, the results differ; the inner query returns one item id per row, where GROUP_CONCAT() returns multiple ids as a string.
SELECT type_id, GROUP_CONCAT(item_id), COUNT(item_id) as number
FROM items
GROUP BY type_id
HAVING number = 2
Try this sql query:
select count(item_id) from items group by type_id having count(item_id)=2;

MySQL query to return only duplicate entries with counts

I have a legacy MySQL table called lnk_lists_addresses with columns list_id and address_id. I'd like to write a query that reports all the cases where the same list_id-address_id combination appears more than once in the table with a count.
I tried this...
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
ORDER BY count(*) DESC
LIMIT 20
It works, sort of, because there are fewer than 20 duplicates. But how would I return only the counts greater than 1?
I tried adding "WHERE count(*) > 1" before and after GROUP BY but got errors saying the statement was invalid.
SELECT count(*), list_id, address_id
FROM lnk_lists_addresses
GROUP BY list_id, address_id
HAVING count(*)>1
ORDER BY count(*) DESC
To combine mine and Todd.Run's answers for a more "complete" answer. You want to use the HAVING clause:
http://dev.mysql.com/doc/refman/5.1/en/select.html
You want to use a "HAVING" clause. Its use is explained in the MySQL manual.
http://dev.mysql.com/doc/refman/5.1/en/select.html
SELECT count(*) AS total, list_id, address_id
FROM lnk_lists_addresses
WHERE total > 1
GROUP BY list_id, address_id
ORDER BY total DESC
LIMIT 20
If you name the COUNT() field, you can use it later in the statement.
EDIT: forgot about HAVING (>_<)

Fetch one row per account id from list

I have a table with game scores, allowing multiple rows per account id: scores (id, score, accountid). I want a list of the top 10 scorer ids and their scores.
Can you provide an sql statement to select the top 10 scores, but only one score per account id?
Thanks!
select username, max(score) from usertable group by username order by max(score) desc limit 10;
First limit the selection to the highest score for each account id.
Then take the top ten scores.
SELECT TOP 10 AccountId, Score
FROM Scores s1
WHERE AccountId NOT IN
(SELECT AccountId s2 FROM Scores
WHERE s1.AccountId = s2.AccountId and s1.Score > s2.Score)
ORDER BY Score DESC
Try this:
select top 10 username,
max(score)
from usertable
group by username
order by max(score) desc
PostgreSQL has the DISTINCT ON clause, that works this way:
SELECT DISTINCT ON (accountid) id, score, accountid
FROM scoretable
ORDER BY score DESC
LIMIT 10;
I don't think it's standard SQL though, so expect other databases to do it differently.
SELECT accountid, MAX(score) as top_score
FROM Scores
GROUP BY accountid,
ORDER BY top_score DESC
LIMIT 0, 10
That should work fine in mysql. It's possible you may need to use 'ORDER BY MAX(score) DESC' instead of that order by - I don't have my SQL reference on hand.
I believe that PostgreSQL (at least 8.3) will require that the DISTINCT ON expressions must match initial ORDER BY expressions. I.E. you can't use DISTINCT ON (accountid) when you have ORDER BY score DESC. To fix this, add it into the ORDER BY:
SELECT DISTINCT ON (accountid) *
FROM scoretable
ORDER BY accountid, score DESC
LIMIT 10;
Using this method allows you to select all the columns in a table. It will only return 1 row per accountid even if there are duplicate 'max' values for score.
This was useful for me, as I was not finding the maximum score (which is easy to do with the max() function) but for the most recent time a score was entered for an accountid.