SQL Select Distinct - sql

I think this is very easy, I was hoping for verification.
I have 2 columns: ID & DocumentNumber. It's a one-to-many relationship, one ID can have many document numbers.
I need to get ID's where all DocumentNumbers belonging to it are unique.
Is this what Group By is for, in conjunction with Distinct? Is it as simple as Grouping By the ID

You can (as you're suspecting) do it using a simple GROUP BY/HAVING and using DISTINCT;
SELECT id FROM documents
GROUP BY id
HAVING COUNT(DocumentNumber) = COUNT(DISTINCT DocumentNumber)
An SQLfiddle to test with.

Related

Validate that only one value exists

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.
You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

what does Group By multiple columns means?

I use oracle 11g , so i read alot of artics about it but i dont understand
how exactly its happened in database , so lets say that have two tables:
select * from Employee
select * from student
so when we want to make group by in multi columns :
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
so my question is: what exactly happened in database ? i mean the query count(*) do first in every column in group by and then sort it ? or what? can any one explain it in details ?.
SQL is a descriptive language, not a procedural language.
What the query does is determine all rows in the original data where the group by keys are the same. It then reduces them to one row.
For example, in your data, these all have the same data:
subject year name
English 1 Harsh
English 1 Pratik
English 1 Ramesh
You are saying to group by subject, year, so these become:
Subject Year Count(*)
English 1 3
Often, this aggregation is implemented using sorting. However, that is up to the database -- and there are many other algorithms. You cannot assume that the database will sort the data. But, if it easier for you to think of it, you can think of the data being sorted by the group by keys, in order to identify the groups. Just one caution, the returned values are not necessarily in any particular order (unless your query includes an order by).

How to identify the most common category referencing the same element?

I have two relations: Location(category,item) and Item(item)
Each item can be listed under multiple categories.
What SQL query can be used in figuring out which two categories, from Location(category,item) most frequently contain the same item?
note: I am looking for a SQL statement but I tagged this question as algorithm / math, as I am willing to accept a solution in the form of an algorithm in case a SQL query can not be provided.
You can do this easily in SQL with join and group by. Join the location table to itself on item, then count the matches. Order by this descending and choose the first one, if you want the pair with the most matches:
select l1.category, l2.category, count(*) as cnt
from location l1 join
location l2
on l1.item = l2.item and
l1.category < l2.category
group by l1.category, l2.category
order by count(*) desc
limit 1;
Note that this assumes that category, item is unique in location. Otherwise, you can use this select:
select l1.category, l2.category, count(distinct l1.item) as cnt

SQL - Get answer from query as a single number

The following code returns a couple of numbers, identifying people who take part in more than three activities.
SELECT pnr
FROM Participates
GROUP BY pnr
HAVING count(activities)>3;
I want the answer to be the number of people who participate in more than three activities though, i.e. "4", instead of four unique numbers. What to do?
Access supports derived tables.
SELECT COUNT(*) AS NumberOfParticipants FROM
(
SELECT pnr
FROM Participates
GROUP BY pnr
HAVING count(activities)>3
) T
You will need a WHERE clause on the pnr field to uniquely identify one of your groupings:
SELECT COUNT(pnr)
FROM Participates
GROUP BY pnr
WHERE pnr = 'whatever'
HAVING COUNT(activities)>3
The order of my clauses might be wrong
Select Count(Distinct pnr)
From Participates
Having Count(activities) > 3

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)