Validate that only one value exists - sql

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.

You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

Related

ORDER BY an aggregated column in Report Builder 3.0

On a report builder 3.0, i retreived some items and counted them using a Count aggregate. Now i want to order them from highest to lowest. How do i use the ORDER BY function on the aggregated column? The picture below show the a column that i want to ORDER BY it, it is ticked.
Pic
The code is vers simple as shown bellow:
SELECT DISTINCT act_id,NameOfAct,
FROM Acts
Your picture indicates you also want a Total row at the bottom:
SELECT
COALESCE(NameOfAct,'Total') NameOfAct,
COUNT(DISTINCT act_id) c
FROM Acts
GROUP BY ROLLUP(NameOfAct)
ORDER BY
CASE WHEN NameOfAct is null THEN 1 ELSE 0 END,
c DESC;
Result of example data:
NameOfAct count
-------------- -------
Act_B 3
Act_A 2
Act_Z 1
Total 6
Try it with example rows at: http://sqlfiddle.com/#!18/dbd6c/2
I looked at the Pic. So you might have duplicate acts with the same name. And you want to know the number of acts that have the same unique name.
You might want to group the results by name:
GROUP BY NameOfAct
And include the act names and their counts in the query results:
SELECT NameOfAct, COUNT(*) AS ActCount
(Since the act_id column is not included in the groups, you need to omit it in the SELECT. The DISTINCT is also not necessary anymore, since all groups are unique already.)
Finally, you can sort the data (probably descending to get the acts with the largest count on top):
ORDER BY ActCount DESC
Your complete query would become something like this:
SELECT NameOfAct, COUNT(*) AS ActCount
FROM Acts
GROUP BY NameOfAct
ORDER BY ActCount DESC
Edit:
By the way, you use field "act_id" in your SELECT clause. That's somewhat confusing. If you want to know counts, you want to look at either the complete table data or group the table data into smaller groups (with the GROUP BY clause). Then you can use aggregate functions to get more information about those groups (or the whole table), like counts, average values, minima, maxima...
Single record information, like an act's ID in your case, is typically not important if you want to use statistic/aggregate methods on grouped data. Suppose your query returns an act name which is used 10 times. Then you have 10 records in your table, each with a unique act_id, but with the same name.
If you need just one act_id that represents each group / act name (and assuming act_id is an autonumbering field), you might include the latest / largest act_id value in the query using the MAX aggregate function:
SELECT NameOfAct, COUNT(*) AS ActCount, MAX(act_id) AS LatestActId
(The rest of the query remains the same.)

How can BigQuery SQL give different DISTINCT and GROUP BY results?

We seem to be getting two different, mutually incompatible results from legacy SQL and standard SQL in Google Big Query.
Here is our standard SQL Query...which gives an answer with 218,529 rows.
SELECT DISTINCT(EID)
FROM test.ourBQtable
Here is our legacy SQL Query...
SELECT COUNT(EID) AS Total, EID
FROM [ourBQproject:test.ourBQtable]
GROUP BY EID
ORDER BY Total DESC
This shows results that look like the table below but yet also shows 218,529 rows of results:
Total EID
376 jb+qLvHMm5JrMkNybAi6uC75FzgsGcNQhJ19IeWFDcQ=
352 JGqNBgicm+mpcYBS4K7AI2WXI3xaSgMkktb+7oOjjnQ=
How is it possible to have what appears to be duplicate EIDs (376 of them as shown in one case in the table) - but when using the DISTINCT(EID) command - the number of rows doesn't decrease? Shouldn't DISTINCT be filtering out all the duplicate rows? Do we really have duplicate rows?
What are we missing in our understanding?
Your code appears to be working exactly correctly.
DISTINCT EID is saying that there are 218,529 different values of EID. This should be returning one row for each of the 218,529 different EIDs.
When you use GROUP BY, you are getting one row for each of the EIDs. In this case, you get the same number.
Try running this query:
SELECT COUNT(*) as num_rows, COUNT(DISTINCT EID) as num_eids
FROM test.ourBQtable;
This will show the number of rows in the table and the number of distinct values of EID (ignoring NULL values)`.
Below two query are equivalent and return same number or rows - one per each unique EID
SELECT DISTINCT EID
FROM test.ourBQtable
and
SELECT EID
FROM test.ourBQtable
GROUP BY EID
That explains why number of output rows are the same
Now, in second query you added COUNT(EID)
SELECT COUNT(EID) AS Total, EID
FROM test.ourBQtable
GROUP BY EID
this does not change the number of output rows, but rather adds count of rows in test.ourBQtable with respective EID (if you sum all these counts - you will get total rows in the original table)

SELECT TOP 1 is returning multiple records

I shall link my database down below.
I have a query called 'TestMonday1' and what this does is return the student with the fewest 'NoOfFrees' and insert the result of the query into the lesson table. Running the query should help explain what i mean. The problem im having is my SQL code has 'SELECT TOP 1' yet if the query returns two students who have the same number of frees it returns both these records. Wit this being a timetable planner, it should only ever return one result, i shall also put the code below,
Many thanks
Code:
INSERT INTO Lesson ( StudentID, LessonStart, LessonEnd, DayOfWeek )
SELECT TOP 1 Availability.StudentID, Availability.StartTime,
Availability.EndTime, Availability.DayOfWeek
FROM Availability
WHERE
Availability.StartTime='16:00:00' AND
Availability.EndTime='18:00:00' AND
Availability.DayOfWeek='Monday' AND
LessonTaken IS NULL
ORDER BY
Availability.NoOfFrees;
This happens because Access returns all records in case of ties in ORDER BY (all records returned have the same values of fields used in ORDER BY).
You can add another field to ORDER BY to make sure there's no ties. StudentID looks like a good candidate (though I don't know your schema, replace with something else if it suits better):
ORDER BY
Availability.NoOfFrees, Availability.StudentID;

access select distinct on certain column

I have a table with some search results. The search results maybe repeated because each result may be found using a different metric. I want to then query this table select only the distinct results using the ID column. So to summarize I have a table with an ID column but the IDs may be repeated and I want to select only one of each ID with MS Access SQL, how should I go about doing this?
Ok I have some more info after trying a couple of the suggestions. The Mins, and Maxes won't work because the column they are operating on cannot be shown. I get an error like You tried to execute a query that does not include the specified expression... I now have all my data sorted, here is what it looks like
ID|Description|searchScore
97 test 1
97 test .95
120 ball .94
97 test .8
120 ball .7
so the problem is that since the rows were put into the table using different search criteria I have duplicated rows with different scores. What I want to do is select only one of each ID sorted by the searchScore descending. Any ideas?
SELECT DISTINCT ID
FROM Search_Table;
Based on the last update to your question, the following query seems appropriate.
SELECT ID, [Description], Max(searchScore)
FROM Search_Table
GROUP BY ID, [Description];
However that's nearly the same as Gordon's suggestion from yesterday, so I'm unsure whether this is what you want.
Here is a way where you can get one of the search criteria:
select id, min(search_criteria)
from t
group by id
This will always return the first one alphabetically. You can also easily get the last one using max().
You could also use:
select id, first(search_criteria)
from t
group by id

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)