SQL: Getting a value multiple times - sql

I have a problem getting the same value multiple times and I don't know what I am doing wrong, it's probably something very simple but nothing seems to work for me, and as I said, I need it for a school project and I have only been doing this for about a week.
This is my code:
select hobby
from preshobby
order by hobby asc
When I click execute I get the same value a couple of times. For example:
Wrestling
Wlking
Walking
Walking
Walking
Walking
Touch Football
Tennis
I need the result to be in ascending order and each value should only appear once.

Use distinct:
select distinct hobby
from preshobby
order by hobby
Note that you don't need to specify asc with order by as ascending is the default sort order in most versions of SQL.

In your table you have probably many entries with repeated hobbies. So you need to group them like this
select hobby
from preshobby
group by hobby order by hobby asc

You are basically selecting all the values of hobbies you have entered in the database column.. Since there are many people with same hobby.. when you query the table for the column, you see repetitive values. Use distinct like this..
select distinct hobby from table Name;
And default order is asc so you need not specify any value unless you need it descending.

Related

Validate that only one value exists

I have a table with two relevant columns. I'll call them EID and MID. They are not unique.
In theory, if the data is set up correctly, there will be many records for each EID and every one of those records should have the same MID.
There are situations where someone may manually update data incorrectly and I need to be able to quickly identify if there is a second MID for any EID.
Ideally, I'd have a query that returns how many MIDs for each EID, but only showing results where there is more than 1 MID. Below is what I'd like the results to look like.
EID Count of Distinct MID values
200345 2
304334 3
I've tried several different forms of queries, but I can't seem to figure out how to reach this result. We're on SQL Server.
You can use the following using COUNT with DISTINCT and HAVING:
SELECT EID, COUNT(DISTINCT MID)
FROM table_name
GROUP BY EID
HAVING COUNT(DISTINCT MID) > 1
demo on dbfiddle.uk

what does Group By multiple columns means?

I use oracle 11g , so i read alot of artics about it but i dont understand
how exactly its happened in database , so lets say that have two tables:
select * from Employee
select * from student
so when we want to make group by in multi columns :
SELECT SUBJECT, YEAR, Count(*)
FROM Student
GROUP BY SUBJECT, YEAR;
so my question is: what exactly happened in database ? i mean the query count(*) do first in every column in group by and then sort it ? or what? can any one explain it in details ?.
SQL is a descriptive language, not a procedural language.
What the query does is determine all rows in the original data where the group by keys are the same. It then reduces them to one row.
For example, in your data, these all have the same data:
subject year name
English 1 Harsh
English 1 Pratik
English 1 Ramesh
You are saying to group by subject, year, so these become:
Subject Year Count(*)
English 1 3
Often, this aggregation is implemented using sorting. However, that is up to the database -- and there are many other algorithms. You cannot assume that the database will sort the data. But, if it easier for you to think of it, you can think of the data being sorted by the group by keys, in order to identify the groups. Just one caution, the returned values are not necessarily in any particular order (unless your query includes an order by).

Take the value of the first row met in non-aggregate expressions

I have a query like this:
SELECT PlayerID, COUNT(PlayerID) AS "MatchesPlayed", Name, Role, Team,
SUM(Goals) As "TotalGoals", SUM(Autogoals) As "TotalAutogoals",
SUM(...)-2*SUM(...)+2*SUM(...) AS Score, ...
FROM raw_ordered
GROUP BY PlayerID
ORDER BY Score DESC
where in raw_ordered each row describes the performance of some player in some match, in reverse chronological order.
Since I'm grouping by PlayerID what I get from this query is a table where each row provides the cumulative data about some player. Now, there's no problem with columns with aggregate functions; my problem is with the Team column.
A player may change team during a season; what I'm interested in here is the last Team he played with, so I'd like to have a way to tell SELECT to take the first value met in each group for the Team column (or, in general, for non-aggregate-function columns).
Unfortunately, I don't seem to find any (easy) way to do this in SQLite: the documentation of SELECT says:
If the expression is an aggregate expression, it is evaluated across all rows in the group. Otherwise, it is evaluated against a single arbitrarily chosen row from within the group.
with no suggestion about how to alter this behavior, and I can't find between the aggregate functions anything that just takes the first value it encounters.
Any idea?
SQLite does not have a 'first' aggregate function; you would have to implement it yourself.
However, the documentation is out of date. Since SQLite 3.7.11, if there is a MIN() or MAX(), the record from which that minimum/maximum value comes is guaranteed to be chosen.
Therefore, just add MAX(MatchDate) to the SELECT column list.
SELECT PlayerID, COUNT(PlayerID) AS "MatchesPlayed", Name, Role,
(SELECT Team FROM raw_ordered GROUP BY PlayerID ORDER BY some_date) AS team,
SUM(Goals) As "TotalGoals", SUM(Autogoals) As "TotalAutogoals",
SUM(...)-2*SUM(...)+2*SUM(...) AS Score, ...
FROM raw_ordered
GROUP BY PlayerID
ORDER BY Score DESC
Presumably you have some way in your table to order the output such that you can use a subquery to achieve your goal.

SQL query: Using DISTINCT/UNIQUE and SUM() in one statement

THE PROBLEM
I have four game servers collecting data. Putting data into a single table. Unfortunately this causes four entries per stat for one player.
WHAT I WANT
I want one record and SUM()on certain columns.
I will post a statement that I wrote which doesn't work but you'll get the point of what I would like to accomplish.
SELECT DISTINCT( Name ),
Max(Level),
Class,
Sum(Kills),
Sum(Deaths),
Sum(Points),
Sum(TotalTime),
Sum(TotalVisits),
CharacterID
FROM Characters
WHERE Name LIKE '$value%'
OR CharacterID LIKE '$value'
ORDER BY Name ASC;
Let me start by saying that duplicate rows in your database is truly less than ideal. A fully normalized database makes data much easier to manipulate without having random anomalies pop up.
However, to answer your question, this is simply what dweiss said put into code (using group by):
SELECT
name,
MAX(Level),
Class,
SUM(Kills),
SUM(Deaths),
SUM(Points),
SUM(TotalTime),
SUM(TotalVisits),
CharacterID
FROM
Characters
WHERE
Name LIKE '$value%' OR CharacterID LIKE '$value'
GROUP BY
name,
class,
characterid
ORDER BY
Name ASC
;
I'm assuming name, class, characterID, are all the same for each player, because that's the only way to get those values in there. Otherwise you'll have to use aggregate functions on them as well.
Instead of using distinct, you could group by your non-aggregated fields (ie name, class, characterid). This way you can use your aggregates: max, sum, and you will still have your distinct characters!

JOIN on another table after GROUP BY and COUNT

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BY to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY that I shouldn't be.
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
I have two tables:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE clause) that returns
the number of matching people in each city
the key for the city
the name of the city.
If I do
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
You misunderstand, you got it backwards.
Standard SQL requires you to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
However, I've read that throwing in
that last part of the GROUP BY clause
(City.name) just to make it work is
wrong.
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
the required tables are joined
the composite dataset is filtered through the WHERE clause
the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
they are then filtered again, through the HAVING clause
finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
Your query would only work on MySQL, because you group on Person.cityKey but select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.key to the group by clause.
Because the combination of city name and city key is unique, the following are equivalent:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max or min aggregate.
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
count(DISTINCT person.key)
instead of
count(person.key)