SQL: Select distinct one column but include all fields - sql

How can I select distinct one column (user) and then output the rest of the fields based on this one column?
Input:
user age country
--------------------------------
Tom 34 US
Tom 32 EN
Dick 29 MX
Dick 29 DE
Harry 15 CA
output (distinct user column, and pick one row to output for rest of fields):
user age country count
--------------------------------------
Tom 34 US 2
Dick 29 MX 2
Harry 15 CA 1
Any help would be appreciated!

SELECT USER, AGE, MAX(COUNTRY), COUNT(*)
FROM TABLE
GROUP BY USER, AGE
You could try changing the MAX for a MIN. No need for a DISTINCT here.
You could use some data format like SUBSTRING, but I'm not sure if the rest of the data will always be like that US and USS etc. Buy if you have more than 2/3 or if the changes start beyond a specific character you may encounter some wrong query results.
According to comments and updates.
SELECT USER, MAX(AGE), MAX(COUNTRY), COUNT(*)
FROM TABLE
GROUP BY USER.

SELECT user, age, country, COUNT(*) AS c_rec FROM
(
SELECT DISTINCT user, age, SUBSTRING(country, 1, 2) AS country FROM yourTable
) T
GROUP BY user, age, country

Related

How to return all names that appear multiple times in table [duplicate]

This question already has answers here:
What's the SQL query to list all rows that have 2 column sub-rows as duplicates?
(10 answers)
Closed last year.
Suppose I have the following schema:
student(name, siblings)
The related table has names and siblings. Note the number of rows of the same name will appear the same number of times as the number of siblings an individual has. For instance, a table could be as follows:
Jack, Lucy
Jack, Tim
Meaning that Jack has Lucy and Tim as his siblings.
I want to identify an SQL query that reports the names of all students who have 2 or more siblings. My attempt is the following:
select name
from student
where count(name) >= 1;
I'm not sure I'm using count correctly in this SQL query. Can someone please help with identifying the correct SQL query for this?
You're almost there:
select name
from student
group by name
having count(*) > 1;
HAVING is a where clause that runs after grouping is done. In it you can use things that a grouping would make available (like counts and aggregations). By grouping on the name and counting (filtering for >1, if you want two or more, not >=1 because that would include 1) you get the names you want..
This will just deliver "Jack" as a single result (in the example data from the question). If you then want all the detail, like who Jack's siblings are, you can join your grouped, filtered list of names back to the table:
select *
from
student
INNER JOIN
(
select name
from student
group by name
having count(*) > 1
) morethanone ON morethanone.name = student.name
You can't avoid doing this "joining back" because the grouping has thrown the detail away in order to create the group. The only way to get the detail back is to take the name list the group gave you and use it to filter the original detail data again
Full disclosure; it's a bit of a lie to say "can't avoid doing this": SQL Server supports something called a window function, which will effectively perform a grouping in the background and join it back to the detail. Such a query would look like:
select student.*, count(*) over(partition by name) n
from student
And for a table like this:
jack, lucy
jack, tim
jane, bill
jane, fred
jane, tom
john, dave
It would produce:
jack, lucy, 2
jack, tim, 2
jane, bill, 3
jane, fred, 3
jane, tom, 3
john, dave, 1
The rows with jack would have 2 on because there are two jack rows. There are 3 janes, there is 1 john. You could then wrap all that in a subquery and filter for n > 1 which would remove john
select *
from
(
select student.*, count(*) over(partition by name) n
from student
) x
where x.n > 1
If SQL Server didn't have window functions, it would look more like:
select *
from
student
INNER JOIN
(
select name, count(*) as n
from student
group by name
) x ON x.name = student.name
The COUNT(*) OVER(PARTITION BY name) is like a mini "group by name and return the count, then auto join back to the main detail using the name as key" i.e. a short form of the latter query
You can do:
select name
from student as s1
where exists (
select s2
from student as s2
where s1.name = s2.name and s1.siblings != s2.siblings
)
I think the best approach is what 'Caius Jard' mentioned. However, additional way if you want to get how many siblings each name has .
SELECT name, COUNT(*) AS Occurrences
FROM student
GROUP BY name
HAVING (COUNT(*) > 1)
I wanted to share another solution I came up with:
select s1.name
from student s1, student s2
where s1.name = s2.name and s1.sibling != s2.sibling;

SQLite query to get table based on values of another table

I am not sure what title has to be here to correctly reflect my question, I can only describe what I want.
There is a table with fields:
id, name, city
There are next rows:
1 John London
2 Mary Paris
3 John Paris
4 Samy London
I want to get a such result:
London Paris
Total 2 2
John 1 1
Mary 0 1
Samy 1 0
So, I need to take all unique values of name and find an appropriate quantity for unique values of another field (city)
Also I want to get a total quantity of each city
Simple way to do it is:
1)Get a list of unique names
SELECT DISTINCT name FROM table
2)Get a list of unique cities
SELECT DISTINCT city FROM table
3)Create a query for every name and city
SELECT COUNT(city) FROM table WHERE name = some_name AND city = some_city
4)Get total:
SELECT COUNT(city) FROM table WHERE name = some_name
(I did't test these queries, so maybe there are some errors here but it's only to show the idea)
As there are 3 names and 2 cities -> 3 * 2 = 6 queries to DB
But for a table with 100 cities and 100 names -> 100 * 100 = 10 000 queries to DB
and it may take a lot of time to do.
Also, names and cities may be changed, so, I can't create a query with predefined names or cities as every day it's new ones, so, instead of London and Paris it may be Moscow, Turin and Berlin. The same thing with names.
How to get such table with one-two queries to original table using sqlite?
(sqlite: I do it for android)
You can get the per-name results with conditional aggregation. As for the total, unfortunately SQLite does not support the with rollup clause, that would generate it automatically.
One workaround is union all and an additional column for ordering:
select name, london, paris
from (
select name, sum(city = 'London') london, sum(city = 'Paris') paris, 1 prio
from mytable
group by name
union all
select 'Total', sum(city = 'London'), sum(city = 'Paris'), 0
from mytable
) t
order by prio, name
Actually the subquery might not be necessary:
select name, sum(city = 'London') london, sum(city = 'Paris') paris, 1 prio
from mytable
group by name
union all
select 'Total', sum(city = 'London'), sum(city = 'Paris'), 0
from mytable
order by prio, name
#GMB gave me the idea of using group by, but as I do it for SQLite on Android, so, the answer looks like:
SELECT name,
COUNT(CASE WHEN city = :london THEN 1 END) as countLondon,
COUNT(CASE WHEN city = :paris THEN 1 END) as countParis
FROM table2 GROUP BY name
where :london and :paris are passed params, and countLondon and countParis are fields of the response class

How to find people in a database who live in the same cities?

I'm new to SQL, and I'm asking for help in an apparently easy question, but it gets cumbersome in my mind.
I have the following table:
ID NAME CITY
---------------------
1 John new york
2 Sam new york
3 Tom boston
4 Bob boston
5 Jan chicago
6 Ted san francisco
7 Kat boston
I want a query that returns all the people who live in a city that another person registered in the database also lives in.
The answer, for the table I showed above, would be:
ID NAME CITY
---------------------
1 John new york
2 Sam new york
3 Tom boston
4 Bob boston
7 Kat boston
This is really a two part question:
What cities have more than one user located in them?
What users live in that subset of cities?
Let's answer it in two parts. Let's also make the simplifying assumption (not stated in your question) that the Users table has only one entry per user per city.
To find cities with more than one user:
SELECT City FROM Users GROUP BY City HAVING COUNT(*) > 1
Now, let's find all the users for those cities:
SELECT ID, User, City FROM Users
WHERE City IN (SELECT City FROM Users GROUP BY CITY HAVING COUNT(*) > 1)
I would use EXISTS :
SELECT t.*
FROM table t
WHERE EXISTS (SELECT 1 FROM table t1 WHERE t1.city = t.city AND t1.name <> t.name);
To avoid a correlated subquery which leads to a nested loop, you could perform a self join:
SELECT id, name, city
FROM persons
JOIN (SELECT city
FROM persons
GROUP BY city HAVING count(*) > 1) AS cities
USING (city);
This might be the most performant solution.
This will give you the rows that have the same city more than 1 time:
SELECT persons.*
FROM persons
WHERE (SELECT COUNT(*) FROM persons AS p GROUP BY CITY HAVING p.CITY = persons.CITY) > 1
This is just a different flavor from the others that have posted.
SELECT ID,
name,
city
FROM (SELECT DISTINCT
ID,
name,
city,
COUNT(1) OVER (PARTITION BY city) AS cityCount
FROM table) t
WHERE cityCount > 1
This can be expressed many ways. Here is one possible way:
select * from persons p
where exists (
select 1 from persons p2
where p2.city = p.city and p2.name <> p.name
)

Hive - How to combine a group by over columns A and B and a distinct over column C

I need to create a query which selects from a particular table the users which have more than one different email. To distinguish users, I group them based on two fields: name and age. Let's see this with an example.
So I have a table like this:
name age email phone
----------------------------------
Andy 20 Andy#du 1234
Berni 21 Berni#du 2345
Carol 22 Carol#du 3456
Andy 20 Andy#du 4321
Berni 21 Berni#et 2345
Dody 28 Dodi#du 7869
Carol 22 Carol#pt 3456
What I want to get is:
Berni 21 Berni#du, Berni#et
Carol 22 Carol#du, Carol#pt
Note that Andy is also twice in the database but with same email (what changes is the phone number). Because of this user I need to make a distinc over email, so only users with two different emails are selected.
With this query I am able to solve the issue and I have the desired result.
select * from
(
select aux.name,
aux.age,
concat_ws(',',collect_set(email)) as email
FROM
(select a.name, a.age, a.email
FROM TestUsers a
RIGHT JOIN
(select name,
age
FROM TestUsers
GROUP BY
name,
age
having count(*) > 1
)b
ON a.name = b.name
AND a.age = b.age
)aux
GROUP BY aux.name,
aux.age
)tr
where locate(",",tr.email) > 0;
But I am sure it has to be a more efficient way than checking when there is not a comma in the email field(which means more than one email).
Has anyone in mind a better approach?
If I understand correctly, you should be able to do this using a having clause:
select tu.name, tu.age,
concat_ws(',', collect_list(tu.email)) as emails
from (select distinct tu.name, tu.age, tu.email
from TestUsers tu
) tu
group by tu.name, tu.age
having count(*) > 1;
Actually, because collect_set() removes duplicates, this should work without a subquery:
select tu.name, tu.age,
concat_ws(',', collect_set(tu.email)) as emails
from testusers tu
group by tu.name, tu.age
having min(tu.email) <> max(tu.email);

SQL: Join with complete participation

Suppose that I have a table Person(Name, Hobby) and there are 3 hobbies in total. The table's values are like
Amy | Stamp Collection
Kevin | Mountain Biking
Kevin | Stamp Collection
Ron | Mountain Biking
Here, Kevin has both the hobbies Mountain Biking and Stamp Collection. I need to write a query to retrieve Kevin.
How can I get the person who has all the hobbies?
Thanks
SELECT Name
FROM Person
GROUP BY Name
HAVING COUNT(*) = (SELECT COUNT(DISTINCT Hobby) FROM Person)
Runnable example
note : not tested, and its correct in Oracle sql
you can try this :
SELECT *
FROM
(
SELECT p.name,
count(distinct p.hobby) cnt
FROM Person p
GROUP BY p.name
) p2
WHERE p2.cnt = (SELECT count(distinct Hobby)
FROM Person)
If you need to get the person's name and count use this:
SELECT count(*) as count, Name FROM Person group by Name limit 1
This will get you the person's name and the amount of hobbies they have. To get all hobbies add the field to the query and remove the grouping:
SELECT Name, Hobby FROM Person where Name = {name} limit 1
(without the curly braces)