sql "group by" same PersonID, different PersonNames. Eliminate duplicates - sql

I have a (rather dirty) datasource (excel) that looks like this:
ID | Name | Subject | Grade
123 | Smith, Joe R. | MATH | 2.0
123 | Smith, Joe Rodriguez | FRENCH | 3.0
234 | Doe, Mary Jane D.| BIOLOGY | 2.5
234 | Doe, Mary Jane Dawson| CHEMISTRY | 2.5
234 | Doe, Mary Jane | FRENCH | 3.5
My application's output should look like this:
Smith, Joe R.
123
MATH | 2.0
FRENCH | 3.0
So basically I want to do query (just for the ID/Person parent 'container') something like:
SELECT DISTINCT ID, Name FROM MyTable<br/>
or
SELECT ID, Name FROM MyTable GROUP BY ID
Of course both of the above are invalid and won't work.
I would like to 'combine' the same ID's and ignore/truncate the other records with the same ID/different Name (because we all know they're the same person since ID is our identifier and clearly it's just a typo/dirty data).
Can this be done by a single SELECT query?

If you don't really care which value shows up in the name field, use MAX() or MIN():
SELECT ID,
MAX(Name) AS Name
FROM [YourTable]
GROUP BY ID
Here's a working example to play with: https://data.stackexchange.com/stackoverflow/q/116699/

You can find the MIN or MAX Value of Name
SELECT ID, Max(Name)
FROM MyTable
GROUP BY ID

SELECT A.ID, A.NAME, T.Subject, T.Grade
FROM (SELECT ID, MIN(NAME) AS NAME
FROM MyTable
GROUP BY ID) A
LEFT JOIN MyTable T on A.ID = T.ID
Will give you something like
123 Smith, Joe R. Math 2.0
123 Smith, Joe R. FRENCH 3.0
234 Doe, Mary Jane BIOLOGY 2.5
234 Doe, Mary Jane CHEMISTRY 2.5
234 Doe, Mary Jane FRENCH 3.5

If you don't care which name you keep, you can use a MAX() or MIN() aggregate to pick just one name:
SELECT ID, MAX(Name) as Name
FROM MyTable GROUP BY ID

Related

How to query: "for which do these values apply"?

I'm trying to match and align data, or resaid, count occurrences and then list for which values those occurrences occur.
Or, in a question: "How many times does each ID value occur, and for what names?"
For example, with this input
Name ID
-------------
jim 123
jim 234
jim 345
john 123
john 345
jane 234
jane 345
jan 45678
I want the output to be:
count ID name name name
------------------------------------
3 345 jim john jane
2 123 jim john
2 234 jim jane
1 45678 jan
Or similarly, the input could be (noticing that the ID values are not aligned),
jim john jane jan
----------------------------
123 345 234 45678
234 123 345
345
but that seems to complicate things.
As close as I am to the desired results is in SQL, as
for ID, count(ID)
from table
group by (ID)
order by count desc
which outputs
ID count
------------
345 3
123 2
234 2
45678 1
I'll appreciate help.
You seem to want a pivot. In SQL, you have to specify the number of columns in advance (unless you construct the query as a string).
But the idea is:
select ID, count(*) as cnt,
max(case when seqnum = 1 then name end) as name_1,
max(case when seqnum = 2 then name end) as name_2,
max(case when seqnum = 3 then name end) as name_3
from (select t.*,
row_number() over (partition by id order by id) as seqnum -- arbitrary ordering
from table t
) t
group by ID
order by count desc;
If you have an unknown number of columns, you can aggregate the values into an array:
select ID, count(*) as cnt,
array_agg(name order by name) as names
from table t
group by ID
order by count desc
the query would look similar to this if that's what you're looking for.
SELECT
name,
id,
COUNT(id) as count
FROM
dataSet
WHERE
dataSet.name = 'input'
AND dataSet.id = 'input'
GROUP BY
name,
id

SQL Server convert row values to columns

I have an SQL table like this
Name1 Name2 Department1 Department2 Location1 Location2
----------------------------------------------------------------------
Jhon Alex IT Marketing London Seattle
Mark Dan Sales R&D Paris Tokyo
How can I query these results in this format:
Name Department Location
---------------------------------------
Jhon IT London
Alex Marketing Seattle
Mark Sales Paris
Dan R&D Tokyo
Use cross apply
DEMO
select name,department,location
from t
cross apply
(
values(name1,department1,location1),(name2,department2,location2)
)cc (name, department,location)
OUTPUT:
name department location
Jhon IT London
Alex Marketing Seattle
Mark Sales Paris
Dan R&D T Tokyo
You could try to use SQL Server's UNPIVOT operator, but honestly a plain union query might even perform better:
SELECT Name1 AS Name, Department1 AS Department, Location1 AS Location FROM yourTable
UNION ALL
SELECT Name2, Department2, Location2 FROM yourTable;
Regarding your expected ordering, there is no sort of id column in your original table which maintains to which name pair each record belongs. So, what I have written above might be the best we can do here.
Try This:
DECLARE #TestDemo AS TABLE(Name1 VARCHAR(10),Name2 VARCHAR(10),Department1 VARCHAR(10),Department2 VARCHAR(10),Location1 VARCHAR(10),Location2 VARCHAR(10))
INSERT INTO #TestDemo VALUES('Jhon','Alex','IT','Marketing','London','Seattle')
INSERT INTO #TestDemo VALUES('Mark','Dan','Sales','R&D','Paris','Tokyo')
SELECT Name1 'Name',Department1 'Department',Location1 'Location' FROM #TestDemo
UNION ALL
SELECT Name2 'Name',Department2 'Department',Location2 'Location' FROM #TestDemo

SQL Server group by? [duplicate]

This question already has answers here:
Retrieving last record in each group from database - SQL Server 2005/2008
(2 answers)
Closed 4 years ago.
I'm not sure how to word my question so perhaps an example would be best. I'm looking for a function or statement that would produce the following result from a single table. For each name, return the row with largest id.
ID NAME ADDRESS
1 JOHN DOE 123 FAKE ST.
2 JOHN DOE 321 MAIN ST.
3 JOHN DOE 333 2ND AVE.
4 MARY JANE 222 1ST. AVE
5 MARY JANE 444 POPLAR ST.
6 SUZY JO 999 8TH AVE.
DESIRED RESULT
3 JOHN DOE 333 2ND AVE.
5 MARY JANE 444 POPLAR ST.
6 SUZY JO 999 8TH AVE.
One option is to use the row_number window function. This allows you to establish a row number to the result set. Then you can define the grouping and ordering within the over clause, in this case you want to partition by (group) the name field and order by the id field descending. Finally you filter those results where rn = 1 which returns the max result for each grouping.
select *
from (
select *, row_number() over (partition by name order by id desc) rn
from yourtable
) t
where rn = 1

SQL statements without group by

If I wanted to find all values in a table that occur more than twice without using group by, how would I do that? I understand how to do this with group by and was curious how to do it without group by (EDIT: could you do this with join?).
For example, if I had last names in a certain zip code, and I wanted to find entries with this last name more than twice, how would I do this without group by in SQL statements?
I tried
select name, count() from population order by name asc having count() > 2;
but that doesn't do what I want it to. Any suggestions?
Being this tagged only as sql it seems a general solution is being looked for. Since the SQL:2003 revision it should be fair to say that this can be solved with window functions:
SELECT name FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) rank,
name
FROM population
) s
WHERE rank = 3
See a sample fiddle here.
Anyway, the fact that it is possible to solve this without a GROUP BY doesn't mean that it should :)
It seems correlated query can work for you. Please check.
Assuming the data set given below
Id Zip Lastname
--- ----- --------
101 12345 John
102 12345 John
103 12345 John
104 12345 Ram
105 12345 Kelly
106 12345 Kelly
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna
110 45678 David
111 45678 David
Query
select * from test.population pop1
where 2 < (select count(*) from test.population pop2
where pop1.Lastname=pop2.Lastname and pop1.Zipcode = pop2.Zipcode)
The output of above query is
Id Zip Lastname
--- ------ --------
101 12345 John
102 12345 John
103 12345 John
107 45678 Krishna
108 45678 Krishna
109 45678 Krishna

SQL: Add counters in select

I have a table which contains names:
Name
----
John Smith
John Smith
Sam Wood
George Wright
John Smith
Sam Wood
I want to create a select statement which shows this:
Name
'John Smith 1'
'John Smith 2'
'Sam Wood 1'
'George Wright 1'
'John Smith 3'
'Sam Wood 2'
In other words, I want to add separate counters to each name. Is there a way to do it without using cursors?
Use ROW_NUMBER():
SELECT Name, ROW_NUMBER() OVER(Partition BY Name ORDER BY Name) as [Rank]
FROM MyTable
Doing:
select name, count(*) as total from table group by name;
will get you something that looks like this:
name | total
-------------+------------
John Smith | 2
-------------+------------
Sam Wood | 2
-------------+------------
George Wright| 1
This isn't what you really wanted though - ROW_NUMBER(), as ck pointed out, is what you want, but not all databases support it - mysql doesn't, for example. If you're using MySQL, this might help:
ROW_NUMBER() in MySQL