TSQL, counting pairs of values in a table - sql

Given a table in the format of
ID Forename Surname
1 John Doe
2 Jane Doe
3 Bob Smith
4 John Doe
How would you go about getting the output
Forename Surname Count
John Doe 2
Jane Doe 1
Bob Smith 1
For a single column I would just use count, but am unsure how to apply that for multiple ones.

SELECT Forename, Surname, COUNT(*) FROM YourTable GROUP BY Forename, Surname

I think this should work:
SELECT Forename, Surname, COUNT(1) AS Num
GROUP BY Forename, Surname


How do you return a specific row per group by some criteria in BigQuery?

I have a table of people with firstname, surname, and age. I would like to retrieve the oldest person in each family (by surname). I don't want to just return the surname and age (via MAX(age) and GROUP BY surname) I want the entire row.
For example if my data is:
firstname, surname, age
john, smith, 31
sally, smith, 33
bob, smith, 34
john wayne, 35
bob wayne, 31
I would like my query to return:
firstname, surname, age
bob, smith, 34
john wayne, 35
Consider below approach
select surname, array_agg(struct(firstname, age) order by age desc limit 1)[offset(0)].*
from your_table
group by surname
if applied to sample data in your question - output is

How to return a count of duplicates and unique values into a column in Access

I currently have this table:
And I'm looking to get the table to look for duplicates in the last name and return a value into a new column and exclude any null/unique values like the table below, or return a Yes/No into the third column.
When I'm trying to enter the query into the Access Database, I keep getting the run-time 3141 error.
The code that I tried in order to get the first option is:
SELECT first_name, last_name, COUNT (last_name) AS Duplicates
FROM table
GROUP BY last_name, first_name
HAVING COUNT(last_name)=>0
You can use a subquery. But I would recommend 1 instead of 0:
select t.*,
(select count(*)
from t as t2
where t2.last_name = t.last_name
from t;
If you really want zero instead of 1, then one method is:
select t.*,
(select iif(count(*) = 1, 0, count(*))
from t as t2
where t2.last_name = t.last_name
from t;

Alias scoping in SQL

I'm having an issue with a complex query on an SQLite3 database that I think has to do with a misunderstanding on my part of how to refer to columns in a results table returned by a select statement, especially when aliases are involved.
Here is an example table - a list of movie IDs with a row for each actor working on the movie:
CREATE TABLE movie_actor (imdb_id TEXT, actor TEXT);
INSERT INTO movie_actor VALUES('44r4', 'John Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jane Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jacob Doe');
INSERT INTO movie_actor VALUES('55r5', 'John Doe');
INSERT INTO movie_actor VALUES('55r5', 'Jane Doe');
INSERT INTO movie_actor VALUES('55r5', 'Nathan Deer');
INSERT INTO movie_actor VALUES('66r6', 'Bob Duck');
INSERT INTO movie_actor VALUES('66r6', 'John Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jane Doe');
INSERT INTO movie_actor VALUES('77r7', 'John Doe');
I am trying to find out the how many times each pair of actors worked with each other across all movies. I decided to go about this with a self-join, but ran into issues where I would get record pairs such as "John Doe, Jane Doe, 3" and "Jane Doe, John Doe, 3" - this is really the same thing, and I wanted to only count the first version. This is the code that resulted:
CASE WHEN d.actor_1 > d.actor_2 THEN d.actor_1 ELSE d.actor_2 END d.actor_1,
CASE WHEN d.actor_2 > d.actor_1 THEN d.actor_2 ELSE d.actor_1 END d.actor_2,
SELECT c.actor_1 AS actor_1, c.actor_2 AS actor_2, COUNT(*) AS v
SELECT a.actor AS actor_1, b.actor AS actor_2
FROM movie_actor a JOIN movie_actor b ON a.imdb_id=b.imdb_id
) AS c
WHERE c.actor_1 <> c.actor_2
GROUP BY c.actor_1, c.actor_2
AS d
This doesn't run, but I can't figure out why. My assumption is that I am not using aliases properly, but I really don't know. Any ideas?
(SQL Fiddle link here)
We get a simpler query, if we add the condition a.actor < b.actor. This excludes pairs with equal actors and at the same time it removed the need of swapping actors.
a.actor AS actor_1, b.actor AS actor_2, COUNT(*) AS v
movie_actor a
INNER JOIN movie_actor b
ON a.imdb_id = b.imdb_id
a.actor < b.actor
GROUP BY a.actor, b.actor
ORDER BY COUNT(*) DESC, a.actor, b.actor
Note: SQL always creates a cross product when joining, i.e. it creates all possible combinations of records that match the join condition. Therefore for imdb 55r5 (including 3 actors) it will first generate the following 3 x 3 = 9 pairs:
John Doe John Doe
John Doe Jane Doe
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Jane Doe
Jane Doe Nathan Deer
Nathan Deer John Doe
Nathan Deer Jane Doe
Nathan Deer Nathan Deer
Then the WHERE-clause excludes all a >= b pairs and we get
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Nathan Deer
Generate the distinct pairs first, then count them.
select actor_1, actor_2, count(*)
from (select distinct a.imdb_id, a.actor as actor_1, b.actor as actor_2
from movie_actor a
inner join movie_actor b on a.imdb_id = b.imdb_id
where a.actor < b.actor) x
group by actor_1, actor_2
order by actor_1, actor_2;
actor_1 actor_2 count(*)
---------- ---------- ----------
Bob Duck Jane Doe 1
Bob Duck Jermaine D 1
Bob Duck John Doe 1
Jacob Doe Jane Doe 1
Jacob Doe Jermaine D 1
Jacob Doe John Doe 1
Jane Doe Jermaine D 2
Jane Doe John Doe 3
Jane Doe Nathan Dee 1
Jermaine D John Doe 2
John Doe Nathan Dee 1

selecting a row using MIN or ROWNUM

I have a oracle table which is similar to the one below which stores people's lastname firstname and age. If last name is same people belong to same family.
LastName FirstName Age
1 miller charls 20
2 miller john 30
3 anderson peter 45
4 Bates andy 50
5 anderson gary 60
6 williams mark 15
I need to write a oracle sql query to
select youngest person from each family. output shd select rows 1,3,4 and 6
How do I do this ?
Another way, a bit shorter:
select lastname
, max(firstname) keep(dense_rank first order by age) as first_name
, max(age) keep(dense_rank first order by age) as age
from you_table_name
group by lastname
order by lastname
-------- ---------- ----------
Bates andy 50
anderson peter 45
miller charls 20
williams mark 15
And SQLFiddle Demo
DENSE_RANK() is a ranking function which generates sequential number and for ties the number generated is the same. I prefer to use DENSE_RANK() here considering that a family can have twins, etc.
SELECT Lastname, FirstName, Age
SELECT Lastname, FirstName, Age,
FROM tableName
) a
WHERE a.rn = 1
SQLFiddle Demo
With Standard SQL I would do as this...
select *
from family f1
where (
select count(*)
from family f2
f2.lastname = f1.lastname
f2.age <= f1.age) <= 1
order by lastname;
This SQL gives you possibilities to pick x youngest/oldest in a family. Just modify the f2.age <= f1.age to e.g. f2.age >= f1.age, and the <= 1 to e.g. <=10 (to get top 10 youngest/oldest in a family).

SQL GROUP BY one field and list latest value of two other fields at the same time

Take in consideration this data:
id firstname lastname registration_date
101126423 foo bar 2010-06-17 13:31:00.000
101126423 foo bar 2010-06-17 13:31:00.000
101126423 foo bar jr 2010-06-18 12:13:00.000
101152718 john doe 2010-02-26 19:08:00.000
101152718 john doe 2010-02-26 19:08:00.000
101152718 john doe 2010-02-26 19:08:00.000
You can have customers with with the same id but with a different firstname / lastname! I want to get all distinct ids but with the latest firstname/lastname (based on registration_date).
For my example I would get:
id firstname lastname
101126423 foo bar jr
101152718 john doe
So far I got:
SELECT DISTINCT id, firstname, lastname
FROM member
but it's obviously not working... I've tried other queries with no success so far. Maybe having can help me but I never used it...
I use SQL Server 2008 in this project.
A couple options for you:
Option 1:
;with cte as(
select id, max(registration_date) lastReg
from member
group by id
select distinct m.id, m.firstname, m.lastname
from member m
join cte c on m.id=c.id
and m.registration_date = c.lastReg
Option 2:
;with cte as(
select id, firstname, lastname,
row_number() over(partition by id order by registration_date desc) as 'order'
from member
select id, firstname, lastname
from cte
where order = 1
The biggest difference in the two, with regards to their results, is how they handle the case where the most recent registration time is duplicated for an id with multiple names. In this case, Option 1 will return both names that have the latest registration date and Option 2 will only return one (randomly). An example of this case is (a slight tweak of your sample data):
id firstname lastname registration_date
101126423 foo bar 2010-06-17 13:31:00.000
101126423 foo bar 2010-06-18 12:13:00.000
101126423 foo bar jr 2010-06-18 12:13:00.000
101152718 john doe 2010-02-26 19:08:00.000
101152718 john doe 2010-02-26 19:08:00.000
101152718 john doe 2010-02-26 19:08:00.000
--Option 1 result:
id firstname lastname
101126423 foo bar
101126423 foo bar jr
101152718 john doe
--Option 2 result (possibility 1):
id firstname lastname
101126423 foo bar
101152718 john doe
--Option 2 result (possibility 2):
id firstname lastname
101126423 foo bar jr
101152718 john doe