Alias scoping in SQL - sql

I'm having an issue with a complex query on an SQLite3 database that I think has to do with a misunderstanding on my part of how to refer to columns in a results table returned by a select statement, especially when aliases are involved.
Here is an example table - a list of movie IDs with a row for each actor working on the movie:
CREATE TABLE movie_actor (imdb_id TEXT, actor TEXT);
INSERT INTO movie_actor VALUES('44r4', 'John Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jane Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('44r4', 'Jacob Doe');
INSERT INTO movie_actor VALUES('55r5', 'John Doe');
INSERT INTO movie_actor VALUES('55r5', 'Jane Doe');
INSERT INTO movie_actor VALUES('55r5', 'Nathan Deer');
INSERT INTO movie_actor VALUES('66r6', 'Bob Duck');
INSERT INTO movie_actor VALUES('66r6', 'John Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jermaine Doe');
INSERT INTO movie_actor VALUES('66r6', 'Jane Doe');
INSERT INTO movie_actor VALUES('77r7', 'John Doe');
I am trying to find out the how many times each pair of actors worked with each other across all movies. I decided to go about this with a self-join, but ran into issues where I would get record pairs such as "John Doe, Jane Doe, 3" and "Jane Doe, John Doe, 3" - this is really the same thing, and I wanted to only count the first version. This is the code that resulted:
SELECT DISTINCT
CASE WHEN d.actor_1 > d.actor_2 THEN d.actor_1 ELSE d.actor_2 END d.actor_1,
CASE WHEN d.actor_2 > d.actor_1 THEN d.actor_2 ELSE d.actor_1 END d.actor_2,
d.v
FROM (
SELECT c.actor_1 AS actor_1, c.actor_2 AS actor_2, COUNT(*) AS v
FROM (
SELECT a.actor AS actor_1, b.actor AS actor_2
FROM movie_actor a JOIN movie_actor b ON a.imdb_id=b.imdb_id
) AS c
WHERE c.actor_1 <> c.actor_2
GROUP BY c.actor_1, c.actor_2
HAVING COUNT(*) > 2
ORDER BY COUNT(*) DESC
LIMIT 20
)
AS d
This doesn't run, but I can't figure out why. My assumption is that I am not using aliases properly, but I really don't know. Any ideas?
(SQL Fiddle link here)

We get a simpler query, if we add the condition a.actor < b.actor. This excludes pairs with equal actors and at the same time it removed the need of swapping actors.
SELECT
a.actor AS actor_1, b.actor AS actor_2, COUNT(*) AS v
FROM
movie_actor a
INNER JOIN movie_actor b
ON a.imdb_id = b.imdb_id
WHERE
a.actor < b.actor
GROUP BY a.actor, b.actor
ORDER BY COUNT(*) DESC, a.actor, b.actor
LIMIT 20
Note: SQL always creates a cross product when joining, i.e. it creates all possible combinations of records that match the join condition. Therefore for imdb 55r5 (including 3 actors) it will first generate the following 3 x 3 = 9 pairs:
John Doe John Doe
John Doe Jane Doe
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Jane Doe
Jane Doe Nathan Deer
Nathan Deer John Doe
Nathan Deer Jane Doe
Nathan Deer Nathan Deer
Then the WHERE-clause excludes all a >= b pairs and we get
John Doe Nathan Deer
Jane Doe John Doe
Jane Doe Nathan Deer

Generate the distinct pairs first, then count them.
select actor_1, actor_2, count(*)
from (select distinct a.imdb_id, a.actor as actor_1, b.actor as actor_2
from movie_actor a
inner join movie_actor b on a.imdb_id = b.imdb_id
where a.actor < b.actor) x
group by actor_1, actor_2
order by actor_1, actor_2;
actor_1 actor_2 count(*)
---------- ---------- ----------
Bob Duck Jane Doe 1
Bob Duck Jermaine D 1
Bob Duck John Doe 1
Jacob Doe Jane Doe 1
Jacob Doe Jermaine D 1
Jacob Doe John Doe 1
Jane Doe Jermaine D 2
Jane Doe John Doe 3
Jane Doe Nathan Dee 1
Jermaine D John Doe 2
John Doe Nathan Dee 1

Related

How to return a count of duplicates and unique values into a column in Access

I currently have this table:
First_Name
Last_Name
Jane
Doe
John
Smith
Bob
Smith
Alice
Smith
And I'm looking to get the table to look for duplicates in the last name and return a value into a new column and exclude any null/unique values like the table below, or return a Yes/No into the third column.
First_Name
Last_Name
Duplicates
Jane
Doe
0
John
Smith
3
Bob
Smith
3
Alice
Smith
3
OR
First_Name
Last_Name
Duplicates
Jane
Doe
No
John
Smith
Yes
Bob
Smith
Yes
Alice
Smith
Yes
When I'm trying to enter the query into the Access Database, I keep getting the run-time 3141 error.
The code that I tried in order to get the first option is:
SELECT first_name, last_name, COUNT (last_name) AS Duplicates
FROM table
GROUP BY last_name, first_name
HAVING COUNT(last_name)=>0
You can use a subquery. But I would recommend 1 instead of 0:
select t.*,
(select count(*)
from t as t2
where t2.last_name = t.last_name
)
from t;
If you really want zero instead of 1, then one method is:
select t.*,
(select iif(count(*) = 1, 0, count(*))
from t as t2
where t2.last_name = t.last_name
)
from t;

SQL Query: How to select multiple instances of a single item without collapsing into a group?

I'm trying to do with following with an SQL query in Impala. I've got a single data table that has (among other things) two columns with values that intersect multiple times. For example, let's say we have a table with two columns for related names and phone numbers:
Names Phone Numbers
John Smith (123) 456-7890
Rob Johnson (123) 456-7890
Greg Jackson (123) 456-7890
Tom Green (123) 456-7890
Jack Mathis (123) 456-7890
John Smith (234) 567-8901
Rob Johnson (234) 567-8901
Joe Wolf (234) 567-8901
Mike Thomas (234) 567-8901
Jim Moore (234) 567-8901
John Smith (345) 678-9012
Rob Johnson (345) 678-9012
Toby Ellis (345) 678-9012
Sam Wharton (345) 678-9012
Bob Thompson (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (456) 789-0123
Kelly Howe (456) 789-0123
Hank Rehms (456) 789-0123
Jim Fellows (456) 789-0123
What I need to get from this table is a selection of each item from the Name column that has multiple entries from the Phone Numbers column associated with it, like this:
Names Phone Numbers
John Smith (123) 456-7890
John Smith (234) 567-8901
John Smith (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (123) 456-7890
Rob Johnson (234) 567-8901
Rob Johnson (345) 678-9012
Rob Johnson (456) 789-0123
This is the query I've got so far, but it's not quite giving me the results I'm looking for:
SELECT a.name, a.phone_number, b.phone_number, b.count1
FROM databasename a
INNER JOIN (
SELECT phone_number, COUNT(phone_number) as count1
FROM databasename
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
Any ideas on how to improve my query to get the results I'm looking for?
Thank you.
Working with your query...
This generates a subset by name of users having more than 1 phone number it then joins back to the entire set based on name returning all phone numbers for users having more than 1 phone number. however if a user has the same phone number listed more than once it would get returned. to eliminate those if needed, add distinct to the count in the inline view.
SELECT a.name, a.phone_number
FROM databasename a
INNER JOIN (
SELECT name, COUNT(phone_number) as count1
FROM databasename
GROUP BY name
having COUNT(phone_number) > 1
) b
on a.name = b.name
Order by a.name, a.phone_Number
One method is to use exists:
select t.*
from tablename t
where exists (select 1 from tablename t2 where t2.name = t.name and t2.phonenumber <> t.phonenumber)
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.name = x.name
AND y.phone <> x.phone;

How do I transpose multiple rows to columns in SQL

My first time reading a question on here.
I am working at a university and I have a table of student IDs and their supervisors, some of the students have one supervisor and some have two or three depending on their subject.
The table looks like this
ID Supervisor
1 John Doe
2 Peter Jones
2 Sarah Jones
3 Peter Jones
3 Sarah Jones
4 Stephen Davies
4 Peter Jones
4 Sarah Jones
5 John Doe
I want to create a view that turns that into this:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe
2 Peter Jones Sarah Jones
3 Peter Jones Sarah Jones
4 Stephen Davies Peter Jones Sarah Jones
5 John Doe
I have looked at PIVOT functions, but don't think it matches my needs.
Any help is greatly appreciated.
PIVOT was the right clue, it only needs a little 'extra' :)
DECLARE #tt TABLE (ID INT,Supervisor VARCHAR(128));
INSERT INTO #tt(ID,Supervisor)
VALUES
(1,'John Doe'),
(2,'Peter Jones'),
(2,'Sarah Jones'),
(3,'Peter Jones'),
(3,'Sarah Jones'),
(4,'Stephen Davies'),
(4,'Peter Jones'),
(4,'Sarah Jones'),
(5,'John Doe');
SELECT
*
FROM
(
SELECT
ID,
'Supervisor ' + CAST(ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Supervisor) AS VARCHAR(128)) AS supervisor_id,
Supervisor
FROM
#tt
) AS tt
PIVOT(
MAX(Supervisor) FOR
supervisor_id IN ([Supervisor 1],[Supervisor 2],[Supervisor 3])
) AS piv;
Result:
ID Supervisor 1 Supervisor 2 Supervisor 3
1 John Doe NULL NULL
2 Peter Jones Sarah Jones NULL
3 Peter Jones Sarah Jones NULL
4 Peter Jones Sarah Jones Stephen Davies
5 John Doe NULL NULL
You will notice that the assignment to Supervisor X is done by ordering by the Supervisor-VARCHAR. If you want the ordering done differently, you might want to include an [Ordering] column; then change to ROW_NUMBER() OVER(PARTITION BY ID ORDER BY [Ordering]). Eg an [Ordering] column could be an INT IDENTITY(1,1). I'll leave that as an excercise to you if that's what's really needed.

Access 2007 SQL Merge tables without creating duplicates

I would like to add the unique values of tblA to tblB without creating duplicate values based on multiple fields. In the following example, FirstName and LastName determine a duplicate, Foo and Source are irrelevant.
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
tblB:
FirstName LastName Foo Source
John Doe 1 B
Bob Smith 5 B
Steve Smith 4 B
This is the result I want:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
Bob Smith 5 B
Here's an equivalent of the code I've tried:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
LEFT JOIN tblA AS A ON A.FirstName = B.FirstName AND A.LastName = B.LastName
WHERE A.FirstName IS NULL
And this is the result I get:
tblA:
FirstName LastName Foo Source
John Doe 1 A
Jane Doe 2 A
Steve Smith 3 A
Bill Johnson 2 A
John Doe 1 B
Bob Smith 5 B
Steve Smith from tblB is ignored, which is good. John Doe from tblB is added, which is bad. I've spent way too much time on this and I've inspected the data every way I can think of to ensure John Doe in tblA and tblB are the same first and last name. Any ideas on what could be going wrong?
Update: FYI, on my real tblB, about 10,000 of 30,000 should be moved to tblA. This is actually moving over 21,000. The problem is this is one step of a common process.
When I try:
SELECT tbb.*
FROM tbb
LEFT JOIN tba
ON (tbb.FirstName = tba.FirstName)
AND (tbb.LastName = tba.LastName)
WHERE (((tba.LastName) Is Null));
The only line returned is:
Bob Smith 5 B
Is it possible that John Doe has a hidden character?
Edit : Sorry, it doesn't work on Access2007
You have many way to do that :
INSERT INTO tblA
SELECT B.* FROM tblB AS B
WHERE B.firstname, B.lastname NOT IN (select firstname, lastname from tblA)
Or
INSERT INTO tblA
SELECT * FROM tblB
MINUS
SELECT * FROM tblA
This one works in Access.
You can run it to infinity - it won't add more rows than needed:
INSERT INTO tblA
SELECT B.*
FROM tblB AS B
WHERE (((B.FirstName) Not In (select firstname from tblA))
AND ((B.LastName) Not In (select firstname from tblA)))

TSQL, counting pairs of values in a table

Given a table in the format of
ID Forename Surname
1 John Doe
2 Jane Doe
3 Bob Smith
4 John Doe
How would you go about getting the output
Forename Surname Count
John Doe 2
Jane Doe 1
Bob Smith 1
For a single column I would just use count, but am unsure how to apply that for multiple ones.
SELECT Forename, Surname, COUNT(*) FROM YourTable GROUP BY Forename, Surname
I think this should work:
SELECT Forename, Surname, COUNT(1) AS Num
FROM T
GROUP BY Forename, Surname