SQL query doesn't retrieve correct result with count - sql

I have these tables
Actor: id | name
Acting: actor_id| movie|id
Movie: id | title
I have the code which returns how many the number of movies that an actor has acted in
SELECT a.name AS name, COUNT(ag.actor_id)
FROM actor a
LEFT JOIN acting ag ON a.id = ag.actor_id
GROUP BY a.id, a.name
ORDER BY COUNT(*) ASC
LIMIT 10;
name | count
--------------------------+-------
Bianca Brigitte VanDamme | 1
Karin Konoval | 1
Keri Maletto | 1
Terence Bernie Hines | 1
Jean Stapleton | 1
Kyle Hebert | 1
Brandon Middleton | 1
Timothy Webber | 1
Dana Hanna | 1
Travis Betz | 1
After inserting a random actor to the actors table, I run the same code but the output does not have the new actor.
INSERT INTO Actor
VALUES (5000, 'Jeremy Bearimy')
-- Run same code
SELECT a.name ........
FROM ...
I get this result:
name | count
--------------------------+-------
Bianca Brigitte VanDamme | 1
Karin Konoval | 1
Keri Maletto | 1
Terence Bernie Hines | 1
Jean Stapleton | 1
Kyle Hebert | 1
Brandon Middleton | 1
Timothy Webber | 1
Dana Hanna | 1
Travis Betz | 1
When I run the query with the code to see which actor has acted in 0 movies, I get a result, so I don't know why they don't appear in the query result.
SELECT a.name AS name, COUNT(ag.actor_id)
FROM actor a
LEFT JOIN acting ag ON a.id = ag.actor_id
GROUP BY a.id, a.name
HAVING COUNT(ag.actor_id) = 0
ORDER BY COUNT(*) ASC
LIMIT 10;
Output:
name | count
----------------+-------
Jeremy Bearimy | 0

Note your order by clause in the first query:
ORDER BY COUNT(*) ASC
Actors that are in 1 movie will have a COUNT(*) of 1. I think that is obvious.
Actors that are in 0 movies will also have a COUNT(*) of 1. Why? Because there is one row in the group even if the columns from the second table are NULL.
You are then limiting to 10 results. There is no second ORDER BY key, so the 10 returned rows are an arbitrary mix, starting with the actors that are in 0 or 1 movies.
If you instead used:
ORDER BY COUNT(ag.actor_id) ASC
then the actors with zero movies would appear before those with 1 movie.

from the last output it looks like actor 'Jeremy Bearimy' is missing in the acting table

Your query returns the 10 first actor, and they all have 1 movie. You have at least 10 actors with 1 movie. The ones that don't have movies can't appear in the result.
Remove the limit 10 and you will see all the actors with a movie or not.

Related

In a query (no editing of tables) how do I join data without any similarities?

I Have a query that finds a table, here's an example one.
Name |Age |Hair |Happy | Sad |
Jon | 15 | Black |NULL | NULL|
Kyle | 18 |Blonde |YES |NULL |
Brad | 17 | Blue |NULL |YES |
Name and age come from one table in a database, hair color comes from a second which is joined, and happy and sad come from a third table.My goal would be to make the first line of the chart like this:
Name |Age |Hair |Happy |Sad |
Jon | 15 |Black |Yes |Yes |
Basically I want to get rid of the rows under the first and get the non NULL data joined to the right. The problem is that there is no column where the Yes values are on the Jon row, so I have no idea how to get them there. Any suggestions?
PS. With the data I am using I can't just put a 'YES' in the 'Jon' row and call it a day, I would need to find the specific value from the lower rows and somehow get that value in the boxes that are NULL.
Do you just want COALESCE()?
COALESCE(Happy, 'Yes') as happy
COALESCE() replaces a NULL value with another value.
If you want to join on a NULL value work with nested selects. The inner select gets an Id for NULLs, the outer select joins
select COALESCE(x.Happy, yn_table.description) as happy, ...
from
(select
t1.Happy,
CASE WHEN t1.Happy is null THEN 1 END as happy_id
from t1 ...) x
left join yn_table
on x.xhappy_id = yn_table.id
If you apply an ORDER BY to the query, you can then select the first row relative to this order with WHERE rownum = 1. If you don't apply an ORDER BY, then the order is random.
After reading your new comment...
the sense is that in my real data the yes under the other names will be a number of a piece of equipment. I want the numbers of the equipment in one row instead of having like 8 rows with only 4 ' yes' values and the rest null.
... I come to the conclusion that this a XY problem.
You are asking about a detail you think will solve your problem, instead of explaining the problem and asking how to solve it.
If you want to store several pieces of equipment per person, you need three tables.
You need a Person table, an Article table and a junction table relating articles to persons to equip them. Let's call this table Equipment.
Person
------
PersonId (Primary Key)
Name
optional attributes like age, hair color
Article
-------
ArticleId (Primary Key)
Description
optional attributes like weight, color etc.
Equipment
---------
PersonId (Primary Key, Foreign Key to table Person)
ArticleId (Primary Key, Foreign Key to table Article)
Quantity (optional, if each person can have only one of each article, we don't need this)
Let's say we have
Person: PersonId | Name
1 | Jon
2 | Kyle
3 | Brad
Article: ArticleId | Description
1 | Hat
2 | Bottle
3 | Bag
4 | Camera
5 | Shoes
Equipment: PersonId | ArticleId | Quantity
1 | 1 | 1
1 | 4 | 1
1 | 5 | 1
2 | 3 | 2
2 | 4 | 1
Now Jon has a hat, a camera and shoes. Kyle has 2 bags and one camera. Brad has nothing.
You can query the persons and their equipment like this
SELECT
p.PersonId, p.Name, a.ArticleId, a.Description AS Equipment, e.Quantity
FROM
Person p
LEFT JOIN Equipment e
ON p.PersonId = e.PersonId
LEFT JOIN Article a
ON e.ArticleId = a.ArticleId
ORDER BY p.Name, a.Description
The result will be
PersonId | Name | ArticleId | Equipment | Quantity
---------+------+-----------+-----------+---------
3 | Brad | NULL | NULL | NULL
1 | Jon | 4 | Camera | 1
1 | Jon | 1 | Hat | 1
1 | Jon | 5 | Shoes | 1
2 | Kyle | 3 | Bag | 2
2 | Kyle | 4 | Camera | 1
See example: http://sqlfiddle.com/#!4/7e05d/2/0
Since you tagged the question with the oracle tag, you could just use NVL(), which allows you to specify a value that would replace a NULL value in the column you select from.
Assuming that you want the 1st row because it contains the smallest age:
- wrap your query inside a CTE
- in another CTE get the 1st row of the query
- in another CTE get the max values of Happy and Sad of your query (for your sample data they both are 'YES')
- cross join the last 2 CTEs.
with
cte as (
<your query here>
),
firstrow as (
select name, age, hair from cte
order by age
fetch first row only
),
maxs as (
select max(happy) happy, max(sad) sad
from cte
)
select f.*, m.*
from firstrow f cross join maxs m
You can try this:
SELECT A.Name,
A.Age,
B.Hair,
C.Happy,
C.Sad
FROM A
INNER JOIN B
ON A.Name = B.Name
INNER JOIN C
ON A.Name = B.Name
(Assuming that Name is the key columns in the 3 tables)

Novice seeking help, Max Aggregate not returning expected results

I'm still very new to MS-SQL. I have a simple table and query that that is getting the best of me. I know it will something fundamental I'm overlooking.
I've changed the field names but the idea is the same.
So the idea is that every time someone signs up they get a RegID, Name, and Team. The names are unique, so for below yes John changed teams. And that's my trouble.
Football Table
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 101 | Bill | Blue |
| 102 | Tom | Green |
| 103 | John | Green |
+------------+----------+---------+
With the query at the bottom using the Max_RegID, I was expecting to get back only one record.
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 103 | John | Green |
+------------+----------+---------+
Instead I get back below, Which seems to include Max_RegID but also for each team. What am I doing wrong?
+------------+----------+---------+
| Max_RegID | Name | Team |
+------------+----------+---------+
| 100 | John | Red |
| 103 | John | Green |
+------------+----------+---------+
My Query
SELECT
Max(Football.RegID) AS Max_RegID,
Football.Name,
Football.Team
FROM
Football
GROUP BY
Football.RegID,
Football.Name,
Football.Team
EDIT* Removed the WHERE statement
The reason you're getting the results that you are is because of the way you have your GROUP BY clause structured.
When you're using any aggregate function, MAX(X), SUM(X), COUNT(X), or what have you, you're telling the SQL engine that you want the aggregate value of column X for each unique combination of the columns listed in the GROUP BY clause.
In your query as written, you're grouping by all three of the columns in the table, telling the SQL engine that each tuple is unique. Therefore the query is returning ALL of the values, and you aren't actually getting the MAX of anything at all.
What you actually want in your results is the maximum RegID for each distinct value in the Name column and also the Team that goes along with that (RegID,Name) combination.
To accomplish that you need to find the MAX(ID) for each Name in an initial data set, and then use that list of RegIDs to add the values for Name and Team in a secondary data set.
Caveat (per comments from #HABO): This is premised on the assumption that RegID is a unique number (an IDENTITY column, value from a SEQUENCE, or something of that sort). If there are duplicate values, this will fail.
The most straight forward way to accomplish that is with a sub-query. The sub-query below gets your unique RegIDs, then joins to the original table to add the other values.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, gets the list of IDs
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
GROUP BY
f2.Name
) AS sq
ON
sq.Max_RegID = f.RegID;
EDIT: Sorry. I just re-read the question. To get just the single record for the MAX(RegID), just take the GROUP BY out of the sub-query, and you'll just get the current maximum value, which you can use to find the values in the rest of the columns.
SELECT
f.RegID
,f.Name
,f.Team
FROM
Football AS f
JOIN
(--The sub-query, sq, now gets the MAX ID
SELECT
MAX(f2.RegID) AS Max_RegID
FROM
Football AS f2
) AS sq
ON
sq.Max_RegID = f.RegID;
Use row_number()
select * from
(SELECT
Football.RegID AS Max_RegID,
Football.Name,
Football.Team, row_number() over(partition by name order by Football.RegID desc) as rn
FROM
Football
WHERE
Football.Name = 'John')a
where rn=1
simply you can edit your query below way
SELECT *
FROM
Football f
WHERE
f.Name = 'John' and
Max_RegID = (SELECT Max(Football.Max_RegID) where Football.Name = 'John'
)
or
if sql server simply use this
select top 1 * from Football f
where f.Name = 'John'
order by Max_RegID desc
or
if mysql then
select * from Football f
where f.Name = 'John'
order by Max_RegID desc
Limit 1
You need self join :
select f1.*
from Football f inner join
Football f1
on f1.name = f.name
where f.Max_RegID = 103;
After re-visit question, the sample data suggests me subquery :
select f.*
from Football f
where name = (select top (1) f1.name
from Football f1
order by f1.Max_RegID desc
);

How can I find all columns A whose subcategories B are all related to the same column C?

I'm trying to better understand relational algebra and am having trouble solving the following type of question:
Suppose there is a column A (Department), a column B (Employees) and a column C (Managers). How can I find all of the departments who only have one manager for all of their employees? An example is provided below:
Department | Employees | Managers
-------------+-------------+----------
A | John | Bob
A | Sue | Sam
B | Jim | Don
B | Alex | Don
C | Jason | Xie
C | Greg | Xie
In this table, the result I should get are all tuples containing departments B and C because all of their employees are managed by the same person (Don and Xie respectively). Department A however, would not be returned because it's employees have multiple managers.
Any help or pointers would be appreciated.
Such problems usually call for a self-join.
Joining the relation onto itself on Department, then filtering out the tuples where the Managers are equal would yield us all the unwanted tuples, which we can just subtract from the original relations.
Here's how I'd do it:
First we make a copy of table T, and call it T2, then take a cross product of T and T2. From the result we select all the rows where T1.Manager /= T2.Manager but T1.Department=T2.Department, yielding us these tuples:
T1.Department | T1.Employees| T1.Managers | T2.Managers | T2.Employees | T2.Department
--------------+-------------+-------------+-------------+--------------+--------------
A | John | Bob | Sam | Sue | A
A | Sue | Sam | Bob | John | A
Departments A and B aren't present because their T1.Manager always equals T2.Manager.
Then we just subtract this result the original set to get the answer.
If your RDBMS supports common table expressions:
with C as (
select department, manager, count(*) as cnt
from A
group by department, manager
),
B as (
select department, count(*) as cnt
from A group by department
)
select A.*
from A
join C on A.department = C.department
join B on A.department = B.department
where B.cnt = C.cnt;

Generating a hierarchy

I got the following question at a job interview and it completely stumped me, so I'm wondering if anybody out there can help explain it to me. Say I have the following table:
employees
--------------------------
id | name | reportsTo
--------------------------
1 | Alex | 2
2 | Bob | NULL
3 | Charlie | 5
4 | David | 2
5 | Edward | 8
6 | Frank | 2
7 | Gary | 8
8 | Harry | 2
9 | Ian | 8
The question was to write a SQL query that returned a table with a column for each employee's name and a column showing how many people are above that employee in the organization: i.e.,
hierarchy
--------------------------
name | hierarchyLevel
--------------------------
Alex | 1
Bob | 0
Charlie | 3
David | 1
Edward | 2
Frank | 1
Gary | 2
Harry | 1
Ian | 2
I can't even figure out where to begin writing this as a SQL query (a cursor, maybe?). Can anyone help me out in case I get asked a similar question to this again? Thanks.
The simplest example would be to use a (real or temporary) table, and add one level at a time (fiddle):
INSERT INTO hierarchy
SELECT id, name, 0
FROM employees
WHERE reportsTo IS NULL;
WHILE ((SELECT COUNT(1) FROM employees) <> (SELECT COUNT(1) FROM hierarchy))
BEGIN
INSERT INTO hierarchy
SELECT e.id, e.name, h.hierarchylevel + 1
FROM employees e
INNER JOIN hierarchy h ON e.reportsTo = h.id
AND NOT EXISTS(SELECT 1 FROM hierarchy hh WHERE hh.id = e.id)
END
Other solutions will be slightly different for each RDBMS. As one example, in SQL Server, you can use a recursive CTE to expand it (fiddle):
;WITH expanded AS
(
SELECT id, name, 0 AS level
FROM employees
WHERE reportsTo IS NULL
UNION ALL
SELECT e.id, e.name, level + 1 AS level
FROM expanded x
INNER JOIN employees e ON e.reportsTo = x.id
)
SELECT *
FROM expanded
ORDER BY id
Other solutions include recursive stored procedures, or even using dynamic SQL to iteratively increase the number of joins until everybody is accounted for.
Of course all these examples assume there are no cycles and everyone can be traced up the chain to a head honcho (reportsTo = NULL).

SQL Join the count to a query

I have created a database of my movies and another with my actors in each movie
Columns are:
ID
Actor
ImdbActorID
ImdbMovieID
Character
Example:
47105 | Howard McGillin | nm0569294 | tt0111333 | Adult Prince Derek
47106 | Michelle Nicastro | nm0629264 | tt0111333 | Adult Princess Odette
47108 | John Cleese | nm0000092 | tt0111333 | Jean-Bob
when my webapp queries a specific movie:
Select * from actors where ImdbMovieID='tt0111333'
I get that list. What my problem is I would like to add a column of the total movies I have of each actor. so i don't programatically have to run a query for each actor
I've thought of joining the same table to itself with the count??? but I don't know if that will even work. what stumps me is having that where clause.
Thanks everyone Stuart and Aaron.
Select ImdbActorID, Character,Actor,ImdbMovieID,cnt from Actors
Join (Select ImdbActorID as Act2, count(*) as cnt
from Actors group by ImdbActorID) as x on Actors.ImdbActorID=x.Act2
where ImdbMovieID='tt0111333'
order by cnt desc