I have a question on which is a better method in terms of speed.
I have a database with 2 tables that looks like this:
Table2
UniqueID Price
1 100
2 200
3 300
4 400
5 500
Table1
UniqueID User
1 Tom
2 Tom
3 Jerry
4 Jerry
5 Jerry
I would like to get the max price for each user, and I am now faced with 2 choices:
Use Max or using Inner Join suggested in the following post:Getting max value from rows and joining to another table
Which method is more efficient?
The answer to your question is to try both methods, and see which performs faster on your data in your environment. Unless you have a large amount of data, the difference is probably not important.
In this case, the traditional method of group by is probably better:
select u.user, max(p.price)
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
group by u.user;
For such a query, you want an index on table2(uniqueid, price), and perhaps on table1(uniqueid, user) as well. This depends on the database engine.
Instead of a join, I would suggest not exists:
select u.user, p.price
from table1 u join
table2 p
on u.uniqueid = p.uniqueid
where not exists (select 1
from table1 u2 join
table2 p2
on u2.uniqueid = p2.uniqueid
where p2.price > p.price
);
Do note that these do not do exactly the same things. The first will return one row per user, no matter what. This version can return multiple rows, if there are multiple rows with the same price. On the other hand, it can return other columns from the rows with the maximum price, which is convenient.
Because your data structure requires a join in the subquery, I think you should stick with the group by approach.
Related
I am trying to select some data from different tables using join.
First, here is my SQL (MS) query:
SELECT Polls.pollID,
Members.membername,
Polls.polltitle, (SELECT COUNT(*) FROM PollChoices WHERE pollID=Polls.pollID) AS 'choices',
(SELECT COUNT(*) FROM PollVotes WHERE PollVotes.pollChoiceID = PollChoices.pollChoicesID) AS 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID;
And the tables involved in this query is here:
The query returns this result:
pollID | membername | polltitle | choices | votes
---------+------------+-----------+---------+-------
10000036 | TestName | Test Title| 2 | 0
10000036 | TestName | Test Title| 2 | 1
Any help will be greatly appreciated.
Your INNER JOIN with PollChoices is bringing in more than 1 row for a given poll as there are 2 choices for the poll 10000036 as indicated by choices column.
You can change the query to use GROUP BY and get the counts.
In case you don't have entries for each member in the PollVotes or Polls table, you need to use LEFT JOIN
SELECT Polls.pollID,
Members.membername,
Polls.polltitle,
COUNT(PollChoices.pollID) as 'choices',
COUNT(PollVotes.pollvoteId) as 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID
INNER JOIN PollVotes
ON PollVotes.pollChoiceID = PollChoices.pollChoicesID
AND PollVotes.memberID = Members.memberID
GROUP BY Polls.pollID,
Members.membername,
Polls.polltitle
You are getting 1 row for each PollChoices record since there are multiple choices per Polls INNER JOIN Members. You may be expecting the SELECT COUNT(*) sub-queries to act as a GROUP BY clause, but they don't.
If that doesn't make sense, add a bare minimum of sample data and the expected result and we can help more.
This query result is telling you the number of votes per choice in each poll.
In your example, this voter named TestName answered the poll (with ID 10000036) and gave one choice 1 vote, and the second choice 0 votes. This is why you are getting two rows in your result.
I'm not sure if you are expecting just one row because you didn't specify what data, exactly, you are trying to select. However if you are trying to see the number of votes that TestName has submitted, for each choice where the vote was greater than 1, then you will have to modify your query like this:
select * from
(SELECT Polls.pollID,
Members.membername,
Polls.polltitle, (SELECT COUNT(*) FROM PollChoices WHERE pollID=Polls.pollID) AS 'choices',
(SELECT COUNT(*) FROM PollVotes WHERE PollVotes.pollChoiceID = PollChoices.pollChoicesID) AS 'votes'
FROM Polls
INNER JOIN Members
ON Polls.memberID = Members.memberID
INNER JOIN PollChoices
ON PollChoices.pollID = Polls.pollID) as mysubquery where votes <> 0;
Let's say there are 2 or more tables.
Table A: aID, name, birthday
Table B: bID, petType, petName
Table C: cID, stackOverFlowUsername
I want to get something like aID, name, birthday, number of cats a person has, stack overflow's username
We can
use joins to join all 3 tables select * from tableA... tableB... tableC...
use multiple select statements, select a.*, (select count(*) from tableB where petType = 'cat') as numberOfCats, (select...) as stackUsername from tableA a
or other ways that I didn't know
My question is when is the right situation to use select, joins or is there even better methods?
Update:
Here is another question. If I have 3 stackoverflow accounts, Tom has 1 and Peter has 2,
using
A left join B left join C
will return a total of 6 rows
select a.*, select count(*) from tableB where..., select top 1 stackOverFlowUsername from tableC
returns 3 rows because there are 3 person
Can I use joins to achieve something similar if I only want one row of data for each person in tableA regardless how many stackoverflow accounts he/she has?
Thanks
A selected sub-select (case 2) might be scanned for every result row, while joined tables/views/subselects are calculates only once: saving memory and joining time (with pre-built indices). Once you are used to talking SQL, you will find that the JOIN syntax is many times easier to read.
I have two tables
Table A
type_uid, allowed_type_uid
9,1
9,2
9,4
1,1
1,2
24,1
25,3
Table B
type_uid
1
2
From table A I need to return
9
1
Using a WHERE IN clause I can return
9
1
24
SELECT
TableA.type_uid
FROM
TableA
INNER JOIN
TableB
ON TableA.allowed_type_uid = TableB.type_uid
GROUP BY
TableA.type_uid
HAVING
COUNT(distinct TableB.type_uid) = (SELECT COUNT(distinct type_uid) FROM TableB)
Join the two tables togeter, so that you only have the records matching the types you are interested in.
Group the result set by TableA.type_uid.
Check that each group has the same number of allowed_type_uid values as exist in TableB.type_uid.
distinct is required only if there can be duplicate records in either table. If both tables are know to only have unique values, the distinct can be removed.
It should also be noted that as TableA grows in size, this type of query will quickly degrade in performance. This is because indexes are not actually much help here.
It can still be a useful structure, but not one where I'd recommend running the queries in real-time. Rather use it to create another persisted/cached result set, and use this only to refresh those results as/when needed.
Or a slightly cheaper version (resource wise):
SELECT
Data.type_uid
FROM
A AS Data
CROSS JOIN
B
LEFT JOIN
A
ON Data.type_uid = A.type_uid AND B.type_uid = A.allowed_type_uid
GROUP BY
Data.type_uid
HAVING
MIN(ISNULL(A.allowed_type_uid,-999)) != -999
Your explanation is not very clear. I think you want to get those type_uid's from table A where for all records in table B there is a matching A.Allowed_type_uid.
SELECT T2.type_uid
FROM (SELECT COUNT(*) as AllAllowedTypes FROM #B) as T1,
(SELECT #A.type_uid, COUNT(*) as AllowedTypes
FROM #A
INNER JOIN #B ON
#A.allowed_type_uid = #B.type_uid
GROUP BY #A.type_uid
) as T2
WHERE T1.AllAllowedTypes = T2.AllowedTypes
(Dems, you were faster than me :) )
Here's what I have:
Person
name = varchar
Helmet
person = foreignkey -> Person
is_safe = boolean
Now, for a batch job, I need to query (no ORM, just raw SQL) for all Person that have 0 Helmet that are safe. I could obviously just loop through each Person in the database, etc., but I need to do this in a single query and limit it to 100 at a time (there are novemdecillions of these suckers in the database), and remove each Person. I don't need the Helmet records for each to be attached in the result. I only need the Person records (naturally deleting will cascade), but I can't simply issue a DELETE in place of my SELECT because there are things I need to do elsewhere before deleting them.
I'm using Postgres, but I'd prefer to use a query that's more or less DB agnostic, if possible.
Here's what I've abstractly come up with:
SELECT * FROM person
WHERE (SELECT COUNT(*) FROM helmet
WHERE person_id = person.id AND is_safe = false) = 0
LIMIT 100
This is clearly not valid SQL, but I'm hoping there is a functionally equivalent, but valid version.
select *
from person
where person_id not in
(
select person_id
from helmet
where is_safe = false
)
SELECT *
FROM (
SELECT p.*
FROM Person p
INNER JOIN Helmet h ON p.id = h.person
GROUP BY p.id
HAVING SUM(h.is_safe) = 0
) inner_select
LIMIT 100
So, this query consists of two parts:
The workhorse of the query is the inner query. This query joins together each person with all his helmets. It then uses GROUP BY two relate all rows for a specific person together. Once we have a group, we can use aggregate functions on each group, and in this case we use SUM to count the number of helmets that are safe. The SUM is used by the HAVING-clause to only select groups that have the SUM of safe helmets (i.e the number of safe helmets) equal to zero.
The outer query ensures that the LIMIT is applied to the result of the inner query, and not the rows of the tables needed to calculate an accurate result.
SELECT person.*
FROM person
LEFT JOIN (SELECT DISTINCT person_id FROM helmet) AS T2 ON person.id = T2.person_id
WHERE T2.person_id IS NULL
LIMIT 100
I ended up using:
SELECT *
FROM person p
WHERE NOT EXISTS
(
SELECT h.person_id
FROM helmet h
WHERE h.person_id = p.id
AND is_safe = true
)
LIMIT 100
which turns out to stop scanning the table once it finds 100 results that match.
I have two tables that I would like to join but I am getting an error from MySQL
Table: books
bookTagNum ShelfTagNum
book1 1
book2 2
book3 2
Table: shelf
shelfNum shelfTagNum
1 shelf1
2 shelf2
I want my results to be:
bookTagNum ShelfTagNum shelfNum
book1 shelf1 1
book2 shelf2 2
book3 shelf2 2
but instead I am also getting an extra result:
book1 shelf2 2
I think my query is doing a cross product instead of a join:
SELECT `books`.`bookTagNum` , `books`.`shelfNum` , `shelf`.`shelfTagNum` , `books`.`title`
FROM books, shelf
where `books`.`shelfNum`=`books`.`shelfNum`
ORDER BY `shelf`.`shelfTagNum` ASC
LIMIT 0 , 30
What am I doing wrong?
I think you want
where `books`.`shelfTagNum`=`shelf`.`shelfNum`
In order to match rows from the books and shelf tables, you need to have terms from each in your where clause - otherwise, you're just performing a no-operation check on the rows of books, since every row's shelfNum will be equal to its shelfNum.
As #fixme.myopenid.com suggests, you could also go the explicit JOIN route, but it's not necessary.
if you want to be sure you're doing a join instead of a cross product, you should state it explicitly in the SQL, thus:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books INNER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return only those rows which exist in both tables), or:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books LEFT OUTER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return all rows from books), or:
SELECT books.bookTagNum,books.shelfNum, shelf.shelfTagNum, books.title
FROM books RIGHT OUTER JOIN shelf ON books.shelfNum = shelf.shelfTagNum
ORDER BY shelf.shelfTagNum
(which will return all rows from shelf)
FYI: If you rewrite your names to be consistent, things get a lot easier to read.
Table 1: Book
BookID ShelfID BookName
1 1 book1
2 2 book2
3 2 book3
Table 2: Shelf
ShelfID ShelfName
1 shelf1
2 shelf2
now, a query to extract books to shelves is
SELECT
b.BookName,
s.ShelfName
FROM
Book b
JOIN Shelf s ON s.ShelfID = b.ShelfID
To answer the original question:
> where `books`.`shelfNum`=`books`.`shelfNum`
> ^^^^^--------------^^^^^------------- books repeated - this is an error
the WHERE clause, as written, does nothing, and because your where clause isn't limiting any rows, you are indeed getting the cross product.
Check your SQL. Your where clause cannot possibly be books.shelfNum=books.shelfNum
And what are all those single quotes for?
Try this:
SELECT `books`.`bookTagNum` , `books`.`shelfNum` , `shelf`.`shelfTagNum` ,
`books`.`title`
FROM books, shelf
where `books`.`shelftagNum`=`shelf`.`shelfNum`
ORDER BY `shelf`.`shelfTagNum` ASC
LIMIT 0 , 30
Because the implicit JOIN condition was not properly stated the result was a cross product.
As others have mentioned, the problem you faced was with your ON condition. To specifically answer your question:
In MySQL, if you omit a JOIN, an INNER JOIN/CROSS JOIN is used. For other databases, it is different. For example, PostgreSQL uses a CROSS JOIN, not an INNER JOIN.
Re: http://dev.mysql.com/doc/refman/5.7/en/join.html
"In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise."