Please refer to this background question.
After constructing this COUNT, how would I then link each of these 'Prices' to, for instance, a column called 'Genre' in TableTwo?
e.g.
Table1: Prices, ID
Table2: Genre, ID
Example output:
PRICES, COUNT, Genre
--------------------
13.99, 2, Horror
52.00, 3, Comedy
1.99, 1, Romance
I should hope this question is easy to follow however I will try to elaborate further on request! Cheers!
EDIT:
Yes, this is a much simpler version of what I'm trying to do. As said in the previous question, I have this field name that I want to count the instances of. Now that I have this answer(from the previous question), I now want to link this to another table that I have(to help me analyse some data a little better).
For sake of example, let's say we have a Blockbuster branch that has 2 suppliers. In TableOne I have 'Title'. I have now listed each unique value from Title and counted each one (So in the store I have a unique title called 'Dead Man's Shoes' and there is 10 copies. However, I also have a unique title called 'Touch Of Evil' and because this is more popular, there is 100 copies. I now want to see which supplier these two come from (From TableTwo). Therefore
Example output:
Title, Count, Supplier
------------------------------------
Dead Man's Shoes, 10, Supplier1
Touch Of Evil, 100, Supplier2
Does that help any better?
SELECT t1.Prices, COUNT(t1.ID) AS TheCount, t2.Genre
FROM Table1 AS t1
INNER JOIN Table2 AS t2
ON t1.ID = t2.ID
GROUP BY t1.Prices, t2.Genre
You have to user JOIN function.
Your query would look something like
SELECT * FROM prices JOIN genres ON ( prices.id = genres.id )
and the result would be what you desire. :)
More on this subject here.
Related
Dears,
I am using MS Access and need to figure out the following:
Let's say I have a query with 2 tables (inner join).
Table1:
ITEM_ID, ITEM_NAME.
Table2:
ITEM_ID, CATEGORY_ID.
Multiple CATEGORY_IDs are assigned to one ITEM_ID.
The data set is really huge; very prone to performance issues.
I would need to extract all ITEM_IDs which have not a specific category ID - say "003" - assigned.
I am not able to use criteria for field CATEGORY_ID Not Like "003" as the data set is so huge and the query would like to extract all the ITEM_ID + ITEM_NAME which have other CATEGORY_ID values (say "001, 002, 004-999") = performance issues.
Is there way to do that? I need to identify materials on stock which are not assigned with category 003.
Please let me know, your help will be highly appreciated.
"n00b alert" Be patient with me pls as I am a true beginner in MS Access queries creation.
Thank you, Petr J.
You would use not exists:
select i.*
from items as i
where not exists (select 1
from categories as c
where c.item_id = i.item_id and c.category_id = "003"
);
For performance, you want an index on categories(item_id, category_id).
You might want to try a left join:
SELECT t1.ITEM_ID, t1.ITEM_NAME
FROM Table1 AS t1
LEFT JOIN (SELECT ITEM_ID FROM Table2 WHERE CATEGORY_ID = "003") AS t2
ON t1.ITEM_ID = t2.ITEM_ID
WHERE t2.ITEM_ID Is Null;
I suggest to create an index on CATEGORY_ID, and, if no referential integrity is in place, also on ITEM_ID (both Table2).
Suppose, I have an SQL problem like the following:
I have written the following SQL query:
SELECT BARS.bar
FROM SELLS JOIN
BARS
ON BARS.bar = SELLS.bar JOIN
DRINKS
ON DRINKS.drink = SELLS.drink
WHERE BARS.address = 'Nowowiejska' AND
DRINKS.type = 'Mineral Water' AND
SELLS.price < 3
But, after running the query against a real database, implemented in MS SQL Server, I found that, there are some duplicate bar-names. So, I fixed my query using DISTINCT. But, it was not possible for me to realize the duplication before actually implementing the query in the actual database.
My question is, How can I realize that I need to use DISTINCT in my query?
Generally speaking, you just need to be aware of the cardinality of the relationships between your tables. In your example, if you want a result set that contains at most one record per bar, then you need to be aware that joining the BARS table to any other table that may contain multiple records for a single bar (e.g. SELLS) can also potentially produce multiple records for the same bar in your result set.
That said, I strongly agree with Gordon Linoff's comment on your question: if you structure your joins properly, I suspect that you will almost never have to use DISTINCT. I write a fair amount of SQL and I use DISTINCT so rarely that when I see it, I will typically review the query carefully to see whether it's really needed or whether it was used as a "hack" to cover up some incorrect joins.
There's a thing called a semi-join that's useful for problems like the one you're working on: where you want to query some table (SELLS) to see whether some particular data is present but don't actually need to return it. This is implemented in SQL Server by the keyword EXISTS. Here's an example of how you could use it for your problem:
-- Sample data from the question:
declare #Bars table (Bar varchar(32), [Address] varchar(32));
declare #Drinks table (Drink varchar(32), [Type] varchar(32));
declare #Sells table (Bar varchar(32), Drink varchar(32), Price money);
insert #Bars values ('A', 'Nowowiejska'), ('B', 'Oak Creek'), ('C', 'Greenfield');
insert #Drinks values ('San Pellegrino', 'Mineral Water');
insert #Sells values ('B', 'San Pellegrino', 2.99), ('C', 'San Pellegrino', 3.50);
-- List bars whose address is Nowowiejska or which sell mineral water for < $3.
select
B.Bar
from
#Bars B
where
B.[Address] = 'Nowowiejska' or
exists
(
select 1
from
#Drinks D
inner join #Sells S on D.Drink = S.Drink
where
S.Bar = B.Bar and
D.[Type] = 'Mineral Water' and
S.Price < 3
);
You can read an excellent introduction to joins here and more about EXISTS here.
To know if you need to use DISTINCT, you need to know if your joins will produce duplicates, which means you need to understand how they work.
First, you need to read the question more carefully. It is asking for a bars that are on 'Nowowiejska' street AND bars that sell 'Mineral Water' for < 3. Since you are only using AND in your query, you will only get bars that are on 'Nowowiejska' street AND Sell 'Mineral Water' for < 3.
Here is what your query should look like:
SELECT DISTINCT Sells.bar
FROM Sells
LEFT OUTER JOIN Bars
ON Sells.bar = Bars.bar
LEFT OUTER JOIN DRINKS
ON Sells.drink = Drinks.drink
WHERE Bars.address = 'Nowowiejska'
OR
(
Drinks.type = 'Mineral Water'
AND
Sells.price < 3
)
Note the structure of the where block - which will allow BOTH bars that are on 'Nowowiejska' street AND bars that sell 'Mineral Water' for < 3.
Since it is possible for a bar to have an address of 'Nowowiejska' AND to have a 'Mineral Water' drink for less than 3, you need to allow for BOTH possibilities. By using left outer joins, you will get all bars, all addresses and all drink types and prices. The where clause then filters your result set to the desired criteria. Finally, DISTINCT ensures that when a bar matches both joins, you will only get it once.
In short, use DISTINCT when it's possible that a "hit" will either match multiple criteria that are separated by OR - or when it's possible that a "hit" will match multiple records in one of the joining tables. Since one bar can't have multiple stored addresses on one street - and since one bar can't have multiple identical drinks stored (and if either of these is true, you should immediately fire your DBA and/or developers), you won't get multiple records from the individual joins, themselves. However, it is entirely possible that a bar can be on the desired street and offer the desired drink for less than the desired price - and you don't want those bars returned twice.
I hope this helps and please feel free to comment if you need clarification.
EDIT
It is also possible to simply combine the two (essentially separate) queries with a union. I would recommend against doing this, since it's better to consolidate queries when possible, but I thought including this might help you better understand how the joins work.
SELECT Sells.bar
FROM Sells
JOIN Bars
ON Sells.bar = Bars.bar
WHERE Bars.address = 'Nowowiejska'
UNION
SELECT Sells.bar
FROM Sells
JOIN Drinks
ON Sells.bar = Drinks.bar
WHERE Drinks.type = 'Mineral Water'
AND Sells.price < 3
Note that using
UNION ALL
preserves duplicates, while using
UNION
does not.
The way to know if you require distinct in your query, which IMO is not common, is to understand what constrains the rows in the tables to be unique, and following from that what effect your joins will have relative to that uniqueness.
Example: if I select bars from a bars table and the table constrains them to be unique, then by definition I never need DISTINCT for that select.
However, if I join that set to another table, then the join logic enters into the problem, and I have to understand the effect of the join on how many values are generated.
Lastly, separate the idea of an actual join ( tablea inner join tableb on ... ) from an existence check aka semi-join ( from tablea where exists ( select * from tableb ...) ). It's very common for people starting out to write an inner join, which fetches ALL the matches, where perhaps they only needed to check if rows exist, which does not. If you rely on inner join for this, then you will get more rows than you probably need, and may end up with DISTINCT as a workaround - though EXISTS would be better performing and also eliminate the need for DISTINCT in the first place.
Example, bars that sell mineral water might be something like bars where exists ( select * from drinks ... where <some criteria> )
Aside: count() is also a poor substitute for exists(), in many cases, just to test if there are any matching rows.
My question is, How can I realize that I need to use DISTINCT in my query?
IF EXISTS(
SELECT BARS.bar
FROM SELLS, BARS, DRINKS
WHERE BARS.bar = SELLS.bar
AND DRINKS.drink = SELLS.drink
AND BARS.address = 'Nowowiejska'
AND DRINKS.type = 'Mineral Water'
AND SELLS.price < 3
GROUP BY BARS.bar
HAVING COUNT(*) > 1
)
SELECT DISTINCT_OR_NOT_DISTINCT ='You need DISTINCT here'
ELSE
SELECT DISTINCT_OR_NOT_DISTINCT ='You dont need DISTINCT here'
my question it seems to be pretty simple but unfortunately i couldn't find any satisfactory answer for this.
Please take a look on the following example:
Table Author define a author, Document has at least one author and table Authors group all authors that a document has.
My question is: Each time that i insert a document i need to verify if my group of authors exists already on table Authors.
What is the best way to do this?
I suppose that in first place i will need to verify if this group already exist and then, i should get that id (if already exist) or generate a new record on Authors (if group doesn't exist).
Question:
Is this the correct logic process that should occur on tables that has a multiplicity of many to many?
How can i check if a group of values already exist on table Authors??
There is something like this select * from Authors where a_id IN(1,2,3) but in a exclusive way. :S
Any help would be great.
Thanks
I would rather go with a solution with three tables:
Author
Document
rel_author_document
And rel_author_document will have a structure like:
author_fk
document_fk
You don't need to add a new group, but just to associate authors to a document.
In rel_author_document you can even add additional columns like role, if you need something like that.
With this approach you can have two documents with the same group of authors, but this won't kill you performances.
In case your question is for a homework assignment and you can't change table structure then:
Query the Authors table to see if you have a group of Author with the same number of author_id in the where condition, something like:
select count(1) cnt
from Authors
where a_id IN(1,2,3)
If cnt is different from the number of author_ids, then insert a new group
If cnt is equal then get the id of that group of Authors:
select ID
from Authors
where a_id = 1
or a_id = 2
or a_id = 3
group by 1
having count(1) = 3
If its a "many-to-many" relationship I believe the table structure you are looking for is:
Author (
a_id,
name,
birth_date,
domain)
Document (
d_id,
abstract,
year)
Auth_Doc (
a_id,
d_id
)
In terms of checking if there is already a group of authors, it depends n how you're displaying your data. If I was uploading a document and I wanted to check if a set of authors exist, I would do the following:
Say your data is stored as:
a_id | name
1 | john, jack, bob
SELECT * FROM Author WHERE name like '%john%'
I hope this helps,
Sohail
I have one table called gallery. For each row in gallery there are several rows in the table picture. One picture belongs to one gallery. Then there is the table vote. There each row is an upvote or a downvote for a certain gallery.
Here is the (simplified) structure:
gallery ( gallery_id )
picture ( picture_id, picture_gallery_ref )
vote ( vote_id, vote_value, vote_gallery_ref )
Now I want one query to give me the following information: All galleries with their own data fields and the number of pictures that are connected to the gallery and the sumarized value of the votes.
Here is my query, but due to the multiple joining the aggregated values are not the right ones. (At least when there is more than one row of either pictures or votes.)
SELECT
*, SUM( vote_value ) as score, COUNT( picture_id ) AS pictures
FROM
gallery
LEFT JOIN
vote
ON gallery_id = vote_gallery_ref
LEFT JOIN
picture
ON gallery_id = picture_gallery_ref
GROUP BY gallery_id
Because I have noticed that COUNT( DISTINCT picture_id ) gives me the correct number of pictures I tried this:
( SUM( vote_value ) / GREATEST( COUNT( DISTINCT picture_id ), 1 ) ) AS score
It works in this example, but what if there were more joins in one query?
Just want to know whether there is a better or more 'elegant' way this problem can be solved. Also I'd like to know whether my solution is MySQL-specific or standard SQL?
This quote from William of Okham applies here:
Enita non sunt multiplicanda praeter necessitatem
(Latin for "entities are not to be multiplied beyond necessity").
You should reconsider why do you need this to be done in a single query? It's true that a single query has less overhead than multiple queries, but if the nature of that single query becomes too complex, both for you to develop, and for the RDBMS to execute, then run separate queries.
Or just use subqueries...
I don't know if this is valid MySQL syntax, but you might be able to do something similar to:
SELECT
gallery.*, a.score, b.pictures
LEFT JOIN
(
select vote_gallery_ref, sum(vote_value) as score
from vote
group by vote_gallery_ref
) a ON gallery_id = vote_gallery_ref
LEFT JOIN
(
select picture_gallery_ref, count(picture_id) as pictures
from picture
group by picture_gallery_ref
) b ON gallery_id = picture_gallery_ref
How often do you add/change vote records?
How often do you add/remove picture records?
How often do you run this query for these totals?
It might be better to create total fields on the gallery table (total_pictures, total_votes, total_vote_values).
When you add or remove a record on the picture table you also update the total on the gallery table. This could be done using triggers on the picture table to automatically update the gallery table. It could also be done using a transaction combining two SQL statements to update the picture table and the gallery table. When you add a record on the picture table increment the total_pictures field on the gallery table. When you delete a record on the picture table decrement the total_pictures field.
Similary when a vote record is added or removed or the vote_value changes you update the total_votes and total_vote_values fields. Adding a record increments the total_votes field and adds vote_values to total_vote_values. Deleting a record decrements the total_votes field and subtracts vote_values from total_vote_values. Updating vote_values on a vote record should also update total_vote_values with the difference (subtract old value, add new value).
Your query now becomes trivial - it's just a straightforward query from the gallery table. But this is at the expense of more complex updates to the picture and vote tables.
As Bill Karwin said, doing this all within one query is pretty ugly.
But, if you have to do it, joining and selecting non-aggregate data with aggregate data requires joining against subqueries (I haven't used SQL that much in the past few years so I actually forgot the proper term for this).
Let's assume your gallery table has additional fields name and state:
select g.gallery_id, g.name, g.state, i.num_pictures, j.sum_vote_values
from gallery g
inner join (
select g.gallery_id, count(p.picture_id) as 'num_pictures'
from gallery g
left join picture p on g.gallery_id = p.picture_gallery_ref
group by g.gallery_id) as i on g.gallery_id = i.gallery_id
left join (
select g.gallery_id, sum(v.vote_value) as 'sum_vote_values'
from gallery g
left join vote v on g.gallery_id = v.vote_gallery_ref
group by g.gallery_id
) as j on g.gallery_id = j.gallery_id
This will yield a result set that looks like:
gallery_id, name, state, num_pictures, sum_vote_values
1, 'Gallery A', 'NJ', 4, 19
2, 'Gallery B', 'NY', 3, 32
3, 'Empty gallery', 'CT', 0,
I'm creating a small forum.
Attempting to run SElECT... JOIN... query too pick up information on the individual posts, plus the last reply (if any). As part of my desire to do everything the hard way, this covers five tables (only columns revelant to this issue are being stated)
commentInfo referenceID | referenceType | authorID | create
postit id | title
postitInfo referencePostitID | create | authorID
user id | username | permission
userInfo referenceUserID | title
So, I run this query SELECT... JOIN... query to get the most recent topics and their last replies.
SELECT DISTINCT
t1.id, t1.title, t2.create, t2.lastEdit, t2.authorID, t3.username,
t4.title AS userTitle, t3.permission, t5.create AS commentCreate,
t5.authorID AS commentAuthor, t6.username AS commentUsername,
t6.permission AS commentPermission
FROM rantPostit AS t1
LEFT JOIN (rantPostitInfo AS t2)
ON ( t1.id = t2.referencePostitID)
LEFT OUTER JOIN (rantUser as t3, rantUserInfo as t4)
ON (t2.authorId = t3.id AND t4.referenceUserId = t2.authorId)
LEFT OUTER JOIN (rantCommentInfo as t5, rantUser as t6)
ON (t5.referenceType = 8 AND t5.referenceID = t1.id AND t6.id = t5.authorID)
ORDER BY t2.create DESC, t5.create DESC
Now, this returns the topic posts. Say I have two of them, it returns both of them fine. Say I have eight replies to the first, it will return 9 entries (one each for the topic + reply, and the individual one with no replies). So, I guess my issue is this: I don't know what to do to limit the number of returns in the final LEFT OUTER JOIN clause to just the most recent, or simply strike the least recent ones out of the window.
(Yes, I realize the ORDER BY... clause is messed up, as it'll first order it by the post create date, then by the comment create date. Yes, I realize I could simplify all my problems by adding two fields into postitInfo, lastCommentCreate and lastCommentCreateID, and have it update each time a reply is made, but... I like the hard way.)
So what am I doing wrong?
Or is this such an inane problem that I should be taken 'round the woodshed and beat with a hammer?
The splits between post and postInfo, and the user and userInfo tables, appear to be doing nothing much here except obfuscate things. To better see solutions, let's boil things down to their essence: a table Posts (with a primary key id, a creation date date, and other fields) and a table Comments (with a primary key id, a foreign key refId referencing Posts, a unique creation date date, and other fields); we want to see all posts, each with its most recent comment if any (the primary keys id of the table rows retrieved, and the other fields, can of course be contextually used in the SELECT to fetch and show more info yet, but that doesn't change the core structure, and simplifying things down to the core structure should help illustrate the solutions). I'm assuming the creation date of a comment is unique, otherwise "latest comment" can be ambiguous (of course, that ambiguity could be arbitrarily truncated in other ways, picking one item of the set of "latest comments" to a given post).
So, here's one approach:
SELECT Posts.id, Comments.id FROM Posts
LEFT OUTER JOIN Comments on (Posts.id = Comments.refId)
WHERE Comments.create IS NULL OR (
Comments.create = (SELECT create FROM Comments
WHERE refID = Posts.id
ORDER BY create DESC
LIMIT 1)
) /* add ORDER BY &c to taste;-) */
the idea: for each post, we want "a null comment" (when there have been no comment to it) or else the comment whose create date is the highest among those referencing the post; here, the inner SELECT takes care of finding that "highest" create date. So, in the same spirit, the inner select might be SELECT MAX(create) FROM Comments WHERE refID = Posts.id which is probably preferable (as shorter and more direct, & maybe faster).
It looks like the last LEFT JOIN is the only one that can return multiple rows. If that's true, you can just use LIMIT 5 to get the last five comments:
ORDER BY t5.create DESC
LIMIT 5
If not, a very simple solution would be to retrieve the comments with a separate query:
SELECT *
FROM rantCommentInfo t5
ON t5.referenceType = 8
AND t5.referenceid = t1.id
LEFT OUTER JOIN rantUser t6
ON t6.id = t5.authorID
ORDER BY CommentCreate
WHERE t5.referenceid = YourT1Id
LIMIT 5
Can't think of a way to do it in one query, without ROW_NUMBER, which MySQL does not support.