SQL difference between IN and JOIN - sql

First I need to say that it is safe to assume that I have no formal education in SQL although I have education in relational algebra.
I am investigating what would be the best approach to the following problem.
Our database is holding texts and keywords for every text.
Articles
id | text
Keywords
id | word
Articles_keywords
id_article | id_keyword
For the sake of this question the provider of answer can assume that tables are indexed however one wants.
So the problem is getting all articles that have a specific keyword.
I have talked with 2 groups of people that solve this in 2 ways, and they both claim that the approach of other group is wrong.
First solution using the IN operator:
SELECT * FROM Articles AS a WHERE a.id IN
(SELECT id_article FROM Articles_Keywords AS ak WHERE ak.id_keyword IN
(SELECT id FROM keywords AS k WHERE k.word = 'xyz'));
Other solution is using JOIN operator of course:
SELECT * FROM Articles as a
JOIN Articles_Keywords as ak
ON a.id = ak.id_article
JOIN Keywords as k
ON k.id = ak.id_keyword
WHERE k.word = 'xyz';
Which approach is better and, above all, why?
Edit
In articles table we have an id column being unique and, just for the sake of this question we could assume that there are no duplicate texts.
The same thing goes for the keywords table.
In article_keywords table the ordered pair (id_article,id_keyword) is unique

Related

How to select ALL results from a SQL query and eliminate the null values

Imagine I have four tables:
Agents
| agent_id | agent_name |
Teams
| team_id | team_name |agent_id |
Menu
| menu_id | menu_name |
Team_assignment
| menu_id | team_id|
I need to write a query that selects all agents that are assigned to all teams and all queues and disregard the ones that are not assigned to a queue. Note that every agent is always assigned to a team but it's not necessary that the agent is assigned to a queue.
Since you stated that this is for a school project, I'll try to stay within the guidelines mentioned here: How do I ask and answer homework questions?
From what I can make up from your question you basically want to select all the data from the different tables joining them on one of the columns in the first table a equals = a column from the second table b. Most commonly where the primary key from one table equals the foreign key from another table. Then you want to add conditions to your query where for example some column from table 1 equals = some value.
Do you catch my drift? 😏
No?
You want to SELECT a.*, b.* everything FROM table Agents a JOINing table Teams b ON column a.agent_id being equal to = column b.agent_id
You probably want to JOIN another table, lets say Team_assignment c ON column c.team_id being equal to = b.team_id.
You can JOIN more tables in the same way.
Sadly, I do not understand what you mean by the ones that are not assigned to a queue but it sounds like a condition that your query needs to match, so WHERE the potential column a.is_assigned_to_queue equals = true AND for example a.agent_name IS NOT NULL
If you got this far you should have been able to catch onto my drift 😎, congrats. This way hopefully you also got a better understanding of how building query works, instead of me just blatantly giving you the answer and you learn nothing from it. Like this:
SELECT a.*, b.*, c.*, d.* FROM Agents a
JOIN Teams b ON a.agent_id = b.agent_id
JOIN Team_assignment c ON c.team_id = b.team_id
JOIN Menu d ON d.menu_id = c.menu_id
WHERE a.is_assigned_to_queue = true
AND a.agent_name IS NOT NULL;
Now it is possible copy and pasting the snippet above will not work, that is because I'm not an SQL expert and I had to refresh my old memories about SQL myself by googling it. But that's the nice part of actually learning it. Being able to explain it to someone else :)

Finding entries with all relations in a relational database table

I am making a relational database using tags. The database has three tables:
object
match
tag
where match is a simple relation between an object and a tag (i.e. each entry consists of a primary key and two foreign keys). I want to structure a query where I can find all objects with all given tags, but am uncertain how to do it.
For instance, these are the three tables:
Object
Death becomes her
Billy Madison
Tag
Comedy
Horror
Match
1 | 1
1 | 2
2 | 1
Given that someone wants a horror-comedy, how do I structure the query to find only the objects with all matches? I realize this is elementary but I genuinely haven't found any answers. If the whole schema is off naturally feel free to point that out.
For the record I'm using Python, SQLAlchemy, and SQLite. Currently I've made a list of all tag IDs to find in Match.
Edit: For any future reference, I used astentx' solution with a slight modification to the query in order to access data from object right away:
select object.Length, object.title
from object
join match
on object.id = match.object
join tag
on match.tag = tag.id
join filter_tags
on tag.name = filter_tags.word
You can pass all your tags as array and use Carray() function or as comma-separated string and transform it to table in this way, for example.
Then for AND condition select rows that have exactly the same tags as you've expected:
select relation.obj_id
from relation
join tags
on relation.tag_id = tags.id
join <generated table>
on tagsvalue = <generated table>.value
group by relation.obj_id
having count(1) = (select count(1) from <generated table>)
Fiddle here.

simple sql query containing 'in'-clause, checking for better syntax

I'm wondering about the best way to perform a SQL query.
I have a table which contains SUBJECTS which are related to ARTICLES (each article contains at least 2 SUBJECTS).
The user is searching for two SUBJECTS and I need to reply with all articles.
I have a table which looks like this:
SubjectID ---> key
ArticleID
My query is as follows:
SELECT ArticleID
FROM tblSubjectsInArticles
WHERE SubjectID = #pSubjectID1
AND ArticleID IN (SELECT ArticleID FROM tblSubjectsInArticles WHERE SubjectID = #pSubjectID2);
I got a feeling that there is a better way to perform this task, maybe an efficient query or a different data structure. Maybe one of you guys know a better way or you can reassure that this is the best way. Thanks.
select distinct s1.ArticleID
from tblSubjectsInArticles s1
join tblSubjectsInArticles s2
on s1.ArticleID = s2.ArticleID
where s1.SubjectID = #pSubjectID1
and s2.SubjectID = #pSubjectID2
So you would create a joined table on the ArticleIDs and filter on the related subjects. The distinct makes sure your list contains only unique ArticleIDs

What is the difference between implicit/explicit joins? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Explicit vs implicit SQL joins
I understand that lots of people will shout at me now. But from my understanding
Say I have two tables
STUDENTS
student_id
firstname
surname
COURSES
course_id
name
student_id
So the courses table has a foreign key STUDENT_ID meaning that ONE student can have MANY courses yes?
OKAY.
From my understanding, if I want to select all the course associated with ONE student I could do either these:
SELECT *
FROM courses AS c, students AS s
WHERE c.student_id = s.student_id
AND s.student_id = 1;
OR
SELECT *
FROM courses AS c
JOIN students AS s ON c.student_id = s.student_id AND s.student_id = 1;
So what's the point in the JOIN when its essentially EXACTLY the same as the WHERE?
I know my understanding is WRONG but I cannot find a simple answer.
Please enlighten me!
FOREIGN_KEY makes your life simple when you try to insert something by checking for the integrity of the data. It was never meant to help you while retrieving the data from the relations.
e.g. If you try to insert a student with course_id = 10 when no such course exists, then foreign key constraint wouldn't allow you to have such a student.
JOIN is exactly the same as using WHERE. Have a look at this question.
In short: there is no difference.
Longer explanation: the relational model is based on the "cartesian product". In the query
SELECT a.x , b.y
FROM table_a a, table_b b
;
, every possible combination of rows form table_a and table_b is produced. If a contains 10 rows, and b 100 rows, you would get 1000 rows. Everything you add to the WHERE-clause restricts these results to only the pairs of rows that satisfy the WHERE-clause. So in
SELECT a.x , b.y, ...
FROM table_a a, table_b b
WHERE a.x = b.y
;
you would get everything, except the rows for which `NOT (a.x = b.y)'
In practice, there are two kinds of WHERE-clause elements: those that relate two tables, and those that compare a column-expression to a constant. The JOIN-clause is a way to specify the first kind of restrictions.
There are some minor differences and complications (NULLs, outer joins), but for the time being the two constructs are equivalent.

Selecting only authors who have articles?

I've got two SQL Server tables authors, and articles where authors primary key (AuthorID) is a foreign key in the articles table to represent a simple one-to-many relationship between authors and articles table. Now here's the problem, I need to issue a full text search on the authors table based on the first name, last name, and biography columns. The full text search is working awesome and ranking and all. Now I need to add one more criteria to my search, I need all the non-articles contributors to be ignored from the search. To achieve that I chose to create a view with all the contributors that have articles and search against this view. So I created the view this way:
Create View vw_Contributors_With_Articles
AS
Select * from Authors
Where Authors.ContributorID
IN ( Select Distinct (Articles.ContributorId) From Articles)
It's working but I really don't like the subquery thing. The join gets me all the redundant authorIDs, tried distinct but didn't work with the biography column as it's type is ntext. Group by wouldn't do it for me because I need all the columns not any aggregate of them.
What do you think guys? How can I improve this?
An EXISTS allows for the potential duplicate entries when there are multiple articles per author:
Select * from Authors
Where EXISTS (SELECT *
FROM Articles
WHERE Articles.ContributorId = Authors.ContributorId)
Edit:
To clarify, you can not DISTINCT on ntext columns. So, you can not have a JOIN solution, unless you use a derived table on articles in the JOIN and avoid using articles directly. Or you convert the ntext to nvarchar(max).
EXISTS or IN is your only option.
Edit 2:
...unless you really want to use a JOIN and you have SQL Server 2005 or higher, you can CAST and DISTINCT (aggregate) to avoid multiple rows in the output...
select DISTINCT
Authors.ContributorID,
Authors.AnotherColumn,
CAST(Authors.biography AS nvarchar(max)) AS biography,
Authors.YetAnotherColumn,
...
from
Authors
inner join
Articles on
Articles.ContributorID = Authors.ContributorID
You want an inner join
select
*
from
Authors
inner join
Articles on
Articles.ContributorID = Authors.ContributorID
This will return only authors who have a an entry on the Articles table, matched by ContributorID.
Select the distinct contributorIDs from the Articles table to get the individual authors who have written an article, and join the Authors table to that query - so something like
select distinct Articles.contributorID, Authors.*
from Articles
join Authors on Articles.contributerID = Authors.ContributerId