Is a "join table" the result of a SQL JOIN, or the table between a many-to-many - sql

This is a question about correctly "naming things". Specifically, how do you distinguish between:
the "between" table in a many-to-many relationship (e.g. users, users_questions, questions)
the (temporary) table that is created during a SQL JOIN (e.g. 'SELECT * FROM users INNER JOIN users_questions.user_id ON users.id WHERE users_question.question_id=37016694;`)

Lots of database designers use the term join table in your first sense: to implement a many-to-many relationship between entities. It's also called a junction table, association table, and other things. More info: https://en.wikipedia.org/wiki/Associative_entity
I've never heard the second sense used. (But, hey, I don't get out much. :-) If you're writing documentation, or teaching, I suggest you reserve the word table to mean an actual, physical, table. Avoid using the word table for a resultset unless you qualify it by saying virtual table or some such phrase. That way your readers and students won't waste time trying to find the definitions of these not-really-tables in your schema.

From relational point of view, a JOIN table (a table which resolves many-to-many relationship) is real physical table. Else it wouldn't survive between requests. In my opinion, the term "join table" was coined by MVC "Code First" developers who ignore physical entities in the DB realm, especially if they are not shown in DbContext.
In my opinion, again, we should honor relational realities.
So help me Codd.

Related

SQL relationship table

I have theoretical question about SQL design. When I have 10s of tables, between which I need relations m:n, is better approach to do relation table for each required pair or is it possible (or from performance view better) to have one relation table with columns (id,table1,row1,table2,row2), with integers in it?
I am an Informatics student and based on what I have learned in class it is always better to create a table in between the two tables in order for it to hold the primary keys of each of those two tables. This table will hold the relations like the following:
student: student_id, first_name, last_name
classes: class_id, name, teacher_id
student_classes: class_id, student_id # the relations table
Of course it would be better to have a more detailed example, at least for a smaller scope, which are tables, why do you need that relation etc.
But generally, in my opinion, database design should follow logic of your data, so relations should be where they are needed.
Other thing, if you will have only one table for all relations, it will be kind of bottle neck for the whole application. So then, how critical this situation is, also depends, on how are you going to use it in application (mostly reads, or lots of writes/updates/deletes, how many rows.. etc).
Also this consideration also could be somewhat even dependent on what RDBMS you're using.
Depends on the case, sometimes it's better to follow normal forms for designing (https://www.sqlshack.com/what-is-database-normalization-in-sql-server/), but after years working in software... from experience that is lost with time, the original design gets messy unless there is a team for that.

What is happening under the hood when a relationship is established between tables?

This question is not limited to Power BI, but it will help me explain my problem.
If you have more than one table in Power BI, you can establish a relationship between them by dragging a column from one table to the other like this:
And you can edit that relationship by clicking the occuring line:
And by the way, here are the structures of the two tables:
# Table1
A,B
1,abc
2,def
3,ghi
4,jkl
# Table2
A,C
1,abc
1,def
2,ghi
3,ghit
This works fine since column A in Table1 consists of unique values and can work as a primary key. And now you can head over to the Report tab, set up two tables, and slice and dice at your hearts desire either by clicking directly under A in Table1, or by introducing a slicer:
But the thing is that you can do that without having established a relationship between the tables. Delete the relationshiop under Relationships and go back to Report and select Home > Manage Relationships to see what I mean:
As the dialog box says 'There are no relationships defined yet.' But you can still subset one table by making selections in the other just like before (EDIT: This statement has been proven wrong in the answer from RADO) . I do know that you can highlight the slicer and select Format > Edit Interactions and deselect the tables associated with the slicer. But I'm still puzzled by the whole thing.
So is there something happening under the hood here that I'm not aware of? Or is the relationship between tables really defined by the very contents of the tables - in the sence that the existence of related values accross tables with the existence of a potential primary key (be it natural or synthetic) makes it possible to query them using SQL, dplyr verbs or any other form of querying techniques. And that you really do not need an explicitly defined relationship?
Or put in another way, does the establishment of a Power BI table relationship have a SQL equivalent? Perhaps like the following:
CREATE TABLE Persons (
ID int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int,
PRIMARY KEY (ID)
);
I'm sorry If I'm rambling a bit here, but I'm just very confused. And googling has so far only added to the confusion. So thank you for any insights!
Your statement "But you can still subset one table by making selections in the other just like before" is not correct. This is a key issue here.
Relations enable propagation of filter context in Power BI. That's a very loaded phrase, and you will have to learn what it means if you plan to use Power BI. It's the most important concept to understand.
To see what I mean, you will need to write DAX measures and try to manipulate them using your tables. You will immediately see the difference when you have or don't have relations.
How the whole system works (simplified):
PowerBI contains a language called "DAX". You will create measures in DAX, and PowerBI will then translate them into its internal language called xmSQL, which is a special flavor of SQL. In xmSQL, regular connection is translated into LEFT OUTER JOIN, like this:
SELECT SUM(Sales.Amount)
FROM Sales
LEFT OUTER JOIN Customer
ON Sales.Customer_Key = Customer.Customer_Key
By-directional relations are a bit more complex, but conceptually similar.
Overall, when you create relations between tables, you are telling PowerBI engine how to join the tables. The engine then also adds some optimizations to speed up the queries.
Every time you execute a DAX measure, click a slicer or a visual, PowerBI generates multiple xmSQL statements in the background, executes them, and then renders their results as visuals. You can see these SQL queries with some tools such as DAX Studio.
Note that it's not strictly necessary to establish relations between tables in PowerBI. You can imitate the same behavior using DAX (programmatically), but such "virtual" relations are more complex and can be substantially slower.
In the RM (relational model) & ERM (entity-relationship model) tables represent relation(ship)s/association. Hence, relational in "RM" & relationship in "ERM".
FKs (foreign keys) get erroneously called "relationships" in pseudo-ERM methods. A SQL FK constraint says subrows appear elsewhere as PK (primary key) or UNIQUE. A DBMS uses them to disallow invalid updates & to optimize queries.
Power BI "relationships" are not FKs. They are instructions on how to build queries.
When there is a FK we do often want to join on it. So we often want a Power BI relationship when there is a FK.
Create and manage relationships in Power BI Desktop
(See also its Download PDF link for Developer.)
PS We do not need constraints to hold or be declared or be known to query. The constraints (Including PKs, FKs, UNIQUE & cardinalities) are determined by the table meanings--(characteristic) predicates--& what business situations can arise. If constraints hold then we just sometimes get fewer rows than otherwise & some query pairs always return the same results when otherwise they wouldn't.
Foreign keys are not needed to join tables!
Is there any rule of thumb to construct SQL query from a human-readable description?
PS Cross join is inner join with a TRUE condition (or no condition in some DBMSs), period. Whether there is a "relationship" aka FK is irrelevant. If the condition is FK=PK or anything else other than TRUE then it's not a cross join; otherwise it is a cross join whether or not there is a FK between the tables. It's just that we frequently want PK=FK in a condition & tools can & do use the presence of a FK towards a default condition.
CROSS JOIN vs INNER JOIN in SQL Server 2008
You asked "What is happening under the hood?"
The simple answer is "Statements about relationships."
Many well meaning people draw ER diagrams and seem to either forget or be unaware of the fact that their ER diagrams are really "pictures of statements in language."
The problem is ambiguity.
Many well meaning people jump straight to ER diagrams without also expressing the logical statements on which their ER diagrams are based. In effect, this means that the person who draws the ER diagram seems to expect that the "reader" of the ER diagram will be able reconstruct the statements from which the ER diagram was drawn.
Here is an example to illustrate what I mean. My purpose is to show the linguistic basis of the "under the covers" relationship between Students and their Addresses.
So, what's under the covers is language!
A simple diagram
The statements from which the diagram is derived.
A more complex diagram
The statements from which the diagram is derived.

DB: advantages of relations

I always think that the relations between tables are needed to perform cross-table operations, such as join. But I noticed that I can inner join two tables that are not linked at all (hasn't any foreign keys).
So, my questions:
Are some differences (such as speed) in joining linked and not-linked tables?
What are the advantages/disadvantages of using relations bwtween tables?
Thank you in advance.
The primary advantage is that foreign key constraints ensure the relational integrity of the data.. ie it stops you from deleting something that has a related entry in another table
You only get a performance advantage if you create an index on your FK
The FK/PK relationship is a logical feature of the data that would exist even if it were not declared in a given database. You include FKs in a table precisely to establish these logical relationships and to make them visible in a way that makes useful inner joins possible. Declaring an FK as referencing a given PK has the advantage, as said in other answers, of preventing orphaned references, rows that reference a non existent PK.
Indexes can speed up joins. In a complicated query, the optimizer may have a lot of strategies to evaluate, and most of these will not use every available index. Good database systems have good optimizers. In most database systems, declaring a PK will create an index behind the scenes. Sometimes, but not always, creating an index on the FK with the same structure as the index n the PK will enable the optimizer to use a strategy called a merge-join. In certain circumstances a merge-join can be much faster than the alternatives.
When you join tables that are apprently unrelated, there are several cases.
One case is where you end up matching every row from table A with every row from table B. This is called a cartesian join. It takes a long time, and nearly always produces unintended results. One time in ten years I did an intentional cartesian join.
Another case is where both tables contain the same FK, and you match along those two FK. An example might be matching by ZIPCODE. Zipcodes are really FKs to some master zipcode table somewhere out there in post office land, even though most people who use zipcodes never realize that fact.
A third case is where there is a third table, a junction table, containing FKs that reference each of the two tables in question. This implements a many-to-many relationship. In this case, what you probably want to be doing is a three way join with two inner joins each of which has an FK/PK matchup as the join condition.
Either I'm telling a lot that you already know, or you would benefit by going through a basic tutorial on relational databases.
In relational database terms a relation is (more or less) the data structure you have called a table - it is not something that exists "between" tables. A important advantage of the relational model is that there are no predefined links or other navigational structures that limit the way data can be joined or otherwise combined. You are free to join relations (tables) in a query however you like.
What you are asking about is actually called a foreign key constraint. A foreign key is a type of constraint that helps ensure data integrity by preventing inconsistent values being populated in the database.

Why no many-to-many relationships?

I am learning about databases and SQL for the first time. In the text I'm reading (Oracle 11g: SQL by Joan Casteel), it says that "many-to-many relationships can't exist in a relational database." I understand that we are to avoid them, and I understand how to create a bridging entity to eliminate them, but I am trying to fully understand the statement "can't exist."
Is it actually physically impossible to have a many-to-many relationship represented?
Or is it just very inefficient since it leads to a lot of data duplication?
It seems to me to be the latter case, and the bridging entity minimizes the duplicated data. But maybe I'm missing something? I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship, either in the text or anywhere else I've searched. I've been searching all day and only finding the same information repeated: "don't do it, and use a bridging entity instead." But I like to ask why. :-)
Thanks!
Think about a simple relationship like the one between Authors and Books. An author can write many books. A book could have many authors. Now, without a bridge table to resolve the many-to-many relationship, what would the alternative be? You'd have to add multiple Author_ID columns to the Books table, one for each author. But how many do you add? 2? 3? 10? However many you choose, you'll probably end up with a lot of sparse rows where many of the Author_ID values are NULL and there's a good chance that you'll run across a case where you need "just one more." So then you're either constantly modifying the schema to try to accommodate or you're imposing some artificial restriction ("no book can have more than 3 authors") to force things to fit.
A true many-to-many relationship involving two tables is impossible to create in a relational database. I believe that is what they refer to when they say that it can't exist. In order to implement a many to many you need an intermediary table with basically 3 fields, an ID, an id attached to the first table and an id atached to the second table.
The reason for not wanting many-to-many relationships, is like you said they are incredibly inefficient and managing all the records tied to each side of the relationship can be tough, for instance if you delete a record on one side what happens to the records in the relational table and the table on the other side? Cascading deletes is a slippery slope, at least in my opinion.
Normally (pun intended) you would use a link table to establish many-to-many
Like described by Joe Stefanelli, let's say you had Authors and Books
SELECT * from Author
SELECT * from Books
you would create a JOIN table called AuthorBooks
Then,
SELECT *
FROM Author a
JOIN AuthorBooks ab
on a.AuthorId = ab.AuthorId
JOIN Books b
on ab.BookId = b.BookId
hope that helps.
it says that "many-to-many relationships can't exist in a relational database."
I suspect the author is just being controversial. Technically, in the SQL language, there is no means to explicitly declare a M-M relationship. It is an emergent result of declaring multiple 1-M relations to the table. However, it is a common approach to achieve the result of a M-M relationship and it is absolutely used frequently in databases designed on relational database management systems.
I haven't found a concrete reason (or better yet an example) that explains why to avoid the many-to-many relationship,
They should be used where they are appropriate to be used would be a more accurate way of saying this. There are times, such as the books and authors example given by Joe Stafanelli, where any other solution would be inefficient and introduce other data integrity problems. However, M-M relationships are more complicated to use. They add more work on the part of the GUI designer. Thus, they should only be used where it makes sense to use them. If you are highly confident that one entity should never be associated with more than one of some other entity, then by all means restrict it to a 1-M. For example, if you were tracking the status of a shipment, each shipment can have only a single status at any given time. It would over complicate the design and not make logical sense to allow a shipment to have multiple statuses.
Of course they can (and do) exist. That sounds to me like a soapbox statement. They are required for a great many business applications.
Done properly, they are not inefficient and do not have duplicate data either.
Take a look at FaceBook. How many many-to-many relationships exist between friends and friends of friends? That is a well-defined business need.
The statement that "many-to-many relationships can't exist in a relational database." is patently false.
Many-to-many relationships are in fact very useful, and also common. For example, consider a contact management system which allows you to put people in groups. One person can be in many groups, and each group can have many members.
Representation of these relations requires an extra table--perhaps that's what your book is really saying? In the example I just gave, you'd have a Person table (id, name, address etc) and a Group table (id, group name, etc). Neither contains information about who's in which group; to do that you have a third table (call it PersonGroup) in which each record contains a Person ID and a Group ID--that record represents the relation between the person and the group.
Need to find the members of a group? Your query might look like this (for the group with ID=1):
SELECT Person.firstName, Person.lastName
FROM Person JOIN PersonGroup JOIN Group
ON (PersonGroup.GroupID = 1 AND PersonGroup.PersonID = Person.ID);
It is correct. The Many to Many relationship is broken down into several One to Many relationships. So essentially, NO many to many relationship exists!
Well, of course M-M relationship does exist in relational databases and they also have capability of handling at some level through bridging tables, however as the degree of M-M relationship increases it also increases complexity which results in slow R-W cycles and latency.
It is recommended to avoid such complex M-M relationships in a Relational Database. Graph Databases are the best alternative and good at handling Many to Many relationship between objects and that's why social networking sites uses Graph databases for handling M-M relationship between User and Friends, Users and Events etc.
Let's invent a fictional relationship (many to many relationship) between books and sales table. Suppose you are buying books and for each book you buy needs to generate an invoice number for that book. Suppose also that the invoice number for a book can represent multiple sales to the same customer (not in reality but let's assume). We have a many to many relationship between books and sales entities.
Now if that's the case, how can we get information about only 1 book given that we have purchased 3 books since all books would in theory have the same invoice number? That introduces the main problem of using a many to many relationship I guess. Now if we add a bridging entity between Books and sales such that each book sold have only 1 invoice number, no matter how many books are purchases we can still correctly identify each books.
In a many-to-many relationship there is obvious redundancy as well as insert, update and delete anomaly which should be eliminated by converting it to 2 one-to-many relationship via a bridge table.
M:N relationships should not exist in database design. They are extremely inefficient and do not make for functional databases. Two tables (entities) with a many-to-many relationship (aircraft, airport; teacher, student) cannot both be children of each other, there would be no where to put foreign keys without an intersecting table. aircraft-> flight <- airport; teacher <- class -> student.
An intersection table provides a place for an entity that is dependent on two other tables, for example, a grade needs both a class and a student, a flight needs both an aircraft and an airport. Many-to-many relationships conceal data. Intersection tables reveal this data and create one-to-many relationships that can be more easily understood and worked with. So, the question arises, what table should the flight be in--aircraft or airport. Neither, they should be foreign keys in the intersection table, Flight.

“Relation” versus “relationship” in RDBMS/SQL?

Coming from question “Relation” versus “relationship”
What are definitions of "relation" vs. "relationship" in RDBMS (or database theory)?
Update:
I was somewhat perplexed by comment to my question:
"relation is a synonym for table, and
thus has a very precise meaning in
terms of the schema stored in the
computer"
Update2:
Had I answered incorrectly that question , in terms of RDBMS, having written that relation is one-side direction singular connection-dependence-link,
i.e. from one table to another while relationship implies (not necessarily explicitly) more than one link connection in one direction (from one table to another)?
A RELATION is a subset of the cartesian product of a set of domains (http://mathworld.wolfram.com/Relation.html). In everyday terms a relation (or more specifically a relation variable) is the data structure that most people refer to as a table (although tables in SQL do not necessarily qualify as relations).
Relations are the basis of the relational database model.
Relationships are something different. A relationship is a semantic "association among things".
Relation is a mathematical term referring to a concept from set theory. Basically, in RDBMS world, the "relational" aspect is that data is organized into tables which reflect the fact that each row (tuple) is related to all the others. They are all the same type of info.
But then, your have ER (Entity Relationship) which is a modeling methodology in which you identify objects and their relationships in the real world. Then each object is modelled as a table, and each relationship is modelled as a table that contains only foreign keys.
For instance, if you have 3 entities: Teacher, Student, Class; then you might also create a couple of tables to record these 2 relationships: TaughtBy and StudyingIn. The TaughtBy table would have a record with a Teacher ID and a Class ID to record that this class is taught by this teacher. And the StudyingIn table would have a Student ID and a Class ID to reflect that the student is taking this class.
That way, each student can be in many Classes, and each Teacher can be in many classes without needing to have a field which contains a list of class ids in any records. SQL cannot deal with field containing a list of things.
A relation is a table with columns and rows.
and
relationship is association between relations/tables
for example employee table has relation in branch its called relationship between employee table and branch table