Must a natural join be on a shared primary key? - sql

Suppose I perform A natural join B, where:
A's schema is: (aID), where (aID) is a primary key.
B's schema is: (aID,bID), where (aID, bID) is a composite primary key.
Would performing the natural join work in this case? Or is it necessary for A to have both aID and bID for this to work?

NATURAL JOIN returns rows with one copy each of the common input table column names and one copy each of the column names that are unique to an input table. It returns a table with all such rows that can be made by combining a row from each input table. That is regardless of how many common column names there are, including zero. When there are no common column names, that is a kind of CROSS JOIN aka CARTESIAN PRODUCT. When all the column names are common, that is a kind of INTERSECTION. All this is regardless of PKs, UNIQUE, FKs & other constraints.
NATURAL JOIN is important as a relational algebra operator. In SQL it can be used in a certain style of relational programming that is in a certain sense simpler than usual.
For a true relational result you would SELECT DISTINCT. Also relations have no special NULL value whereas SQL JOINs treat a NULL as not equal to a NULL; so if we treat NULL as just another value relationally then SQL will sometimes not return the true relational result. (When both arguments have a NULL for each of some shared columns and both have the same non-NULL value for each other shared column.)

A "natural" join uses the names of columns to match between tables. It uses any matching names, regardless of key definitions.
Hence,
select . . .
from a natural join b
will use AId, because that is the only column with the same name.
In my opinion, natural join is an abomination. For one thing, it ignores explicitly declared foreign key relationships. These are the "natural join" keys, regardless of their names.
Second, the join keys are not clear in the SELECT statement. This makes debugging the query much more difficult.
Third, I cannot think of a SQL construct where adding a column or removing a column from a table takes a working query and changes the number of rows in the result set.
Further, I often have common columns on my tables -- CreatedAt, CreatedOn, CreatedBy. Just the existence of these columns precludes using natural joins.

Related

Does the term "row set" and "row" have different meaning in the SQL vocabulary?

Consider the relation enrolled(student, course) in which (student, course) is the primary key, and the relation paid(student, amount) where student is the primary key. Assume no null values and no foreign keys or integrity constraints.
Given the following four queries:
Query1:
select student from enrolled where student in (select student from paid)
Query2:
select student from paid where student in (select student from enrolled)
Query3:
select E.student from enrolled E, paid P where E.student = P.student
Query4:
select student from paid where exists
(select * from enrolled where enrolled.student = paid.student)
Which one of the following statements is correct?
(A) All queries return identical row sets for any database
(B) Query2 and Query4 return identical row sets for all databases but there exist databases for which Query1 and Query2 return different row sets
(C) There exist databases for which Query3 returns strictly fewer rows than Query2
(D) There exist databases for which Query4 will encounter an integrity violation at runtime
This question which is otherwise simple, I am having an issue with it in the sense that if we consider the meaning of the two terms "row sets" and "rows" to be equivalent, then we get one answer, but if we consider that they have a different meaning then we get another answer.
Now few of my peers here are of the opinion that the term "row sets" and "rows" have different meanings. They say that in SQL when we alter a table we get the prompt that x rows affected. So in SQL, the tuples in a table are called rows (where duplicates are allowed). While "row set" is the collection of tuples in a table with duplicates removed, based on the set theory concept (or relational algebra concept).
While other peers are of the opinion that the term "row sets" and "rows" mean the same. They say that in SQL when we output a table we get the prompt that x rows in set. They say that in SQL the terms "row sets" and "rows" are used interchangeably.
Personally, I feel that they mean the same. But I am not sure. Please can anyone confirm?
Please give a reference and definition for what your "peers" are saying. SQL is a corruption of relational theory, which is why SQL has 'rows' or 'records', not 'tuples'. Some queries in SQL can return duplicate rows ('bags' not 'sets') so I would avoid 'set' wrt SQL except where your query has explicit SELECT DISTINCT or the tables have declared keys such that you can be sure the result is a set.
(student, course) is the primary key, ... student is the primary key ... Assume no null values and no foreign keys or integrity constraints.
A primary key is an integrity constraint. I presume you mean no other constraints.
So the set of student values appearing in enrolled bears no connection to the set of student values appearing in paid. A given student value might appear in both or only one or only the other. student alone is not the primary key for enrolled so there might be repeated student values there. So
(A) All queries return identical row sets for any database
Some of those queries are liable to return duplicate rows (because they don't have SELECT DISTINCT. So False: not all queries return sets.
If you mean all queries report an identical set of student values for any database in which the schema holds: True. (Test that by using SELECT DISTINCT for all cases.)
(B) Query2 and Query4 return identical row sets ...
The q seems to be drawing on some subtle sense of 'row sets' that I'm not going to guess at. In general SQL queries do not return 'sets'. Avoid the term.
(C) There exist databases for which Query3 returns strictly fewer rows than Query2
I wouldn't like to guess. It depends very much on the particular DBMS and how it optimises queries wrt primary keys.
(D) There exist databases for which Query4 will encounter an integrity violation at runtime
False. Queries do not encounter integrity violations. It's only updates that can break integrity.

SQL Server joins. please i want answer for this

Question: when joining more than 2 tables, what rules are true?
All tables must be related to each other.
Every table must be related to at least one table.
A table does not have to be related.
There must be one table that is directly related to all other tables.
Question: when using a join, which option is true about the joining columns in each table.
Must have the same name
Must have the same data type
Must have the same name and data type
Must have a PK FK relationship.
Must have been joined in the past.
SQL offers multiple types of joins. One of the is the CROSS JOIN. The CROSS JOIN does not use any columns at all. Hence, requirements on columns are irrelevant.
If we limited the logic to INNER JOIN (which is a stretch given the wording), then the only conditions that must be true is (2) in the first set.

Is natural join the only elision of foreign key name?

Suppose you have a department table with DepartmentID as primary key, and an employee table with DepartmentID as a foreign key. You can then use the fact that these columns have the same name, to perform a natural join that allows you to omit the column name from the query. (I'm not commenting on whether you should or not - that's a matter of opinion - just noting the fact that this shorthand is part of SQL syntax.)
There are various other cases in SQL syntax where you might refer to the column names with expressions like employee.DepartmentID = department.DepartmentID. Are there any other cases where some kind of shorthand allows you to use the fact that the columns have the same name, to omit the column name?
SQL does not know directly about foreign keys; it just has foreign key constraints, which prevent you from creating invalid data. When you have a foreign key, you would want both a constraint and to do joins on it, but the database does not automatically derive one from the other.
Anyway, when you are using a join on two columns with the same names:
SELECT ...
FROM employee
JOIN department ON employee.DepartmentID = department.DepartmentID
then you can replace the ON clause with the USING clause:
SELECT ...
FROM employee
JOIN department USING (DepartmentID)
If there is a USING clause then each of the column names specified must exist in the datasets to both the left and right of the join-operator. For each pair of named columns, the expression "lhs.X = rhs.X" is evaluated for each row of the cartesian product as a boolean expression. Only rows for which all such expressions evaluates to true are included from the result set.
[…]
For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. This is the only difference between a USING clause and its equivalent ON constraint.
(Omitting the duplicate column matters only when you are using SELECT *. (I'm not commenting on whether you should or not – that's a matter of opinion – just noting the fact that this shorthand is part of SQL syntax.))

SQL One-to-one relationship join

I have 2 tables one is an extension of the other so it is currently a simple one-to-one relationship (this is likely to become one-to-many in the future). I need to join from one table to another to pull a value out of another column in the extension.
so table A contains basic details including an id and table B uses a FK reference to the Id column in table A. I need to pull out column X from table B.
To add complexity sometimes there won't be a matching entry in table B but in that case it needs to return null. Also the value of X could be null.
I know I can use a left outer join but is there a more efficient way to perform the join?
Left outer join is the way. In order to make it most efficient, make sure you index the FK column in table B. It will be super-fast with the index.
You don't need to index the primary key in table A for this query (and most databases already index primary keys anyway).
The MySQL syntax for creating the index:
CREATE INDEX `fast_lookups` ON `table_b` (`col_name`);
You can name it whatever, I picked "fast_lookups."

Is using Null to represent a Value Bad Practice?

If I use null as a representation of everything in a database table is that bad practice ?
i.e.
I have the tables: myTable(ID) and myRelatedTable(ID,myTableID)
myRelatedTable.myTableID is a FK of myTable.ID
What I want to accomplish is: if myRelatedTable.myTableID is null then my business logic will interpret that as being linked to all myTable rows.
The reason I want to do this is because I have an uknown amount of rows that could be inserted into myTable after the myRelatedTable row is created and some rows in the myRelatedTable need to reference all existing rows in myTable.
I think you might agree that it would be bad to use the number 3 to represent a value other an 3.
By the same reasoning it is therefore a bad idea to use NULL to represent anything other than the absence of a value.
If you disagree and twist NULL to some other purpose, the maintenance programmers that come after you will not be grateful.
Not a good idea, because then you cannot use the "related to all entries" fact in SQL queries at all. At some point, you'll probably want/need to do this.
Ideally there should be no nulls at all. There should be another table to represent the relation.
If you are going to assign special meanings however NULL should only ever mean "not assigned" - ie no relationship exists, use negative numbers, ie -1 if you want to trigger some business layer trickery. It should be obvious to any developers that come across this in the future that -1 is an extraordinary value that should not be treated as normal.
I don't think NULL is the best way to do it but you might use a separate tinyInt column to indicate that the row in MyRelatedTable is related to everything in MyTable, e.g. MyRelatedTable.RelatedAll. That would make it more explicit for other that have to maintain it. Then you could do some sort of Union query e.g.
SELECT M.ID, R.ID AS RelatedTableID,....
FROM MyTable M INNER JOIN MyRelated Table R ON R.myTableId = M.Id
UNION
SELECT M.ID, R.ID AS RelatedTableID,....
FROM MyTable M, MyRelatedTable R
WHERE R.RelatedAll = 1
Yes, for the simple reason that NULL represents no value. Not a special value; not a blank value, but nothing.
If the foreign key is just a simple integer, and it's generated automatically, then you could use 0 to represent the "magic" value.
What you posted, namely that a NULL in a foreign key asserts a relationship with all the rows in the referenced table, is very non standard. Off the top of my head, I think it's fraught with dangers.
What most people who use NULLs in FKs mean by it is that it asserts a relationship to NONE of the rows in the referenced table. This is common in the case of optional relationships, ones that can occur zero times.
Example: We have an HR database, with a table called "EMPLOYEES". We have two columns, called "EmpID" and "SupervisorID". (Many people call the first column simply "ID"). Every employee in the table has an entry under SupervisorID with the sole exception of the CEO of the company. THe CEO has a NULL in the SupervisorID column, meaning that the CEO has no supervisor. The CEO is accountable to the BOD, but that isn't represented in SupervisorID.
What you might mean by a relationship with ALL the rows in the refernced table is this: There's a POSSIBLE relationship between the row in question and ANY ONE of the rows in the reference table. When you start to get into the questions of the facts that are true in the real world but unknown to the database you open a whole big can of worms.