Is natural join the only elision of foreign key name? - sql

Suppose you have a department table with DepartmentID as primary key, and an employee table with DepartmentID as a foreign key. You can then use the fact that these columns have the same name, to perform a natural join that allows you to omit the column name from the query. (I'm not commenting on whether you should or not - that's a matter of opinion - just noting the fact that this shorthand is part of SQL syntax.)
There are various other cases in SQL syntax where you might refer to the column names with expressions like employee.DepartmentID = department.DepartmentID. Are there any other cases where some kind of shorthand allows you to use the fact that the columns have the same name, to omit the column name?

SQL does not know directly about foreign keys; it just has foreign key constraints, which prevent you from creating invalid data. When you have a foreign key, you would want both a constraint and to do joins on it, but the database does not automatically derive one from the other.
Anyway, when you are using a join on two columns with the same names:
SELECT ...
FROM employee
JOIN department ON employee.DepartmentID = department.DepartmentID
then you can replace the ON clause with the USING clause:
SELECT ...
FROM employee
JOIN department USING (DepartmentID)
If there is a USING clause then each of the column names specified must exist in the datasets to both the left and right of the join-operator. For each pair of named columns, the expression "lhs.X = rhs.X" is evaluated for each row of the cartesian product as a boolean expression. Only rows for which all such expressions evaluates to true are included from the result set.
[…]
For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. This is the only difference between a USING clause and its equivalent ON constraint.
(Omitting the duplicate column matters only when you are using SELECT *. (I'm not commenting on whether you should or not – that's a matter of opinion – just noting the fact that this shorthand is part of SQL syntax.))

Related

Must a natural join be on a shared primary key?

Suppose I perform A natural join B, where:
A's schema is: (aID), where (aID) is a primary key.
B's schema is: (aID,bID), where (aID, bID) is a composite primary key.
Would performing the natural join work in this case? Or is it necessary for A to have both aID and bID for this to work?
NATURAL JOIN returns rows with one copy each of the common input table column names and one copy each of the column names that are unique to an input table. It returns a table with all such rows that can be made by combining a row from each input table. That is regardless of how many common column names there are, including zero. When there are no common column names, that is a kind of CROSS JOIN aka CARTESIAN PRODUCT. When all the column names are common, that is a kind of INTERSECTION. All this is regardless of PKs, UNIQUE, FKs & other constraints.
NATURAL JOIN is important as a relational algebra operator. In SQL it can be used in a certain style of relational programming that is in a certain sense simpler than usual.
For a true relational result you would SELECT DISTINCT. Also relations have no special NULL value whereas SQL JOINs treat a NULL as not equal to a NULL; so if we treat NULL as just another value relationally then SQL will sometimes not return the true relational result. (When both arguments have a NULL for each of some shared columns and both have the same non-NULL value for each other shared column.)
A "natural" join uses the names of columns to match between tables. It uses any matching names, regardless of key definitions.
Hence,
select . . .
from a natural join b
will use AId, because that is the only column with the same name.
In my opinion, natural join is an abomination. For one thing, it ignores explicitly declared foreign key relationships. These are the "natural join" keys, regardless of their names.
Second, the join keys are not clear in the SELECT statement. This makes debugging the query much more difficult.
Third, I cannot think of a SQL construct where adding a column or removing a column from a table takes a working query and changes the number of rows in the result set.
Further, I often have common columns on my tables -- CreatedAt, CreatedOn, CreatedBy. Just the existence of these columns precludes using natural joins.

Using ALIAS for Table Name have any performance issues

Let's say I have two tables
Table : Department
Columns
1) DeptID
2) DeptName
Table : Employee
Columns
1) EmpID
2) EmpName
3) DeptID
When I apply join on these two tables, I have to use Employee.DeptID = Department.DeptID
Now I restructure my table design and now my table will look like below:
Table : Department
Columns
1) DeptID
2) DeptName
Table : Employee
Columns
1) EmpID
2) EmpName
3) Emp_DeptID
Now with these column names I dont have to use ALIAS, I can simply use DeptID = Emp_DeptID
My question is, using ALIAS hampers query performance in anyway??
Actually, neither example in your question uses aliases.
Consider the following fragments:
1. FROM Department, Employee WHERE DeptID = emp_DeptID
2. FROM Department, Employee WHERE Department.DeptID = Employee.emp_DeptID
3. FROM Department AS D, Employee AS E WHERE D.DeptID = E.emp_DeptID
Example #3 uses aliases. Aliases are used to assign a temporary alternate name to a relation in a query. They are used for two reasons: 1) to reduce typing and improve clarity when a table name must be typed many times and 2) to disambiguate when the same relation is used more than once in the query (in a self-join or correlate sub-query, for instance).
Example 2 simply uses fully-qualified column names but those are not table aliases.
Example 1 is something I have to admit is new to me; the idea that if the column names in the JOIN conditions are unambiguously available from only one relation you don't need to qualify them with the relation name. After decades of always qualifying join conditions that feels "wrong" to me. The danger is that you might, in the future, add an identically named column into one of the relations causing the query to fail (or worse, return incorrect results). But I admit to not always qualifying the column names in the selected column list and that suffers from the same danger, so I think I'm just speaking from force of habit.
Oh, and to answer your original question: fully qualifying by including the table names will probably have a tiny positive effect on the query analyzer (which converts your SQL into an executable query plan). That's because you're not forcing the analyzer to do the qualification for you. But that effect will likely be too small to measure compared to the time taken to actually perform the query.

postgresql 9.3. Group by without all columns

I have a problem with the following query:
SELECT
ee.id
ee.column2
ee.column3,
ee.column4,
SUM(ee.column5)
FROM
table1 ee
LEFT JOIN table2 epc ON ee.id = epc.id
WHERE
ee.id (6050)
GROUP BY ee.id
WHERE column id is the primary key.
On version 8.4, the query returns an error saying that column2, column3 and column4 don't exist in the group by clause.
This same query executes successfully on version 9.3.
Does anybody know why?
This was introduced in 9.1
Quote from the release notes:
Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key, the result is unambiguous.
It is also explained with examples in the chapter about group by:
In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.
In a nutshell: if the group by clause contains a column that uniquely identifies the rows, it is sufficient to include that column only.
The SQL-99 standard introduced the concept of functionally dependent columns. A column is functionally dependent on another column when that other column (or set of columns) already uniquely defines it. So if you have a table with a primary key, then all other columns in that table are functionally dependent on that primary key.
So when using a GROUP BY, and you include the primary key of a table, then you do not need to include the other columns of that same table in the GROUP BY-clause as they have already been uniquely identified by the primary key.
This is also documented in GROUP BY Clause:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
(emphasis mine)

How to use an expression in a join between two tables?

I have two tables. In the 1st table (transaction) there are 2 columns called supplier_code and local_commodity_code. In the 2nd table (local_feed_commodity_map) there are two columns called local_commodity_code and local_commodity_desc. In 1st table, the local_commodity_code field is made by concatenating the supplier_code from 1st table and local_commodity_code from the 2nd table.
I split the concatenated column by using the following code:
SELECT
SUBSTR(T.LOCAL_COMMODITY_CODE, 1, INSTR(T.LOCAL_COMMODITY_CODE,'~')-1) LOCAL_COM_CODE
FROM OYSTER_WEB3.TRANSACTION T
So, I have the column named local_com_code after splitting.
Now I want to join these two tables using the newly generated column (local_com_code) and the local_commodity_code column from the 2nd table. How can I do this only using SELECT statement because I don't have permission for create, insert or update table.
SELECT L.*, T.*
FROM (SELECT Supplier_Code,
Local_Commodity_Code,
SUBSTR(LOCAL_COMMODITY_CODE, 1, INSTR(LOCAL_COMMODITY_CODE,'~')-1)
LOCAL_COM_CODE
FROM OYSTER_WEB3.TRANSACTION
) T
JOIN Local_Feed_Commodity_Map L
ON L.Local_Commodity_Code = T.Local_Com_Code
Oracle has an aversion to the SQL standard 'AS' keyword in some locations, so I've not used it anywhere to maximize the chances of the code working.
However, as I noted in a comment to the question, this is an appalling piece of schema design and should be fixed. It is ludicrous to pessimize all queries that have to work between these two tables by requiring the use of SUBSTR and INSTR like that. The Local_Commodity_Code in the Transaction table should be identical to the Local_Commodity_Code in the Local_Feed_Commodity_Map table so that both the primary key and the foreign key columns can be properly indexed (and referential integrity enforced).

SQL statement from two tables

I would like to know if it is possible, to select certain columns from one table, and another column from a second table, which would relate to a non imported column in the first table. I have to obtain this data from access, and do not know if this is possible with Access, or SQL in general.
Assuming the following table structure:
CREATE TABLE tbl_1 (
pk_1 int,
field_1 varchar(25),
field_2 varchar(25)
);
CREATE TABLE tbl_2 (
pk_2 int,
fk_1 int,
field_3 varchar(25),
field_4 varchar(25)
);
You could use the following:
SELECT t1.field_1, t2.field_3
FROM tbl_1 t1
INNER JOIN tbl_2 t2 ON t1.pk_1 = t2.fk_1
WHERE t2.field_3 = "Some String"
In regard to Bill's post, there are two ways to create JOIN's within SQL queries:
Implicit - The join is created using
the WHERE clause of the query with multiple tables being specified in the FROM clause
Explicit - The join is created using
the appropriate type of JOIN clause
(INNER, LEFT, RIGHT, FULL)
It is always recommended that you use the explicit JOIN syntax as implicit joins can present problems once the query becomes more complex.
For example, if you later add an explicit join to a query that already uses an implicit join with multiple tables referenced in the FROM clause, the first table referenced in the FROM clause will not be visible to the explicitly joined table.
What you are looking for are JOINs:
http://en.wikipedia.org/wiki/Join_(SQL)
You need primary keys for the referenced data sets and foreign keys in the first table.
I'm not 100% sure I understand your question.
Is the following true:
Your first table is imported from somewhere else.
You are only importing some columns.
You want to build a query which references a column which you haven't imported.
If this is true, it's just not possible. As far as the Access query engine in concerned the non-imported columns don't exist.
Why not just import them as well?
But primary keys make the query more efficient