postgresql 9.3. Group by without all columns - sql

I have a problem with the following query:
SELECT
ee.id
ee.column2
ee.column3,
ee.column4,
SUM(ee.column5)
FROM
table1 ee
LEFT JOIN table2 epc ON ee.id = epc.id
WHERE
ee.id (6050)
GROUP BY ee.id
WHERE column id is the primary key.
On version 8.4, the query returns an error saying that column2, column3 and column4 don't exist in the group by clause.
This same query executes successfully on version 9.3.
Does anybody know why?

This was introduced in 9.1
Quote from the release notes:
Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)
The SQL standard allows this behavior, and because of the primary key, the result is unambiguous.
It is also explained with examples in the chapter about group by:
In this example, the columns product_id, p.name, and p.price must be in the GROUP BY clause since they are referenced in the query select list (but see below). The column s.units does not have to be in the GROUP BY list since it is only used in an aggregate expression (sum(...)), which represents the sales of a product. For each product, the query returns a summary row about all sales of the product.
In a nutshell: if the group by clause contains a column that uniquely identifies the rows, it is sufficient to include that column only.

The SQL-99 standard introduced the concept of functionally dependent columns. A column is functionally dependent on another column when that other column (or set of columns) already uniquely defines it. So if you have a table with a primary key, then all other columns in that table are functionally dependent on that primary key.
So when using a GROUP BY, and you include the primary key of a table, then you do not need to include the other columns of that same table in the GROUP BY-clause as they have already been uniquely identified by the primary key.
This is also documented in GROUP BY Clause:
When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.
(emphasis mine)

Related

Pivot multiple rows (2 columns) into a single row

I have a table where it has only 2 columns, the first columns is a name identifier and the second column is a value for this identifier (basically the table acts as default values), below is a screenshot of that table.
What I want is to convert the table from multiple rows into a single row and the values would be columns with the first column as column name. Example, the current values to be transformed into the below.
I read about the PIVOT operator, however it requires an aggregate function in the pivot clause but I don't think I can use an aggregate function in this case, its just setting row values as column values.
Is this possible with PIVOT or is there another construct I should use to achieve this?
There is already a correct technical answer, showing how to pivot in your case.
Let me explain why this "pivoting" is indeed an aggregation, at a logical level.
You have a group of four rows, and you want to generate a "summary row" for the group. (Imagine, in parallel, that you had several employees identified by employee id, in an additional column; each employee had up to four rows, for the same attributes. Then you are grouping by employee id, each group has up to four rows - fewer if there are missing attributes - and you want to get a "summary row" for each group.)
This is a form of aggregation. But for what aggregate function? You seem to only have one value for AGE, only one value for STATUS, etc.
In fact, you can think of AGE as existing in each of the four rows. When the CODE is 'AGE' then the value is 42, and when the CODE is something else then the value is NULL. You could use SUM(), AVG(), MIN(), MAX() over these four values (one is 42, the rest are NULL); they would all return the same answer, 42 - since all aggregate functions ignore NULL.
What if the values are strings, not numbers? Answer: same thing - except you can't use SUM() or AVG(). You still have MIN() and MAX(). In fact you could use other aggregate function too - they just have to be string aggregates. For example you could use LISTAGG(). Again, you are aggregating a single non-NULL string, the others are NULL, so the result will be just that one non-NULL string.
Before Oracle introduced the PIVOT operator in version 11.1 of the database, programmers were already able to pivot - using a conditional aggregation just like I explained. Something like
select max(case when code = 'AGE' then AGE end) as AGE,
...
from ...
group by EMPLOYEE_ID -- in the more general case
(in your simple case you don't need to group by anything.)
You can use pivot clause for that purpose, like below (Your table has only 2 columns and I assume you don't have any duplicate code)
select *
from Yourtable
pivot (
max(value) for code in (
'AGE' as AGE
, 'FIRST_NAME' as FIRST_NAME
, 'LAST_NAME' as LAST_NAME
, 'STATUS' as STATUS
)
)

Must a natural join be on a shared primary key?

Suppose I perform A natural join B, where:
A's schema is: (aID), where (aID) is a primary key.
B's schema is: (aID,bID), where (aID, bID) is a composite primary key.
Would performing the natural join work in this case? Or is it necessary for A to have both aID and bID for this to work?
NATURAL JOIN returns rows with one copy each of the common input table column names and one copy each of the column names that are unique to an input table. It returns a table with all such rows that can be made by combining a row from each input table. That is regardless of how many common column names there are, including zero. When there are no common column names, that is a kind of CROSS JOIN aka CARTESIAN PRODUCT. When all the column names are common, that is a kind of INTERSECTION. All this is regardless of PKs, UNIQUE, FKs & other constraints.
NATURAL JOIN is important as a relational algebra operator. In SQL it can be used in a certain style of relational programming that is in a certain sense simpler than usual.
For a true relational result you would SELECT DISTINCT. Also relations have no special NULL value whereas SQL JOINs treat a NULL as not equal to a NULL; so if we treat NULL as just another value relationally then SQL will sometimes not return the true relational result. (When both arguments have a NULL for each of some shared columns and both have the same non-NULL value for each other shared column.)
A "natural" join uses the names of columns to match between tables. It uses any matching names, regardless of key definitions.
Hence,
select . . .
from a natural join b
will use AId, because that is the only column with the same name.
In my opinion, natural join is an abomination. For one thing, it ignores explicitly declared foreign key relationships. These are the "natural join" keys, regardless of their names.
Second, the join keys are not clear in the SELECT statement. This makes debugging the query much more difficult.
Third, I cannot think of a SQL construct where adding a column or removing a column from a table takes a working query and changes the number of rows in the result set.
Further, I often have common columns on my tables -- CreatedAt, CreatedOn, CreatedBy. Just the existence of these columns precludes using natural joins.

Is natural join the only elision of foreign key name?

Suppose you have a department table with DepartmentID as primary key, and an employee table with DepartmentID as a foreign key. You can then use the fact that these columns have the same name, to perform a natural join that allows you to omit the column name from the query. (I'm not commenting on whether you should or not - that's a matter of opinion - just noting the fact that this shorthand is part of SQL syntax.)
There are various other cases in SQL syntax where you might refer to the column names with expressions like employee.DepartmentID = department.DepartmentID. Are there any other cases where some kind of shorthand allows you to use the fact that the columns have the same name, to omit the column name?
SQL does not know directly about foreign keys; it just has foreign key constraints, which prevent you from creating invalid data. When you have a foreign key, you would want both a constraint and to do joins on it, but the database does not automatically derive one from the other.
Anyway, when you are using a join on two columns with the same names:
SELECT ...
FROM employee
JOIN department ON employee.DepartmentID = department.DepartmentID
then you can replace the ON clause with the USING clause:
SELECT ...
FROM employee
JOIN department USING (DepartmentID)
If there is a USING clause then each of the column names specified must exist in the datasets to both the left and right of the join-operator. For each pair of named columns, the expression "lhs.X = rhs.X" is evaluated for each row of the cartesian product as a boolean expression. Only rows for which all such expressions evaluates to true are included from the result set.
[…]
For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. This is the only difference between a USING clause and its equivalent ON constraint.
(Omitting the duplicate column matters only when you are using SELECT *. (I'm not commenting on whether you should or not – that's a matter of opinion – just noting the fact that this shorthand is part of SQL syntax.))

How to renumber a table column

I have a SQLite table sorted by column ID. But I need to sort it by another numerical field called RunTime.
CREATE TABLE Pass_2 AS
SELECT RunTime, PosLevel, PosX, PosY, Speed, ID
FROM Pass_1
The table Pass_2 looks good, but I need to renumber the ID column from 1 .. n without resorting the records.
It is a principle of SQL databases that the underlying tables have no natural or guaranteed order to their records. You must specify the order in which you want to see the records when SELECTing from a table using an ORDER BY clause.
You can obtain the records you want using SELECT * FROM your_table ORDER BY RunTime, and that is the correct and reliable way to do this in any SQL database.
If you want to attempt to get the records in Pass_2 to "be" in RunTime order, you can add the ORDER BY clause to the SELECT you use to create the table but remember: you are not guaranteed to get the records back in the order in which they were added to the table.
When might you get the records back in a different order? This is most likely to happen when your query can be answered using columns in a covering index -- in that case the records are more likely to be returned in index order than any "natural" order (but again, no guarantees with an ORDER BY clause).
If you want a new ID column starting at 1, then use the ROW_NUMBER() function. Instead of ID in your query use this ROW_NUMBER() OVER(ORDER BY Runtime) AS ID.... This will replace the old ID column with a freshly calculated column

How to use an expression in a join between two tables?

I have two tables. In the 1st table (transaction) there are 2 columns called supplier_code and local_commodity_code. In the 2nd table (local_feed_commodity_map) there are two columns called local_commodity_code and local_commodity_desc. In 1st table, the local_commodity_code field is made by concatenating the supplier_code from 1st table and local_commodity_code from the 2nd table.
I split the concatenated column by using the following code:
SELECT
SUBSTR(T.LOCAL_COMMODITY_CODE, 1, INSTR(T.LOCAL_COMMODITY_CODE,'~')-1) LOCAL_COM_CODE
FROM OYSTER_WEB3.TRANSACTION T
So, I have the column named local_com_code after splitting.
Now I want to join these two tables using the newly generated column (local_com_code) and the local_commodity_code column from the 2nd table. How can I do this only using SELECT statement because I don't have permission for create, insert or update table.
SELECT L.*, T.*
FROM (SELECT Supplier_Code,
Local_Commodity_Code,
SUBSTR(LOCAL_COMMODITY_CODE, 1, INSTR(LOCAL_COMMODITY_CODE,'~')-1)
LOCAL_COM_CODE
FROM OYSTER_WEB3.TRANSACTION
) T
JOIN Local_Feed_Commodity_Map L
ON L.Local_Commodity_Code = T.Local_Com_Code
Oracle has an aversion to the SQL standard 'AS' keyword in some locations, so I've not used it anywhere to maximize the chances of the code working.
However, as I noted in a comment to the question, this is an appalling piece of schema design and should be fixed. It is ludicrous to pessimize all queries that have to work between these two tables by requiring the use of SUBSTR and INSTR like that. The Local_Commodity_Code in the Transaction table should be identical to the Local_Commodity_Code in the Local_Feed_Commodity_Map table so that both the primary key and the foreign key columns can be properly indexed (and referential integrity enforced).