I've a Oracle SQL Query that works fine in 12C, but not in 11g. I've given a similar example below. Please explain if this is a bug/enhancement fixed in 12C.
CREATE TABLE MSI_OWNER.VINOTH_TEST1
(
COL1 VARCHAR2(100 BYTE),
SAL NUMBER,
YEAR NUMBER
)
Insert into MSI_OWNER.VINOTH_TEST1 (COL1, SAL, YEAR) Values ('Vinoth', 100, 1);
Insert into MSI_OWNER.VINOTH_TEST1 (COL1, SAL, YEAR) Values ('Vinoth', 100, 2);
COMMIT;
SELECT col1,
(SELECT MAX (its)
FROM (SELECT MAX (year) its
FROM vinoth_test1 x
WHERE x.col1 = a.col1))
max_year,
sal
FROM vinoth_test1 a
GROUP BY col1, sal
Please note that, I've re written a different logic to fix this, but I wanted to know if this is a bug in 11g or an enhancement in 12C.
Error in 11g: ORA-00904: "A"."COL1": invalid identifier
In either database, you could write this as:
select col1, sal,
max(max(year)) over (partition by col1)
from vinoth_test1
group by col1, sal;
No subqueries are needed. As pointed out, you do not need the addiitonal level of subqueries. The innermost subquery just returns one row anyway.
Oracle has only allowed correlated references to the direct parent in a subquery -- not to higher level parents. That would seem to argue that your query would not work in any version of Oracle. However, I believe that Oracle 12c does some optimizations before imposing this rule. The documentation alludes to this:
Oracle performs a correlated subquery when a nested subquery
references a column from a table referred to a parent statement one
level above the subquery. . . . A
correlated subquery conceptually is evaluated once for each row
processed by the parent statement. However, the optimizer may choose
to rewrite the query as a join or use some other technique to
formulate a query that is semantically equivalent. Oracle resolves
unqualified columns in the subquery by looking in the tables named in
the subquery and then in the tables named in the parent statement.
I suspect that this optimization is removing your unnecessary subquery, and hence allowing the query to compile.
The Oracle documentation has always been explicit that correlation is permitted only one level deep (although there is no clear reason for that, and it is against the SQL Standard).
As Solomon Yakobson, one of the gurus there, has explained several times on OTN, in each new version, in sub-version 1 (as in, 10.1, 11.1), correlation at deeper levels worked OK, just like the OP noticed. It used to be "fixed" (the flexibility was taken back) in sub-version 2 (10.2, 11.2). 12.1 had the same "enhancement" (correlation at all levels), and 12.2 did NOT take that away - even though the documentation STILL says correlation is not permitted more than one level down. Especially since such limitations don't exist when we write queries with the WITH clause, it makes zero sense for Oracle to continue with that restriction.
https://docs.oracle.com/database/122/SQLRF/Using-Subqueries.htm#SQLRF52357
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement one level above the subquery [...]
Related
I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.
Since Oracle 12c, we can finally use the SQL standard row limiting clause like this:
SELECT * FROM t FETCH FIRST 10 ROWS ONLY
Now, in Oracle 12.1, there was a limitation that is quite annoying when joining tables. It's not possible to have two columns of the same name in the SELECT clause, when using the row limiting clause. E.g. this raises ORA-00918 in Oracle 12.1
SELECT t.id, u.id FROM t, u FETCH FIRST 10 ROWS ONLY
This is a restriction is documented in the manual for all versions 12.1, 12.2, 18.0:
The workaround is obviously to alias the columns
SELECT t.id AS t_id, u.id AS u_id FROM t, u FETCH FIRST 10 ROWS ONLY
Or to resort to "classic" pagination using ROWNUM or window functions.
Curiously, though, the original query with ambiguous ID columns runs just fine from Oracle 12.2 onwards. Is this a documentation bug, or an undocumented feature?
Seems in this case when you are using the row limiting clause, Oracle internally calling the ROW_NUMBER() function where it using the column name in OVER clause Like ROW_NUMBER OVER(ORDER BY ID). because of this you are getting the ORA-00918 error.
I noticed that you have an implicit join. It would be interesting to see if the problem goes away when joining explicitly. I'm wondering if behind the scenes Oracle is doing a join based on id=id and not using the table aliases you assigned them.
That would also explain the column aliases fixing the issue. Try explicitly joining; that could force oracle to use the table aliases and resolve the ambiguity it thinks it sees.
Here is the example SQL in question; The SQL should run on any Oracle DBMS (I'm running 11.2.0.2.0).
Note how the UUID values are different (one has 898 the other has 899) in the resultset despite being built from within the inline view/with clause. Further below you can see how DBMS_RANDOM.RANDOM() does not have this side effect.
SQL:
WITH data AS (SELECT SYS_GUID () uuid FROM DUAL)
SELECT uuid, uuid
FROM data
Output:
UUID UUID_1
F8FCA4B4D8982B55E0440000BEA88F11 F8FCA4B4D8992B55E0440000BEA88F11
In Contrast DBMS_RANDOM the results are the same
SQL:
WITH data AS (SELECT DBMS_RANDOM.RANDOM() rand FROM DUAL)
SELECT rand, rand
FROM data
Output:
RAND RAND_1
92518726 92518726
Even more interesting is I can change the behavior / stabilize sys_guid by including calls to DBMS_RANDOM.RANDOM:
WITH data AS (
SELECT SYS_GUID () uuid,
DBMS_RANDOM.random () rand
FROM DUAL)
SELECT uuid a,
uuid b,
rand c,
rand d
FROM data
SQL Fiddle That Stabilizes SYS_GUID:
http://sqlfiddle.com/#!4/d41d8/29409
SQL Fiddle That shows the odd SYS_GUID behavior:
http://sqlfiddle.com/#!4/d41d8/29411
The documentation gives a reason as to why you may see a discrepancy (emphasis mine):
Caution:
Because SQL is a declarative language, rather than an imperative (or procedural) one, you cannot know how many times a function invoked by a SQL statement will run—even if the function is written in PL/SQL, an imperative language.
If your application requires that a function be executed a certain number of times, do not invoke that function from a SQL statement. Use a cursor instead.
For example, if your application requires that a function be called for each selected row, then open a cursor, select rows from the cursor, and call the function for each row. This technique guarantees that the number of calls to the function is the number of rows fetched from the cursor.
Basically, Oracle doesn't specify how many times a function will be called inside a sql statement: it may be dependent upon the release, the environment, the access path among other factors.
However, there are ways to limit query rewrite as explained in the chapter Unnesting of Nested Subqueries:
Subquery unnesting unnests and merges the body of the subquery into the body of the statement that contains it, allowing the optimizer to consider them together when evaluating access paths and joins. The optimizer can unnest most subqueries, with some exceptions. Those exceptions include hierarchical subqueries and subqueries that contain a ROWNUM pseudocolumn, one of the set operators, a nested aggregate function, or a correlated reference to a query block that is not the immediate outer query block of the subquery.
As explained above, you can use ROWNUM pseudo-column to prevent Oracle from unnesting a subquery:
SQL> WITH data AS (SELECT SYS_GUID() uuid FROM DUAL WHERE ROWNUM >= 1)
2 SELECT uuid, uuid FROM data;
UUID UUID
-------------------------------- --------------------------------
1ADF387E847F472494A869B033C2661A 1ADF387E847F472494A869B033C2661A
The NO_MERGE hint "fixes" it. Prevents Oracle from re-writing the inline view.
WITH data AS (SELECT /*+ NO_MERGE */
SYS_GUID () uuid FROM DUAL)
SELECT uuid, uuid
FROM data
From the docs:
The NO_MERGE hint instructs the optimizer not to combine the outer
query and any inline view queries into a single query.This hint lets
you have more influence over the way in which the view is accessed.
SQL Fiddle with the NO_MERGE hint applied:
I'm still struggling to understand/articulate how the query is being re-written in such a way that sys_guid() would be called twice. Perhaps it is a bug; but I tend to assume it is a bug in my own thoughts/code.
Very interesting.
We can use the materialize hint to fix it to.
WITH data AS (SELECT /*+materialize*/SYS_GUID () uuid FROM DUAL)
SELECT uuid, uuid
FROM data;
1 F9440E2613761EC8E0431206460A934C F9440E2613761EC8E0431206460A934C
From my point of view, if we can change the result of a query just by adding a hint, there is an Oracle bug.
Maybe we have to ask metalink to check it...
Assuming I have the following aggregate functions:
AGG1
AGG2
AGG3
AGG4
Is it possible to write valid SQL (in a db agnostic way) like this:
SELECT [COL1, COL2 ....], AGG1(param1), AGG2(param2) FROM [SOME TABLES]
WHERE [SOME CRITERIA]
HAVING AGG3(param2) >-1 and AGG4(param4) < 123
GROUP BY COL1, COL2, ... COLN
ORDER BY COL1, COLN ASC
LIMIT 10
Where COL1 ... COLN are columns in the tables being queried, and param1 ... paramX are parameters passed to the AGG funcs.
Note: AGG1 and AGG2 are returned in the results as columns (but do not appear in the HAVING CLAUSE, and AGG3 and AGG4 appear in the HAVING CLAUSE but are not returned in the result set.
Ideally, I want a DB agnostic answer to the solution, but if I have to be tied to a db, I am using PostgreSQL (v9.x).
Edit
Just a matter of clarification: I am not opposed to using GROUP BY in the query. My SQL is not very good, so the example SQL above may have been slightly misleading. I have edited the pseudo sql statement above to hopefully make my intent more clear.
The main thing I wanted to find out was whether a select query that used AGG functions could:
Have agg functions values in the returned column without them being specified in a HAVING clause.
Have agg functions specified in a HAVING clause, but are not returned in the result set.
From the answers I have received so far, it would seem the answer to both questions is YES. The only think I have to do to correct my SQL is to add a GROUP BY clause to make sure that the returned rows are unique.
PostgreSQL major version include the first digit after the dot, thus "PostgreSQL (v9.x)" is not specific enough. As #kekekela said, there is no (cheap) completely db agnostic way. Even between PostgreSQL 9.0 and 9.1 there is an important syntactical difference.
If you had only the grouped values AGG1(param1), AGG2(param2) you would get away without providing an explicit GROUP BY clause. Since you mix grouped and non-grouped columns you have to provide a GROUP BY clause with all non-grouped columns that appear in the SELECT. That's true for any version of PostgreSQL. Read about GROUP BY and HAVING it in the manual.
Starting with version 9.1, however, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Concerning parameters
Do you intend to feed a constant value to an aggregate function? What's the point? The docs tell us
An aggregate function computes a single result from multiple input rows.
Or do you want those parameters to be column names? That kind of dynamic SQL works as long as the statement is generated before committing to the database. Does not work for prepared statements or simple sql or plpgsql functions. You have to use EXECUTE in a plpgsql function for that purpose.
As safeguard against SQLi use the USING $1, $2 syntax for values and quote_ident() for your column or table names.
The only way to aggregate over columns without using GROUP BY is to use windowing functions. You left out details of your problem, but the following might be what you are looking for:
SELECT *
FROM (
SELECT [COL1, COL2 ....],
AGG1(param1) over (partition by some_grouping_column) as agg1,
AGG2(param2) over (partition by some_grouping_column) as agg2,
row_number() over () as rn
FROM [SOME TABLES]
WHERE [SOME CRITERIA]
ORDER BY COL1
) t
WHERE AGG3 >-1
AND AGG4 < 123
AND rn <= 10
ORDER BY col1
This is standard ANSI SQL and works on most database including PostgreSQL (since 8.4).
Note that you do not need to use the same grouping column for both aggregates in the partition by clause.
If you want to stick with ANSI SQL then you should use the row_number() function to limit the result. If you run this only on PostgreSQL (or other DBMS that support LIMIT in some way) move the LIMIT cause into the derived table (the inner query)
That should work from a high level perspective, except you'd need COL1, COL2 etc in a GROUP BY statement or else they won't be valid in the SELECT list. Having AGG1, etc in the SELECT list and not in the HAVING is not a problem.
As far as db agnostic, you're going to have to tweak syntax no matter what you do (the LIMIT for example is going to be different in PostgreSQL, SQL SERVER and Oracle that I know off the top of my head), but you could build logic to construct the statements properly for each provided your high-level representation is solid.
I have a doubt and question regarding alias in sql. If i want to use the alias in same query can i use it. For eg:
Consider Table name xyz with column a and b
select (a/b) as temp , temp/5 from xyz
Is this possible in some way ?
You are talking about giving an identifier to an expression in a query and then reusing that identifier in other parts of the query?
That is not possible in Microsoft SQL Server which nearly all of my SQL experience is limited to. But you can however do the following.
SELECT temp, temp / 5
FROM (
SELECT (a/b) AS temp
FROM xyz
) AS T1
Obviously that example isn't particularly useful, but if you were using the expression in several places it may be more useful. It can come in handy when the expressions are long and you want to group on them too because the GROUP BY clause requires you to re-state the expression.
In MSSQL you also have the option of creating computed columns which are specified in the table schema and not in the query.
You can use Oracle with statement too. There are similar statements available in other DBs too. Here is the one we use for Oracle.
with t
as (select a/b as temp
from xyz)
select temp, temp/5
from t
/
This has a performance advantage, particularly if you have a complex queries involving several nested queries, because the WITH statement is evaluated only once and used in subsequent statements.
Not possible in the same SELECT clause, assuming your SQL product is compliant with entry level Standard SQL-92.
Expressions (and their correlation names) in the SELECT clause come into existence 'all at once'; there is no left-to-right evaluation that you seem to hope for.
As per #Josh Einstein's answer here, you can use a derived table as a workaround (hopefully using a more meaningful name than 'temp' and providing one for the temp/5 expression -- have in mind the person who will inherit your code).
Note that code you posted would work on the MS Access Database Engine (and would assign a meaningless correlation name such as Expr1 to your second expression) but then again it is not a real SQL product.
Its possible I guess:
SELECT (A/B) as temp, (temp/5)
FROM xyz,
(SELECT numerator_field as A, Denominator_field as B FROM xyz),
(SELECT (numerator_field/denominator_field) as temp FROM xyz);
This is now available in Amazon Redshift
E.g.
select clicks / impressions as probability, round(100 * probability, 1) as percentage from raw_data;
Ref:
https://aws.amazon.com/about-aws/whats-new/2018/08/amazon-redshift-announces-support-for-lateral-column-alias-reference/
You might find W3Schools "SQL Alias" to be of good help.
Here is an example from their tutorial:
SELECT po.OrderID, p.LastName, p.FirstName
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen' AND p.FirstName='Ola'
Regarding using the Alias further in the query, depending on the database you are using it might be possible.