Duplicate columns in Oracle query using row limiting clause - sql

Since Oracle 12c, we can finally use the SQL standard row limiting clause like this:
SELECT * FROM t FETCH FIRST 10 ROWS ONLY
Now, in Oracle 12.1, there was a limitation that is quite annoying when joining tables. It's not possible to have two columns of the same name in the SELECT clause, when using the row limiting clause. E.g. this raises ORA-00918 in Oracle 12.1
SELECT t.id, u.id FROM t, u FETCH FIRST 10 ROWS ONLY
This is a restriction is documented in the manual for all versions 12.1, 12.2, 18.0:
The workaround is obviously to alias the columns
SELECT t.id AS t_id, u.id AS u_id FROM t, u FETCH FIRST 10 ROWS ONLY
Or to resort to "classic" pagination using ROWNUM or window functions.
Curiously, though, the original query with ambiguous ID columns runs just fine from Oracle 12.2 onwards. Is this a documentation bug, or an undocumented feature?

Seems in this case when you are using the row limiting clause, Oracle internally calling the ROW_NUMBER() function where it using the column name in OVER clause Like ROW_NUMBER OVER(ORDER BY ID). because of this you are getting the ORA-00918 error.

I noticed that you have an implicit join. It would be interesting to see if the problem goes away when joining explicitly. I'm wondering if behind the scenes Oracle is doing a join based on id=id and not using the table aliases you assigned them.
That would also explain the column aliases fixing the issue. Try explicitly joining; that could force oracle to use the table aliases and resolve the ambiguity it thinks it sees.

Related

Oracle SQL Query working in 12C but not in 11g

I've a Oracle SQL Query that works fine in 12C, but not in 11g. I've given a similar example below. Please explain if this is a bug/enhancement fixed in 12C.
CREATE TABLE MSI_OWNER.VINOTH_TEST1
(
COL1 VARCHAR2(100 BYTE),
SAL NUMBER,
YEAR NUMBER
)
Insert into MSI_OWNER.VINOTH_TEST1 (COL1, SAL, YEAR) Values ('Vinoth', 100, 1);
Insert into MSI_OWNER.VINOTH_TEST1 (COL1, SAL, YEAR) Values ('Vinoth', 100, 2);
COMMIT;
SELECT col1,
(SELECT MAX (its)
FROM (SELECT MAX (year) its
FROM vinoth_test1 x
WHERE x.col1 = a.col1))
max_year,
sal
FROM vinoth_test1 a
GROUP BY col1, sal
Please note that, I've re written a different logic to fix this, but I wanted to know if this is a bug in 11g or an enhancement in 12C.
Error in 11g: ORA-00904: "A"."COL1": invalid identifier
In either database, you could write this as:
select col1, sal,
max(max(year)) over (partition by col1)
from vinoth_test1
group by col1, sal;
No subqueries are needed. As pointed out, you do not need the addiitonal level of subqueries. The innermost subquery just returns one row anyway.
Oracle has only allowed correlated references to the direct parent in a subquery -- not to higher level parents. That would seem to argue that your query would not work in any version of Oracle. However, I believe that Oracle 12c does some optimizations before imposing this rule. The documentation alludes to this:
Oracle performs a correlated subquery when a nested subquery
references a column from a table referred to a parent statement one
level above the subquery. . . . A
correlated subquery conceptually is evaluated once for each row
processed by the parent statement. However, the optimizer may choose
to rewrite the query as a join or use some other technique to
formulate a query that is semantically equivalent. Oracle resolves
unqualified columns in the subquery by looking in the tables named in
the subquery and then in the tables named in the parent statement.
I suspect that this optimization is removing your unnecessary subquery, and hence allowing the query to compile.
The Oracle documentation has always been explicit that correlation is permitted only one level deep (although there is no clear reason for that, and it is against the SQL Standard).
As Solomon Yakobson, one of the gurus there, has explained several times on OTN, in each new version, in sub-version 1 (as in, 10.1, 11.1), correlation at deeper levels worked OK, just like the OP noticed. It used to be "fixed" (the flexibility was taken back) in sub-version 2 (10.2, 11.2). 12.1 had the same "enhancement" (correlation at all levels), and 12.2 did NOT take that away - even though the documentation STILL says correlation is not permitted more than one level down. Especially since such limitations don't exist when we write queries with the WITH clause, it makes zero sense for Oracle to continue with that restriction.
https://docs.oracle.com/database/122/SQLRF/Using-Subqueries.htm#SQLRF52357
Oracle performs a correlated subquery when a nested subquery references a column from a table referred to a parent statement one level above the subquery [...]

Count(*) with order by not working on PostgreSQL which works on Oracle

Below is the Sql query which works on oracle but not working on PostgreSQL.
select count(*) from users where id>1 order by username;
I know that order by has no meaning in this query but still why it's working on oracle. Below is error on PostgreSQL
ERROR: column "users.username" must appear in the GROUP BY clause or be used in an aggregate function
Position: 48
SQLState: 42803
PostgreSQL version 9.6.3
As seen by Oracle's execution plan, there is no sorting after the rows are aggregated, which suggests that the SQL engine Oracle has implemented ignores that phrase.
Why doesn't it work in PostgreSQL -- because the people running Postgres know what they're doing ;) Just kidding, but that question would be highly speculative for me, without seeing the Oracle vs MySQL source. The bigger questions is if Oracle and MySQL allow for this by coincidence, or because Oracle owns both.
Final note:
If you're going to ask why similar software applications behave differently, I think it's also important to include what version you're referring to. Even different versions of the same application may follow different instructions.
If you are looking for the count of all records only, then there is no need of order by clause because it has no meaning even in oracle also. In such case remove order by.
select count(*) from users where id>1
If you are looking for username wise count, then there is a meaning of sorting on username and in such case you can use following query.
select count(*) from users where id>1 group by username order by username;
Hope your doubt will be cleared.
You can use a with statement for doing things like MySQL in PostgreSQL.
with cnt (cnt1) AS ( select count(*) as cnt1 from sample )
select *, c.cnt1 as len from sample ,cnt as c;

How is query working at Teradata?

I tried to run below query at teradata and It resulted as expected :
select column1 as c1Alias from my_table where column2 in ( c1Alias , 10 , 20 , 30) ;
But I tried to run same query on HIVE , It throws exception as given below :
FAILED: SemanticException [Error 10004]: Line 1:44 Invalid table alias or column reference 'c1Alias': (possible column names are: .......)
I am not surprised why it is failing at HIVE , but surprised how it is working on Teradata.
As per my understanding, Clauses are executed in order as WHERE >> SELECT. Apparently alias generated at SELECT clause would not be available for use in WHERE clause. Correct me if I am wrong here.
I really wanted to know how it is working in teradata ?
You're correct, logically any SELECT is processed in following order:
FROM
WHERE
GROUP BY
HAVING
OLAP functions
QUALIFY
create SELECT column list
SAMPLE
ORDER BY
Besides the proprietary QUALIFY/SAMPLE every DBMS will do it exactly the same.
When you add a filter to the WHERE-condition the column list is not yet created, thus using an alias should fail (and will fail in almost every other DBMS, afaik only Access allows using it similar to Teradata).
It's not failing because Teradata is older than Standard SQL and this seems to be an relict of the query language Teradata implemented first.
But it's a nice extension (just never alias to an existing column name to avoid confusing the optimizer and/or end user) and you get used to it very fast, it avoids lots cut&paste or Derived Tables.
The order of execution of SQL is explained very well over here:
https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/
An excerpt from the post for your quick reference: (Credits to the author for covering all 10 parts of SQL)
FROM, including JOINs
WHERE
GROUP BY
HAVING
WINDOW functions
SELECT
DISTINCT
UNION
ORDER BY
10.LIMIT and OFFSET

Is it possible to have an SQL query that uses AGG functions in this way?

Assuming I have the following aggregate functions:
AGG1
AGG2
AGG3
AGG4
Is it possible to write valid SQL (in a db agnostic way) like this:
SELECT [COL1, COL2 ....], AGG1(param1), AGG2(param2) FROM [SOME TABLES]
WHERE [SOME CRITERIA]
HAVING AGG3(param2) >-1 and AGG4(param4) < 123
GROUP BY COL1, COL2, ... COLN
ORDER BY COL1, COLN ASC
LIMIT 10
Where COL1 ... COLN are columns in the tables being queried, and param1 ... paramX are parameters passed to the AGG funcs.
Note: AGG1 and AGG2 are returned in the results as columns (but do not appear in the HAVING CLAUSE, and AGG3 and AGG4 appear in the HAVING CLAUSE but are not returned in the result set.
Ideally, I want a DB agnostic answer to the solution, but if I have to be tied to a db, I am using PostgreSQL (v9.x).
Edit
Just a matter of clarification: I am not opposed to using GROUP BY in the query. My SQL is not very good, so the example SQL above may have been slightly misleading. I have edited the pseudo sql statement above to hopefully make my intent more clear.
The main thing I wanted to find out was whether a select query that used AGG functions could:
Have agg functions values in the returned column without them being specified in a HAVING clause.
Have agg functions specified in a HAVING clause, but are not returned in the result set.
From the answers I have received so far, it would seem the answer to both questions is YES. The only think I have to do to correct my SQL is to add a GROUP BY clause to make sure that the returned rows are unique.
PostgreSQL major version include the first digit after the dot, thus "PostgreSQL (v9.x)" is not specific enough. As #kekekela said, there is no (cheap) completely db agnostic way. Even between PostgreSQL 9.0 and 9.1 there is an important syntactical difference.
If you had only the grouped values AGG1(param1), AGG2(param2) you would get away without providing an explicit GROUP BY clause. Since you mix grouped and non-grouped columns you have to provide a GROUP BY clause with all non-grouped columns that appear in the SELECT. That's true for any version of PostgreSQL. Read about GROUP BY and HAVING it in the manual.
Starting with version 9.1, however, once you list a primary key in the GROUP BY you can skip additional columns for this table and still use them in the SELECT list. The release notes for version 9.1 tell us:
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
Concerning parameters
Do you intend to feed a constant value to an aggregate function? What's the point? The docs tell us
An aggregate function computes a single result from multiple input rows.
Or do you want those parameters to be column names? That kind of dynamic SQL works as long as the statement is generated before committing to the database. Does not work for prepared statements or simple sql or plpgsql functions. You have to use EXECUTE in a plpgsql function for that purpose.
As safeguard against SQLi use the USING $1, $2 syntax for values and quote_ident() for your column or table names.
The only way to aggregate over columns without using GROUP BY is to use windowing functions. You left out details of your problem, but the following might be what you are looking for:
SELECT *
FROM (
SELECT [COL1, COL2 ....],
AGG1(param1) over (partition by some_grouping_column) as agg1,
AGG2(param2) over (partition by some_grouping_column) as agg2,
row_number() over () as rn
FROM [SOME TABLES]
WHERE [SOME CRITERIA]
ORDER BY COL1
) t
WHERE AGG3 >-1
AND AGG4 < 123
AND rn <= 10
ORDER BY col1
This is standard ANSI SQL and works on most database including PostgreSQL (since 8.4).
Note that you do not need to use the same grouping column for both aggregates in the partition by clause.
If you want to stick with ANSI SQL then you should use the row_number() function to limit the result. If you run this only on PostgreSQL (or other DBMS that support LIMIT in some way) move the LIMIT cause into the derived table (the inner query)
That should work from a high level perspective, except you'd need COL1, COL2 etc in a GROUP BY statement or else they won't be valid in the SELECT list. Having AGG1, etc in the SELECT list and not in the HAVING is not a problem.
As far as db agnostic, you're going to have to tweak syntax no matter what you do (the LIMIT for example is going to be different in PostgreSQL, SQL SERVER and Oracle that I know off the top of my head), but you could build logic to construct the statements properly for each provided your high-level representation is solid.

There are a method to paging using ANSI SQL only?

I know:
Firebird: FIRST and SKIP;
MySQL: LIMIT;
SQL Server: ROW_NUMBER();
Does someone knows a SQL ANSI way to perform result paging?
See Limit—with offset section on this page: http://troels.arvin.dk/db/rdbms/
BTW, Firebird also supports ROWS clause since version 2.0
No official way, no.*
Generally you'll want to have an abstracted-out function in your database access layer that will cope with it for you; give it a hint that you're on MySQL or PostgreSQL and it can add a 'LIMIT' clause to your query, or rownum over a subquery for Oracle and so on. If it doesn't know it can do any of those, fall back to fetching the lot and returning only a slice of the full list.
*: eta: there is now, in ANSI SQL:2003. But it's not globally supported, it often performs badly, and it's a bit of a pain because you have to move/copy your ORDER into a new place in the statement, which makes it harder to wrap automatically:
SELECT * FROM (
SELECT thiscol, thatcol, ROW_NUMBER() OVER (ORDER BY mtime DESC, id) AS rownumber
)
WHERE rownumber BETWEEN 10 AND 20 -- care, 1-based index
ORDER BY rownumber;
There is also the "FETCH FIRST n ROWS ONLY" suffix in SQL:2008 (and DB2, where it originated). But like the TOP prefix in SQL Server, and the similar syntax in Informix, you can't specify a start point, so you still have to fetch and throw away some rows.
In nowadays there is a standard, not necessarily a ANSI standard (people gave many anwsers, I think this is the less verbose one)
SELECT * FROM t1
WHERE ID > :lastId
ORDER BY ID
FETCH FIRST 3 ROWS ONLY
It's not supported by all databases though, bellow a list of all databases that have support
MariaDB: Supported since 5.1 (usually, limit/offset is used)
MySQL: Supported since 3.19.3 (usually, limit/offset is used)
PostgreSQL: Supported since PostgreSQL 8.4 (usually, limit/offset is used)
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 7
Oracle: Supported since version 12c (uses subselects with the row_num function)
Microsoft SQL Server: Supported since 2012 (traditionally, top-N is used)
You can use the offset style of course, although you could have performance issues
SELECT * FROM t1
ORDER BY ID
OFFSET 0 ROWS
FETCH FIRST 3 ROWS ONLY
It has a different support
MariaDB: Supported since 5.1
MySQL: Supported since 4.0.6
PostgreSQL: Supported since PostgreSQL 6.5
SQLite: Supported since version 2.1.0
Db2 LUW: Supported since version 11.1
Oracle: Supported since version 12c
Microsoft SQL Server: Supported since 2012
Yes (SQL ANSI 2003), feature E121-10, combined with the F861 feature you have :
ORDER BY column OFFSET n1 ROWS FETCH NEXT n2 ROWS ONLY;
Like:
SELECT Name, Address FROM Employees ORDER BY Salary OFFSET 2 ROWS
FETCH NEXT 2 ROWS ONLY;
Examples:
postgres:
https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=e25bb5235ccce77c4f950574037ef379
oracle:
https://dbfiddle.uk/?rdbms=oracle_21&fiddle=07d54808407b9dbd2ad209f2d0fe7ed7
sqlserver:
https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=e25bb5235ccce77c4f950574037ef379
db2:
https://dbfiddle.uk/?rdbms=db2_11.1&fiddle=e25bb5235ccce77c4f950574037ef379
YugabyteDB:
https://dbfiddle.uk/?rdbms=yugabytedb_2.8&fiddle=e25bb5235ccce77c4f950574037ef379
Unfortunately, MySQL does not support this syntax, you need something like:
ORDER BY column LIMIT n1 OFFSET n2
But MariaDB does:
https://dbfiddle.uk/?rdbms=mariadb_10.6&fiddle=e25bb5235ccce77c4f950574037ef379
I know I'm very, very late to this question, but it's still one of the top results for this issue.
However one response missing for this question is that the I believe the "correct" ANSI SQL method for paging, at least if you want maximum portability, is to not to use LIMIT/OFFSET/FIRST etc. at all, but to instead do something like:
SELECT *
FROM MyTable
WHERE ColumnA > ?
ORDER BY ColumnA ASC
Where ? is a parameter using a library that supports them (such as PDO in PHP).
The idea here is simple, when fetching the first page we pass a parameter that will match every possible row, e.g- if ColumnA is text, we would pass an empty string (''). We then read in as many results as we want, and then release the rest. This may mean some extra rows are fetched behind the scenes, but our priority here is compatibility.
In order to fetch the next page, we take the value of ColumnA from the last row in our results, and pass it in as the parameter, this way we will only fetch values that appear after it. To run the same query in the other direction, just swap > for < and ASC for DESC.
There are some important caveats of this approach:
Since we're using a condition, your DBMS is free to use an index to optimise the request, which can actually be faster than some "proper" pagination methods, as you eliminate rows rather than advancing past them.
This form of paging is more tightly anchored than row number based methods. When using row number offsets, if you offset into the table, but new rows are added that sort earlier than the current page, then it will cause results to be shifted into later pages. For example, if your current page's last row is mango but since fetching it rows are added for apple and carrot, then mango may now appear on the next page as well, as it has been shifted in the sort order. By using a condition of ColumnA > 'mango' this can't happen. This can be very useful in cases where you are sorting by a DATETIME with frequent updates occurring.
This trick can be made to work in both directions, by reversing the sort order as mentioned when going backwards (flip > to < and ASC to DESC) and passing in the value of ColumnA from the first row of each page of results, rather than the last. Note that if values were added to your table, it may mean that your first page may be shorter, but this is a fairly minor issue.
To be sure you're on the last (or first) page, you should fetch N + 1 rows, where N is the number of rows you want per page, this way you can detect whether there are more rows to fetch.
This method works best if you have a single column with only unique values, but it is still possible to use in more complex cases, so long as you can expand your ORDER BY clause (and WHERE condition) to include enough columns that every row is unique.
So it's not without a few catches, but it's by far the most compatible method as every SQL database will support it.
Insert your results into a storage table, ordered how you'd like to display them, but with a new IDENTITY column.
Now SELECT from that table just the range of IDs you're interested in.
(Be sure to clean out the table when you're done)
Or do it on the client, as anything to do with presentation should not normally be done on the SQL Server (in my opinion)
ANSI Sql example:
offset=41, fetchsize=10
SELECT TOP(10) *
FROM table1
WHERE table1.ID NOT IN (SELECT TOP(40) table1.ID FROM table1)
For paging we need a RowNo column to filter over it -that it should be over a field like id- with two variables like #PageNo and #PageRows. So I use this query:
SELECT *
FROM (
SELECT *, (SELECT COUNT(1)
FROM aTable ti
WHERE ti.id < t.id) As RowNo
FROM aTable t) tr
WHERE
tr.RowNo >= (#PageNo - 1) * #PageRows + 1
AND
tr.RowNo <= #PageNo * #PageRows
BTW, Troels, PostgreSQL supports Limit/Offset