This Oracle SQL SELECT shouldn't work. Why does it? - sql

I am debugging a query in Oracle 19c that is trying to sort a SELECT DISTINCT result by a field not in the query. (Note: This is the wrong way to do it. Do not do this.)
This query is trying to return a unique list of customer names sorted with the most recent sale date first. It returns an expected error, "ORA-01791: not a SELECTed expression".
SELECT DISTINCT CUSTOMER_NAME
FROM SALES
ORDER BY LAST_SALE_DATE DESCENDING;
It returns an error because the query is trying to order the result by a field that has not been selected. This makes sense so far.
However, if I simply add FETCH FIRST 6 ROWS ONLY to the query, it does not return an error (although the result is not correct so do not do this). But the question is why Oracle does not return an error message?
SELECT DISTINCT CUSTOMER_NAME
FROM SALES
ORDER BY LAST_SALE_DATE DESCENDING
FETCH FIRST 6 ROWS ONLY;
Why does adding FETCH FIRST 6 ROWS ONLY make this work?
Added: The incorrect query will return duplicates if there are multiple records with the same name and date. A search for something like "sql select distinct order by another column" will show several correct ways to do this.

The explain plan for the query shows what is happening.
The parser rewrites the fetch... clause - it adds analytic row_number to the select list, and uses an outer query with a filter on row numbers. It pushes the unique directive (the distinct directive from your select) into the subquery, which is not a valid transformation; I would consider this an obvious bug in Oracle's implementation of fetch....
The explain plan shows the step where the parser creates an inline view where it applies unique and it adds analytic row_number(). It doesn't show the projection (what columns are included in the view), and - critically - what it applies unique to. A little bit of experimentation suggests the answer though: it applies unique to the combination of customer_name and last_sales_date.
It's possible that this has been reported and perhaps fixed in recent versions - what is your Oracle version?

Related

why distinct and order by doesn't work together in sql query?

enter image description here I am learning how to order by is used in SQL query, then I learned that order by and distinctly don't work together but, when I try to do it practically it worked. I am so confused even after asking chatgpt what the relationship is between order by and distinct.
I learned that when executing SQL queries, the ORDER BY clause comes after the SELECT clause. This means that the database will first retrieve the data specified in the SELECT clause, and then sort it based on the criteria specified in the ORDER BY clause. If the column used in the ORDER BY clause is not present in the SELECT clause, the database will automatically include that column in the select and do order by on both the columns and give result column only column given in SELECT.
However, when using both DISTINCT and ORDER BY together, the outcome may not be what is expected. This is because DISTINCT acts on both the columns in the SELECT clause and the column in the ORDER BY clause. This may cause unexpected results, especially in MySQL.
I found that when I tried this in practice, it still produced the desired results, which makes me question if I learned something incorrectly or if there is missing information that I am unaware of.
I am using the MYSQL database.
It seems there are some spaces before your unique value. It will be appropriate to perform TRIM/RTRIM and remove these spaces in the DISTINC clause itself.
It should be something like this:
DISTINCT TRIM(value) AS trim_value
...
ORDER BY trim_value
Also, it is possible that these are not spaces but some other characters which need to be replace, too.

Unexpected SQL Behaviour with "SELECT TOP" Query

I'm using Microsoft SQL Server 2019 and when I execute:
SELECT TOP 10 *
FROM WideWorldImporters.Sales.Invoices
SELECT TOP 10 CustomerID
FROM WideWorldImporters.Sales.Invoices
It gives results:
Which is incorrect because those aren't the "top 10" customer IDs as displayed by the first query.
Full Screenshot:
Edit: The behaviour I expected above matches what actually happens in SQL Sever 2014. I suspect they changed the underlying implementation in SQL Server 2019, although it still satisfies the documented behaviour.
A TOP without ORDER BY is unpredictable. This is documented by Microsoft. From Microsoft docs:
When you use TOP with the ORDER BY clause, the result set is limited to the first N number of ordered rows. Otherwise, TOP returns the first N number of rows in an undefined order.
...
In a SELECT statement, always use an ORDER BY clause with the TOP clause. Because, it's the only way to predictably indicate which rows are affected by TOP.
See also how does SELECT TOP works when no order by is specified?
You have no ORDER BY so the top 10 results by an indeterminate order are returned. This ordering can change, from one execution to the next.
Tables in SQL represent unordered sets. If you want particular rows using TOP, you need to have an ORDER BY.
This is too long for a comment.
In SQL, table records are unordered. For this reason, a query like yours:
SELECT TOP 10 * FROM WideWorldImporters.Sales.Invoices
... will produce inconsistent results, because it is missing an ORDER BY clause. You need to tell your RDBMS which column should be used to order the records so in can define which TOP records should be returned. For example, something like:
SELECT TOP 10 * FROM WideWorldImporters.Sales.Invoices ORDER BY InvoiceID DESC
This result is absolutely normal, as your query doesn't specify an order. The top 10 returns the first 10 results.
If you don't specify any ordering clause, the results will be returned according to the previous operations of the SQL engine, which might not always be the same, and surely not what you expect them to be.

Strange issue with the Order By --SQL

Few days ago I came across a strange problem with the Order By , While creating a new table I used
Select - Into - From and Order By (column name)
and when I open that table see tables are not arranged accordingly.
I re-verified it multiple times to make sure I am doing the right thing.
One more thing I would like to add is till the time I don't use INTO, I can see the desired result but as soon as I create new table, I see there is no Order for tht column. Please help me !
Thanks in advance.. Before posting the question I did research for 3 days but no solution yet
SELECT
[WorkOrderID], [ProductID], [OrderQty], [StockedQty]
INTO
[AdventureWorks2012].[Production].[WorkOrder_test]
FROM
[AdventureWorks2012].[Production].[WorkOrder]
ORDER BY
[StockedQty]
SQL 101 for beginners: SELECT statements have no defined order unless you define one.
When i open that table
That likely issues a SELECT (TOP 1000 IIFC) without order.
While creating a new table i used Select - Into - From and Order By (column name)
Which sort of is totally irrelevant - you basically waste performance ordering the input data.
You want an order in a select, MAKE ONE by adding an order by clause to the select. The table's internal order is by clustered index, but an query can return results in any order it wants. Fundamental SQL issue, as I said in the first sentence. Any good book on sql covers that in one of the first chapters. SQL uses a set approach, sets have no intrinsic order.
Firstly T-SQL is a set based language and sets don't have orders. More over it also doesn't mean serial execution of commands i.e, the above query is not executed in sequence written but the processing order for a SELECT statement is as:
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
Now when you execute your query without into selected column data gets ordered based on the condition specified in 'Order By' clause but when Into is used format of new_table is determined by evaluating the expressions in the select list.(Remember order by clause has not been evaluated yet).
The columns in new_table are created in the order specified by the select list but rows cannot be ordered. It's a limitation of Into clause you can refer here:
Specifying an ORDER BY clause does not guarantee the rows are inserted
in the specified order.

SQL - Using MAX in a WHERE clause

Assume value is an int and the following query is valid:
SELECT blah
FROM table
WHERE attribute = value
Though MAX(expression) returns int, the following is not valid:
SELECT blah
FROM table
WHERE attribute = MAX(expression)
OF course the desired effect can be achieved using a subquery, but my question is why was SQL designed this way - is there some reason why this sort of thing is not allowed? Students coming from programming languages where you can always replace a data-type by a function call that returns that type find this issue confusing. Is there an explanation one can give them rather than just saying "that's the way it is"?
It's just because of the order of operations of a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
WHERE just filters the rows returned by FROM. An aggregate function like MAX() can't have a result returned because it hasn't even been applied to anything.
That's also the reason, why you can't use aliases defined in the SELECT clause in a WHERE clause, but you can use aliases defined in FROM clause.
A where clause checks every row to see if it matches the conditions specified.
A max computes a single value from a row set. If you put a max, or any other aggregate function into a where clause, how can SQL server figure out what rows the max function can use until the where clause has finished it filter?
This deals with the order that SQL Server processes commands in. It runs the WHERE clause before a GROUP BY or any aggregate. Since a where clause runs first, SQL Server can't tell if a row will be included in an aggregate until it processes the where. That is what the HAVING clause is for. HAVING runs after the GROUP BY and the WHERE and can include MAX since you have already filtered out the rows you don't want to use. See http://www.bennadel.com/blog/70-SQL-Query-Order-of-Operations.htm for a good explanation of the order in which SQL commands run.
Maybe this work
SELECT blah
FROM table
WHERE attribute = (SELECT MAX(expresion) FROM table1)
The WHERE clause is specifically designed to test conditions against raw data (individual rows of the table). However, MAX is an aggregate function over multiple rows of data. Basically, without a sub-select, the WHERE clause knows nothing about any rows in the table except for the current row. So how can you determine the maximum value over a whole bunch of rows when you don't even know what those rows are?
Yes, it's a little bit of a simplification, especially when dealing with joins, but the same principle applies. WHERE is always row-by-row, so that's all it really knows about.
Even if you have a GROUP BY clause, the WHERE clause still only processes one row at a time in the raw data before grouping. It doesn't know the value of a column in any other rows, so it has no way of knowing which row has the maximum value.
Assuming this is MS SQL Server, the following would work.
SELECT TOP 1 blah
FROM table
ORDER BY expression DESC

SQL error ORA 01427

I am trying to update one of the columns in my table by collecting the values from another table in the data store using this query
UPDATE tablename PT
SET DID = (select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
Note: Both the column names in the table are the same as entered
I get ORA 01427 error. Any idea about the issue?
I am trying to understand the other posts of this ORA error
As you can see here
SQL Error: ORA-01427: single-row subquery returns more than one row
This means that your sub-query
select distinct(did) from datastore.get_dept_alias
where upper(ltrim(rtrim(deptalias))) = upper(ltrim(rtrim(PT."Dept Descr")))
AND cid = PT.CID)
is returning more than one row.
So, are you sure that distinct (did) is unique? Looks like it's not. I don't recommend using where rownum = 1 because you don't know which one of the values will be used to update; unless you use ORDER BY.
Your getting this error because your select statement can return more than one result. You can not update a single cell with a query that can potentially return more than one result.
A common approach to avoid this with many SQL languages is to use a top 1 or something like that to assure the engine that you will only return one result. Note that you have to do this even if you know the query will only return one result. Just because YOU know it doesn't mean that the engine knows it. The engine also has to protect you from future possibilities not just things as they are right this moment.
Update:
I noticed you updated your question to Oracle. So in that case you could limit the subquery to a single result using the where rownum = 1 clause. As other answer pointed out you'd have to use further logic to ensure that top 1 coming back is the right one. If you don't know which one is the right one then solve that first.
The thought also occurs to me that you might be misunderstanding what DISTINCT does. This ensures that the return results are unique - but there could still be multiple unique results.