Strange issue with the Order By --SQL - sql

Few days ago I came across a strange problem with the Order By , While creating a new table I used
Select - Into - From and Order By (column name)
and when I open that table see tables are not arranged accordingly.
I re-verified it multiple times to make sure I am doing the right thing.
One more thing I would like to add is till the time I don't use INTO, I can see the desired result but as soon as I create new table, I see there is no Order for tht column. Please help me !
Thanks in advance.. Before posting the question I did research for 3 days but no solution yet
SELECT
[WorkOrderID], [ProductID], [OrderQty], [StockedQty]
INTO
[AdventureWorks2012].[Production].[WorkOrder_test]
FROM
[AdventureWorks2012].[Production].[WorkOrder]
ORDER BY
[StockedQty]

SQL 101 for beginners: SELECT statements have no defined order unless you define one.
When i open that table
That likely issues a SELECT (TOP 1000 IIFC) without order.
While creating a new table i used Select - Into - From and Order By (column name)
Which sort of is totally irrelevant - you basically waste performance ordering the input data.
You want an order in a select, MAKE ONE by adding an order by clause to the select. The table's internal order is by clustered index, but an query can return results in any order it wants. Fundamental SQL issue, as I said in the first sentence. Any good book on sql covers that in one of the first chapters. SQL uses a set approach, sets have no intrinsic order.

Firstly T-SQL is a set based language and sets don't have orders. More over it also doesn't mean serial execution of commands i.e, the above query is not executed in sequence written but the processing order for a SELECT statement is as:
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
Now when you execute your query without into selected column data gets ordered based on the condition specified in 'Order By' clause but when Into is used format of new_table is determined by evaluating the expressions in the select list.(Remember order by clause has not been evaluated yet).
The columns in new_table are created in the order specified by the select list but rows cannot be ordered. It's a limitation of Into clause you can refer here:
Specifying an ORDER BY clause does not guarantee the rows are inserted
in the specified order.

Related

This Oracle SQL SELECT shouldn't work. Why does it?

I am debugging a query in Oracle 19c that is trying to sort a SELECT DISTINCT result by a field not in the query. (Note: This is the wrong way to do it. Do not do this.)
This query is trying to return a unique list of customer names sorted with the most recent sale date first. It returns an expected error, "ORA-01791: not a SELECTed expression".
SELECT DISTINCT CUSTOMER_NAME
FROM SALES
ORDER BY LAST_SALE_DATE DESCENDING;
It returns an error because the query is trying to order the result by a field that has not been selected. This makes sense so far.
However, if I simply add FETCH FIRST 6 ROWS ONLY to the query, it does not return an error (although the result is not correct so do not do this). But the question is why Oracle does not return an error message?
SELECT DISTINCT CUSTOMER_NAME
FROM SALES
ORDER BY LAST_SALE_DATE DESCENDING
FETCH FIRST 6 ROWS ONLY;
Why does adding FETCH FIRST 6 ROWS ONLY make this work?
Added: The incorrect query will return duplicates if there are multiple records with the same name and date. A search for something like "sql select distinct order by another column" will show several correct ways to do this.
The explain plan for the query shows what is happening.
The parser rewrites the fetch... clause - it adds analytic row_number to the select list, and uses an outer query with a filter on row numbers. It pushes the unique directive (the distinct directive from your select) into the subquery, which is not a valid transformation; I would consider this an obvious bug in Oracle's implementation of fetch....
The explain plan shows the step where the parser creates an inline view where it applies unique and it adds analytic row_number(). It doesn't show the projection (what columns are included in the view), and - critically - what it applies unique to. A little bit of experimentation suggests the answer though: it applies unique to the combination of customer_name and last_sales_date.
It's possible that this has been reported and perhaps fixed in recent versions - what is your Oracle version?

how to resolve this - group by changes the Order of items in SQL Server

I'm using SQL server 2014,I'm fetching data from a view.The order of items is getting changed once i use Group by ,how can i get the order back after using this Group by,There is one date column,but its not saving any time,So i can't sort it based on date also..
How can I display the data in the same order as it displayed before using Group by?Anyone have any idea please help?
Thanks
Tables and views are essentially unordered sets. To get rows in a specific order, you should always add an ORDER BY clause on the columns you wish to order on.
I'm assuming you previously selected from the VIEW without an ORDER BY clause. The order in which rows are returned from a SELECT statement without an ORDER BY statement is undefined. The order you are getting them in, can change due to any number of reasons (eg some are listed here).
Your problem stems from the mistake you made on relying on the order from a SELECT from a VIEW without an ORDER BY. You should have had an ORDER BY clause in your SELECT statement to begin with.
How can I display the data in the same order as it displayed before using Group by?
The answer: You can't if your initial statement did not have an ORDER BY clause.
The resolution: Determine the order you want the resultset in and add an ORDER BY clause on those columns, both in your initial query and the version with the GROUP BY clause.
Maybe you can use the row_number() function without any OVER and ORDER BY keywords? This should be done in a sub-select and when you group the data in the outer SELECT, use the AVG() function on the numbered column and ORDER the result by this. The problem is, that when you group rows, the original rows disappear. That's kind if the purpose of GROUP BY. ;) Depending on what you GROUP BY, what you're asking might be logically impossible.
EDIT:
Found this solution Googling: http://blog.sqlauthority.com/2015/05/05/sql-server-generating-row-number-without-ordering-any-columns/
So you can number rows like this to maintain the order of rows from the table before you GROUP BY:
row_number() OVER (ORDER BY (SELECT 1))
The only way you can enforce a specific order is to explicitly use a ORDER BY clause. Otherwise the order of rows is not guaranteed (take a look at this article for more details) and the database engine will return the rows based on "as fast as it can" or "as fast as it can retrieve them from disk" rule. So, order can also vary between executions of the same query in the span of a few seconds.
When doing a DISTINCT, GROUP BY or ORDER BY, SQL Server automatically does a SORT of the data based on an index it uses for that query.
Looking at the execution plan of your query will show you what index (and implicitly columns in that index) is being used to sort the data.

What is the order of execution for this SQL statement

I have below SQL Query :
SELECT TOP 5 C.CustomerID,C.CustomerName,C.CustomerSalary
FROM Customer C
WHERE C.CustomerSalary > 10000
ORDER BY C.CustomerSalary DESC
What will be execution order of the following with proper explanation ?
TOP Clause
WHERE Clause
ORDER BY Clause
Check out the documentation for the SELECT statement, in particular this section:
Logical Processing Order of the SELECT statement
The following steps show the logical processing order, or binding
order, for a SELECT statement. This order determines when the objects
defined in one step are made available to the clauses in subsequent
steps. For example, if the query processor can bind to (access) the
tables or views defined in the FROM clause, these objects and their
columns are made available to all subsequent steps. Conversely,
because the SELECT clause is step 8, any column aliases or derived
columns defined in that clause cannot be referenced by preceding
clauses. However, they can be referenced by subsequent clauses such as
the ORDER BY clause. Note that the actual physical execution of the
statement is determined by the query processor and the order may vary
from this list.
which gives the following order:
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
WHERE
ORDER BY
TOP
Here is a good article about that: http://blog.sqlauthority.com/2009/04/06/sql-server-logical-query-processing-phases-order-of-statement-execution/
Simply remember this phrase:-
Fred Jones' Weird Grave Has Several Dull Owls
Take the first letter of each word, and you get this:-
FROM
(ON)
JOIN
WHERE
GROUP BY
(WITH CUBE or WITH ROLLUP)
HAVING
SELECT
DISTINCT
ORDER BY
TOP
Hope that helps.
This is exact execution order, with your case.
1-FROM
2-WHERE
3-SELECT
4-ORDER BY
5-TOP
TOP, WHERE, and ORDER BY are not "executed" - they simply describe the desired result and the database query optimizer determines (hopefully) the best plan for the actual execution. The separation between "declaring the desired result" and how it is physically achieved is what makes SQL a "declarative" language.
Assuming there is an index on CustomerSalary, and the table is not clustered, your query will likely be executed as an index seek + table heap access, as illustrated in this SQL Fiddle (click on View Execution Plan at the bottom):
As you can see, first the correct CustomerSalary value is found through the Index Seek, then the row that value belongs to is retrieved from the table heap through RID Lookup (Row ID Lookup). The Top is just for show here (and has 0% cost), as is the Nested Loops for that matter - the starting index seek will return (at most) one row in any case. The whole query is rather efficient and will likely cost only a few I/O operations.
If the table is clustered, you'll likely have another index seek instead of the table heap access, as illustrated in this SQL Fiddle (note the lack of NONCLUSTERED keyword in the DDL SQL):
But beware: I was lucky this time to get the "right" execution plan. The query optimizer might have chosen a full table scan, which is sometimes actually faster on small tables. When analyzing query plans, always try to do that on realistic amounts of data!
Visit https://msdn.microsoft.com/en-us/library/ms189499.aspx for a better explanation.
The following steps show the logical processing order, or binding order, for a SELECT statement. This order determines when the objects defined in one step are made available to the clauses in subsequent steps. For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps. Conversely, because the SELECT clause is step 8, any column aliases or derived columns defined in that clause cannot be referenced by preceding clauses. However, they can be referenced by subsequent clauses such as the ORDER BY clause. Note that the actual physical execution of the statement is determined by the query processor and the order may vary from this list.
FROM
ON
JOIN
WHERE
GROUP BY
WITH CUBE or WITH ROLLUP
HAVING
SELECT
DISTINCT
ORDER BY
TOP
My $0,02 here.
There's two different concepts in action here: the logical execution order and the plan of query execution. An other was to see it is who answers the following questions:
How MSSQL understood my SQL Query?
What it'll do to execute it in the best possible way given the current schema and data?
The first question is answered by the logical execution order. Brian's answer show what it is. It's the way SQL understood your command: "FROM Customer table (aliased as C) consider only the rows WHERE the C.CustomerSalary > 10000, ORDER them BY C.CustomerSalary in descendent order and SELECT the columns listed for the TOP 5 rows". The resultset will obey that meaning
The second question's answer is the query execution plan - and it depends on your schema (table definitions, selectivity of data, quantity of rows in the customer table, defined indexes, etc) since is heavily dependant of SQL Server optimizer internal workings.
Here is the complete sequence for sql server :
1. FROM
2. ON
3. JOIN
4. WHERE
5. GROUP BY
6. WITH CUBE or WITH ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. ORDER BY
11. TOP
So from the above list, you can easily understand the execution sequence of TOP, WHERE and ORDER BY which is :
1. WHERE
2. ORDER BY
3. TOP
Get more information about it from Microsoft

SQL Server Implicit Order

i've got an issue due to database conception.
My data are grouped in a table which looks like :
IdGroup | IdValue
So for each group i've got the list of value.
Indeed, we should have had an order column or an id, but i can't.
Do you know anyway which can prove the order of the select value based on the insert order ?
I mean, if I inserted 1003,1001,1002 could i garantuee it to be retrieve in this order ?
IdGroup | IdValue
1 | 1003
1 | 1001
1 | 1002
Of course, using an order by doesn't seems to fit because i don't have any column usable.
Any idea ? Using a system proc or something like this.
Thanks a lot :)
Stop telling me to use an order by and altering the table, it doesn't fit and yes i know it's the good pratice to do... thanks :)
A couple of ideas:
DBCC PAGE (undocumented) can be used to look at the raw data pages of the table. It may be possible to determine insert order by looking at the low level information.
If you cannot alter the table, can you add a table to the database? If so, consider creating a table with an identity column and use a trigger on the original table to insert the records in the new table.
Also, you should include which version(s) of SQL Server are involved. Doing anything this unusual will very often be version specific.
You shouldn't rely on the data being returned in a particular order; use an ORDER BY clause to guarantee the order.
(Despite the fact that data appears to be returned in clustered index order, this might not always be the case).
Whilst some small scale tests will show that it returns it in what appears to be the right order, it just will not hold.
The golden rule remains - unless an order by clause is specified, there are no guarentees provided on the order of the returned data.
edit : If you place a non-clustered index on the idgroup column it is forced to add a hidden field, the uniqueifier since the values are the same - the problem it, you can't access it in an order by clause, but from a forensic perspective, you can determine the order it was inserted in.
As others have said, the only way to guarantee an ordering is with an ORDER BY clause. What isn't highlighted in their answers is that, the only place that this ORDER BY matters is in the SELECT statement. It doesn't* matter if you apply an ORDER BY clause during the INSERT statement; the system is free to return results from a select in whatever order it finds most efficient, unless an ORDER BY is specified at that time.
*There's a particular way to ensure what order IDENTITY values are assigned to a result set during an INSERT, using an ORDER BY, but I can't remember the exact details, and it still doesn't effect the order of SELECT.
Can you add the Created Date column? In this way you can get the records using Order by Clause Created Date. Moreover set it's default value Getdate()

Query Optimisation (ORDER BY)

I have an SQL query similar to below:
SELECT NAME,
MY_FUNCTION(NAME) -- carries out some string manipulation
FROM TITLES
ORDER BY NAME; -- has an index.
The TITLES table has approximately 12,000 records. At the moment the query takes over 5 minutes to execute but if I remove the ORDER BY clause then it executes within a couple of seconds.
Does anyone have any suggestions on how to be speed up this query.
If MY_FUNCTION is deterministic (i.e. always returns the same result for the same input value) then you could create an index on (NAME, MY_FUNCTION(NAME)) and it may help (or may not!)
In comments under the question, you say that it takes 2 seconds "to return N rows without the ORDER BY". That makes sense: without the ORDER BY you will just get the first N rows encountered, as soon as they are encountered. With the ORDER BY, the first N rows are returned only after the results have been sorted into the correct order.
If the query is being used in a situation where getting the first N rows fast is important (e.g. an online report with pagination) then you could try adding a FIRST_ROWS or FIRST_ROWS_n hint to the query, to try to persuade it to use the index. See Choosing an Optimizer Goal
Use the EXPLAIN statement to see where the issue is
EXPLAIN SELECT NAME, MY_FUNCTION(NAME) FROM TITLES ORDER BY NAME;
Sounds weird. What's name column type?
Have you checked for defective hardware errors? Maybe (just maybe) your query with the order by clause is using your index, and your index is located in a defective disk (it could be in a different disk from the table if they are located in different tablespaces).