How can I order entries in a UNION without ORDER BY? - sql

How can I be sure that my result set will have a first and b second? It would help me to solve a tricky ordering problem.
Here is a simplified example of what I'm doing:
SELECT a FROM A LIMIT 1
UNION
SELECT b FROM B LIMIT 1;

SELECT col
FROM
(
SELECT a col, 0 ordinal FROM A LIMIT 1
UNION ALL
SELECT b, 1 FROM B LIMIT 1
) t
ORDER BY ordinal

I don't think order is guaranteed, at least not across all DBMS.
What I've done in the past to control the ordering in UNIONs is:
(SELECT a, 0 AS Foo FROM A LIMIT 1)
UNION
(SELECT b, 1 AS Foo FROM B LIMIT 1)
ORDER BY Foo

Your result set with UNION will eliminate distinct values.
I can't find any proof in documentation, but from 10 years experience I can tell that UNION ALL does preserve order, at least in Oracle.
Do not rely on this, however, if you're building a nuclear plant or something like that.

No, the order of results in a SQL query is controlled only by the ORDER BY clause. It may be that you happen to see ordered results without an ORDER BY clause in some situation, but that is by chance (e.g. a side-effect of the optimiser's current query plan) and not guaranteed.
What is the tricky ordering problem?

I know for Oracle there is no way to guarantee which will come out first without an order by. The problem is if you try it it may come out in the correct order even for most of the times you run it. But as soon as you rely on it in production, it will come out wrong.

I would have thought not, since the database would most likely need to do an ORDER BY in order to the UNION.
UNION ALL might behave differently, but YMMV.

The short answer is yes, you will get A then B.

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

Fetch data sequentially using union operator

I am fetching data using Union operator. I want my output to be in the same order as my select queries are fetching but instead Union sorts it in alphabetical order. Can you suggest me a way to avoid getting it sorted by default.
Try to do it in a subquery like this:
select * from (select x , y ,z from table1
UNION ALL
select x,y,z from table2)
order by y
Thilo is correct: to be safe, a query should always explicitly order the results. Relying on implicit sorting has caused many problems in the past and will continue to cause more problems in the future.
The suggestion that UNION ALL will avoid the sorting is almost always correct. It should work in 11g and below. But 12c introduced concurrent execution of union all, which does not guarantee the order of results any more.
Even if implicit sorting works right now it's always a good idea to add an ORDER BY.
You should not rely on the order if you do not specify any ORDER BY clause.
UNION guaranties uniquines of the result. Pre 10g versions used sort to remove duplicates. Newer Oracle version also might (but not must) use a hash-table to remove duplicates - therefore the result is not necessarily sorted.
UNION ALL does not care about uniquines.
You can simply type:
select x , y ,z from table1
UNION ALL
select x,y,z from table2
order by y
Order by applies onto the whole result.

Does UNION ALL guarantee the order of the result set [duplicate]

This question already has answers here:
SQL Server UNION - What is the default ORDER BY Behaviour
(6 answers)
Closed 9 years ago.
Can I be sure that the result set of the following script will always be sorted like this O-R-D-E-R ?
SELECT 'O'
UNION ALL
SELECT 'R'
UNION ALL
SELECT 'D'
UNION ALL
SELECT 'E'
UNION ALL
SELECT 'R'
Can it be proved to sometimes be in a different order?
There is no inherent order, you have to use ORDER BY. For your example you can easily do this by adding a SortOrder to each SELECT. This will then keep the records in the order you want:
SELECT 'O', 1 SortOrder
UNION ALL
SELECT 'R', 2
UNION ALL
SELECT 'D', 3
UNION ALL
SELECT 'E', 4
UNION ALL
SELECT 'R', 5
ORDER BY SortOrder
You cannot guarantee the order unless you specifically provide an order by with the query.
No it does not. SQL tables are inherently unordered. You need to use order by to get things in a desired order.
The issue is not whether it works once when you try it out. The issue is whether you can trust this behavior. And you cannot. SQL Server does not even guarantee the ordering for this:
select *
from (select t.*
from t
order by col1
) t
It says here:
When ORDER BY is used in the definition of a view, inline function,
derived table, or subquery, the clause is used only to determine the
rows returned by the TOP clause. The ORDER BY clause does not
guarantee ordered results when these constructs are queried, unless
ORDER BY is also specified in the query itself.
A fundamental principle of the SQL language is that tables are not ordered. So, although your query might work in many databases, you should use the version suggested by BlueFeet to guarantee the ordering of results.
Try removing all of the ALLs, for example. Or even just one of them. Now consider that the type of optimization that has to happen there (and many other types) will also be possible when the SELECT queries are actual queries against tables, and are optimized separately. Without an ORDER BY, ordering within each query will be arbitrary, and you can't guarantee that the queries themselves will be processed in any order.
Saying UNION ALL with no ORDER BY is like saying "Just throw all the marbles on the floor." Maybe every time you throw all the marbles on the floor, they end up being organized by color. That doesn't mean the next time you throw them on the floor they'll behave the same way. The same is true for ordering in SQL Server - if you don't say ORDER BY then SQL Server assumes you don't care about order. You may see by coincidence a certain order being returned all the time, but many things can affect the arbitrary order that has been selected next time. Data changes, statistics changes, recompile, plan flush, upgrade, service pack, hotfix, trace flag... ad nauseum.
I will put this in large letters to make it clear:
You cannot guarantee an order without ORDER BY
Some further reading:
Bad habits to kick : relying on undocumented behavior
Also, please read this post by Conor Cunningham, a pretty smart guy on the SQL team.
No. You get the records in whatever way SQL Server fetches them for you. You can apply an order on a unioned result set by 1-based index thusly:
SELECT 1, 'O'
UNION ALL
SELECT 2, 'R'
UNION ALL
SELECT 3, 'D'
UNION ALL
SELECT 4, 'E'
UNION ALL
SELECT 5, 'R'
ORDER BY 1

How to speed up group-based duplication-count queries on unindexed tables

When I need to know the number of rows containing more than n duplicates for certain colulmn c, I can do it like this:
WITH duplicateRows AS (
SELECT COUNT(1)
FROM [table]
GROUP BY c
HAVING COUNT(1) > n
) SELECT COUNT(1) FROM duplicateRows
This leads to an unwanted behaviour: SQL Server counts all rows grouped by i, which (when no index is on this table) leads to horrible performance.
However, when altering the script such that SQL Server doesn't have to count all the rows doesn't solve the problem:
WITH duplicateRows AS (
SELECT 1
FROM [table]
GROUP BY c
HAVING COUNT(1) > n
) SELECT COUNT(1) FROM duplicateRows
Although SQL Server now in theory can stop counting after n + 1, it leads to the same query plan and query cost.
Of course, the reason is that the GROUP BY really introduces the cost, not the counting. But I'm not at all interested in the numbers. Is there another option to speed up the counting of duplicate rows, on a table without indexes?
The greatest two costs in your query are the re-ordering for the GROUP BY (due to lack of appropriate index) and the fact that you're scanning the whole table.
Unfortunately, to identify duplicates, re-ordering the whole table is the cheapest option.
You may get a benefit from the following change, but I highly doubt it would be significant, as I'd expect the execution plan to involve a sort again anyway.
WITH
sequenced_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY fieldC) AS sequence_id
FROM
yourTable
)
SELECT
COUNT(*)
FROM
sequenced_data
WHERE
sequence_id = (n+1)
Assumes SQLServer2005+
Without indexing the GROUP BY solution is the best, every PARTITION-based solution involving both table(clust. index) scan and sort, instead of simple scan-and-counting in GROUP BY case
If the only goal is to determine if there are ANY rows in ANY group (or, to rephrase that, "there is a duplicate inside the table, given the distinction of column c"), adding TOP(1) to the SELECT queries could perform some performance magic.
WITH duplicateRows AS (
SELECT TOP(1)
1
FROM [table]
GROUP BY c
HAVING COUNT(1) > n
) SELECT 1 FROM duplicateRows
Theoretically, SQL Server doesn't need to determine all groups, so as soon as the first group with a duplicate is found, the query is finished (but worst-case will take as long as the original approach). I have to say though that this is a somewhat imperative way of thinking - not sure if it's correct...
Speed and "without indexes" almost never go together.
Athough as others here have mentioned I seriously doubt that it will have performance benefits. Perhaps you could try restructuring your query with PARTITION BY.
For example:
WITH duplicateRows AS (
SELECT a.aFK,
ROW_NUMBER() OVER(PARTITION BY a.aFK ORDER BY a.aFK) AS DuplicateCount
FROM Address a
) SELECT COUNT(DuplicateCount) FROM duplicateRows
I haven't tested the performance of this against the actual group by clause query. It's just a suggestion of how you could restructure it in another way.

Clarification on "rownum"

I have a table Table1
Name Date
A 01-jun-2010
B 03-dec-2010
C 12-may-2010
When i query this table with the following query
select * from table1 where rownum=1
i got output as
Name Date
A 01-jun-2010
But in the same way when i use the following queries i do not get any output.
select * from table1 where rownum=2
select * from table1 where rownum=3
Someone please give me guidance why it works like that, and how to use the rownum.
Tom has an answer for many Oracle related questions
In short, rownum is available after the where clause has been applied and before the order by clause is applied.
In the case of RowNum=2, the predicate in the where clause will never evaluate to true as RowNum starts at 1 and only increases if records matching the predicate can be found.
Adding rownums is one of the last things done after the result set has been fetched from the database. This means that the first row will always have rownum 1. Rownum is better used when you want to limit the result set, for instance when doing paging.
See this for more: http://www.orafaq.com/wiki/ROWNUM
(Not an Oracle expert by any means)
From what I understand, rownum numbers the rows in a result set.
So, in your example:
select * from table1 where rownum=2
How many rows are there going to be in the result set? Therefore, what rownum would be assigned to such a row? Can you see now why no result is actually returned?
In general, you should avoid relying on rownum, or any features that imply an order to results. Try to think about working with the entire set of results.
With that being said, I believe the following would work:
select * from (select rownum as rn,table1.* from table1) as t where t.rn = 2
Because in that case, you're numbering the rows within the subquery.