Select Distinct without sorting - sql

I used a Select Distinct query, which resulted me a sorted data. Is there anyway that i dont get data sorted?

I'll try to elaborate a bit as to what's going on and why... though I agree with #vic's comment to the question...
Without explicitly stating an order (via an order by clause) there is absolutely no guarantee of any order in the result set.
Practically speaking, many queries will return a consistent order based on the query plan and how the data is actually stored and accessed... DO NOT RELY ON THIS!
Specifically, for a distinct query, the sql engine will sort the data so that it can be sure to remove any duplicates.
In short, if the order of the result set matters (even if the desired order is "random") you must ALWAYS explicitly state it. That said, from a purely set-based-math/sql standpoint, the order of the result shouldn't matter.

Put this at the end of your query. This will effectively randomize the results which then will appear to you non-sorted ;)
ORDER BY Rnd([ID]);
Replace the ID with primary key of the table. In Access SQL it is possible to call certain VB Functions directly. In this case the Rnd function can be called in a query and fed a seed value from the data being sorted.

I think sorting may have something to do with the way DISTINCT is determined.
The easiest way to return distinct values is to sort the selection set
returned by processing the SQL predicate and then
returning only the rows where the DISTINCT columns change value from the prior row.
In short,
DISTINCT requires a sort to be performed where duplicate rows are dropped.
That said, there is no guarantee that rows are returned to you in any particular
order unless you explicitly include an ORDER BY clause.

Related

Does ms access group by order the results?

I need to run a query that groups the result and orders it. When I used the following query I noticed that the results were ordered by the field name:
SELECT name, count(name)
FROM contacts
GROUP BY name
HAVING count(name)>1
Originally I planed on using the following query:
SELECT name, count(name)
FROM contacts
GROUP BY name
HAVING count(name)>1
ORDER BY name
I'm worried that order by significantly slows the running time.
Can I depend on ms-access to always order by the field I am grouping by, and eliminate the order by?
EDIT: I tried grouping different fields in other tables and it was always ordered by the grouped field.
I have found answers to this question to other SQL DBMSs, but not access.
How GROUP BY and ORDER BY work in general
Databases usually choose between sorting and hashing when creating groups for GROUP BY or DISTINCT operations. If they do choose sorting, you might get lucky and the sorting is stable between the application of GROUP BY and the actual result set consumption. But at some later point, this may break as the database might suddenly prefer an order-less hashing algorithm to produce groups.
In no database, you should ever rely on any implicit ordering behaviour. You should always use explicit ORDER BY. If the database is sophisticated enough, adding an explicit ORDER BY clause will hint that sorting is more optimal for the grouping operation as well, as the sorting can then be re-used in the query execution pipeline.
How this translates to your observation
I tried grouping different fields in other tables and it was always ordered by the grouped field.
Have you exhaustively tried all possible queries that could ever be expressed? I.e. have you tried:
JOIN
OUTER JOIN
semi-JOIN (using EXISTS or IN)
anti-JOIN (using NOT EXISTS or NOT IN)
filtering
grouping by many many columns
DISTINCT + GROUP BY (this will certainly break your ordering)
UNION or UNION ALL (which defeats this argument anyway)
I bet you haven't. And even if you tried all of the above, can you be sure there isn't a very peculiar configuration where the above breaks, just because you've observed the behaviour in some (many) experiments?
You cannot.
MS Access specific behaviour
As far as MS Access is concerned, consider the documentation on ORDER BY
Remarks
ORDER BY is optional. However, if you want your data displayed in sorted order, then you must use ORDER BY.
Notice the wording. "You must use ORDER BY". So, MS Acces is no different from other databases.
The answer
So your question about performance is going in the wrong direction. You cannot sacrifice correctness for performance in this case. Better tackle performance by using indexes.
Here is the MSDN documentation for the GROUP BY clause in Access SQL:
https://msdn.microsoft.com/en-us/library/bb177905(v=office.12).aspx
The page makes no reference to any implied or automatic ordering of results - if you do see desired ordering without an explicit ORDER BY then it is entirely coincidental.
The only way to guarantee the particular ordering of results in SQL is with ORDER BY.
There is a slight performance problem with using ORDER BY (in general) in that it requires the DBMS to get all of the results first before it outputs the first row of results (though the DBMS is free to use an "online sort" algorithm that sorts data as it gets each row from its backing store, it still needs to get the last row from the backing store before it can return the first row to the client (in case the last row from the backing-store happens to be the 1st result according to the ORDER BY) - however unless you're querying tens of thousands of rows in a latency-sensitive application this is not a problem - and as you're using Access already it's very clear that this is not a performance-sensitive application.

Oracle SQL returns rows in arbitrary fashion when no "order by" clause is used

Maybe someone can explain this to me, but when querying a data table from Oracle, where multiple records exist for a key (say a customer ID), the record that appears first for that customer can vary if there is no implicit "order by" statement enforcing the order by say an alternate field such as a transaction type. So running the same query on the same table could yield a different record ordering than from 10 minutes ago.
E.g., one run could yield:
Cust_ID, Transaction_Type
123 A
123 B
Unless an "order by Transaction_Type" clause is used, Oracle could arbitrarily return the following result the next time the query is run:
Cust_ID, Transaction_Type
123 B
123 A
I guess I was under the impression that there was a database default ordering of rows in Oracle which (perhaps) reflected the physical ordering on the disk medium. In other words, an arbitrary order that is immutable and would guarantee the same result when a query is rerun.
Does this have to do with the optimizer and how it decides where to most efficiently retrieve the data?
Of course the best practice from a programming perspective is to force whatever ordering is required, I was just a little unsettled by this behavior.
The order of rows returned to the application from a SELECT statement is COMPLETELY ARBITRARY unless otherwise specified. If you want, need, or expect rows to return in a certain order, it is the user's responsibility to specify such an order.
(Caveat: Some versions of Oracle would implicitly sort data in ascending order if certain operations were used, such as DISTINCT, UNION, MINUS, INTERSECT, or GROUP BY. However, as Oracle has implemented hash sorting, the nature of the sort of the data can vary, and lots of SQL relying on that feature broke.)
There is no default ordering, ever. If you don't specify ORDER BY, you can get the same result the first 10000 times, then it can change.
Note that this is also true even with ORDER BY for equal values. For example:
Col1 Col2
1 1
2 1
3 2
4 2
If you use ORDER BY Col2, you still don't know if row 1 or 2 will come first.
Just image the rows in a table like balls in a basket. Do the balls have an order?
I dont't think there is any DBMS that guarantees an order if ORDER BY is not specified.
Some might always return the rows in the order they were inserted, but that is an implementation side effect.
Some execution plans might cause the result set to be ordered even without an ORDER BY, but again this is an implementation side-effect that you should not rely on.
If an ORDER BY clause is not present the database (not just Oracle - any relational database) is free to return rows in whatever order it happens to find them. This will vary depending on the query plan chosen by the optimizer.
If the order in which the rows are returned matters you must use an ORDER BY clause. You may sometimes get lucky and the rows will come back in the order you want them to be even without an ORDER BY, but there is no guarantee that A) you will get lucky on other queries, and B) the order in which the rows are returned tomorrow will be the same as the order in which they're returned today.
In addition, updates to the database product may change the behavior of queries. We had to scramble a bit when doing a major version upgrade last year when we found that Oracle 10 returned GROUP BY results in a different order than did Oracle 9. Reason - no ORDER BY clause.
ORDER BY - when the order of the returned data really matters.
The simple answer is that the SQL standard says that there is no default order for queries that do not have an ORDER BY statement, so you should never assume one.
The real reason would probably relate to the hashes assigned to each row as it is pulled into the record set. There is no reason to assume consistent hashing.
if you don't use ORDER BY, the order is arbitrary; however, dependent on phisical storage and memory aspects.
so, if you repeat the same query hundreds of times in 10 minutes, you will get almost the same order everytime, because probably nothing changes.
Things that could change the "noorder order" are:
the executing plan - if is changed(you have pointed
that)
inserts and deletes on the tables involved in the query.
other things like presence in memory of the rows.(other querys on other tables could influence that)
When you get into parallel data retrieval I/O isn't it possible to get different sequences on different runs, even with no change to the stored data?
That is, in a multiprocessing environment the order of completion of parallel threads is undefined and can vary with what else is happening on the same shared processor.
As I'm new to Oracle database engine, I noticed this behavior in my SELECT statements that has no ORDER BY.
I've been using Microsoft SQL Server for years now. SQL Server Engine always will retrieve data ordered by the table's "Clustered Index" which is basically the Primary Key Index. SQL Server will always insert new data in a sequential order based on the clustered index.
So when you perform a select on a table without order by in SQL Server, it will always retrieve data ordered by primary key value.
ORDER BY can cause serious performance overhead, that's why you do not want to use it unless you are not happy with inconsistent results order.
I ended up with a conclusion that in ALL my Oracle queries I must use ORDER BY or I will end up with unpredicted order which will greatly effect my end-user reports.

How do database servers decide which order to return rows without any "order by" statements?

Kind of a whimsical question, always something I've wondered about and I figure knowing why it does what it does might deepen my understanding a bit.
Let's say I do "SELECT TOP 10 * FROM TableName". In short timeframes, the same 10 rows come back, so it doesn't seem random. They weren't the first or last created. In my massive sample size of...one table, it isn't returning the min or max auto-incrementing primary key value.
I also figure the problem gets more complex when taking joins into account.
My database of choice is MSSQL, but I figure this might be an interesting question regardless of the platform.
If you do not supply an ORDER BY clause on a SELECT statement you will get rows back in arbitrary order.
The actual order is undefined, and depends on which blocks/records are already cached in memory, what order I/O is performed in, when threads in the database server are scheduled to run, and so on.
There's no rhyme or reason to the order and you should never base any expectations on what order rows will be in unless you supply an ORDER BY.
If they're not ordered by the calling query, I believe they're just returned in the order they were read off disk. This may vary because of the types of joins used or the indexes that looked up the values.
You can see this if the table has a clustered index on it (and you're just selecting - a JOIN can re-order things) - a SELECT will return the rows in clustered-index-order, even without an ORDER BY clause.
There is a very detailed explanation with examples here: http://sqlserverpedia.com/blog/sql-server-bloggers/its-the-natural-order-of-things-not/
"How do database servers decide which order to return rows without any “order by” statements?"
They simply do not take any "decision" with respect to ordering. They see the user doesn't care about ordering, and so they don't care either. And thus they simply go out to find the requested rows. The order in which they find them is normally the order in which you get them. That order depends on user-unpredictable things like the chosen physical access paths, ordering of physical records inside the database's physical files, etc. etc.
Don't let yourself be misled by the ordering as you get it, in the case where you didn't explicitly specify an ordering in your query. If you don't specify an ordering in your query, no ordering in the result set is guaranteed, even if in practice results seem to suggest that some ordering appears to be adhered to by the server.

what's order when select data from database?

Suppose I have a table:
CREATE TABLE [tab] (
[name] varchar,
[order_by] int
)
There are 10 rows in the table, and all rows have same value for order_by (Let's say it's 0)
If I then issue following SQL:
select * from [tab] order by [order_by]
What's the order of the rows? What factor decides the row order in this case?
It's not defined. The database can spit them out in any order it chooses, and it can even change the order between queries if it feels like it (it probably won't do this, but you shouldn't rely on the order being consistent).
If your columns that you order by has no variation than there is no guaranteed order.
Any time you want a defined order, you need a good order by clause. I can't even imagine why anyone would use an orderby clause if there is no variation in the column being ordered or why you would even have a column that never has but one value.
There is no order in this case, since you did not specify an order.
My experience in real life is that when you don't specify any order (or specify one that doesn't actually result in sorting, as in this case) rows generally come out in the order they were added to the table. However, that is in no way guaranteed and I would never rely on it.
Generally speaking you can't depend on the order of records coming out of a table unless you specify an order by clause, and any records with the sames value(s) for the fields in an order by clause will not be sorted.
That being said, there are ways to make an educated guess as to the order of the records that will come out. Usually they will be emitted i the order of the table's clustered index. This is usually the primary key but not always. If there is no clustered index, then it will usually be insert order. Note that you can't depend on either of these things. SQL Server might be doing some optimizations that will change the order.
Typically your table has an identity column with a PKey. If that's the case then that would be the order in SQL Server 2008. Unfortunately, I've experienced older versions of SQL Server tending to give inconsistent results depending on whether you're connecting via OLEDB or ODBC.
If 'name' was a primary key, then the index would have a specified order (either ASC or DESC). And that's the order that I think you would see in this case. At least that's the behavior I've observed in SQL 2008.
If 'name' had no index then I don't believe the order would be predictable at all.
EDIT:
So even in the situation I described it looks like the order will not necessarily be reliable. There's a better explanation here: SQL best practice to deal with default sort order
I suppose the moral of the story is to specify an order if the order is important to you.
The 'natural' order of rows is the order in which the CLUSTERED index says they are in, and that is the order that rows are generally returned in if you don't specify an order. However, enterprise edition merry-go-round scans mean that you won't always get them in that order, and as a few people have said, you should never rely on that.
If you specify order, and the key you are ordering on is equal for a bunch of rows, then order is not guaranteed at all.

Does SELECT DISTINCT imply a sort of the results

Does including DISTINCT in a SELECT query imply that the resulting set should be sorted?
I don't think it does, but I'm looking for a an authoritative answer (web link).
I've got a query like this:
Select Distinct foo
From Bar
In oracle, the results are distinct but are not in sorted order. In Jet/MS-Access there seems to be some extra work being done to ensure that the results are sort. I'm assuming that oracle is following the spec in this case and MS Access is going beyond.
Also, is there a way I can give the table a hint that it should be sorting on foo (unless otherwise specified)?
From the SQL92 specification:
If DISTINCT is specified, then let TXA be the result of eliminating redundant duplicate values from TX. Otherwise, let TXA be TX.
...
4) If an is not specified, then the ordering of the rows of Q is implementation-dependent.
Ultimately the real answer is that DISTINCT and ORDER BY are two separate parts of the SQL statement; If you don't have an ORDER BY clause, the results by definition will not be specifically ordered.
No. There are a number of circumstances in which a DISTINCT in Oracle does not imply a sort, the most important of which is the hashing algorithm used in 10g+ for both group by and distinct operations.
Always specify ORDER BY if you want an ordered result set, even in 9i and below.
There is no "authoritative" answer link, since this is something that no SQL server guarantees.
You will often see results in order when using distinct as a side effect of the best methods of finding those results. However, any number of other things can mix up the results, and some server may hand back results in such a way as to not give them sorted even if it had to sort to get the results.
Bottom line: if your server doesn't guarantee something you shouldn't count on it.
Not to my knowledge, no. The only reason I can think of is that SQL Server would internally sort the data in order to detect and filter out duplicates, and thus return it in a "pre-sorted" manner. But I wouldn't rely on that "side effect" :-)
No, it is not implying a sort. In my experience, it sorts by the known index, which may happen to be foo.
Why be subtle? Why not specific Select Distinct foo from Bar Order by foo?
On at least one server I've used (probably either Oracle or SQL Server, about six years ago), SELECT DISTINCT was rejected if you didn't have an ORDER BY clause. It was accepted on the "other" server (Oracle or SQL Server). Your mileage may vary.
No, the results are not sorted. If you want to give it a 'hint', you can certainly supply an ORDER BY:
select distinct foo
from bar
order by foo
But keep in mind that you might want to sort on more than just alphabetically. Instead you might want to sort on criteria on other fields. See:
http://weblogs.sqlteam.com/jeffs/archive/2007/12/13/select-distinct-order-by-error.aspx
As the answers mostly say, DISTINCT does not mandate a sort - only ORDER BY mandates that. However, one standard way of achieving DISTINCT results is to sort; the other is to hash the values (which tends to lead to semi-random sequencing). Relying on the sort effect of DISTINCT would be foolish.
In my case (SQL server), as an example I had a list of countries with a numerical value X assigned against each. When I did a select distinct * from Table order by X, it ordered it by X but at the same time result set countries were also ordered which was not directly implemented.
From my experience, I'll say that distinct does imply an implicit sort.
Yes. Oracle does use a sort do calculate a distinct. You can see that if you look at the explain plan. The fact that it did a sort for that calculation does not in any way imply
that the result set will be sorted. If you want the result set sorted, you are required to use the ORDER BY clause.