ORDER BY in a Sql Server 2008 view - sql

we have a view in our database which has an ORDER BY in it.
Now, I realize views generally don't order, because different people may use it for different things, and want it differently ordered. This view however is used for a VERY SPECIFIC use-case which demands a certain order. (It is team standings for a soccer league.)
The database is Sql Server 2008 Express, v.10.0.1763.0 on a Windows Server 2003 R2 box.
The view is defined as such:
CREATE VIEW season.CurrentStandingsOrdered
AS
SELECT TOP 100 PERCENT *, season.GetRanking(TEAMID) RANKING
FROM season.CurrentStandings
ORDER BY
GENDER, TEAMYEAR, CODE, POINTS DESC,
FORFEITS, GOALS_AGAINST, GOALS_FOR DESC,
DIFFERENTIAL, RANKING
It returns:
GENDER, TEAMYEAR, CODE, TEAMID, CLUB, NAME,
WINS, LOSSES, TIES, GOALS_FOR, GOALS_AGAINST,
DIFFERENTIAL, POINTS, FORFEITS, RANKING
Now, when I run a SELECT against the view, it orders the results by GENDER, TEAMYEAR, CODE, TEAMID. Notice that it is ordering by TEAMID instead of POINTS as the order by clause specifies.
However, if I copy the SQL statement and run it exactly as is in a new query window, it orders correctly as specified by the ORDER BY clause.

The order of rows returned by a view with an ORDER BY clause is never guaranteed. If you need a specific row order, you must specify where you select from the view.
See this the note at the top of this Book On-Line entry.

SQL Server 2005 ignores TOP 100 PERCENT by design.
Try TOP 2000000000 instead.
Now, I'll try and find a reference... I was at a seminar presented by Itzak Ben-Gan who mentioned it
Found some...
Kimberly L. Tripp
"TOP 100 Percent ORDER BY Considered Harmful"
In this particular case, the optimizer
recognizes that TOP 100 PERCENT
qualifies all rows and does not need
to be computed at all.

Just use :
"Top (99) Percent "
or
"Top (a number 1000s times more than your data rows like 24682468123)"
it works! just try it.

In SQL server 2008, ORDER BY is ignored in views that use TOP 100 PERCENT. In prior versions of SQL server, ORDER BY was only allowed if TOP 100 PERCENT was used, but a perfect order was never guaranteed. However, many assumed a perfect order was guaranteed. I infer that Microsoft does not want to mislead programmers and DBAs into believing there is a guaranteed order using this technique.
An excellent comparative demonstration of this inaccuracy, can be found here...
http://blog.sqlauthority.com/2009/11/24/sql-server-interesting-observation-top-100-percent-and-order-by
Oops, I just noticed that this was already answered. But checking out the comparative demonstration is worth a look anyway.

Microsoft has fixed this. You have patch your sql server
http://support.microsoft.com/kb/926292

I found an alternative solution.
My initial plan was to create a 'sort_order' column that would prevent users from having to perform a complex sort.
I used a windowed function ROW_NUMBER. In the ORDER BY clause, I specified the default sort order that I needed (just as it would have been in the ORDER BY of a SELECT statement).
I get several positive outcomes:
By default, the data is getting returned in the default sort order I originally intended (this is probably due to the windowed function having to sort the data prior to assigning the sort_order value)
Other users can sort the data in alternative ways if they choose to
The sort_order column is there for a very specific sort need, making it easier for users to sort the data should whatever tool they use rearranges the rowset.
Note: In my specific application, users are accessing the view via Excel 2010, and by default the data is presented to the user as I had hoped without further sorting needed.
Hope this helps those with a similar problem.
Cheers,
Ryan

run a profiler trace on your database and see the query that's actually being run when you query your view.
You also might want to consider using a stored procedure to return the data from your view, ordered correctly for your specific use case.

Related

Microsoft Access Count unique values per id

I have an access database that has an id referring to a customer who has trucks of different sizes. currently the table looks something like this:
id.....tire size
1......30
1......30
1......31
1......31
2......32
What I want to achieve is something like this:
id.....30.....31.....32
1......2......2......0
2......0......0......0
where it counts the number of occurrences of a specific tire size and inputs it into the respective tire size column.
In order to display the data as you have written it, you will need to do a crosstab query. The code below should achieve what you want
TRANSFORM Nz(Count([YourTable].[Tire Size]),0) AS [CountOfTire Size]
SELECT [YourTable].[ID]
FROM [YourTable]
GROUP BY [YourTable].[ID]
PIVOT [YourTable].[Tire Size];
The first step would be a query like:
select tire_size, COUNT(id) from mytable
GROUP BY tire_size
(I put the "special magic" parts of that query in UPPER CASE for emphasis.)
In the MS-Access query-builder, grouping features are accessed by clicking a button that looks vaguely like an "E" (actually, a Greek "epsilon" character), if I recall correctly. This adds a new "grouping" row to the query-builder grid.
This will produce (as you will quickly see) a row-by-row result with tire-size and the count of id's for that tire-size.
Many other variations of this are possible. Read the MS-Access on-line help which discusses this feature: they did a very good job with it.
The essential idea is the GROUP BY clause: this says that each distinct value of tire_size forms a "group." (Yes, you can GROUP BY more than one column, in which each unique combination of values forms one group.) Then, you specify so-called "domain aggregate functions, such as COUNT(), AVG(), SUM(), to produce summary statistics for each group.
Every GROUP BY column must appear in the SELECT clause, and every other item that appears there must be a domain aggregate function. (Which, if you think about it, makes perfect sense ...)
(Fortunately, MS-Access's query builder does a good job of "hiding" all that. You can build a grouping-query interactively, thanks to that "epsilon" button. But it's useful then to look at the "SQL View" to see what it did in SQL terms.)
Use the 'GROUP BY' aggregator
You'll need something like this:
SELECT
tyre_size,
count(id)
FROM tablename
GROUP BY
tire_size

Why is there no `select last` or `select bottom` in SQL Server like there is `select top`?

I know this might probably sound like a stupid question, but please bear with me.
In SQL-server we have
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by default), cool. If we want records to be sorted on any other column, we just specify that in the order by clause, something like this...
SELECT TOP N ... ORDER BY [ColumnName]
Even more cool. But what if I want the last row? I just write something like this...
SELECT TOP N ... ORDER BY [ColumnName] DESC
But there is a slight concern with that. I said concern and not issue because it isn't actually an issue. By this way, I could get the last row based on that column, but what if I want the last row that was inserted. I know about SCOPE_IDENTITY, IDENT_CURRENT and ##IDENTITY, but consider a heap (a table without a clustered index) without any identity column, and multiple accesses from many places (please don't go into this too much as to how and when these multiple operation are happening, this doesn't concern the main thing). So in this case there doesn't seems to be an easy way to find which row was actually inserted last. Some might answer this as
If you do a select * from [table] the last row shown in the sql result window will be the last one inserted.
To anything thinking about this, this is not actually the case, at least not always and one that you can always rely on (msdn, please read the Advanced Scanning section).
So the question boils down to this, as in the title itself. Why doesn't SQL Server have a
SELECT LAST
or say
SELECT BOTTOM
or something like that, where we don't have to specify the Order By and then it would give the last record inserted in the table at the time of executing the query (again I am not going into details about how would this result in case of uncommitted reads or phantom reads).
But if still, someone argues that we can't talk about this without talking about these read levels, then, for them, we could make it behave as the same way as TOP work but just the opposite. But if your argument is then we don't need it as we can always do
SELECT TOP N ... ORDER BY [ColumnName] DESC
then I really don't know what to say here. I know we can do that, but are there any relation based reason, or some semantics based reason, or some other reason due to which we don't have or can't have this SELECT LAST/BOTTOM. I am not looking for way to does Order By, I am looking for reason as to why do don't have it or can't have it.
Extra
I don't know much about how NOSQL works, but I've worked (just a little bit) with mongodb and elastic search, and there too doesn't seems to be anything like this. Is the reason they don't have it is because no one ever had it before, or is it for some reason not plausible?
UPDATE
I don't need to know that I need to specify order by descending or not. Please read the question and understand my concern before answering or commenting. I know how will I get the last row. That's not even the question, the main question boils down to why no select last/bottom like it's counterpart.
UPDATE 2
After the answers from Vladimir and Pieter, I just wanted to update that I know the the order is not guaranteed if I do a SELECT TOP without ORDER BY. I know from what I wrote earlier in the question might make an impression that I don't know that's the case, but if you just look a further down, I have given a link to msdn and have mentioned that the SELECT TOP without ORDER BY doesn't guarantees any ordering. So please don't add this to your answer that my statement in wrong, as I have already clarified that myself after a couple of lines (where I provided the link to msdn).
You can think of it like this.
SELECT TOP N without ORDER BY returns some N rows, neither first, nor last, just some. Which rows it returns is not defined. You can run the same statement 10 times and get 10 different sets of rows each time.
So, if the server had a syntax SELECT LAST N, then result of this statement without ORDER BY would again be undefined, which is exactly what you get with existing SELECT TOP N without ORDER BY.
You have stressed in your question that you know and understand what I've written below, but I'll still keep it to make it clear for everyone reading this later.
Your first phrase in the question
In SQL-server we have SELECT TOP N ... now in that we can get the
first n rows in ascending order (by default), cool.
is not correct. With SELECT TOP N without ORDER BY you get N "random" rows. Well, not really random, the server doesn't jump randomly from row to row on purpose. It chooses some deterministic way to scan through the table, but there could be many different ways to scan the table and server is free to change the chosen path when it wants. This is what is meant by "undefined".
The server doesn't track the order in which rows were inserted into the table, so again your assumption that results of SELECT TOP N without ORDER BY are determined by the order in which rows were inserted in the table is not correct.
So, the answer to your final question
why no select last/bottom like it's counterpart.
is:
without ORDER BY results of SELECT LAST N would be exactly the same as results of SELECT TOP N - undefined.
with ORDER BY result of SELECT LAST N ... ORDER BY X ASC is exactly the same as result of SELECT TOP N ... ORDER BY X DESC.
So, there is no point to have two key words that do the same thing.
There is a good point in the Pieter's answer: the word TOP is somewhat misleading. It really means LIMIT result set to some number of rows.
By the way, since SQL Server 2012 they added support for ANSI standard OFFSET:
OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
[
FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
]
Here adding another key word was justified that it is ANSI standard AND it adds important functionality - pagination, which didn't exist before.
I would like to thank #Razort4x here for providing a very good link to MSDN in his question. The "Advanced Scanning" section there has an excellent example of mechanism called "merry-go-round scanning", which demonstrates why the order of the results returned from a SELECT statement cannot be guaranteed without an ORDER BY clause.
This concept is often misunderstood and I've seen many question here on SO that would greatly benefit if they had a quote from that link.
The answer to your question
Why doesn't SQL Server have a SELECT LAST or say SELECT BOTTOM or
something like that, where we don't have to specify the ORDER BY and
then it would give the last record inserted in the table at the time
of executing the query (again I am not going into details about how
would this result in case of uncommitted reads or phantom reads).
is:
The devil is in the details that you want to omit. To know which record was the "last inserted in the table at the time of executing the query" (and to know this in a somewhat consistent/non-random manner) the server would need to keep track of this information somehow. Even if it is possible in all scenarios of multiple simultaneously running transactions, it is most likely costly from the performance point of view. Not every SELECT would request this information (in fact very few or none at all), but the overhead of tracking this information would always be there.
So, you can think of it like this: by default the server doesn't do anything specific to know/keep track of the order in which the rows were inserted, because it affects performance, but if you need to know that you can use, for example, IDENTITY column. Microsoft could have designed the server engine in such a way that it required an IDENTITY column in every table, but they made it optional, which is good in my opinion. I know better than the server which of my tables need IDENTITY column and which do not.
Summary
I'd like to summarise that you can look at SELECT LAST without ORDER BY in two different ways.
1) When you expect SELECT LAST to behave in line with existing SELECT TOP. In this case result is undefined for both LAST and TOP, i.e. result is effectively the same. In this case it boils down to (not) having another keyword. Language developers (T-SQL language in this case) are always reluctant to add keywords, unless there are good reasons for it. In this case it is clearly avoidable.
2) When you expect SELECT LAST to behave as SELECT LAST INSERTED ROW. Which should, by the way, extend the same expectations to SELECT TOP to behave as SELECT FIRST INSERTED ROW or add new keywords LAST_INSERTED, FIRST_INSERTED to keep existing keyword TOP intact. In this case it boils down to the performance and added overhead of such behaviour. At the moment the server allows you to avoid this performance penalty if you don't need this information. If you do need it IDENTITY is a pretty good solution if you use it carefully.
There is no select last because there is no need for it. Consider a "select top 1 * from table" . Top 1 would get you the first row that is returned. And then the process stops.
But there is no guarantees about ordering if you don't specify an order by. So it may as well be any row in the dataset you get back.
Now do a "select last 1 * from table". Now the database will have to process all the rows in order to get you the last one.
And because ordering is non-deterministic, it may as well be the same result as from the select "top 1".
See now where the problem comes? Without an order by top and last are actually the same, only "last" will take more time. And with an order by, there's really only a need for top.
SELECT TOP N ...
now in that we can get the first n rows in ascending order (by
default), cool. If we want records to be sorted on any other column,
we just specify that in the order by clause, something like this...
What you say here is totally wrong and absolutely NOT how it works. There is no guarantee on what order you get. Ascending order on what ?
create table mytest(id int, id2 int)
insert into mytest(id,id2)values(1,5),(2,4),(3,3),(4,2),(5,1)
select top 1 * from mytest
select * from mytest
create clustered index myindex on mytest(id2)
select top 1 * from mytest
select * from mytest
insert into mytest(id,id2)values(6,0)
select top 1 * from mytest
Try this code line by line and see what you get with the last "select top 1".....you get in this case the last inserted record.
update
I think you understand that "select top 1 * from table" basically means: "Select a random row from the table".
So what would last mean? "Select the last random row from the table?" Wouldn't the last random row from a table be conceptually the same as saying any 1 random row from the table? And if that's true, top and last are the same, so there is no need for last.
Update 2
In hindsight I was happier with the syntax mysql uses : LIMIT.
Top doesn't say anything about ordering it is only there to specify the number of rows to be returned.
Limits the rows returned in a query result set to a specified number of rows or percentage of rows in SQL Server 2014.
The reasons why SELECT LAST_INSERTED does not make sense.
It cannot be easily applied to non-heap tables.
Heap data can be freely moved by DBMS so those "natural" order is subject to change. To keep it the system needs some additional mechanism which seems to be a useless waste.
If really desired it can be simulated by adding some 'auto-increment' column.
SQL Server ordering is arbitrary unless otherwise stated. It's set based, therefore you must define what your set is. Correct SCOPE_IDENTITY() is the correct way to capture the last inserted record, or the OUTPUT clause. Why would you do inserts on a heap that you need to reference chronologically anyway?? That's super bad database design.

When no 'Order by' is specified, what order does a query choose for your record set?

I was always of the impression that a query with no specified 'Order by' rule, would order this by the results by what was specified within your where clause.
For instance, my where clause states:
WHERE RESULTS_I_AM_SEARCHING_FOR IN
ITEM 1
ITEM 2
ITEM 3
I would have imagined that the results returned for items 1, 2 and 3 would be in the order specified in the where, however this is not the case. Does anyone know what order it sorts them in when not specified?
Thanks and sorry for the really basic question!
Damon
If you don't specify an ORDER BY, then there is NO ORDER defined.
The results can be returned in an arbitrary order - and that might change over time, too.
There is no "natural order" or anything like that in a relational database (at least in all that I know of). The only way to get a reliable ordering is by explicitly specifying an ORDER BY clause.
Update: for those who still don't believe me - here's two excellent blog posts that illustrate this point (with code samples!) :
Conor Cunningham (Architect on the Core SQL Server Engine team): No Seatbelt - Expecting Order without ORDER BY
Alexander Kuznetsov: Without ORDER BY, there is no default sort order (post in the Web Archive)
With SQL Server, if no ORDER BY is specified, the results are returned in the quickest way possible.
Therefore without an ORDER BY, make no assumptions about the order.
As it was already said you should never rely on the "default order" because it doesn't exist. Anyway if you still want to know some curious details about sql server implementation you can check this out:
http://exacthelp.blogspot.co.uk/2012/10/default-order-of-select-statement-in.html

Organize SQL Server database

I have a licensing database set up for storing my cutomers' records. However, when I need to find someone, it is hard since it is not in alphabetical order.. And I cannot find an option to sort them in Visual Studio's Server Explorer.
Here is a picture, notice the first name letters I did not cut off, they are not in order: http://img822.imageshack.us/img822/4946/captureeg.png
So how do I fix this problem? Is there some secret button in VS I have to discover?
If using a T-SQL statement, you can rewrite the SQL with an ending of
ORDER BY Name DESC
this will allow it to be alphabetical in descending order and ten it will be easier or when searching add a search clause
WHERE Name = 'Earl Smith'
if you do comment with more specific in how you are getting the table would be helpful as well.
full Query and of course update customer_records to your table name:
SELECT * FROM customer_records ORDER BY Name DESC;
To be exact - this is by SQL standard. No set has an order UNLESS YOU IMPOSE ONE. Which means a ORDER BY part in a SELECT statement. If you dont do that, the return value is technically random and at the discretion of the database server which will come up in them in an order that is as fast as possible to compute.

Beginner SQL section: avoiding repeated expression

I'm entirely new at SQL, but let's say that on the StackExchange Data Explorer, I just want to list the top 15 users by reputation, and I wrote something like this:
SELECT TOP 15
DisplayName, Id, Reputation, Reputation/1000 As RepInK
FROM
Users
WHERE
RepInK > 10
ORDER BY Reputation DESC
Currently this gives an Error: Invalid column name 'RepInK', which makes sense, I think, because RepInK is not a column in Users. I can easily fix this by saying WHERE Reputation/1000 > 10, essentially repeating the formula.
So the questions are:
Can I actually use the RepInK "column" in the WHERE clause?
Do I perhaps need to create a virtual table/view with this column, and then do a SELECT/WHERE query on it?
Can I name an expression, e.g. Reputation/1000, so I only have to repeat the names in a few places instead of the formula?
What do you call this? A substitution macro? A function? A stored procedure?
Is there an SQL quicksheet, glossary of terms, language specification, anything I can use to quickly pick up the syntax and semantics of the language?
I understand that there are different "flavors"?
Can I actually use the RepInK "column" in the WHERE clause?
No, but you can rest assured that your database will evaluate (Reputation / 1000) once, even if you use it both in the SELECT fields and within the WHERE clause.
Do I perhaps need to create a virtual table/view with this column, and then do a SELECT/WHERE query on it?
Yes, a view is one option to simplify complex queries.
Can I name an expression, e.g. Reputation/1000, so I only have to repeat the names in a few places instead of the formula?
You could create a user defined function which you can call something like convertToK, which would receive the rep value as an argument and returns that argument divided by 1000. However it is often not practical for a trivial case like the one in your example.
Is there an SQL quicksheet, glossary of terms, language specification, anything I can use to quickly pick up the syntax and semantics of the language?
I suggest practice. You may want to start following the mysql tag on Stack Overflow, where many beginner questions are asked every day. Download MySQL, and when you think there's a question within your reach, try to go for the solution. I think this will help you pick up speed, as well as awareness of the languages features. There's no need to post the answer at first, because there are some pretty fast guns on the topic over here, but with some practice I'm sure you'll be able to bring home some points :)
I understand that there are different "flavors"?
The flavors are actually extensions to ANSI SQL. Database vendors usually augment the SQL language with extensions such as Transact-SQL and PL/SQL.
You could simply re-write the WHERE clause
where reputation > 10000
This won't always be convenient. As an alternativly, you can use an inline view:
SELECT
a.DisplayName, a.Id, a.Reputation, a.RepInK
FROM
(
SELECT TOP 15
DisplayName, Id, Reputation, Reputation/1000 As RepInK
FROM
Users
ORDER BY Reputation DESC
) a
WHERE
a.RepInK > 10
Regarding something like named expressions, while there are several possible alternatives, the query optimizer is going to do best just writing out the formula Reputation / 1000 long-hand. If you really need to run a whole group of queries using the same evaluated value, your best bet is to create view with the field defined, but you wouldn't want to do that for a one-off query.
As an alternative, (and in cases where performance is not much of an issue), you could try something like:
SELECT TOP 15
DisplayName, Id, Reputation, RepInk
FROM (
SELECT DisplayName, Id, Reputation, Reputation / 1000 as RepInk
FROM Users
) AS table
WHERE table.RepInk > 10
ORDER BY Reputation DESC
though I don't believe that's supported by all SQL dialects and, again, the optimizer is likely to do a much worse job which this kind of thing (since it will run the SELECT against the full Users table and then filter that result). Still, for some situations this sort of query is appropriate (there's a name for this... I'm drawing a blank at the moment).
Personally, when I started out with SQL, I found the W3 schools reference to be my constant stopping-off point. It fits my style for being something I can glance at to find a quick answer and move on. Eventually, however, to really take advantage of the database it is necessary to delve into the vendors documentation.
Although SQL is "standarized", unfortunately (though, to some extent, fortunately), each database vendor implements their own version with their own extensions, which can lead to quite different syntax being the most appropriate (for a discussion of the incompatibilities of various databases on one issue see the SQLite documentation on NULL handling. In particular, standard functions, e.g., for handling DATEs and TIMEs tend to differ per vendor, and there are other, more drastic differences (particularly in not support subselects or properly handling JOINs). If you care for some of the details, this document provides both the standard forms and deviations for several major databases.
You CAN refer to RepInK in the Order By clause, but in the Where clause you must repeat the expression. But, as others have said, it will only be executed once.
There are good answers for the technical problem already, so I'll only address some of the rest of your questions.
If you're just working with the DataExplorer, you'll want to familiarize yourself with SQL Server syntax since that's what it's running. The best place to find that, of course, is MSDN's reference.
Yes, there are different variations in SQL syntax. For example, the TOP clause in the query you gave is SQL Server specific; in MySQL you'd use the LIMIT clause instead (and these keywords don't necessarily appear in the same spot in the query!).