Mssql query sequence is changing every time - sql

For Erp version upgrade we copied our database and restored to new one. We altered tables. (Added new columns or changed Width etc. We dropped indexes and recreated). After all we tested new database, caught a situation like this.
we have a query like
select top 5 * from table.
old database: when we run the query continuously, result order not changing
new database: when we run, order sequence of results changing.
I think there is a clustered index issue but how can I determine and solve it I don't know. Any help?

order is not guaranteed unless you use order by.
Old database may have clustered index created on table. Please check it. But even clustered index does not always guarantee ordering.

If you don't use the ORDER BY statement after the TOP statement the database engine does not guarantees the order of the resultset.
SQL Server wants to return qualified rows quickly in some cases and this mechanism called Row Goals and most probably your query execution plan uses this mechanism. On the other hand, the clustered index operator includes an ordered attribute and this shows us the retrieved rows are fetched in the ordered manner or not.

Related

In SQL, does the LIMIT returns the row which is inserted the last in chronological order?

Suppose, if following rows are inserted in chronological order into a table:
row1, row2, row3, row4, ..., row1000, row1001.
After a while, we delete/remove the latest row1001.
As in this post: How to get Top 5 records in SqLite?
If the below command is run:
SELECT * FROM <table> LIMIT 1;
Will it assuredly provide the "row1000"?
If no, then is there any efficient way to get the latest row(s)
without traversing through all the rows? -- i.e. without using
combination of ORDER BY and DESC.
[Note: For now I am using "SQLite", but it will be interesting for me to know about SQL in general as well.]
You're misunderstanding how SQL works. You're thinking row-by-row which is wrong. SQL does not "traverse rows" as per your concern; it operates on data as "sets".
Others have pointed out that relational database cannot be assumed to have any particular ordering, so you must use ORDER BY to explicitly specify ordering.
However (not mentioned yet is that), in order to ensure it performs efficiently, you need to create an appropriate index.
Whether you have an index or not, the correct query is:
SELECT <cols>
FROM <table>
ORDER BY <sort-cols> [DESC] LIMIT <no-rows>
Note that if you don't have an index the database will load all data and probably sort in memory to find the TOP n.
If you do have the appropriate index, the database will use the best index available to retrieve the TOP n rows as efficiently as possible.
Note that the sqllite documentation is very clear on the matter. The section on ORDER BY explains that ordering is undefined. And nothing in the section on LIMIT contradicts this (it simply constrains the number of rows returned).
If a SELECT statement that returns more than one row does not have an ORDER BY clause, the order in which the rows are returned is undefined.
This behaviour is also consistent with the ANSI standard and all major SQL implementations. Note that any database vendor that guaranteed any kind of ordering would have to sacrifice performance to the detriment of queries trying to retrieve data but not caring about order. (Not good for business.)
As a side note, flawed assumptions about ordering is an easy mistake to make (similar to flawed assumptions about uninitialised local variables).
RDBMS implementations are very likely to make ordering appear consistent. They follow a certain algorithm for adding data, a certain algorithm for retrieving data. And as a result, their operations are highly repeatable (it's what we love (and hate) about computers). So things repeatably look the same.
Theoretical examples:
Inserting a row results in the row being added to the next available free space. So data appears sequential. But an update would have to move the row to a new location if it no longer fits.
The DB engine might retrieve data sequentially from clustered index pages and seem to use clustered index as the 'natural ordering' ... until one day a page-split puts one of the pages in a different location. * Or a new version of the DMBS might cache certain data for performance, and suddenly order changes.
Real-world example:
The MS SQL Server 6.5 implementation of GROUP BY had the side-effect of also sorting by the group-by columns. When MS (in version 7 or 2000) implemented some performance improvements, GROUP BY would by default, return data in a hashed order. Many people blamed MS for breaking their queries when in fact they had made false assumptions and failed to ORDER BY their results as needed.
This is why the only guarantee of a specific ordering is to use the ORDER BY clause.
No. Table records have no inherent order. So it is undefined which row(s) to get with a LIMIT clause without an ORDER BY.
SQLite in its current implemantation may return the latest inserted row, but even if that were the case you must not rely on it.
Give a table a datetime column or some sortkey, if record order is important for you.
In SQL, data is stored in tables unordered. What comes out first one day might not be the same the next.
ORDER BY, or some other specific selection criteria is required to guarantee the correct value.

SQL server 2005 index order behavior

Well, after how many hours of troubleshooting and making queries, I found out that the queries were not the source of error but the database table itself. The ordering of tables are not what I expected to happen, it should be ascending.
I added data on tblCarFranchise table, as expected, this will happen. The ID column is well sorted in ascending order.
Next, I deleted row 2 with the ID = 12.
Then added another data row.
This is where the problem lies, it is inserted between 11 and 13! I don't exactly know why this is happening. Again, I added another row to see if it still goes in between 11 and 13, and this is the thing that happened:
Can someone tell me how to fix this? I have already searched for solutions like "SQL Server index ordering", "SQL Server index ascending" but the search results give me a SQL queries instead. -_-
Not really an answer, because this isn't really a question; however, my response would be too long to fit in the comments section. As Gordon already pointed out, tables are inherently unordered. Any order that you are seeing in other tables is probably the result of the following circumstances:
There is a clusted index on an identity column.
That clustered index is "fresh"; in other words, no rows have been modified since the last build of that clustered index.
There are no non-clustered indexes that the optimizer decided would be more efficient to use than the clustered index in retrieving data.
If you want rows to be returned in a certain order, then use an ORDER BY clause. Otherwise, don't worry about it. Add indexes as appropriate, and maintain them, but let the optimizer choose the best way to work with your tables.

optimize query with column in where clause

I have an sql query which fetch the first N rows in a table which is designed as a low-level queue.
select top N * from my_table where status = 0 order by date asc
The intention behind this query is as follows:
First, this question is intended to be database agnostic, as my implementation will support sql server, oracle, DB2 and sybase. The sql syntax above of "top N" is just an example.
The table can contain millions of rows.
N is a relatively small number in comparison, e.g. 100.
status is 0 when the row is in the queue. Later it is changed to 1 to indicate that it is in processing. After processing it is deleted. So it is expected that at least 90% of the rows in the table will be with status 0.
rows in the table should be fetched according to their date, hence the order by clause.
What is the optimal index to make this query works fastest?
I initially thought the index should be on (date, status), but I am not sure about it anymore. Since the status column will contain mostly zeros, is there an added-value to it? Will it be sufficient to index by (date) alone?
Or maybe it should be (status, date)?
I don't think there is an efficient solution that will be RDMS independent. For example, Oracle has bitmap indexes, SQLServer has partial indexes, and I don't see reasons not to use them if, for instance, Mysql or Sqlite has nothing similar. Also, historically SQLServer implements clustered tables (or IOT in Oracle world) way better than Oracle does, so having clustered index on date column may work perfectly for SQLServer, but not for Oracle.
I'd rather change approach a bit. If you say 90% of rows don't satisfy status=0 condition, why not try refactoring schema, and adding a new table (or materialized view) that holds only records you are interested in ? The number of new programmable objects required for keeping that table up-to-date and merging data with original table is relatively small even if RDMS doesn't support materialized view directly. Also, if it's possible to redesign underlying logic, so rows never updated, only inserted or deleted, then it will help avoiding lock contentions , and as a result , the whole system will have a better performance .
Have a clustered index on Date and a non clustered index on Status.

Is there a SQL ANSI way of starting a search at the end of table?

In a certain app I must constantly query data that are likely to be amongst the last inserted rows. Since this table is going to grow a lot, I wonder if theres a standard way of optimizing the queries by making them start the lookup at the table's end. I think I would get the same optmization if the database stored data for the table in a stack-like structure, so the last inserted rows would be searched first.
The SQL spec doesn't mention anything about maintaining the insertion order. In practice, most of decent DB's also doesn't maintain it. Then it stops here. Sorting the table first ain't going to make it faster. Just index the column(s) of interest (at least the ones which you use in the WHERE).
One of the "tenets" of a proper RDBMS is that this kind of matters shouldn't concern you or anyone else using the DB.
The DB engine is "free" to use whatever method it wants to store/retrieve records, so if you want to enforce a "top" behaviour do what other suggested: add a timestamp field to the table (or tables), add an index on it and query using it as a sort and/or query criteria (e.g.: you poll the table each minute, and ask for records with timestamp>=systime-1 minute)
There is no standard way.
In some databases you can specify the sort order on an index.
SQL Server allows you to write ASC or DESC on an index:
[ ASC | DESC ]
Determines the ascending or descending sort direction for the particular index column. The default is ASC.
In MySQL you can also write ASC or DESC when you create the index but currently this is ignored. It might be implemented in a future version.
Add a counter or a time field in your table, sort on it and get top rows.
In other words: You should forget the idea that SQL tables are accessed in any particular order by default. A seqscan does not mean the oldest rows will be searched first, only that all rows will be checked. If you want to optimize some search you add indexes on some fields. What you are looking for is probably indexes.
If your data is indexed, it won't matter. The index is doing a binary search, not a sequential scan.
Unless you're doing TOP 1 (or something like it), the SELECT will have to scan the whole table or index anyway.
According to Data Independence you shouldn't care. That said a clustered index would probably suit your needs if you typically look for a date range. (sorting acs/desc shouldn't matter but you should try it out.)
If you find that you really need it you can also shard your database to increase perf on the most recently added data.
If you have enough rows that its actually becomming a problem, and you know how many "the most recently inserted rows" should be, you could try a round-about method.
Note: Even for pretty big tables, this is less efficient, but once your main table gets big enough, I've seen this work wonders for user-facing performance.
Create a "staging" table that exactly mimics your table's structure. Whenever you insert into your main table, also insert into your "staging" area. Limit your "staging" area to n rows by using a trigger to delete the lowest id row in the table when a new row over your arbitrary maximum is reached (say, 10,000 or whatever your limit is).
Then, queries can hit that smaller table first looking for the information. Since the table is arbitrarilly limited to the last n rows, it's only looking in the most recent data. Only if that fails to find a match would your query (actually, at this point a stored procedure because of the decision making) hit your main table.
Some Gotchas:
1) Make sure your trigger(s) is(are) set up properly to maintain the correct concurrancy between your "main" and "staging" tables.
2) This can quickly become a maintenance nightmare if not handled properly- and depending on your scenario it be be a little finiky.
3) I cannot stress enough that this is only efficient/useful in very specific scenarios. If yours doesn't match it, use one of the other answers.
ISO/ANSI Standard SQL does not consider optimization at all. For example the widely recognized CREATE INDEX SQL DDL does not appear in the Standard. This is because the Standard makes no assumptions about the underlying storage medium and nor should it. I regularly use SQL to query data in text files and Excel spreadsheets, neither of which have any concept of database indexes.
You can't do this.
However, there is a way to do something that might be even better. Depending on the design of your table, you should be able to create an index that keeps things in almost the order of entry. For example, if you adopt the common practice of creating an id field that autoincrements, then that index is just about in chronological order.
Some RDBMSes permit you to declare a backwards index, that is, one that descends instead of ascending. If you create a backwards index on the ID field, and if the optimizer uses that index, it will look at the most recent entries first. This will give you a rapid response for the first row.
The next step is to get the optimizer to use the index. You need to use explain plan to see if the index is being used. If you ask for the rows in order of id descending, the optimizer will almost certainly use the backwards index. If not you may be able to use hints to guide the optimizer.
If you still need to avoid reading all the rows in order to avoid wasting time, you may be able to use the LIMIT feature to declare that you only want, say 10 rows, and no more, or 1 row and no more. That should do it.
Good luck.
If your table has a create date, then I'd reverse sort by that and take the top 1.

SQL 2008 - resultset order issue

We are using SQL Server 2008. We have a table called response which has a primary key called response_id. It also has a column called bid_id. When we execute the query
‘select * from response where bid_id = x’
without an ‘order by’ we are getting results in mostly ascending order (default), but once in a while in descending order (very random). Is it possible in sql that the same sql query might return a resultset in different order if executed several times without order by? We used to have SQL Server 2000 till 5 months ago and never faced this problem. Does SQL Server 2008 deal differently with sql queries without ‘order by’?
Thanks.
In any SQL statement, on any database, you need to define an ORDER BY clause to guarantee the order is consistent.
Do you mean this happens for varying values of x?
As OMG Ponies says the order is not guaranteed unless you specify one.
In terms of an explanation perhaps you are getting a clustered index scan when it thinks there will be a significant portion of rows matching the bid_id = x criteria, hence them being in order of response_id, but it is looking them up via a non clustered index on the bid_id column when it expects fewer records to match the condition. These different access strategies could lead to them being returned in different orders. Adding an ORDER BY would cause it to favour a strategy that will leave them sorted or it would add an additional sort step to the plan before the results get returned.
Always use ORDER BY if order is important.
What is cached in memory as opposed to what is on disk will affect the order in which the SQL Server engine gets the rows, thus affecting the order of the rows if ORDER BY is not used.
The following URL has an interesting, if not directly related, discussion on the topic of the cache.
http://blog.sqlauthority.com/2010/06/17/sql-server-data-pages-in-buffer-pool-data-stored-in-memory-cache/
This is a very common question. I wrote up a canned answer:
Without ORDER BY, there is no default sort order.