Faster Select all from databases in SQL Server 2008 R2

Faster Select all from databases in SQL Server 2008 R2 - sql

I have a table which has 1.7 mil rows in total in SQL Server 2008 R2.
And here is my concern, I need to display all 1.7 mil records in my program. The standard approach I used was to
select col1, col2, col3,... , col13 from table
However, in the application end (VB.NET), it would takes approx 1 minute to load all the records in DataGridView control.
Somehow, it wouldn't be nice if the user needs to wait for a minute for viewing.
My question here is, is there any approach that I should consider for a faster Select All statement? Eg: configuration, paging, or etc?
P/s: I have did some read up on indexing. If I'm not mistaken, index is more suitable in situation like select for specific record only rite?
Thanks for all advises and help !
Regards,
PC

I would suggest not returning all rows at once. Is someone going to be looking at each row?
A clustered index is faster to read from since all of the data is stored physically in order by the index. Since you are reading every column, make sure a clustered index is defined.
SQL 2008 R2 Clustered Index

You're right that an index won't help you retrieve all 1.7m records in the table faster. Indexes are lookup-oriented data structures that make it faster to find rows based on the rows' attributes - attributes meaning the values of particular columns or expressions computed in terms of the column values. They're usually some type of tree object that makes it quicker to filter rows in the table down to those matching a predicate, with the goal of avoiding a full table scan like the sort your application is doing.
But indexes are only useful when the number of rows to retrieve is significantly smaller than the total number of rows in the table. When you want to show all the rows, they're no help at all.
I'd suggest you reexamine your application requirements. Is it really necessary to retrieve every row on every page load? Do they change that frequently? Could you put some sort of NoSQL cache layer between the database and the application? Memcached could probably speed this up significantly.
I'm also assuming you really do need all 1.7m of these rows every time the application is used. What are you doing with them?

Related

Two indexes for same column and change the order

I have a large table in Microsoft SQL Server 2008. It has two indexes. One index having column A descending order and another index having the column A ascending with some other columns.
My application is doing below:
Select for the record.
If there is no record then insert
If find then update the records
Note that this table has millions of records.
The question is: Are these indexes affect the any select/insert/update performance?
Any suggestions?

Having the exact same two indexes with the only difference being the ordering will make no difference to the SQL engine and will just pick either (practially).
Imagine you 2 dictionaries of the english language, one sorts words from A to Z and the other from Z to A. The effort you will need to search for a particular word will be roughly the same in both cases.
A different case would be if you had 2 lists of people's data, one ordered by first name then last name and the other by last name first, then first name. If you have to look for "John Doe", the one that's ordered first by last name will be practically useless compared to the other one.
These examples are very simplified representations of indexes on SQL Server. Indexes store their data on a structure that's called a B-Tree, but for searching purposes these examples work to understand when will a index be useful or not.
Resuming: you can drop the first index and keep the one that has additional columns on it, since it can be used for more different scenarios and also all cases that would require the other one. Keeping an unuseful index brings additional maintenance tasks like keeping the index updated on every insert, update, delete and refreshing statistics, so you might want to drop it.
PD: As for the 2nd index being used, this greatly depends on the query you are using to access that particular table, it's joins and where clauses. You can use the "Display Estimated Execution Plan" having highlighted the query on SSMS to see the access plan of the engine to each object to perform the operation. If the index is used, you will see it there.

Thanks for all the answers. I explored the SQL Server Profiler and SQL Tuning Advisor for the SQL Server and ran them to get the recommended indexes. It recommended a single index with include options. I used the index and the performance has been improved.

optimize query with column in where clause

I have an sql query which fetch the first N rows in a table which is designed as a low-level queue.
select top N * from my_table where status = 0 order by date asc
The intention behind this query is as follows:
First, this question is intended to be database agnostic, as my implementation will support sql server, oracle, DB2 and sybase. The sql syntax above of "top N" is just an example.
The table can contain millions of rows.
N is a relatively small number in comparison, e.g. 100.
status is 0 when the row is in the queue. Later it is changed to 1 to indicate that it is in processing. After processing it is deleted. So it is expected that at least 90% of the rows in the table will be with status 0.
rows in the table should be fetched according to their date, hence the order by clause.
What is the optimal index to make this query works fastest?
I initially thought the index should be on (date, status), but I am not sure about it anymore. Since the status column will contain mostly zeros, is there an added-value to it? Will it be sufficient to index by (date) alone?
Or maybe it should be (status, date)?

I don't think there is an efficient solution that will be RDMS independent. For example, Oracle has bitmap indexes, SQLServer has partial indexes, and I don't see reasons not to use them if, for instance, Mysql or Sqlite has nothing similar. Also, historically SQLServer implements clustered tables (or IOT in Oracle world) way better than Oracle does, so having clustered index on date column may work perfectly for SQLServer, but not for Oracle.
I'd rather change approach a bit. If you say 90% of rows don't satisfy status=0 condition, why not try refactoring schema, and adding a new table (or materialized view) that holds only records you are interested in ? The number of new programmable objects required for keeping that table up-to-date and merging data with original table is relatively small even if RDMS doesn't support materialized view directly. Also, if it's possible to redesign underlying logic, so rows never updated, only inserted or deleted, then it will help avoiding lock contentions , and as a result , the whole system will have a better performance .

Have a clustered index on Date and a non clustered index on Status.

how to optimize sql server table for faster response?

i found a in a table there are 50 thousands records and it takes one minute when we fetch data from sql server table just by issuing a sql. there are one primary key that means a already a cluster index is there. i just do not understand why it takes one minute. beside index what are the ways out there to optimize a table to get the data faster. in this situation what i need to do for faster response. also tell me how we can write always a optimize sql. please tell me all the steps in detail for optimization.
thanks.

The fastest way to optimize indexes in table is to use SQL Server Tuning Advisor. Take a look http://www.youtube.com/watch?v=gjT8wL92mqE <-- here

Select only the columns you need, rather than select *. If your table has some large columns e.g. OLE types or other binary data (maybe used for storing images etc) then you may be transferring vastly more data off disk and over the network than you need.
As others have said, an index is no help to you when you are selecting all rows (no where clause). Using an index would be slower in such cases because of the index read and table lookup for each row, vs full table scan.

If you are running select * from employee (as per question comment) then no amount of indexing will help you. It's an "Every column for every row" query: there is no magic for this.
Adding a WHERE won't help usually for select * query too.
What you can check is index and statistics maintenance. Do you do any? Here's a Google search
Or change how you use the data...
Edit:
Why a WHERE clause usually won't help...
If you add a WHERE that is not the PK..
you'll still need to scan the table unless you add an index on the searched column
then you'll need a key/bookmark lookup unless you make it covering
with SELECT * you need to add all columns to the index to make it covering
for a many hits, the index will probably be ignored to avoid key/bookmark lookups.
Unless there is a network issue or such, the issue is reading all columns not lack of WHERE
If you did SELECT col13 FROM MyTable and had an index on col13, the index will probably be used.
A SELECT * FROM MyTable WHERE DateCol < '20090101' with an index on DateCol but matched 40% of the table, it will probably be ignored or you'd have expensive key/bookmark lookups

Irrespective of the merits of returning the whole table to your application that does sound an unexpectedly long time to retrieve just 50000 rows of employee data.
Does your query have an ORDER BY or is it literally just select * from employee?
What is the definition of the employee table? Does it contain any particularly wide columns? Are you storing binary data such as their CVs or employee photo in it?
How are you issuing the SQL and retrieving the results?
What isolation level are your select statements running at (You can use SQL Profiler to check this)
Are you encountering blocking? Does adding NOLOCK to the query speed things up dramatically?

Is there a SQL ANSI way of starting a search at the end of table?

In a certain app I must constantly query data that are likely to be amongst the last inserted rows. Since this table is going to grow a lot, I wonder if theres a standard way of optimizing the queries by making them start the lookup at the table's end. I think I would get the same optmization if the database stored data for the table in a stack-like structure, so the last inserted rows would be searched first.

The SQL spec doesn't mention anything about maintaining the insertion order. In practice, most of decent DB's also doesn't maintain it. Then it stops here. Sorting the table first ain't going to make it faster. Just index the column(s) of interest (at least the ones which you use in the WHERE).

One of the "tenets" of a proper RDBMS is that this kind of matters shouldn't concern you or anyone else using the DB.
The DB engine is "free" to use whatever method it wants to store/retrieve records, so if you want to enforce a "top" behaviour do what other suggested: add a timestamp field to the table (or tables), add an index on it and query using it as a sort and/or query criteria (e.g.: you poll the table each minute, and ask for records with timestamp>=systime-1 minute)

There is no standard way.
In some databases you can specify the sort order on an index.
SQL Server allows you to write ASC or DESC on an index:
[ ASC | DESC ]
Determines the ascending or descending sort direction for the particular index column. The default is ASC.
In MySQL you can also write ASC or DESC when you create the index but currently this is ignored. It might be implemented in a future version.

Add a counter or a time field in your table, sort on it and get top rows.
In other words: You should forget the idea that SQL tables are accessed in any particular order by default. A seqscan does not mean the oldest rows will be searched first, only that all rows will be checked. If you want to optimize some search you add indexes on some fields. What you are looking for is probably indexes.

If your data is indexed, it won't matter. The index is doing a binary search, not a sequential scan.
Unless you're doing TOP 1 (or something like it), the SELECT will have to scan the whole table or index anyway.

According to Data Independence you shouldn't care. That said a clustered index would probably suit your needs if you typically look for a date range. (sorting acs/desc shouldn't matter but you should try it out.)
If you find that you really need it you can also shard your database to increase perf on the most recently added data.

If you have enough rows that its actually becomming a problem, and you know how many "the most recently inserted rows" should be, you could try a round-about method.
Note: Even for pretty big tables, this is less efficient, but once your main table gets big enough, I've seen this work wonders for user-facing performance.
Create a "staging" table that exactly mimics your table's structure. Whenever you insert into your main table, also insert into your "staging" area. Limit your "staging" area to n rows by using a trigger to delete the lowest id row in the table when a new row over your arbitrary maximum is reached (say, 10,000 or whatever your limit is).
Then, queries can hit that smaller table first looking for the information. Since the table is arbitrarilly limited to the last n rows, it's only looking in the most recent data. Only if that fails to find a match would your query (actually, at this point a stored procedure because of the decision making) hit your main table.
Some Gotchas:
1) Make sure your trigger(s) is(are) set up properly to maintain the correct concurrancy between your "main" and "staging" tables.
2) This can quickly become a maintenance nightmare if not handled properly- and depending on your scenario it be be a little finiky.
3) I cannot stress enough that this is only efficient/useful in very specific scenarios. If yours doesn't match it, use one of the other answers.

ISO/ANSI Standard SQL does not consider optimization at all. For example the widely recognized CREATE INDEX SQL DDL does not appear in the Standard. This is because the Standard makes no assumptions about the underlying storage medium and nor should it. I regularly use SQL to query data in text files and Excel spreadsheets, neither of which have any concept of database indexes.

You can't do this.
However, there is a way to do something that might be even better. Depending on the design of your table, you should be able to create an index that keeps things in almost the order of entry. For example, if you adopt the common practice of creating an id field that autoincrements, then that index is just about in chronological order.
Some RDBMSes permit you to declare a backwards index, that is, one that descends instead of ascending. If you create a backwards index on the ID field, and if the optimizer uses that index, it will look at the most recent entries first. This will give you a rapid response for the first row.
The next step is to get the optimizer to use the index. You need to use explain plan to see if the index is being used. If you ask for the rows in order of id descending, the optimizer will almost certainly use the backwards index. If not you may be able to use hints to guide the optimizer.
If you still need to avoid reading all the rows in order to avoid wasting time, you may be able to use the LIMIT feature to declare that you only want, say 10 rows, and no more, or 1 row and no more. That should do it.
Good luck.

If your table has a create date, then I'd reverse sort by that and take the top 1.

effect of number of projections on query performance

I am looking to improve the performance of a query which selects several columns from a table. was wondering if limiting the number of columns would have any effect on performance of the query.

Reducing the number of columns would, I think, have only very limited effect on the speed of the query but would have a potentially larger effect on the transfer speed of the data. The less data you select, the less data that would need to be transferred over the wire to your application.

I might be misunderstanding the question, but here goes anyway:
The absolute number of columns you select doesn't make a huge difference. However, which columns you select can make a significant difference depending on how the table is indexed.
If you are selecting only columns that are covered by the index, then the DB engine can use just the index for the query without ever fetching table data. If you use even one column that's not covered, though, it has to fetch the entire row (key lookup) and this will degrade performance significantly. Sometimes it will kill performance so much that the DB engine opts to do a full scan instead of even bothering with the index; it depends on the number of rows being selected.
So, if by removing columns you are able to turn this into a covering query, then yes, it can improve performance. Otherwise, probably not. Not noticeably anyway.
Quick example for SQL Server 2005+ - let's say this is your table:
ID int NOT NULL IDENTITY PRIMARY KEY CLUSTERED,
Name varchar(50) NOT NULL,
Status tinyint NOT NULL
If we create this index:
CREATE INDEX IX_MyTable
ON MyTable (Name)
Then this query will be fast:
SELECT ID
FROM MyTable
WHERE Name = 'Aaron'
But this query will be slow(er):
SELECT ID, Name, Status
FROM MyTable
WHERE Name = 'Aaron'
If we change the index to a covering index, i.e.
CREATE INDEX IX_MyTable
ON MyTable (Name)
INCLUDE (Status)
Then the second query becomes fast again because the DB engine never needs to read the row.

Limiting the number of columns has no measurable effect on the query. Almost universally, an entire row is fetched to cache. The projection happens last in the SQL pipeline.
The projection part of the processing must happen last (after GROUP BY, for instance) because it may involve creating aggregates. Also, many columns may be required for JOIN, WHERE and ORDER BY processing. More columns than are finally returned in the result set. It's hardly worth adding a step to the query plan to do projections to somehow save a little I/O.
Check your query plan documentation. There's no "project" node in the query plan. It's a small part of formulating the result set.
To get away from "whole row fetch", you have to go for a columnar ("Inverted") database.

It can depend on the server you're dealing with (and, in the case of MySQL, the storage engine). Just for example, there's at least one MySQL storage engine that does column-wise storage instead of row-wise storage, and in this case more columns really can take more time.
The other major possibility would be if you had segmented your table so some columns were stored on one server, and other columns on another (aka vertical partitioning). In this case, retrieving more columns might involve retrieving data from different servers, and it's always possible that the load is imbalanced so different servers have different response times. Of course, you usually try to keep the load reasonably balanced so that should be fairly unusual, but it's still possible (especially if, for example, if one of the servers handles some other data whose usage might vary independently from the rest).

yes, if your query can be covered by a non clustered index it will be faster since all the data is already in the index and the base table (if you have a heap) or clustered index does not need to be touched by the optimizer

To demonstrate what tvanfosson has already written, that there is a "transfer" cost I ran the following two statements on a MSSQL 2000 DB from query analyzer.
SELECT datalength(text) FROM syscomments
SELECT text FROM syscomments
Both results returned 947 rows but the first one took 5 ms and the second 973 ms.
Also because the fields are the same I would not expect indexing to factor here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas