I have a table which contains too much data (10 cr). For Paging purpose i have used
ROW_NUMBER() OVER (Order by id), but its getting very slow in select. So please give any alternate soluion? (Using verica)
You can create a temp table with an identity column. Good read
How to dynamically number rows in a SELECT Transact-SQL statement
Related
Few days ago I came across a strange problem with the Order By , While creating a new table I used
Select - Into - From and Order By (column name)
and when I open that table see tables are not arranged accordingly.
I re-verified it multiple times to make sure I am doing the right thing.
One more thing I would like to add is till the time I don't use INTO, I can see the desired result but as soon as I create new table, I see there is no Order for tht column. Please help me !
Thanks in advance.. Before posting the question I did research for 3 days but no solution yet
SELECT
[WorkOrderID], [ProductID], [OrderQty], [StockedQty]
INTO
[AdventureWorks2012].[Production].[WorkOrder_test]
FROM
[AdventureWorks2012].[Production].[WorkOrder]
ORDER BY
[StockedQty]
SQL 101 for beginners: SELECT statements have no defined order unless you define one.
When i open that table
That likely issues a SELECT (TOP 1000 IIFC) without order.
While creating a new table i used Select - Into - From and Order By (column name)
Which sort of is totally irrelevant - you basically waste performance ordering the input data.
You want an order in a select, MAKE ONE by adding an order by clause to the select. The table's internal order is by clustered index, but an query can return results in any order it wants. Fundamental SQL issue, as I said in the first sentence. Any good book on sql covers that in one of the first chapters. SQL uses a set approach, sets have no intrinsic order.
Firstly T-SQL is a set based language and sets don't have orders. More over it also doesn't mean serial execution of commands i.e, the above query is not executed in sequence written but the processing order for a SELECT statement is as:
1.FROM
2.ON
3.JOIN
4.WHERE
5.GROUP BY
6.WITH CUBE or WITH ROLLUP
7.HAVING
8.SELECT
9.DISTINCT
10.ORDER BY
Now when you execute your query without into selected column data gets ordered based on the condition specified in 'Order By' clause but when Into is used format of new_table is determined by evaluating the expressions in the select list.(Remember order by clause has not been evaluated yet).
The columns in new_table are created in the order specified by the select list but rows cannot be ordered. It's a limitation of Into clause you can refer here:
Specifying an ORDER BY clause does not guarantee the rows are inserted
in the specified order.
I have the very table in sql server 2008 with lot of data
|ID|Name|Column_1|Column_2|
|..|....|........|........|
more than 18,000 records. So i need to the the row with the lowest value of Column_1 that is date but could by any data type (that is unsorted) so I use these sentence
SELECT TOP 1 ID, Name from table ORDER BY Column_1 ASC
But this is very very slow. And i think that i don't need to to sort the whole table. My question es how to get the same date with out using TOP 1 and ORDER BY
I cannot see why 18,000 rows of information would cause too much of a slow down, but that is obviously without seeing what the data is you are storing.
If you are regularly going to be using the Column_1 field, then I would suggest you place a non-clustered index on it... that will speed up your query.
You can do it by "designing" your table via Sql Server Management Studio, or directly via TSQL...
CREATE INDEX IX_myTable_Column_1 ON myTable (Column_1 ASC)
More information on MSDN about creating indexes here
Update thanks to comments by #GarethD who helped me with this, as I wasn't actually aware of it.
As an extra part of the above TSQL statement, it will increase the speed of your queries if you include the names of the other columns that will be used within the index....
CREATE INDEX IX_myTable_Column_1 ON myTable (Column_1 ASC) INCLUDE (ID, Name)
As GarethD points out, using this SQLFiddle as proof, the execution plan is much quicker as it avoids a "RID" (or Row Identifier) lookup.
More information on MSDN about creating indexes with include columns here
Thank you #GarethD
Would this work faster? When I read this question, this was the code that came to mind:
Select top 1 ID, Name
from table
where Column_1 = (Select min(Column_1) from table)
I've researched and realize I have a unique situation.
First off, I am not allowed to post images yet to the board since I'm a new user, so see appropriate links below
I have multiple tables where a column (not always the identifier column) is sequentially numbered and shouldn't have any breaks in the numbering. My goal is to make sure this stays true.
Down and Dirty
We have an 'Event' table where we randomly select a percentage of the rows and insert the rows into table 'Results'. The "ID" column from the 'Results' is passed to a bunch of delete queries.
This more or less ensures that there are missing rows in several tables.
My problem:
Figuring out an sql query that will renumber the column I specify. I prefer to not drop the column.
Example delete query:
delete ItemVoid
from ItemTicket
join ItemVoid
on ItemTicket.item_ticket_id = itemvoid.item_ticket_id
where itemticket.ID in (select ID
from results)
Example Tables Before:
Example Tables After:
As you can see 2 rows were delete from both tables based on the ID column. So now I gotta figure out how to renumber the item_ticket_id and the item_void_id columns where the the higher number decreases to the missing value, and the next highest one decreases, etc. Problem #2, if the item_ticket_id changes in order to be sequential in ItemTickets, then
it has to update that change in ItemVoid's item_ticket_id.
I appreciate any advice you can give on this.
(answering an old question as it's the first search result when I was looking this up)
(MS T-SQL)
To resequence an ID column (not an Identity one) that has gaps,
can be performed using only a simple CTE with a row_number() to generate a new sequence.
The UPDATE works via the CTE 'virtual table' without any extra problems, actually updating the underlying original table.
Don't worry about the ID fields clashing during the update, if you wonder what happens when ID's are set that already exist, it
doesn't suffer that problem - the original sequence is changed to the new sequence in one go.
WITH NewSequence AS
(
SELECT
ID,
ROW_NUMBER() OVER (ORDER BY ID) as ID_New
FROM YourTable
)
UPDATE NewSequence SET ID = ID_New;
Since you are looking for advice on this, my advice is you need to redesign this as I see a big flaw in your design.
Instead of deleting the records and then going through the hassle of renumbering the remaining records, use a bit flag that will mark the records as Inactive. Then when you are querying the records, just include a WHERE clause to only include the records are that active:
SELECT *
FROM yourTable
WHERE Inactive = 0
Then you never have to worry about re-numbering the records. This also gives you the ability to go back and see the records that would have been deleted and you do not lose the history.
If you really want to delete the records and renumber them then you can perform this task the following way:
create a new table
Insert your original data into your new table using the new numbers
drop your old table
rename your new table with the corrected numbers
As you can see there would be a lot of steps involved in re-numbering the records. You are creating much more work this way when you could just perform an UPDATE of the bit flag.
You would change your DELETE query to something similar to this:
UPDATE ItemVoid
SET InActive = 1
FROM ItemVoid
JOIN ItemTicket
on ItemVoid.item_ticket_id = ItemTicket.item_ticket_id
WHERE ItemTicket.ID IN (select ID from results)
The bit flag is much easier and that would be the method that I would recommend.
The function that you are looking for is a window function. In standard SQL (SQL Server, MySQL), the function is row_number(). You use it as follows:
select row_number() over (partition by <col>)
from <table>
In order to use this in your case, you would delete the rows from the table, then use a with statement to recalculate the row numbers, and then assign them using an update. For transactional integrity, you might wrap the delete and update into a single transaction.
Oracle supports similar functionality, but the syntax is a bit different. Oracle calls these functions analytic functions and they support a richer set of operations on them.
I would strongly caution you from using cursors, since these have lousy performance. Of course, this will not work on an identity column, since such a column cannot be modified.
The typical way of selecting data is:
select * from my_table
But what if the table contains 10 million records and you only want records 300,010 to 300,020
Is there a way to create a SQL statement on Microsoft SQL that only gets 10 records at once?
E.g.
select * from my_table from records 300,010 to 300,020
This would be way more efficient than retrieving 10 million records across the network, storing them in the IIS server and then counting to the records you want.
SELECT * FROM my_table is just the tip of the iceberg. Assuming you're talking a table with an identity field for the primary key, you can just say:
SELECT * FROM my_table WHERE ID >= 300010 AND ID <= 300020
You should also know that selecting * is considered poor practice in many circles. They want you specify the exact column list.
Try looking at info about pagination. Here's a short summary of it for SQL Server.
Absolutely. On MySQL and PostgreSQL (the two databases I've used), the syntax would be
SELECT [columns] FROM table LIMIT 10 OFFSET 300010;
On MS SQL, it's something like SELECT TOP 10 ...; I don't know the syntax for offsetting the record list.
Note that you never want to use SELECT *; it's a maintenance nightmare if anything ever changes. This query, though, is going to be incredibly slow since your database will have to scan through and throw away the first 300,010 records to get to the 10 you want. It'll also be unpredictable, since you haven't told the database which order you want the records in.
This is the core of SQL: tell it which 10 records you want, identified by a key in a specific range, and the database will do its best to grab and return those records with minimal work. Look up any tutorial on SQL for more information on how it works.
When working with large tables, it is often a good idea to make use of Partitioning techniques available in SQL Server.
The rules of your partitition function typically dictate that only a range of data can reside within a given partition. You could split your partitions by date range or ID for example.
In order to select from a particular partition you would use a query similar to the following.
SELECT <Column Name1>…/*
FROM <Table Name>
WHERE $PARTITION.<Partition Function Name>(<Column Name>) = <Partition Number>
Take a look at the following white paper for more detailed infromation on partitioning in SQL Server 2005.
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope this helps however please feel free to pose further questions.
Cheers, John
I use wrapper queries to select the core query and then just isolate the ROW numbers that i wish to take from the query - this allows the SQL server to do all the heavy lifting inside the CORE query and just pass out the small amount of the table that i have requested. All you need to do is pass the [start_row_variable] and the [end_row_variable] into the SQL query.
NOTE: The order clause is specified OUTSIDE the core query [sql_order_clause]
w1 and w2 are TEMPORARY table created by the SQL server as the wrapper tables.
SELECT
w1.*
FROM(
SELECT w2.*,
ROW_NUMBER() OVER ([sql_order_clause]) AS ROW
FROM (
<!--- CORE QUERY START --->
SELECT [columns]
FROM [table_name]
WHERE [sql_string]
<!--- CORE QUERY END --->
) AS w2
) AS w1
WHERE ROW BETWEEN [start_row_variable] AND [end_row_variable]
This method has hugely optimized my database systems. It works very well.
IMPORTANT: Be sure to always explicitly specify only the exact columns you wish to retrieve in the core query as fetching unnecessary data in these CORE queries can cost you serious overhead
Use TOP to select only a limited amont of rows like:
SELECT TOP 10 * FROM my_table WHERE ID >= 300010
Add an ORDER BY if you want the results in a particular order.
To be efficient there has to be an index on the ID column.
EDIT: I'm still waiting for more answers. Thanks!
In SQL 2000 days, I used to use temp table method where you create a temp table with new identity column and primary key then select where identity column between A and B.
When SQL 2005 came along I found out about Row_Number() and I've been using it ever since...
But now, I found a serious performance issue with Row_Number().
It performs very well when you are working with not-so-gigantic result sets and sorting over an identity column. However, it performs very poorly when you are working with large result sets like over 10,000 records and sorting it over non-identity column. Row_Number() performs poorly even if you sort by an identity column if the result set is over 250,000 records. For me, it came to a point where it throws an error, "command timeout!"
What do you use to do paginate a large result set on SQL 2005?
Is temp table method still better in this case? I'm not sure if this method using temp table with SET ROWCOUNT will perform better... But some say there is an issue of giving wrong row number if you have multi-column primary key.
In my case, I need to be able to sort the result set by a date type column... for my production web app.
Let me know what you use for high-performing pagination in SQL 2005. And I'd also like to know a smart way of creating indexes. I'm suspecting choosing right primary keys and/or indexes (clustered/non-clustered) will play a big role here.
Thanks in advance.
P.S. Does anyone know what stackoverflow uses?
EDIT: Mine looks something like...
SELECT postID, postTitle, postDate
FROM
(SELECT postID, postTitle, postDate,
ROW_NUMBER() OVER(ORDER BY postDate DESC, postID DESC) as RowNum
FROM MyTable
) as DerivedMyTable
WHERE RowNum BETWEEN #startRowIndex AND (#startRowIndex + #maximumRows) - 1
postID: Int, Identity (auto-increment), Primary key
postDate: DateTime
EDIT: Is everyone using Row_Number()?
The row_number() technique should be quick. I have seen good results for 100,000 rows.
Are you using row_number() similiar to the following:
SELECT column_list
FROM
(SELECT column_list
ROW_NUMBER() OVER(ORDER BY OrderByColumnName) as RowNum
FROM MyTable m
) as DerivedTableName
WHERE RowNum BETWEEN #startRowIndex AND (#startRowIndex + #maximumRows) - 1
...and do you have a covering index for the column_list and/or an index on the 'OrderByColumnName' column?
Well, for your sample query ROW_COUNT should be pretty fast with thousands of rows, provided you have an index on your PostDate field. If you don't, the server needs to perform a complete clustered index scan on your PK, practically load every page, fetch your PostDate field, sort by it, determine the rows to extract for the result set and again fetch those rows. It's kind of creating a temp index over and over again (you might see an table/index spool in the plain).
No wonder you get timeouts.
My suggestion: set an index on PostDate DESC, this is what ROW_NUMBER will go over - (ORDER BY PostDate DESC, ...)
As for the article you are referring to - I've done pretty much paging and stuff with SQL Server 2000 in the past without ROW_COUNT and the approach used in the article is the most efficient one. It does not work in all circumstances (you need unique or almost unique values). An overview of some other methods is here.
.