Delete all but top n from database table in SQL - sql

What's the best way to delete all rows from a table in sql but to keep n number of rows on the top?

DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Edit:
Chris brings up a good performance hit since the TOP 10 query would be run for each row. If this is a one time thing, then it may not be as big of a deal, but if it is a common thing, then I did look closer at it.

I would select ID column(s) the set of rows that you want to keep into a temp table or table variable. Then delete all the rows that do not exist in the temp table. The syntax mentioned by another user:
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Has a potential problem. The "SELECT TOP 10" query will be executed for each row in the table, which could be a huge performance hit. You want to avoid making the same query over and over again.
This syntax should work, based what you listed as your original SQL statement:
create table #nuke(NukeID int)
insert into #nuke(Nuke) select top 1000 id from article
delete article where not exists (select 1 from nuke where Nukeid = id)
drop table #nuke

Future reference for those of use who don't use MS SQL.
In PostgreSQL use ORDER BY and LIMIT instead of TOP.
DELETE FROM table
WHERE id NOT IN (SELECT id FROM table ORDER BY id LIMIT n);
MySQL -- well...
Error -- This version of MySQL does not yet support 'LIMIT &
IN/ALL/ANY/SOME subquery'
Not yet I guess.

Here is how I did it. This method is faster and simpler:
Delete all but top n from database table in MS SQL using OFFSET command
WITH CTE AS
(
SELECT ID
FROM dbo.TableName
ORDER BY ID DESC
OFFSET 11 ROWS
)
DELETE CTE;
Replace ID with column by which you want to sort.
Replace number after OFFSET with number of rows which you want to keep.
Choose DESC or ASC - whatever suits your case.

I think using a virtual table would be much better than an IN-clause or temp table.
DELETE
Product
FROM
Product
LEFT OUTER JOIN
(
SELECT TOP 10
Product.id
FROM
Product
) TopProducts ON Product.id = TopProducts.id
WHERE
TopProducts.id IS NULL

This really is going to be language specific, but I would likely use something like the following for SQL server.
declare #n int
SET #n = SELECT Count(*) FROM dTABLE;
DELETE TOP (#n - 10 ) FROM dTable
if you don't care about the exact number of rows, there is always
DELETE TOP 90 PERCENT FROM dTABLE;

I don't know about other flavors but MySQL DELETE allows LIMIT.
If you could order things so that the n rows you want to keep are at the bottom, then you could do a DELETE FROM table LIMIT tablecount-n.
Edit
Oooo. I think I like Cory Foy's answer better, assuming it works in your case. My way feels a little clunky by comparison.

I would solve it using the technique below. The example expect an article table with an id on each row.
Delete article where id not in (select top 1000 id from article)
Edit: Too slow to answer my own question ...

Refactored?
Delete a From Table a Inner Join (
Select Top (Select Count(tableID) From Table) - 10)
From Table Order By tableID Desc
) b On b.tableID = A.tableID
edit: tried them both in the query analyzer, current answer is fasted (damn order by...)

Better way would be to insert the rows you DO want into another table, drop the original table and then rename the new table so it has the same name as the old table

I've got a trick to avoid executing the TOP expression for every row. We can combine TOP with MAX to get the MaxId we want to keep. Then we just delete everything greater than MaxId.
-- Declare Variable to hold the highest id we want to keep.
DECLARE #MaxId as int = (
SELECT MAX(temp.ID)
FROM (SELECT TOP 10 ID FROM table ORDER BY ID ASC) temp
)
-- Delete anything greater than MaxId. If MaxId is null, there is nothing to delete.
IF #MaxId IS NOT NULL
DELETE FROM table WHERE ID > #MaxId
Note: It is important to use ORDER BY when declaring MaxId to ensure proper results are queried.

Related

Is there any better option to apply pagination without applying OFFSET in SQL Server?

I want to apply pagination on a table with huge data. All I want to know a better option than using OFFSET in SQL Server.
Here is my simple query:
SELECT *
FROM TableName
ORDER BY Id DESC
OFFSET 30000000 ROWS
FETCH NEXT 20 ROWS ONLY
You can use Keyset Pagination for this. It's far more efficient than using Rowset Pagination (paging by row number).
In Rowset Pagination, all previous rows must be read, before being able to read the next page. Whereas in Keyset Pagination, the server can jump immediately to the correct place in the index, so no extra rows are read that do not need to be.
For this to perform well, you need to have a unique index on that key, which includes any other columns you need to query.
In this type of pagination, you cannot jump to a specific page number. You jump to a specific key and read from there. So you need to save the unique ID of page you are on and skip to the next. Alternatively, you could calculate or estimate a starting point for each page up-front.
One big benefit, apart from the obvious efficiency gain, is avoiding the "missing row" problem when paginating, caused by rows being removed from previously read pages. This does not happen when paginating by key, because the key does not change.
Here is an example:
Let us assume you have a table called TableName with an index on Id, and you want to start at the latest Id value and work backwards.
You begin with:
SELECT TOP (#numRows)
*
FROM TableName
ORDER BY Id DESC;
Note the use of ORDER BY to ensure the order is correct
In some RDBMSs you need LIMIT instead of TOP
The client will hold the last received Id value (the lowest in this case). On the next request, you jump to that key and carry on:
SELECT TOP (#numRows)
*
FROM TableName
WHERE Id < #lastId
ORDER BY Id DESC;
Note the use of < not <=
In case you were wondering, in a typical B-Tree+ index, the row with the indicated ID is not read, it's the row after it that's read.
The key chosen must be unique, so if you are paging by a non-unique column then you must add a second column to both ORDER BY and WHERE. You would need an index on OtherColumn, Id for example, to support this type of query. Don't forget INCLUDE columns on the index.
SQL Server does not support row/tuple comparators, so you cannot do (OtherColumn, Id) < (#lastOther, #lastId) (this is however supported in PostgreSQL, MySQL, MariaDB and SQLite).
Instead you need the following:
SELECT TOP (#numRows)
*
FROM TableName
WHERE (
(OtherColumn = #lastOther AND Id < #lastId)
OR OtherColumn < #lastOther
)
ORDER BY
OtherColumn DESC,
Id DESC;
This is more efficient than it looks, as SQL Server can convert this into a proper < over both values.
The presence of NULLs complicates things further. You may want to query those rows separately.
On very big merchant website we use a technic compound of ids stored in a pseudo temporary table and join with this table to the rows of the product table.
Let me talk with a clear example.
We have a table design this way :
CREATE TABLE S_TEMP.T_PAGINATION_PGN
(PGN_ID BIGINT IDENTITY(-9 223 372 036 854 775 808, 1) PRIMARY KEY,
PGN_SESSION_GUID UNIQUEIDENTIFIER NOT NULL,
PGN_SESSION_DATE DATETIME2(0) NOT NULL,
PGN_PRODUCT_ID INT NOT NULL,
PGN_SESSION_ORDER INT NOT NULL);
CREATE INDEX X_PGN_SESSION_GUID_ORDER
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_GUID, PGN_SESSION_ORDER)
INCLUDE (PGN_SESSION_ORDER);
CREATE INDEX X_PGN_SESSION_DATE
ON S_TEMP.T_PAGINATION_PGN (PGN_SESSION_DATE);
We have a very big product table call T_PRODUIT_PRD and a customer filtered it with many predicates. We INSERT rows from the filtered SELECT into this table this way :
DECLARE #SESSION_ID UNIQUEIDENTIFIER = NEWID();
INSERT INTO S_TEMP.T_PAGINATION_PGN
SELECT #SESSION_ID , SYSUTCDATETIME(), PRD_ID,
ROW_NUMBER() OVER(ORDER BY --> custom order by
FROM dbo.T_PRODUIT_PRD
WHERE ... --> custom filter
Then everytime we need a desired page, compound of #N products we add a join to this table as :
...
JOIN S_TEMP.T_PAGINATION_PGN
ON PGN_SESSION_GUID = #SESSION_ID
AND 1 + (PGN_SESSION_ORDER / #N) = #DESIRED_PAGE_NUMBER
AND PGN_PRODUCT_ID = dbo.T_PRODUIT_PRD.PRD_ID
All the indexes will do the job !
Of course, regularly we have to purge this table and this is why we have a scheduled job which deletes the rows whose sessions were generated more than 4 hours ago :
DELETE FROM S_TEMP.T_PAGINATION_PGN
WHERE PGN_SESSION_DATE < DATEADD(hour, -4, SYSUTCDATETIME());
In the same spirit as SQLPro solution, I propose:
WITH CTE AS
(SELECT 30000000 AS N
UNION ALL SELECT N-1 FROM CTE
WHERE N > 30000000 +1 - 20)
SELECT T.* FROM CTE JOIN TableName T ON CTE.N=T.ID
ORDER BY CTE.N DESC
Tried with 2 billion lines and it's instant !
Easy to make it a stored procedure...
Of course, valid if ids follow each other.

SQL query to update top 1 record in table

Can anybody please tell me how to write query to update top 1 record in table ?
Thanks
YOU have to decide what the top record is in a table, by ordering against a column you decide on.
That said, you could do this in SQL Server:
UPDATE [YourTable]
SET [YourColumn] = SomeValue
WHERE [PrimaryKey] IN
(
SELECT TOP 1 [PrimaryKey]
FROM [YourTable]
ORDER BY [PrimaryKey] -- You need to decide what column you want to sort on
)
There is no "top 1 record" in a relational table. Records are not in any order unless you specify one in the query.
in SQLServer you can use the TOP keyword in your update expression:
http://msdn.microsoft.com/en-us/library/ms177523.aspx
When a TOP (n) clause is used with UPDATE, the update operation is performed on a random selection of 'n' number of rows.
In MS SQL, in addition to the existing answers, it is possible to UPDATE the TOP N rows returned through a Common Table Expression (CTE). You are able to define an ORDER BY if required, as well.
One advantage of this is that it will work even if your table doesn't have a primary key:
WITH Top1Foo AS
(
SELECT TOP 1 *
FROM Foos
ORDER BY Name DESC
)
UPDATE Top1Foo
SET Name = 'zzz';
There's a Example SqlFiddle here
However, the caveat mentioned by other users remains - using TOP implies there are more records meeting the selection criteria, and it risks updating an arbitrary record / records.

My tricky SQL Update query not working so well

I am trying to update a table in my database with another row from another table. I have two parameters one being the ID and another being the row number (as you can select which row you want from the GUI)
this part of the code works fine, this returns one column of a single row.
(SELECT txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
its just when i try to add the update/set it won't work. I am probably missing something
UPDATE TBL_Content
Set TBL_Content.txtPageContent = (select txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
Thanks for the help! (i have tried top 1 with no avail)
I see a few issues with your update. First, I don't see any joining or selection criteria for the table that you're updating. That means that every row in the table will be updated with this new value. Is that really what you want?
Second, the row number between what is on the GUI and what you get back in the database may not match. Even if you reproduce the query used to create your list in the GUI (which is dangerous anyway, since it involves keeping the update and the select code always in sync), it's possible that someone could insert or delete or update a row between the time that you fill your list box and send that row number to the server for the update. It's MUCH better to use PKs (probably IDs in your case) to determine which row to use for updating.
That said, I think that the following will work for you (untested):
;WITH cte AS (
SELECT
txtPageContent,
ROW_NUMBER() OVER (ORDER BY ArchiveDate ASC) AS rowid
FROM
ARC_Content
WHERE
ContentID = #ContentID)
UPDATE
TC
SET
txtPageContent = cte.txtPageContent
FROM
TBL_Content TC
INNER JOIN cte ON
rowid = #rowID

How to select bottom most rows?

I can do SELECT TOP (200) ... but why not BOTTOM (200)?
Well not to get into philosophy what I mean is, how can I do the equivalent of TOP (200) but in reverse (from the bottom, like you'd expect BOTTOM to do...)?
SELECT
columns
FROM
(
SELECT TOP 200
columns
FROM
My_Table
ORDER BY
a_column DESC
) SQ
ORDER BY
a_column ASC
It is unnecessary. You can use an ORDER BY and just change the sort to DESC to get the same effect.
Sorry, but I don't think I see any correct answers in my opinion.
The TOP x function shows the records in undefined order. From that definition follows that a BOTTOM function can not be defined.
Independent of any index or sort order. When you do an ORDER BY y DESC you get the rows with the highest y value first. If this is an autogenerated ID, it should show the records last added to the table, as suggested in the other answers. However:
This only works if there is an autogenerated id column
It has a significant performance impact if you compare that with the TOP function
The correct answer should be that there is not, and cannot be, an equivalent to TOP for getting the bottom rows.
Logically,
BOTTOM (x) is all the records except TOP (n - x), where n is the count; x <= n
E.g. Select Bottom 1000 from Employee:
In T-SQL,
DECLARE
#bottom int,
#count int
SET #bottom = 1000
SET #count = (select COUNT(*) from Employee)
select * from Employee emp where emp.EmployeeID not in
(
SELECT TOP (#count-#bottom) Employee.EmployeeID FROM Employee
)
It would seem that any of the answers which implement an ORDER BY clause in the solution is missing the point, or does not actually understand what TOP returns to you.
TOP returns an unordered query result set which limits the record set to the first N records returned. (From an Oracle perspective, it is akin to adding a where ROWNUM < (N+1).
Any solution which uses an order, may return rows which also are returned by the TOP clause (since that data set was unordered in the first place), depending on what criteria was used in the order by
The usefulness of TOP is that once the dataset reaches a certain size N, it stops fetching rows. You can get a feel for what the data looks like without having to fetch all of it.
To implement BOTTOM accurately, it would need to fetch the entire dataset unordered and then restrict the dataset to the final N records. That will not be particularly effective if you are dealing with huge tables. Nor will it necessarily give you what you think you are asking for. The end of the data set may not necessarily be "the last rows inserted" (and probably won't be for most DML intensive applications).
Similarly, the solutions which implement an ORDER BY are, unfortunately, potentially disastrous when dealing with large data sets. If I have, say, 10 Billion records and want the last 10, it is quite foolish to order 10 Billion records and select the last 10.
The problem here, is that BOTTOM does not have the meaning that we think of when comparing it to TOP.
When records are inserted, deleted, inserted, deleted over and over and over again, some gaps will appear in the storage and later, rows will be slotted in, if possible. But what we often see, when we select TOP, appears to be sorted data, because it may have been inserted early on in the table's existence. If the table does not experience many deletions, it may appear to be ordered. (e.g. creation dates may be as far back in time as the table creation itself). But the reality is, if this is a delete-heavy table, the TOP N rows may not look like that at all.
So -- the bottom line here(pun intended) is that someone who is asking for the BOTTOM N records doesn't actually know what they're asking for. Or, at least, what they're asking for and what BOTTOM actually means are not the same thing.
So -- the solution may meet the actual business need of the requestor...but does not meet the criteria for being the BOTTOM.
First, create an index in a subquery according to the table's original order using:
ROW_NUMBER () OVER (ORDER BY (SELECT NULL) ) AS RowIndex
Then order the table descending by the RowIndex column you've created in the main query:
ORDER BY RowIndex DESC
And finally use TOP with your wanted quantity of rows:
SELECT TOP 1 * --(or 2, or 5, or 34)
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL) ) AS RowIndex, *
FROM MyTable) AS SubQuery
ORDER BY RowIndex DESC
All you need to do is reverse your ORDER BY. Add or remove DESC to it.
The problem with ordering the other way is that it often does not make good use of indices. It is also not very extendable if you ever need to select a number of rows that are not at the start or the end. An alternative way is as follows.
DECLARE #NumberOfRows int;
SET #NumberOfRows = (SELECT COUNT(*) FROM TheTable);
SELECT col1, col2,...
FROM (
SELECT col1, col2,..., ROW_NUMBER() OVER (ORDER BY col1) AS intRow
FROM TheTable
) AS T
WHERE intRow > #NumberOfRows - 20;
The currently accepted answer by "Justin Ethier" is not a correct answer as pointed out by "Protector one".
As far as I can see, as of now, no other answer or comment provides the equivalent of BOTTOM(x) the question author asked for.
First, let's consider a scenario where this functionality would be needed:
SELECT * FROM Split('apple,orange,banana,apple,lime',',')
This returns a table of one column and five records:
apple
orange
banana
apple
lime
As you can see: we don't have an ID column; we can't order by the returned column; and we can't select the bottom two records using standard SQL like we can do for the top two records.
Here is my attempt to provide a solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
SELECT TOP 2 * FROM #mytemptable ORDER BY tempID DESC
DROP TABLE #mytemptable
And here is a more complete solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
DELETE FROM #mytemptable WHERE tempID <= ((SELECT COUNT(*) FROM #mytemptable) - 2)
ALTER TABLE #mytemptable DROP COLUMN tempID
SELECT * FROM #mytemptable
DROP TABLE #mytemptable
I am by no means claiming that this is a good idea to use in all circumstances, but it provides the desired results.
You can use the OFFSET FETCH clause.
SELECT COUNT(1) FROM COHORT; --Number of results to expect
SELECT * FROM COHORT
ORDER BY ID
OFFSET 900 ROWS --Assuming you expect 1000 rows
FETCH NEXT 100 ROWS ONLY;
(This is for Microsoft SQL Server)
Official documentation:
https://www.sqlservertutorial.net/sql-server-basics/sql-server-offset-fetch/
"Tom H" answer above is correct and it works for me in getting Bottom 5 rows.
SELECT [KeyCol1], [KeyCol2], [Col3]
FROM
(SELECT TOP 5 [KeyCol1],
[KeyCol2],
[Col3]
FROM [dbo].[table_name]
ORDER BY [KeyCol1],[KeyCol2] DESC) SOME_ALAIS
ORDER BY [KeyCol1],[KeyCol2] ASC
Thanks.
try this.
declare #floor int --this is the offset from the bottom, the number of results to exclude
declare #resultLimit int --the number of results actually retrieved for use
declare #total int --just adds them up, the total number of results fetched initially
--following is for gathering top 60 results total, then getting rid of top 50. We only keep the last 10
set #floor = 50
set #resultLimit = 10
set #total = #floor + #resultLimit
declare #tmp0 table(
--table body
)
declare #tmp1 table(
--table body
)
--this line will drop the wanted results from whatever table we're selecting from
insert into #tmp0
select Top #total --what to select (the where, from, etc)
--using floor, insert the part we don't want into the second tmp table
insert into #tmp1
select top #floor * from #tmp0
--using select except, exclude top x results from the query
select * from #tmp0
except
select * from #tmp1
I've come up with a solution to this that doesn't require you to know the number of row returned.
For example, if you want to get all the locations logged in a table, except the latest 1 (or 2, or 5, or 34)
SELECT *
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY CreatedDate) AS Row, *
FROM Locations
WHERE UserId = 12345) AS SubQuery
WHERE Row > 1 -- or 2, or 5, or 34
Querying a simple subquery sorted descending, followed by sorting on the same column ascending does the trick.
SELECT * FROM
(SELECT TOP 200 * FROM [table] t2 ORDER BY t2.[column] DESC) t1
ORDER BY t1.[column]
SELECT TOP 10*from TABLE1 ORDER BY ID DESC
Where ID is the primary key of the TABLE1.
SELECT columns FROM My_Table LIMIT 200 OFFSET (SELECT Count(*)-200 My_Table)

How to read the last row with SQL Server

What is the most efficient way to read the last row with SQL Server?
The table is indexed on a unique key -- the "bottom" key values represent the last row.
If you're using MS SQL, you can try:
SELECT TOP 1 * FROM table_Name ORDER BY unique_column DESC
select whatever,columns,you,want from mytable
where mykey=(select max(mykey) from mytable);
You'll need some sort of uniquely identifying column in your table, like an auto-filling primary key or a datetime column (preferably the primary key). Then you can do this:
SELECT * FROM table_name ORDER BY unique_column DESC LIMIT 1
The ORDER BY column tells it to rearange the results according to that column's data, and the DESC tells it to reverse the results (thus putting the last one first). After that, the LIMIT 1 tells it to only pass back one row.
If some of your id are in order, i am assuming there will be some order in your db
SELECT * FROM TABLE WHERE ID = (SELECT MAX(ID) FROM TABLE)
I think below query will work for SQL Server with maximum performance without any sortable column
SELECT * FROM table
WHERE ID not in (SELECT TOP (SELECT COUNT(1)-1
FROM table)
ID
FROM table)
Hope you have understood it... :)
I tried using last in sql query in SQl server 2008 but it gives this err:
" 'last' is not a recognized built-in function name."
So I ended up using :
select max(WorkflowStateStatusId) from WorkflowStateStatus
to get the Id of the last row.
One could also use
Declare #i int
set #i=1
select WorkflowStateStatusId from Workflow.WorkflowStateStatus
where WorkflowStateStatusId not in (select top (
(select count(*) from Workflow.WorkflowStateStatus) - #i ) WorkflowStateStatusId from .WorkflowStateStatus)
You can use last_value: SELECT LAST_VALUE(column) OVER (PARTITION BY column ORDER BY column)...
I test it at one of my databases and it worked as expected.
You can also check de documentation here: https://msdn.microsoft.com/en-us/library/hh231517.aspx
OFFSET and FETCH NEXT are a feature of SQL Server 2012 to achieve SQL paging while displaying results.
The OFFSET argument is used to decide the starting row to return rows from a result and FETCH argument is used to return a set of number of rows.
SELECT *
FROM table_name
ORDER BY unique_column desc
OFFSET 0 Row
FETCH NEXT 1 ROW ONLY
SELECT TOP 1 id from comission_fees ORDER BY id DESC
In order to retrieve the last row of a table for MS SQL database 2005, You can use the following query:
select top 1 column_name from table_name order by column_name desc;
Note: To get the first row of the table for MS SQL database 2005, You can use the following query:
select top 1 column_name from table_name;
If you don't have any ordered column, you can use the physical id of each lines:
SELECT top 1 sys.fn_PhysLocFormatter(%%physloc%%) AS [File:Page:Slot],
T.*
FROM MyTable As T
order by sys.fn_PhysLocFormatter(%%physloc%%) DESC
SELECT * from Employees where [Employee ID] = ALL (SELECT MAX([Employee ID]) from Employees)
This is how you get the last record and update a field in Access DB.
UPDATE compalints SET tkt = addzone &'-'& customer_code &'-'& sn where sn in (select max(sn) from compalints )
If you have a Replicated table, you can have an Identity=1000 in localDatabase and Identity=2000 in the clientDatabase, so if you catch the last ID you may find always the last from client, not the last from the current connected database.
So the best method which returns the last connected database is:
SELECT IDENT_CURRENT('tablename')
Well I'm not getting the "last value" in a table, I'm getting the Last value per financial instrument. It's not the same but I guess it is relevant for some that are looking to look up on "how it is done now". I also used RowNumber() and CTE's and before that to simply take 1 and order by [column] desc. however we nolonger need to...
I am using SQL server 2017, we are recording all ticks on all exchanges globally, we have ~12 billion ticks a day, we store each Bid, ask, and trade including the volumes and the attributes of a tick (bid, ask, trade) of any of the given exchanges.
We have 253 types of ticks data for any given contract (mostly statistics) in that table, the last traded price is tick type=4 so, when we need to get the "last" of Price we use :
select distinct T.contractId,
LAST_VALUE(t.Price)over(partition by t.ContractId order by created ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
from [dbo].[Tick] as T
where T.TickType=4
You can see the execution plan on my dev system it executes quite efficient, executes in 4 sec while the exchange import ETL is pumping data into the table, there will be some locking slowing me down... that's just how live systems work.
It is very simple
select top 10 * from TableName order by 1 desc
SELECT * FROM TABLE WHERE ID = (SELECT MAX(ID) FROM TABLE)
I am pretty sure that it is:
SELECT last(column_name) FROM table
Becaause I use something similar:
SELECT last(id) FROM Status