Update where select, guarantee atomicity - sql

I have a T-SQL query like this:
UPDATE
[MyTable]
SET
[MyField] = #myValue
WHERE
[Id] =
(
SELECT TOP(1)
[Id]
FROM
[MyTable]
WHERE
[MyField] IS NULL
-- AND other conditions on [MyTable]
ORDER BY
[Id] ASC
)
It seems that this query is not atomic (the select of 2 concurrent executions can return the same Id twice).
Edit:
If I execute this query, the Id returned by the SELECT will not be available for the next execution (because [MyField] will not be NULL anymore). However, if I execute this query twice at the same time, both executions could return the same Id (and the second UPDATE would overwrite the first one).
I've read that one solution to avoid that is to use a SERIALIZABLE isolation level. Is that the best / fastest / most simple way ?

As I can see, UPDLOCK would be enough (test code confirms that)

This would result the top 1 value but if its the same you have the potential to return more than one value.
Try using DISTINCT
I understand that top(1) should only return 1 row but the question states its returning more than one row. So i thought it might be because the value is the same. So you could possibly use something like this
SELECT DISTINCT TOP 1 name FROM [Class];
Its either that or where clause to narrow results

Calculate the max ID first then use it in a cross join for the the update.
SQL DEMO
WITH cte as (
SELECT TOP 1 ID
FROM [MyTable]
WHERE MYFIELD IS NULL
ORDER BY ID
)
UPDATE t
SET [ID] = cte.[ID]
FROM [MyTable] t
CROSS JOIN cte;
OUTPUT

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

SQL query to update top 1 record in table

Can anybody please tell me how to write query to update top 1 record in table ?
Thanks
YOU have to decide what the top record is in a table, by ordering against a column you decide on.
That said, you could do this in SQL Server:
UPDATE [YourTable]
SET [YourColumn] = SomeValue
WHERE [PrimaryKey] IN
(
SELECT TOP 1 [PrimaryKey]
FROM [YourTable]
ORDER BY [PrimaryKey] -- You need to decide what column you want to sort on
)
There is no "top 1 record" in a relational table. Records are not in any order unless you specify one in the query.
in SQLServer you can use the TOP keyword in your update expression:
http://msdn.microsoft.com/en-us/library/ms177523.aspx
When a TOP (n) clause is used with UPDATE, the update operation is performed on a random selection of 'n' number of rows.
In MS SQL, in addition to the existing answers, it is possible to UPDATE the TOP N rows returned through a Common Table Expression (CTE). You are able to define an ORDER BY if required, as well.
One advantage of this is that it will work even if your table doesn't have a primary key:
WITH Top1Foo AS
(
SELECT TOP 1 *
FROM Foos
ORDER BY Name DESC
)
UPDATE Top1Foo
SET Name = 'zzz';
There's a Example SqlFiddle here
However, the caveat mentioned by other users remains - using TOP implies there are more records meeting the selection criteria, and it risks updating an arbitrary record / records.

How to select bottom most rows?

I can do SELECT TOP (200) ... but why not BOTTOM (200)?
Well not to get into philosophy what I mean is, how can I do the equivalent of TOP (200) but in reverse (from the bottom, like you'd expect BOTTOM to do...)?
SELECT
columns
FROM
(
SELECT TOP 200
columns
FROM
My_Table
ORDER BY
a_column DESC
) SQ
ORDER BY
a_column ASC
It is unnecessary. You can use an ORDER BY and just change the sort to DESC to get the same effect.
Sorry, but I don't think I see any correct answers in my opinion.
The TOP x function shows the records in undefined order. From that definition follows that a BOTTOM function can not be defined.
Independent of any index or sort order. When you do an ORDER BY y DESC you get the rows with the highest y value first. If this is an autogenerated ID, it should show the records last added to the table, as suggested in the other answers. However:
This only works if there is an autogenerated id column
It has a significant performance impact if you compare that with the TOP function
The correct answer should be that there is not, and cannot be, an equivalent to TOP for getting the bottom rows.
Logically,
BOTTOM (x) is all the records except TOP (n - x), where n is the count; x <= n
E.g. Select Bottom 1000 from Employee:
In T-SQL,
DECLARE
#bottom int,
#count int
SET #bottom = 1000
SET #count = (select COUNT(*) from Employee)
select * from Employee emp where emp.EmployeeID not in
(
SELECT TOP (#count-#bottom) Employee.EmployeeID FROM Employee
)
It would seem that any of the answers which implement an ORDER BY clause in the solution is missing the point, or does not actually understand what TOP returns to you.
TOP returns an unordered query result set which limits the record set to the first N records returned. (From an Oracle perspective, it is akin to adding a where ROWNUM < (N+1).
Any solution which uses an order, may return rows which also are returned by the TOP clause (since that data set was unordered in the first place), depending on what criteria was used in the order by
The usefulness of TOP is that once the dataset reaches a certain size N, it stops fetching rows. You can get a feel for what the data looks like without having to fetch all of it.
To implement BOTTOM accurately, it would need to fetch the entire dataset unordered and then restrict the dataset to the final N records. That will not be particularly effective if you are dealing with huge tables. Nor will it necessarily give you what you think you are asking for. The end of the data set may not necessarily be "the last rows inserted" (and probably won't be for most DML intensive applications).
Similarly, the solutions which implement an ORDER BY are, unfortunately, potentially disastrous when dealing with large data sets. If I have, say, 10 Billion records and want the last 10, it is quite foolish to order 10 Billion records and select the last 10.
The problem here, is that BOTTOM does not have the meaning that we think of when comparing it to TOP.
When records are inserted, deleted, inserted, deleted over and over and over again, some gaps will appear in the storage and later, rows will be slotted in, if possible. But what we often see, when we select TOP, appears to be sorted data, because it may have been inserted early on in the table's existence. If the table does not experience many deletions, it may appear to be ordered. (e.g. creation dates may be as far back in time as the table creation itself). But the reality is, if this is a delete-heavy table, the TOP N rows may not look like that at all.
So -- the bottom line here(pun intended) is that someone who is asking for the BOTTOM N records doesn't actually know what they're asking for. Or, at least, what they're asking for and what BOTTOM actually means are not the same thing.
So -- the solution may meet the actual business need of the requestor...but does not meet the criteria for being the BOTTOM.
First, create an index in a subquery according to the table's original order using:
ROW_NUMBER () OVER (ORDER BY (SELECT NULL) ) AS RowIndex
Then order the table descending by the RowIndex column you've created in the main query:
ORDER BY RowIndex DESC
And finally use TOP with your wanted quantity of rows:
SELECT TOP 1 * --(or 2, or 5, or 34)
FROM (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL) ) AS RowIndex, *
FROM MyTable) AS SubQuery
ORDER BY RowIndex DESC
All you need to do is reverse your ORDER BY. Add or remove DESC to it.
The problem with ordering the other way is that it often does not make good use of indices. It is also not very extendable if you ever need to select a number of rows that are not at the start or the end. An alternative way is as follows.
DECLARE #NumberOfRows int;
SET #NumberOfRows = (SELECT COUNT(*) FROM TheTable);
SELECT col1, col2,...
FROM (
SELECT col1, col2,..., ROW_NUMBER() OVER (ORDER BY col1) AS intRow
FROM TheTable
) AS T
WHERE intRow > #NumberOfRows - 20;
The currently accepted answer by "Justin Ethier" is not a correct answer as pointed out by "Protector one".
As far as I can see, as of now, no other answer or comment provides the equivalent of BOTTOM(x) the question author asked for.
First, let's consider a scenario where this functionality would be needed:
SELECT * FROM Split('apple,orange,banana,apple,lime',',')
This returns a table of one column and five records:
apple
orange
banana
apple
lime
As you can see: we don't have an ID column; we can't order by the returned column; and we can't select the bottom two records using standard SQL like we can do for the top two records.
Here is my attempt to provide a solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
SELECT TOP 2 * FROM #mytemptable ORDER BY tempID DESC
DROP TABLE #mytemptable
And here is a more complete solution:
SELECT * INTO #mytemptable FROM Split('apple,orange,banana,apple,lime',',')
ALTER TABLE #mytemptable ADD tempID INT IDENTITY
DELETE FROM #mytemptable WHERE tempID <= ((SELECT COUNT(*) FROM #mytemptable) - 2)
ALTER TABLE #mytemptable DROP COLUMN tempID
SELECT * FROM #mytemptable
DROP TABLE #mytemptable
I am by no means claiming that this is a good idea to use in all circumstances, but it provides the desired results.
You can use the OFFSET FETCH clause.
SELECT COUNT(1) FROM COHORT; --Number of results to expect
SELECT * FROM COHORT
ORDER BY ID
OFFSET 900 ROWS --Assuming you expect 1000 rows
FETCH NEXT 100 ROWS ONLY;
(This is for Microsoft SQL Server)
Official documentation:
https://www.sqlservertutorial.net/sql-server-basics/sql-server-offset-fetch/
"Tom H" answer above is correct and it works for me in getting Bottom 5 rows.
SELECT [KeyCol1], [KeyCol2], [Col3]
FROM
(SELECT TOP 5 [KeyCol1],
[KeyCol2],
[Col3]
FROM [dbo].[table_name]
ORDER BY [KeyCol1],[KeyCol2] DESC) SOME_ALAIS
ORDER BY [KeyCol1],[KeyCol2] ASC
Thanks.
try this.
declare #floor int --this is the offset from the bottom, the number of results to exclude
declare #resultLimit int --the number of results actually retrieved for use
declare #total int --just adds them up, the total number of results fetched initially
--following is for gathering top 60 results total, then getting rid of top 50. We only keep the last 10
set #floor = 50
set #resultLimit = 10
set #total = #floor + #resultLimit
declare #tmp0 table(
--table body
)
declare #tmp1 table(
--table body
)
--this line will drop the wanted results from whatever table we're selecting from
insert into #tmp0
select Top #total --what to select (the where, from, etc)
--using floor, insert the part we don't want into the second tmp table
insert into #tmp1
select top #floor * from #tmp0
--using select except, exclude top x results from the query
select * from #tmp0
except
select * from #tmp1
I've come up with a solution to this that doesn't require you to know the number of row returned.
For example, if you want to get all the locations logged in a table, except the latest 1 (or 2, or 5, or 34)
SELECT *
FROM
(SELECT ROW_NUMBER() OVER (ORDER BY CreatedDate) AS Row, *
FROM Locations
WHERE UserId = 12345) AS SubQuery
WHERE Row > 1 -- or 2, or 5, or 34
Querying a simple subquery sorted descending, followed by sorting on the same column ascending does the trick.
SELECT * FROM
(SELECT TOP 200 * FROM [table] t2 ORDER BY t2.[column] DESC) t1
ORDER BY t1.[column]
SELECT TOP 10*from TABLE1 ORDER BY ID DESC
Where ID is the primary key of the TABLE1.
SELECT columns FROM My_Table LIMIT 200 OFFSET (SELECT Count(*)-200 My_Table)

Need a row count after SELECT statement: what's the optimal SQL approach?

I'm trying to select a column from a single table (no joins) and I need the count of the number of rows, ideally before I begin retrieving the rows. I have come to two approaches that provide the information I need.
Approach 1:
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
Then
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
Or Approach 2
SELECT my_table.my_col, ( SELECT COUNT ( my_table.my_col )
FROM my_table
WHERE my_table.foo = 'bar' ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
I am doing this because my SQL driver (SQL Native Client 9.0) does not allow me to use SQLRowCount on a SELECT statement but I need to know the number of rows in my result in order to allocate an array before assigning information to it. The use of a dynamically allocated container is, unfortunately, not an option in this area of my program.
I am concerned that the following scenario might occur:
SELECT for count occurs
Another instruction occurs, adding or removing a row
SELECT for data occurs and suddenly the array is the wrong size.
-In the worse case, this will attempt to write data beyond the arrays limits and crash my program.
Does Approach 2 prohibit this issue?
Also, Will one of the two approaches be faster? If so, which?
Finally, is there a better approach that I should consider (perhaps a way to instruct the driver to return the number of rows in a SELECT result using SQLRowCount?)
For those that asked, I am using Native C++ with the aforementioned SQL driver (provided by Microsoft.)
If you're using SQL Server, after your query you can select the ##RowCount function (or if your result set might have more than 2 billion rows use the RowCount_Big() function). This will return the number of rows selected by the previous statement or number of rows affected by an insert/update/delete statement.
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
SELECT ##Rowcount
Or if you want to row count included in the result sent similar to Approach #2, you can use the the OVER clause.
SELECT my_table.my_col,
count(*) OVER(PARTITION BY my_table.foo) AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
Using the OVER clause will have much better performance than using a subquery to get the row count. Using the ##RowCount will have the best performance because the there won't be any query cost for the select ##RowCount statement
Update in response to comment: The example I gave would give the # of rows in partition - defined in this case by "PARTITION BY my_table.foo". The value of the column in each row is the # of rows with the same value of my_table.foo. Since your example query had the clause "WHERE my_table.foo = 'bar'", all rows in the resultset will have the same value of my_table.foo and therefore the value in the column will be the same for all rows and equal (in this case) this the # of rows in the query.
Here is a better/simpler example of how to include a column in each row that is the total # of rows in the resultset. Simply remove the optional Partition By clause.
SELECT my_table.my_col, count(*) OVER() AS 'Count'
FROM my_table
WHERE my_table.foo = 'bar'
There are only two ways to be 100% certain that the COUNT(*) and the actual query will give consistent results:
Combined the COUNT(*) with the query, as in your Approach 2. I recommend the form you show in your example, not the correlated subquery form shown in the comment from kogus.
Use two queries, as in your Approach 1, after starting a transaction in SNAPSHOT or SERIALIZABLE isolation level.
Using one of those isolation levels is important because any other isolation level allows new rows created by other clients to become visible in your current transaction. Read the MSDN documentation on SET TRANSACTION ISOLATION for more details.
Approach 2 will always return a count that matches your result set.
I suggest you link the sub-query to your outer query though, to guarantee that the condition on your count matches the condition on the dataset.
SELECT
mt.my_row,
(SELECT COUNT(mt2.my_row) FROM my_table mt2 WHERE mt2.foo = mt.foo) as cnt
FROM my_table mt
WHERE mt.foo = 'bar';
If you're concerned the number of rows that meet the condition may change in the few milliseconds since execution of the query and retrieval of results, you could/should execute the queries inside a transaction:
BEGIN TRAN bogus
SELECT COUNT( my_table.my_col ) AS row_count
FROM my_table
WHERE my_table.foo = 'bar'
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
ROLLBACK TRAN bogus
This would return the correct values, always.
Furthermore, if you're using SQL Server, you can use ##ROWCOUNT to get the number of rows affected by last statement, and redirect the output of real query to a temp table or table variable, so you can return everything altogether, and no need of a transaction:
DECLARE #dummy INT
SELECT my_table.my_col
INTO #temp_table
FROM my_table
WHERE my_table.foo = 'bar'
SET #dummy=##ROWCOUNT
SELECT #dummy, * FROM #temp_table
Here are some ideas:
Go with Approach #1 and resize the array to hold additional results or use a type that automatically resizes as neccessary (you don't mention what language you are using so I can't be more specific).
You could execute both statements in Approach #1 within a transaction to guarantee the counts are the same both times if your database supports this.
I'm not sure what you are doing with the data but if it is possible to process the results without storing all of them first this might be the best method.
If you are really concerned that your row count will change between the select count and the select statement, why not select your rows into a temp table first? That way, you know you will be in sync.
Why don't you put your results into a vector? That way you don't have to know the size before hand.
You might want to think about a better pattern for dealing with data of this type.
No self-prespecting SQL driver will tell you how many rows your query will return before returning the rows, because the answer might change (unless you use a Transaction, which creates problems of its own.)
The number of rows won't change - google for ACID and SQL.
IF (##ROWCOUNT > 0)
BEGIN
SELECT my_table.my_col
FROM my_table
WHERE my_table.foo = 'bar'
END
Just to add this because this is the top result in google for this question.
In sqlite I used this to get the rowcount.
WITH temptable AS
(SELECT one,two
FROM
(SELECT one, two
FROM table3
WHERE dimension=0
UNION ALL SELECT one, two
FROM table2
WHERE dimension=0
UNION ALL SELECT one, two
FROM table1
WHERE dimension=0)
ORDER BY date DESC)
SELECT *
FROM temptable
LEFT JOIN
(SELECT count(*)/7 AS cnt,
0 AS bonus
FROM temptable) counter
WHERE 0 = counter.bonus

Delete all but top n from database table in SQL

What's the best way to delete all rows from a table in sql but to keep n number of rows on the top?
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Edit:
Chris brings up a good performance hit since the TOP 10 query would be run for each row. If this is a one time thing, then it may not be as big of a deal, but if it is a common thing, then I did look closer at it.
I would select ID column(s) the set of rows that you want to keep into a temp table or table variable. Then delete all the rows that do not exist in the temp table. The syntax mentioned by another user:
DELETE FROM Table WHERE ID NOT IN (SELECT TOP 10 ID FROM Table)
Has a potential problem. The "SELECT TOP 10" query will be executed for each row in the table, which could be a huge performance hit. You want to avoid making the same query over and over again.
This syntax should work, based what you listed as your original SQL statement:
create table #nuke(NukeID int)
insert into #nuke(Nuke) select top 1000 id from article
delete article where not exists (select 1 from nuke where Nukeid = id)
drop table #nuke
Future reference for those of use who don't use MS SQL.
In PostgreSQL use ORDER BY and LIMIT instead of TOP.
DELETE FROM table
WHERE id NOT IN (SELECT id FROM table ORDER BY id LIMIT n);
MySQL -- well...
Error -- This version of MySQL does not yet support 'LIMIT &
IN/ALL/ANY/SOME subquery'
Not yet I guess.
Here is how I did it. This method is faster and simpler:
Delete all but top n from database table in MS SQL using OFFSET command
WITH CTE AS
(
SELECT ID
FROM dbo.TableName
ORDER BY ID DESC
OFFSET 11 ROWS
)
DELETE CTE;
Replace ID with column by which you want to sort.
Replace number after OFFSET with number of rows which you want to keep.
Choose DESC or ASC - whatever suits your case.
I think using a virtual table would be much better than an IN-clause or temp table.
DELETE
Product
FROM
Product
LEFT OUTER JOIN
(
SELECT TOP 10
Product.id
FROM
Product
) TopProducts ON Product.id = TopProducts.id
WHERE
TopProducts.id IS NULL
This really is going to be language specific, but I would likely use something like the following for SQL server.
declare #n int
SET #n = SELECT Count(*) FROM dTABLE;
DELETE TOP (#n - 10 ) FROM dTable
if you don't care about the exact number of rows, there is always
DELETE TOP 90 PERCENT FROM dTABLE;
I don't know about other flavors but MySQL DELETE allows LIMIT.
If you could order things so that the n rows you want to keep are at the bottom, then you could do a DELETE FROM table LIMIT tablecount-n.
Edit
Oooo. I think I like Cory Foy's answer better, assuming it works in your case. My way feels a little clunky by comparison.
I would solve it using the technique below. The example expect an article table with an id on each row.
Delete article where id not in (select top 1000 id from article)
Edit: Too slow to answer my own question ...
Refactored?
Delete a From Table a Inner Join (
Select Top (Select Count(tableID) From Table) - 10)
From Table Order By tableID Desc
) b On b.tableID = A.tableID
edit: tried them both in the query analyzer, current answer is fasted (damn order by...)
Better way would be to insert the rows you DO want into another table, drop the original table and then rename the new table so it has the same name as the old table
I've got a trick to avoid executing the TOP expression for every row. We can combine TOP with MAX to get the MaxId we want to keep. Then we just delete everything greater than MaxId.
-- Declare Variable to hold the highest id we want to keep.
DECLARE #MaxId as int = (
SELECT MAX(temp.ID)
FROM (SELECT TOP 10 ID FROM table ORDER BY ID ASC) temp
)
-- Delete anything greater than MaxId. If MaxId is null, there is nothing to delete.
IF #MaxId IS NOT NULL
DELETE FROM table WHERE ID > #MaxId
Note: It is important to use ORDER BY when declaring MaxId to ensure proper results are queried.