SQL Server 2008 - Select disjunct rows - sql

I have two concurrent processes and I have two queries, eg.:
select top 10 * into #tmp_member
from member
where status = 0
order by member_id
and then
update member
set process_status = 1
from member inner join #tmp_member m
on member.member_id=m.member_id
I'd like each process to select different rows, so if a row was already selected by the first process, then do not use that one in the second process' result list.
Do I have to play around with locks? UPDLOCK, ROWLOCK, READPAST hints maybe? Or is there a more straightforward solution?
Any help is appreciated,
cheers,
b

You need hints.
See my answer here: SQL Server Process Queue Race Condition
However, you can shorten your query above into a single statement with the OUTPUT clause. Otherwise you'll need a transaction too (asuming each process executes the 2 statements above one after the other)
update m
set process_status = 1
OUTPUT Inserted.member_id
from
(
SELECT top 10
process_status, member_id
from member WITH (ROWLOCK, READPAST, UPDLOCK)
where status = 0
order by member_id
) m
Summary: if you want multiple processes to
select 10 rows where status = 0
set process_status = 1
return a resultset in a safe, concurrent fashion
...then use this code.

Well the problem is that your select/update is not atomic - the second process might select the first 10 items in between the first process having selected and before updating.
There's the OUTPUT clause you can use on the UPDATE statement to make it atomic. See the documentation for details, but basically you can write something like:
DECLARE #MyTableVar table(member_ID INT)
UPDATE TOP (10) Members
SET
member_id = member_id,
process_status = 1
WHERE status = 0
OUTPUT inserted.member_ID
INTO #MyTableVar;
After that #MyTableVar should contain all the updated member IDs.

To meet your goal of having multiple processes work on the member table you will not need to "play around with locks". You will need to change from the #tmp_member table to a global temp table or a permanate table. The table will also need a column to track which process is managing the member row/
You will need a method to provide some kind of ID to each process which will be using the table. The first query will then be modified to exclude any entries in the table by other processes. The second query will be modified to include only those entries by this process

Related

PostgreSQL Update and return

Let's say I have a table called t in Postgres:
id | group_name | state
-----------------------------
1 | group1 | 0
2 | group1 | 0
3 | group1 | 0
I need to update the state of a row by ID, while also returning some things:
The old state
The remaining number of rows in the same group that have state = 0
I've got a query to do this as follows:
UPDATE t AS updated SET state = 1
FROM t as original
WHERE
updated.id = original.id AND
updated.id = :some_id
RETURNING
updated.state AS new_state,
original.state AS old_state,
(
SELECT COUNT(*) FROM t
WHERE
group_name = updated.group_name AND
state = 0
) as remaining_count;
However, it seems like the subquery within RETURNING is executed before the update has completed, leaving me with a remaining_count that is off by 1.
Additionally, I'm not sure how this behaves when concurrent queries are run. If we update two of these rows at the same time, is it possible that they would both return the same remaining_count?
Is there a more elegant solution to this? Perhaps some sort of window/aggregate function?
The subquery is indeed run without seeing the change from the UPDATE, because it's running before the UPDATE has committed, and therefore it's not visible. Nevertheless, it's an easy fix; just add a where clause to filter out the ID you just updated in the subquery, making your query something like this:
UPDATE t AS updated SET state = 1
FROM t as original
WHERE
updated.id = original.id AND
updated.id = :some_id
RETURNING
updated.state AS new_state,
original.state AS old_state,
(
SELECT COUNT(*) FROM t
WHERE
group_name = updated.group_name AND
state = 0 AND
t.id <> :some_id /* this is what I changed */
) as remaining_count;
Concurrency-wise, I'm not sure what the behavior would be, TBH; best I can do is point you at the relevant docs.
You could try (non-recursive) WITH queries, aka Common Table Expressions (CTEs). Their general structure is as follows:
WITH auxiliary_query_name AS (
auxiliary_query_expression;
)
[, WITH ...]
primary_query_expression;
Normally, auxiliary_query_expression and primary_query_expression run concurrently, and if they refer to the same underlying tables, the result is unpredictable. However, you can refer to auxiliary_query_name from within primary_query_expression, and from other auxiliary queries, thus enforcing a run sequence, where the referring query has to wait for the referred one to complete. Some finer points may apply, but that's the gist of it. CTEs also come with the advantage of being computed only once.
Regarding your query specifically, assuming that what you want in the end is the ID of the updated item, its old state, new state, the group it belongs to, and how many other items of that group are left to update, I believe the following would achieve this. I slightly modified the original query to update multiple items at once, to show how this approach shines (beside the clear sequence, it's performance advantages are moot if you update only a single item at a time).
WITH updated_t AS (
UPDATE t AS updated SET state = 1
FROM t as original
WHERE
updated.id = original.id AND
updated.id in :array_of_IDs -- I changed this
RETURNING
updated.id,
original.state AS old_state,
updated.state AS new_state,
updated.group_name
),
WITH remaining AS (
SELECT t.group_name, count(*) as remaining_count
-- we need to JOIN then filter out the updated rows because
-- all WITH in a statement share the same snapshot, thus have
-- the same starting "view" of base tables.
FROM t LEFT JOIN updated_t
ON t.id = updated_t.id
WHERE updated_t.id is NULL
AND t.group_name in (SELECT DISTINCT group_name from updated_t)
AND t.state = 0
GROUP BY group_name
)
SELECT
updated_t.id,
updated_t.group_name,
updated_t.old_state,
updated_t.new_state,
remaining.remaining_count
FROM updated_t, remaining
WHERE
updated_t.group_name = remaining.group_name;

SQL Server Concurrency in update

I have a TABLE:
id status mod runat
1 0 null null
2 0 null null
3 0 null null
4 0 null null
And, I call this query two times, at same time.
UPDATE TABLE
SET
status = 1,
mod = GETDATE()
OUTPUT INSERTED.id
WHERE id = (
SELECT TOP (1) id
FROM TABLE
WHERE STATUS = 0
AND NOT EXISTS(SELECT * FROM TABLE WHERE STATUS = 1)
AND COALESCE(runat, GETDATE()) <= GETDATE()
ORDER BY ID ASC)
And... some times I have:
1
1
Instead
1
NULL
why? Update query isn't transactional?
Short answer
Add WITH (UPDLOCK, HOLDLOCK) to select
UPDATE TABLE
SET
status = 1,
mod = GETDATE()
OUTPUT INSERTED.id
WHERE id = (
SELECT TOP (1) id
FROM TABLE WITH (UPDLOCK, HOLDLOCK)
WHERE STATUS = 0
AND NOT EXISTS(SELECT * FROM TABLE WHERE STATUS = 1)
AND COALESCE(runat, GETDATE()) <= GETDATE()
ORDER BY ID ASC)
Explanation
Because you are using a subquery to get the ID there are basically two statements being run here - a select and an update. When 1 is returned twice it just means both select statements ran before either update was completed. If you add an UPDLOCK, then when the first one runs it holds the UPDLOCK. The second SELECT has to wait for the UPDLOCK to be released by the first select before it can execute.
More information
Exactly what is happening will depending on the locking scheme of your database, and the locks issued by other statements. This kind of update can even lead to deadlocks under certain circumstances.
Because the statements runs so fast it's hard to see what locks they are holding. To effectively slow things down a good trick is to
Open a session and run the first statement with a BEGIN TRANS
statement at the start of it (don't include a COMMIT or ROLLBACK)
Run a query on sys.dm_tran_locks to see what locks are being held
Open a second session and run the second statement and see what
happens. If your locking scheme is setup correctly it should wait
for the first one to finish before it does anything.
Switch back to the first session and COMMIT to simulate it finished
This link has a lot of information but locking and data contention are complex areas with lots of possible solutions. This link should give you everything you need to know to decide how to approach this issue.
https://learn.microsoft.com/en-us/sql/relational-databases/sql-server- transaction-locking-and-row-versioning-guide?view=sql-server-2017

SQL Update only updates one row

This code is only updating one row, why? It has to do with one of the sub-queries but I am not sure. I'm thinking the WHERE..IN in the UPDATE statement but I am not sure.
UPDATE [sde].[sy1].[Valve_evw]
SET [sde].[sy1].[Valve_evw].[MA]
= (SELECT [sde].[sy1].[Valve_Join_evw].[MC]
FROM [sde].[sy1].[Valve_Join_evw])
WHERE [sde].[sy1].[Valve_evw].[PrimaryKey]
IN (SELECT [sde].[sy1].[Valve_Join_evw].[PrimaryKey]
FROM [sde].[sy1].[Valve_Join_evw]
WHERE [sde].[sy1].[Valve_Join_evw].[MA]
!= [sde].[sy1].[Valve_Join_evw].[MC])
Context:
What I am trying to do is update the MA column in Valve_evw using the MC column in Valve_Join_evw. The PrimaryKey in Valve_evw references equivalent rows as the PrimaryKey in Valve_Join_evw. As in, a single row in Valve_Join_evw will have the same PrimaryKey as a single row in Valve_evw, thus that equivalency can be used to update the records in Valve_evw. Also the MA column is equivalent in both tables. [Note: The Valve_Join_evw table is created with ESRI mapping software using the spatial relationship between the Valve_evw and a separate table, this is how the duplicate rows exist]
I am using database views (hence the '_evw') in SQL Server with a default INSTEAD OF UPDATE trigger. This combination, views and trigger, prevents the use of table joins to do this update. I have also tried MERGE but that will not work either. Therefore I am stuck with the ANSI standard, hence the sub-queries. This script runs with no errors but it only updates a single row whereas there are about 9000 thousand rows in the tables.
The output message:
(1 row(s) affected)
(0 row(s) affected)
First of all let's reduce the eye hurting SQL to what it really is:
update sde.sy1.valve_evw
set ma = (select mc from sde.sy1.valve_join_evw)
where primarykey in (select primarykey from sde.sy1.valve_join_evw where ma <> mc)
WHERE clause
We look for all primarykey in valve_join_evw where a record's ma <> mc. We update all valve_evw records with such primarykey.
SET clause
For a record we want to update, we set ma to the value found with:
select mc from sde.sy1.valve_join_evw
But this query has no where clause, so what value does it select to fill the record's ma field? It selects all mc from valve_join_evw, so the DBMS probably picks one of these values arbitrarily. (It would be better, it raised an error.)
Conclusion
It is very easy to see which records the statement will update.
Which primarykey:
select primarykey from sde.sy1.valve_join_evw where ma <> mc
Which rows:
select *
from sde.sy1.valve_evw
where primarykey in (select primarykey from sde.sy1.valve_join_evw where ma <> mc)
As to the SET clause: Add a WHERE clause to your subquery that relates the record to select to the record to update (same ma? same primarykey?) E.g.:
set ma =
(
select mc
from sde.sy1.valve_join_evw vj
where vj.primarykey = valve_evw.primarykey
and vj.ma = valve_evw.ma
)
Hi there i recomend first to do the select statement and when you are ok with te records retrieved use the same where for the update statement
Here is what the final script looks like.
UPDATE [Valve_evw]
SET [Valve_evw].[MA] =
(
SELECT [Valve_Join_evw].[MC]
FROM [Valve_Join_evw]
WHERE[Valve_Join_evw].[PrimaryKey] = [Valve_evw].[PrimaryKey]
)
WHERE [Valve_evw].[PrimaryKey]
IN (
SELECT [Valve_Join_evw].[PrimaryKey]
FROM [Valve_Join_evw]
WHERE [Valve_Join_evw].[MA]
!= [Valve_Join_evw].[MC]
);

SQL Server Empty Result

I have a valid SQL select which returns an empty result, up and until a specific transaction has taken place in the environment.
Is there something available in SQL itself, that will allow me to return a 0 as opposed to an empty dataset? Similar to isNULL('', 0) functionality. Obviously I tried that and it didn't work.
PS. Sadly I don't have access to the database, or the environment, I have an agent installed that is executing these queries so I'm limited to solving this problem with just SQL.
FYI: Take any select and run it where the "condition" is not fulfilled (where LockCookie='777777777' for example.) If that condition is never met, the result is empty. But at some point the query will succeed based on a set of operations/tasks that happen. But I would like to return 0, up until that event has occurred.
You can store your result in a temp table and check ##rowcount.
select ID
into #T
from YourTable
where SomeColumn = #SomeValue
if ##rowcount = 0
select 0 as ID
else
select ID
from #T
drop table #T
If you want this as one query with no temp table you can wrap your query in an outer apply against a dummy table with only one row.
select isnull(T.ID, D.ID) as ID
from (values(0)) as D(ID)
outer apply
(
select ID
from YourTable
where SomeColumn = #SomeValue
) as T
alternet way is from code, you can check count of DataSet.
DsData.Tables[0].Rows.count > 0
make sure that your query matches your conditions

Check number of records in a database table other than count(*)

I want to check if there are any records in a table for a certain entry. I used COUNT(*) to check the number of records and got it to work. However, when the number of records for an entry is very high, my page loads slowly.
I guess COUNT(*) is causing the problem, but how do I check if the records exist without using it? I only want to check whether any records exist for the entry and then execute some code. Please help me find an alternative solution for this.
Thanks for any help.
There are several ways that may work. You can use exists, which lets the database optimise the method to get the answer:
if exists(select * from ...)
You can use top 1 so that the database can stop after finding the first match:
if (select count(*) from (select top 1 * from ...)) > 0
use select top 1 and check is there is an row
You can try selecting the first entry for given condition.
SELECT id FROM table WHERE <condition> LIMIT 1
I'm not sure if this will be quicker but you can try.
Other possible solution. How do you use count? COUNT(*)? If yes, then try using COUNT(id). As I remember this should be faster.
I would recommend testing to see if at least 1 record exists in the table, that meets your criteria then continue accordingly. For example:
IF EXISTS
(
SELECT TOP 1 Table_Name --Or Your ColumnName
FROM INFORMATION_SCHEMA.Tables -- Or your TableName
)
BEGIN
PRINT 'At least one record exists in table'
END
I found this on codeproject. It's quite handy.
-- Author,,Md. Marufuzzaman
SELECT SYS_OBJ.NAME AS "TABLE NAME"
, SYS_INDX.ROWCNT AS "ROW COUNT"
FROM SYSOBJECTS SYS_OBJ, SYSINDEXES SYS_INDX
WHERE SYS_INDX.ID = SYS_OBJ.ID
AND INDID IN(0,1) --This specifies 'user' databases only
AND XTYPE = 'U' --This omits the diagrams table of the database
--You may find other system tables will need to be ommitted,
AND SYS_OBJ.NAME <> 'SYSDIAGRAMS'
ORDER BY SYS_INDX.rowcnt DESC --I found it more useful to display
--The following line adds up all the rowcount results and places
--the final result into a separate column [below the first resulting table]
COMPUTE SUM(SYS_INDX.ROWCNT)
GO
you should use
select count(1) from
If you are saying (*) it will expand all the column's and then count