Manage ValidFrom and ValidTo inside a transaction in SQL Server - sql

I use temporal table in SQL Server to store some historical data.
According to the doc, all the actions in the transaction will be mark at the same time when first transaction begin. But I'm not happy with this behavior, I need to do 2-3 different jobs and each job must has different timestamp or some logic will fail, it also need to rollback if one job failed.
Example, the result in database is like this
Column A
ValidFrom
Result1
timestamp1
Result2
timestamp1
Result3
timestamp1
What I need is the result from 3 job will be in different timestamp although 3 job is run in the same transaction.
Column A
ValidFrom
Result1
timestamp1
Result2
timestamp2
Result3
timestamp3
Is there any way rollback some action in temporal table without running in transaction. Or is there any way to change the system time to make all action not to be the same timestamp?
Currently, I temporary disable transaction when do those jobs. But it too risk if one of job is failed.

Related

Synchronization of queries to a MariaDB database

Because of some high-availability considerations, I design a system where multiple processes will communicate/synchronize via the database (most likely MariaDB, but I am open to looking into PostgreSQL and MySQL options).
One of the requirements identified is that a process must take a piece of work from the database, without letting another process take the same piece of work concurrently.
Specifically, here is the race condition I have in mind:
Process A starts a SQL transaction and runs SELECT * FROM requests WHERE ReservedTS IS NULL ORDER BY CreatedTS LIMIT 100. Here ReservedTS and CreatedTS are DATETIME columns storing the time the piece of work was created by a work submitter process and reserved by a work executor process correspondingly.
Process B starts a transaction, runs the same query and gets the same set of results.
Process A runs UPDATE requests WHERE id IN (<list of IDs selected above>) AND ReservedTS IS NULL SET ReservedTS=NOW()
Process B runs the same query, however, because its transaction has its own snapshot of the data, the ReservedTS will appear not null to Process B, so the items get reserved twice.
Process A commits the transaction.
Process B commits the transaction, overwriting the values of process A.
Could you please help to resolve the above data race?
You can easily do that by using exclusive locks:
For simplification the test table:
CREATE TABLE t1 (id int not null auto_increment primary key, reserved int);
INSERT INTO t1 VALUES (0,0), (1,0);
Process A:
BEGIN
SELECT id, reserved from t1 where id=2 and reserved=0 FOR UPDATE;
UPDATE t1 SET reserved=1 WHERE id=2 and reserved=0;
COMMIT
If Process B tries to update the same entry before Process A finished the transaction it has to wait until lock was released (or a timeout occurred):
update t1 set reserved=1 where id=2 and reserved=0;
Query OK, 0 rows affected (12.04 sec)
Rows matched: 0 Changed: 0 Warnings: 0
And as you can see, Process B didn't update anything.

In sybase, how would I lock a stored procedure that is executing and alter the table that the stored procedure returns?

I have a table as follows:
id status
-- ------
1 pass
1 fail
1 pass
1 na
1 na
Also, I have a stored procedure that returns a table with top 100 records having status as 'na'. The stored procedure can be called by multiple nodes in an environment and I don't want them to fetch duplicate data. So, I want to lock the stored procedure while it is executing and set the status of the records obtained from the stored procedure to 'In Progress' and return that table and then release the lock, so that different nodes don't fetch the same data. How would I accomplish this?
There is already a solution provided for similar question in ms sql but it shows errors when using in sybase.
Assuming Sybase ASE ...
The bigger issue you'll likely want to consider is whether you want a single process to lock the entire table while you're grabbing your top 100 rows, or if you want other processes to still access the table?
Another question is whether you'd like multiple processes to concurrently pull 100 rows from the table without blocking each other?
I'm going to assume that you a) don't want to lock the entire table and b) you may want to allow multiple processes to concurrently pull rows from the table.
1 - if possible, make sure the table is using datarows locking (default is usually allpages); this will reduce the granularity of locks to the row level (as opposed to page level for allpages); the table will need to be datarows if you want to allow multiple processes to concurrently find/update rows in the table
2 - make sure the lock escalation setting on the table is high enough to ensure a single process's 100 row update doesn't lock the table (sp_setpglockpromote for allpages, sp_setrowlockpromote for datarows); the key here is to make sure your update doesn't escalate to a table-level lock!
3 - when it comes time to grab your set of 100 rows you'll want to ... inside a transaction ... update the 100 rows with a status value that's unique to your session, select the associated id's, then update the status again to 'In Progress'
The gist of the operation looks like the following:
declare #mysession varchar(10)
select #mysession = convert(varchar(10),##spid) -- replace ##spid with anything that
-- uniquely identifies your session
set rowcount 100 -- limit the update to 100 rows
begin tran get_my_rows
-- start with an update so that get exclusive access to the desired rows;
-- update the first 100 rows you find with your ##spid
update mytable
set status = #mysession -- need to distinguish your locked rows from
-- other processes; if we used 'In Progress'
-- we wouldn't be able to distinguish between
-- rows update earlier in the day or updated
-- by other/concurrent processes
from mytable readpast -- 'readpast' allows your query to skip over
-- locks held by other processes but it only
-- works for datarows tables
where status = 'na'
-- select your reserved id's and send back to the client/calling process
select id
from mytable
where status = #mysession
-- update your rows with a status of 'In Progress'
update mytable
set status = 'In Progress'
where status = #mysession
commit -- close out txn and release our locks
set rowcount 0 -- set back to default of 'unlimited' rows
Potential issues:
if your table is large and you don't have an index on status then your queries could take longer than necessary to run; by making sure lock escalation is high enough and you're using datarows locking (so the readpast works) you should see minimal blocking of other processes regardless of how long it takes to find the desired rows
with an index on the status column, consider that all of these updates are going to force a lot of index updates which is probably going to lead to some expensive deferred updates
if using datarows and your lock escalation is too low then an update could look the entire table, which would cause another (concurrent) process to readpast the table lock and find no rows to process
if using allpages you won't be able to use readpast so concurrent processes will block on your locks (ie, they won't be able to read around your lock)
if you've got an index on status, and several concurrent processes locking different rows in the table, there could be a chance for deadlocks to occur (likely in the index tree of the index on the status column) which in turn would require your client/application to be coded to expect and address deadlocks
To think about:
if the table is relatively small such that table scanning isn't a big cost, you could drop any index on the status column and this should reduce the performance overhead of deferred updates (related to updating the indexes)
if you can work with a session specific status value (eg, 'In Progress - #mysession') then you could eliminate the 2nd update statement (could come in handy if you're incurring deferred updates on an indexed status column)
if you have another column(s) in the table that you could use to uniquely identifier your session's rows (eg, last_updated_by_spid = ##spid, last_updated_date = #mydate - where #mydate is initially set to getdate()) then your first update could set the status = 'In Progress', the select would use ##spid and #mydate for the where clause, and the second update would not be needed [NOTE: This is, effectively, the same thing Gordon is trying to address with his session column.]
assuming you can work with a session specific status value, consider using something that will allow you to track, and fix, orphaned rows (eg, row status remains 'In Progress - #mysession' because the calling process died and never came back to (re)set the status)
if you can pass the id list back to the calling program as a single string of concatenated id values you could use the method I outline in this answer to append the id's into a #variable during the first update, allowing you to set status = 'In Progress' in the first update and also allowing you to eliminate the select and the second update
how would you tell which rows have been orphaned? you may want the ability to update a (small)datetime column with the getdate() of when you issued your update; then, if you would normally expect the status to be updated within, say, 5 minutes, you could have a monitoring process that looks for orphaned rows where status = 'In Progress' and its been more than, say, 10 minutes since the last update
If the datarows, readpast, lock escalation settings and/or deadlock potential is too much, and you can live with brief table-level locks on the table, you could have the process obtain an exclusive table level lock before performing the update and select statements; the exclusive lock would need to be obtained within a user-defined transaction in order to 'hold' the lock for the duration of your work; a quick example:
begin tran get_my_rows
-- request an exclusive table lock; wait until it's granted
lock table mytable in exclusive mode
update ...
select ...
update ...
commit
I'm not 100% sure how to do this in Sybase. But, the idea is the following.
First, add a new column to the table that represents the session or connection used to change the data. You will use this column to provide isolation.
Then, update the rows:
update top (100) t
set status = 'in progress',
session = #session
where status = 'na'
order by ?; -- however you define the "top" records
Then, you can return or process the 100 ids that are "in progress" for the given connection.
Create another table, proc_lock, that has one row
When control enters the stored procedure, start a transaction and do a select for update on the row in proc_lock (see this link). If that doesn't work for Sybase, then you could try the technique from this answer to lock the row.
Before the procedure exits, make sure to commit the transaction.
This will ensure that only one user can execute the proc at a time. When the second user tries to execute the proc, it will block until the first user's lock on the proc_lock row is released (e.g. when transaction is committed)

Select and insert at the same moment SQL 2016

I have simple table in my application where is
ID | VALUE | DATE | ITEMID
All the time some items are inserting data threw my webservice to this table. I am inserting only data that is 5 or more minutes old. At the same time when I am using my web application I am selecting some max, min and current values for items. From time to time I got
Transaction (Process ID 62) was deadlocked on lock resources with
another process and has been chosen as the deadlock victim. Rerun the
transaction.
Is there any chance to get data and for that moment when I am selecting, don't care about inserts?
You can use with(nolock) hint,
This will get you uncommitted data.
See the answer here
What is "with (nolock)" in SQL Server?
And here
http://sqlserverplanet.com/tsql/using-with-nolock
If using SQL Server 2016:
Problem:
When we write a SELECT statement, shared locks are held on resources - say the TableX from which max, min and current values are being selected. Now for this moment if you INSERT data in these locked tables (TableX), then because INSERT requires an exclusive lock on the resource (TableX), the INSERT statement will wait for the SELECT to finish.
Solution:
As suggested in posts below, we can use WITH (NOLOCK) or WITH (READ UNCOMMITTED) table hints so that SELECT statements don't hold a lock on the underlying resources (TableX). But the problem is, this SELECT might read uncommitted data that was changed during a transaction and then later was rolled back to its initial state (also called reading dirty data).
If you want to make sure that the SELECT only reads committed data and also doesn't block the INSERTs to write to TableX, then at database level SET READ_COMMITTED_SNAPSHOT ON.
You can change the SELECT query to avoid deadlocks as:
SELECT ITEMID, MAX(VALUE), MIN(VALUE)
FROM table_name WITH (NOLOCK)
GROUP BY ITEMID, DATE

Procedure containing transaction leading to deadlock when executed in parallel in SQL Server

Current scenario: we have a T-SQL procedure which fetches unprocessed data from Table1 and dumps into Table2 with certain manipulations on each record. The entire process is being executed in a transaction and a flag is maintained to signify if a record has been processed.
A SQL Server job has been created to execute this procedure at an interval of 2 min. (another job has been created to load Table1 with fresh data at an interval of 1 min)
In case when the first time job execution requires more than 2 min, the same job gets triggered again, this is leading to a dead lock.
The transaction within the T-SQL is for each record so shouldn't the table be released after processing each record so that even the same same job is triggered again it would be able to read the unprocessed rows.
Database isolation level maintained is "Read committed snapshot"
We were under the assumption that SQL Server would allow the process to run in parallel. What are we doing wrong here?
Note: we are a bunch of naive developers experimenting with SQL Server. Let us know if we are being dumb!
Here is the gist of what the procedure does
pr_process_data
begin
select column1, column2, column3... from `table1`
fetch into cursor
open cursor
while fetch status = 0
begin loop
begin transaction
"queries to validate datatype, length of all the columns"
if validation succeeds
update `table1' valid flag = 'y'
else
update `table1` valid flag = 'n'
commit
end loop
insert into `table2` select * from `table1' where valid = 'y'
end
The number of columns may vary anywhere between 20 to 40 and the number of validation can be anywhere between 3-5
Hope the example is clear enough!

How can I determine the actual database row insertion order?

I have a multithreaded process which inserts several records into a single table. The inserts are performed in a stored procedure, with the sequence being generated INTO a variable, and that variable is later used inside of an INSERT.
Given that I'm not doing mysequence.nextval inside the INSERT itself, it makes me think that it is possible for two concurrent processes to grab a sequence in one order, then do the inserts in the reverse order. If this is the case, then the sequence numbers will not reflect the true order of insertion.
I also record the sysdate in a DATE column for each of my inserts, but I've noticed that often times the dates for two records match and I need to sort by the sequence number to break the tie. But given the previous issue, this doesn't seem to guarantee the actual insert order.
How can I determine the absolute order of insertion into the database?
DATE datatypes only go to seconds, whereas TIMESTAMP goes to milliseconds. Would that address the problem?
According to Oracle's docs:
TIMESTAMP: Year, month, and day values
of date, as well as hour, minute, and
second values of time, where
fractional_seconds_precision is the
number of digits in the fractional
part of the SECOND datetime field.
Accepted values of
fractional_seconds_precision are 0 to
9. The default is 6. The default format is determined explicitly by the
NLS_DATE_FORMAT parameter or
implicitly by the NLS_TERRITORY
parameter. The sizes varies from 7 to
11 bytes, depending on the precision.
This datatype contains the datetime
fields YEAR, MONTH, DAY, HOUR, MINUTE,
and SECOND. It contains fractional
seconds but does not have a time zone.
Whereas date does not:
DATE: Valid date range from January 1,
4712 BC to December 31, 9999 AD. The
default format is determined
explicitly by the NLS_DATE_FORMAT
parameter or implicitly by the
NLS_TERRITORY parameter. The size is
fixed at 7 bytes. This datatype
contains the datetime fields YEAR,
MONTH, DAY, HOUR, MINUTE, and SECOND.
It does not have fractional seconds or
a time zone.
Of course, having said that, I am not sure why it matters when the records were written, but that is a way that might solve your problem.
Sequence should be thread safe:
create table ORDERTEST (
ORDERID number not null ,
COLA varchar2(10) ,
INSERTDATE date default sysdate,
constraint ORDERTEST_pk primary key (orderid)
) ;
create sequence ORDERTEST_seq start with 1 nocycle nocache ;
insert into ORDERTEST (ORDERID, COLA, INSERTDATE)
select ORDERTEST_SEQ.NEXTVAL , substr(OBJECT_NAME,1,10), sysdate
from USER_OBJECTS
where rownum <= 5; --just to limit results
select *
from ORDERTEST
order by ORDERID desc ;
ORDERID COLA INSERTDATE
---------------------- ---------- -------------------------
5 C_COBJ# 16-JUL-10 12.15.36
4 UNDO$ 16-JUL-10 12.15.36
3 CON$ 16-JUL-10 12.15.36
2 I_USER1 16-JUL-10 12.15.36
1 ICOL$ 16-JUL-10 12.15.36
now in a different session:
insert into ORDERTEST (ORDERID, COLA, INSERTDATE)
select ORDERTEST_SEQ.NEXTVAL , substr(OBJECT_NAME,1,10), sysdate
from USER_OBJECTS
where rownum <= 5; --just to limit results
select *
from ORDERTEST
order by ORDERID desc ;
5 rows inserted
ORDERID COLA INSERTDATE
---------------------- ---------- -------------------------
10 C_COBJ# 16-JUL-10 12.17.23
9 UNDO$ 16-JUL-10 12.17.23
8 CON$ 16-JUL-10 12.17.23
7 I_USER1 16-JUL-10 12.17.23
6 ICOL$ 16-JUL-10 12.17.23
The Oralce Sequence is thread safe:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/views.htm#ADMIN020
"If two users are accessing the same sequence concurrently, then the sequence numbers each user receives might have gaps because sequence numbers are also being generated by the other user." the numbers may not be 1,2,3,4,5 (as in my example --> if you fear this you can up the cache)
this can also help, although they do not site their source:
http://forums.oracle.com/forums/thread.jspa?threadID=910428
"the sequence is incremented immediately and permanently, whether you commit or roll back the transaction. Concurrent access of NextVal on a sequence will always return separate values to each caller."
If your fear is the inserts will be out of order and you need the sequence value use the returning clause:
declare
x number ;
begin
insert into ORDERTEST (ORDERID, COLA, INSERTDATE)
values( ORDERTEST_SEQ.NEXTVAL , 'abcd', sysdate)
returning orderid into x;
dbms_output.put_line(x);
end;
--11
then you know it got inserted right then and there.
There are several effects going on. Todays computers can execute so many operations per second that the timers can't keep up. Also, getting the current time is a somewhat expensive operation so you have gaps that can last several milliseconds until the value changes. That's why you get the same sysdate for different rows.
Now to solve your insert problem. Calling nextval on a sequence is guaranteed to remove this value from the sequence. If two threads call nextval several times, you can get interleaved numbers (i.e. thread 1 will see 1 3 4 7 and thread 2 will see 2 5 6 8) but you can be sure that each thread will get different numbers.
So even if you don't use the result of nextval immediately, you should be safe. As for the "absolute" insert order in the database, this might be hard to tell. For example, a DB could keep the rows in a cache before writing them to disk. The rows could be reordered to optimize disk access. But as long as you assign the results from nextval to your rows in the order in which you insert them, this shouldn't matter and they should always appear to be inserted in order.
While there may be some concept of insertion order in to a database, there is certainly no concept of retrieval order. Any rows that come back from the database will come back in whatever order the DB sees fit to return them in, and this may or may not have ANYTHING to do with the order they were inserted in to the database. Also, the order that rows are inserted in to the DB may have little to nothing related to how they are physically stored on disk.
Relying upon any order from a DB query without the use of an ORDER BY clause is folly. If you wish to be certain of any order, you need to maintain that relationship at a formal level (sequences, timestamps, whatever) in your logic when creating the records for insertion.
If the transactions are separate, you can determine this from the ora_rowscn pseudo-column for the table.
[Edit]
Some more detail, and I'll delete my answer if this is not of use - unless you created the table with the non-default "rowdependencies" clause, you'll have other rows from the block tagged with the scn, so this may be misleading. If you really want this information without an application change you'll have to rebuild the table with this clause.
Given your description of the issue you're trying to resolve, I would think that the sequences would be fine. If you have two processes that call the the same stored procedure and the second (chronological) one finishes first for some reason is that actually relevant? I would think the order in which the procedures were called (which will be reflected by the sequence (unless you're using RAC)) would be more meaningful than the order in which they were written to the database.
If you're really worried about the sequence the rows were inserted in, then you need to look at when the commits were issued, not when the inserts statements were issued. Otherwise you have the following scenario as a possibility:
Transaction 1 is started
Transaction 2 is started
Transaction 3 is started
Transaction 2 inserts
Transaction 1 inserts
Transaction 3 inserts
Transaction 3 commits
Transaction 1 commits
Transaction 2 commits
In this case Transaction 1 was started first, Transaction 2 inserted first, and Transaction 3 committed first. The sequence number gives you a good idea of the order in which the transactions were started. A timestamp field will give you an idea of when the inserts were issued. The only reliable way to get an order for the commits is to serialize writes to the table, which is generally a bad idea (it removes scalability).
You should (a) add the timestamp to each record, and (b) move the sequence NEXTVAL to the INSERT statement.
That way, when you query the table, you can ORDER BY timestamp, id, which will effectively be the order in which the rows were actually inserted.