Suppose, I am about to start a project using ASP.NET and SQL Server 2005. I have to design the concurrency requirement for this application. I am planning to add a TimeStamp column in each table. While updating the tables I will check that the TimeStamp column is same, as it was selected.
Will this approach be suffice? Or is there any shortcomings for this approach under any circumstances?
Please advice.
Thanks
Lijo
First of all the way which you describe in your question is in my opinion the best way for ASP.NET application with MS SQL as a database. There is no locking in the database. It is perfect with permanently disconnected clients like web clients.
How one can read from some answers, there is a misunderstanding in the terminology. We all mean using Microsoft SQL Server 2008 or higher to hold the database. If you open in the MS SQL Server 2008 documentation the topic "rowversion (Transact-SQL)" you will find following:
"timestamp is the synonym for the
rowversion data type and is subject to
the behavior of data type synonym." …
"The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature."
So timestamp data type is the synonym for the rowversion data type for MS SQL. It holds 64-bit the counter which exists internally in every database and can be seen as ##DBTS. After a modification of one row in one table of the database, the counter will be incremented.
As I read your question I read "TimeStamp" as a column name of the type rowversion data. I personally prefer the name RowUpdateTimeStamp. In AzManDB (see Microsoft Authorization Manager with the Store as DB) I could see such name. Sometimes were used also ChildUpdateTimeStamp to trace hierarchical RowUpdateTimeStamp structures (with respect of triggers).
I implemented this approach in my last project and be very happy. Generally you do following:
Add RowUpdateTimeStamp column to every table of you database with the type rowversion (it will be seen in the Microsoft SQL Management Studio as timestamp, which is the same).
You should construct all you SQL SELECT Queries for sending results to the client so, that you send additional RowVersion value together with the main data. If you have a SELECT with JOINTs, you should send RowVersion of the maximum RowUpdateTimeStamp value from both tables like
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN s.RowUpdateTimeStamp
ELSE m.RowUpdateTimeStamp
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
Or make a data casting like following
SELECT s.Id AS Id
,s.Name AS SoftwareName
,m.Name AS ManufacturerName
,CASE WHEN s.RowUpdateTimeStamp > m.RowUpdateTimeStamp
THEN CAST(s.RowUpdateTimeStamp AS bigint)
ELSE CAST(m.RowUpdateTimeStamp AS bigint)
END AS RowUpdateTimeStamp
FROM dbo.Software AS s
INNER JOIN dbo.Manufacturer AS m ON s.Manufacturer_Id=m.Id
to hold RowUpdateTimeStamp as bigint, which corresponds ulong data type of C#. If you makes OUTER JOINTs or JOINTs from many tables, the construct MAX(RowUpdateTimeStamp) from all tables will be seen a little more complex. Because MS SQL don't support function like MAX(a,b,c,d,e) the corresponding construct could looks like following:
(SELECT MAX(rv)
FROM (SELECT table1.RowUpdateTimeStamp AS rv
UNION ALL SELECT table2.RowUpdateTimeStamp
UNION ALL SELECT table3.RowUpdateTimeStamp
UNION ALL SELECT table4.RowUpdateTimeStamp
UNION ALL SELECT table5.RowUpdateTimeStamp) AS maxrv) AS RowUpdateTimeStamp
All disconnected clients (web clients) receive and hold not only some rows of data, but RowVersion (type ulong) of the data row.
In one try to modify data from the disconnected client, you client should send the RowVersion corresponds to the original data to server. The spSoftwareUpdate stored procedure could look like
CREATE PROCEDURE dbo.spSoftwareUpdate
#Id int,
#SoftwareName varchar(100),
#originalRowUpdateTimeStamp bigint, -- used for optimistic concurrency mechanism
#NewRowUpdateTimeStamp bigint OUTPUT
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
-- ExecuteNonQuery() returns -1, but it is not an error
-- one should test #NewRowUpdateTimeStamp for DBNull
SET NOCOUNT ON;
UPDATE dbo.Software
SET Name = #SoftwareName
WHERE Id = #Id AND RowUpdateTimeStamp <= #originalRowUpdateTimeStamp
SET #NewRowUpdateTimeStamp = (SELECT RowUpdateTimeStamp
FROM dbo.Software
WHERE (##ROWCOUNT > 0) AND (Id = #Id));
END
Code of dbo.spSoftwareDelete stored procedure look like the same. If you don’t switch on NOCOUNT, you can produce DBConcurrencyException automatically generated in a lot on scenarios. Visual Studio gives you possibilities to use optimistic concurrency like "Use optimistic concurrency" checkbox in Advanced Options of the TableAdapter or DataAdapter.
If you look at dbo.spSoftwareUpdate stored procedure carful you will find, that I use RowUpdateTimeStamp <= #originalRowUpdateTimeStamp in WHERE instead of RowUpdateTimeStamp = #originalRowUpdateTimeStamp. I do so because, the value of #originalRowUpdateTimeStamp which has the client typically are constructed as a MAX(RowUpdateTimeStamp) from more as one tables. So it can be that RowUpdateTimeStamp < #originalRowUpdateTimeStamp. Either you should use strict equality = and reproduce here the same complex JOIN statement as you used in SELECT statement or use <= construct like me and stay exact the same safe as before.
By the way, one can construct very good value for ETag based on RowUpdateTimeStamp which can sent in HTTP header to the client together with data. With the ETag you can implement intelligent data caching on the client side.
I can’t write whole code here, but you can find a lot of examples in Internet. I want only repeat one more time that in my opinion usage optimistic concurrency based on rowversion is the best way for the most of ASP.NET scenarios.
In SQL Server a recommended approach for type of situation is to create a column of type 'rowversion' and use that to check if any of the fields in that row have changed.
SQL Server guarantees that if any value in the row changes (or a new row is inserted) it's rowversion column will automatically updated to different value. Letting the database handle this for you is much more reliable than trying to do it yourself.
In your update statements you simply need to add a where clause to check that the rowversion value is the same as it was when you first retrieved the row. If not, someone else has changed the row (ie: it's dirty)
Also, from that page:
The timestamp syntax is deprecated.
This feature will be removed in a
future version of Microsoft SQL
Server. Avoid using this feature in
new development work, and plan to
modify applications that currently use
this feature.
I'm not sure that concurrency should be handled in the database like this. The database itself should be able manage isolation and transactional behavior, but the threading behavior ought to be done in code.
rowversion suggestion is correct I would say but its disappointing to see that timestamp will be deprecated soon. Some of my OLD applications are using this for different reasons then checking concurrency.
Related
I am using SSDT 2017 and I am working on a solution that basically gets a full result set from a query into a variable (1 column only: AccountID), and I need to include the values in that object variable in a query, something like this:
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
I tried with an expression but I get an error, so I am not sure if there's a better way, also I tried a for each loop container logic but since I have millions of record in the object variable I think that's not the best way.
Any idea?
It doesn't work that way. Where "it" is going to be a host of things.
The SSIS data types are primitive types (boolean, date, numbers) or Object. The only supported operations for Object is a null check and enumeration.
SSIS parameterization is only for equality based substitutions. There is no concept of a list data type in SQL so there's no analog in SSIS.
I have millions of record in the object variable
Even if you converted your list to a string and used string concatenation, the next problem you're going to run into is the string length limit of 4000 characters.
What is the way?
Let's reset the problem: You have a non-trivial set of identities from a source system. That set of ids needs to be used as the basis for a subsequent extract.
Is the source of identities and the actual data on the same server
While you can empty the ocean with a teaspoon, it's not the correct tool. Same holds true here. Move the query that identifies the recordset to be extracted into a filter condition for your source.
i.e.
Load dataset into #AccountIDObjectVariable
SELECT
OA.AccountId
FROM
dbo.OutstandingAccount AS OA;
Extract that isn't working
"SELECT * FROM dbo.account WHERE AccountID IN (" + #AccountIDObjectVariable + ")"
is rewritten as
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM dbo.OutstandingAccount AS OA WHERE OA.AccountID = A.AccountID);
There are two reasonable approaches for solving this
Pull it all
If the source ids list and the source table are of similar orders of magnitude, it might be easier to just bring it all down and use the account id generating query in a Lookup Task. If AccountID exists, then it's the data you want. Yes you pulled more than you wanted but you likely would have burned more cycles and complexity trying to selectively pull what you wanted.
Push and pull
This approach is going to work for SQL Server and I have no idea about any other database. Well, I suppose Sybase would be the same given database paternity.
Open SSMS and create a global temporary table on the database where dbo.account lives. Do not disconnect from SSMS.
IF OBJECT_ID('tempdb..##SO_66961235') IS NOT NULL
BEGIN
DROP TABLE ##SO_66961235;
END
GO
CREATE TABLE ##SO_66961235
(
AccountID int NOT NULL
);
Modify the Connection manager to set the RetainSameConnection Property to true for the database connection to dbo.account
Execute SQL Task - Make Temp Table
Use the connection to the account database and the above query. This will ensure the table exists for future sessions of SSIS to work.
DataFlow Load IDs
In the dataflow properties, set DelayValidation to True
Use your source query to generate the list of IDs and select the temporary table as the destination. You might need to have a second connection manager to this system running and pointed at tempdb, it's been a long time since I've done this. Same rule about RetainSameConnection will hold true though.
When this data flow completes, then we will have a temporary table on the data source server that we can reference.
Dataflow 2 Get Data
Again, DelayValidation to true.
Source will be a query
SELECT * FROM dbo.account AS A WHERE EXISTS (SELECT * FROM ##SO_66961235 AS OA WHERE OA.AccountID = A.AccountID);
What's with all the delay validation?
When the SSIS package starts, the first thing it does is ensure all the pieces are in place for it to run successfully and not only are the pieces in place, is the shape of the data still the same? A temporary table won't exist when the package starts and the package will fail with VS_NEEDSNEWMETADATA error. Setting DelayValidation tells SSIS that it should not worry about checking until the component actually gets the signal to start before it checks metadata. Since we defined the precursor Execute SQL Task to create the table, the validation should succeed.
I used global temporary tables here. You can use local scoped temporary tables but it makes the already fiddly design process much more so. Were it me, I'd have a package parameter controlling a boolean that uses a global temp table for development sessions and local temp table for actual run-time operations but that's beyond the scope of this question.
create procedure tempsproc
as
select t1.c1
from #t
join t2 on #t.c2 = t3.c3
The select clause references a table which is not mentioned in the from clause.
I have heard of deferred name resolution but I do not see how the above select could ever work no matter what tables are existing at runtime.
The on clause also references tables which are not mentioned in the from clause.
The above SQL compiles without error.
The problem only comes to light at run time - not what you want
What do I need to do in order for the above procedure to be rejected by SQL Server at compile time?
PS: this is on a SQL Server 2008 R2 sp3 system
This could work (from the SQL compilers perspective) if a column named t1 is added to either of the tables and a type method named c1 were also added to the database.
Since the compiler does not know what names and methods might be added in the future, the rules of Deferred-Name Resolution say that it has to accept it.
In short, it's not a syntax-error so it will NOT be rejected.
Problems like this really should be picked-up during debugging, but if you want to catch when you are parsing and saving your stored procedures, you can get it mostly done by doing something like this:
create procedure tempsproc
as
select t1.c1
from #t
join t2 on #t.c2 = t3.c3
go
BEGIN TRANSACTION
EXEC tempsproc;
ROLLBACK TRANSACTION
go
TYPE METHODS
I have been asked to explain about the Type Methods, so here it is.
Most of the newer data types, such as XML and Spatial include special methods that can only be used on these data types. Here's an example from Microsoft:
CREATE TABLE SpatialTable
( id int IDENTITY (1,1),
GeomCol1 geometry,
GeomCol2 AS GeomCol1.STAsText() );
GO
INSERT INTO SpatialTable (GeomCol1)
VALUES (geometry::STGeomFromText('LINESTRING (100 100, 20 180, 180 180)', 0));
This shows two very different method formats (GeomCol1.STAsText() and geometry::STGeomFromText('...')), which highlights another point.
This syntax was to comply with certain standards (OGC in this case). The XML methods have to comply with a different standard. Other data types they add in the future may have to comply with still other standards, which means that they have to be pretty flexible about what the allowable syntax for a method might be, including whether or not it has any parenthesis.
Finally, you may wonder, "But don't they know ahead of time what data-type methods exist?" Surprisingly, the answer is "NO", because SQL Server allows new data-types to be added to existing servers and databases. So, for instance, XML data-types were originally an optional add-on(extension) to SQL Server (they were then built-in to the next major release). Further I believe (not sure) that customers and third-parties can also make their own and add them to an existing database.
I've almost seen every post concerning this question but haven't captured the best one. Some of them recommend using Identity but some triggers to perform incrementing integer column.
I'd like also to use triggers as there will be more delete happen in my table in this case. In addition, as I have mainly come from Interbase DBMS where I used to create a before insert trigger on table this issue sucks until now as I migrated from Interbase to MS SQL Server.
This is how I did in Interbase
CREATE trigger currency_bi for currency
active before insert position 0
AS
declare variable m integer;
begin
select max(id)+1 from currency into :m;
if (:m is NULL ) then m=1;
new.id=:m;
end
So, as I should frequently use this, which is the best way to create a trigger that increments integer column using max(id)+1 ?
Don't use triggers to do this, it will either kill the performance or cause all sorts of concurrency problems, depending on your use of transactions and locking.
It's better to use one of mechanisms available in the engine -- identity property or sequence object.
If you're running a newer version of SQL Server, with sequence feature available, use sequence. It will allow you to reserve a range of ids from the client applcation, and assign them to new rows on the client, before sending them to server for insert.
Always use Identity option , because as you told that you frequently delete the record, in this case trigger will some time give wrong information ( Called Isolation level).
Suppose one transaction delete the highest one record and just before or same time your trigger fired. So it get the deleted highest record which is not exist after few second.
So when you fired select query, it show the gap which is wrong.
Sqlserver give the inbuilt mechanism of this type of situation with auto identity true option.
http://mrbool.com/understanding-auto-increment-in-sql-server/29171
You donot bother about this. Also draw back of trigger is if multiple insert happened, then it always fired after the last insert statement.
Try to never use trigger , as it is harmful and not controllable.
Still if you want , then add in your insert statement , not use trigger
How can I auto-increment a column without using IDENTITY?
Is it possible to find date+time when a row was inserted into a table in SQL Server 2005?
Does SQL Server log insert commands?
Whenever I create a table, I alway include the following two columns:
CreatedBy varchar(255) default system_name,
CreatedAt datetime default getdate()
Although this uses a bit of extra space, I've found that the information proves very, very useful over time.
Your question is about the log. The answer is "yes". However, whether you can get the information depends on your recovery mode. If simple, then the records are overwritten for the next transaction. If bulk or full, then the information is in the log, at least since the last incremental backup.
You can derive insert date as long as you are using cdc created functions to pull actual data records.
So for example if you will pull something like:
DECLARE #from_lsn binary(10), #to_lsn binary(10);
SET #from_lsn=sys.fn_cdc_get_min_lsn ( 'name_of_your_cdc_instance_on_cdc_table' );
SET #to_lsn=sys.fn_cdc_get_max_lsn();
SELECT * FROM
cdc.fn_cdc_get_net_changes_name_of_your_cdc_instance_on_cdc_table
(
#from_lsn,
#to_lsn,
N'all'
);
You can use cdc built in function sys.fn_cdc_map_lsn_to_time to convert Log Sequence Number to datetime. Below usecase:
SELECT sys.fn_cdc_map_lsn_to_time(__$start_lsn),* FROM
cdc.fn_cdc_get_net_changes_name_of_your_cdc_instance_on_cdc_table
(
#from_lsn,
#to_lsn,
N'all'
);
you can have a InsertDate default getdate() column on your table, that would be the easiest approach.
On SQl Server 2008 you can use CDC to control changed data on your table
Change data capture records insert, update, and delete activity that
is applied to a SQL Server table. This makes the details of the
changes available in an easily consumed relational format. Column
information and the metadata that is required to apply the changes to
a target environment is captured for the modified rows and stored in
change tables that mirror the column structure of the tracked source
tables. Table-valued functions are provided to allow systematic access
to the change data by consumers.
Suppose I have a database table that has a timedate column of the last time it was updated or inserted. Which would be preferable:
Have a trigger update the field.
Have the program that's doing the insertion/update set the field.
The first option seems to be the easiest since I don't even have to recompile to do it, but that's not really a huge deal. Other than that, I'm having trouble thinking of any reasons to do one over the other. Any suggestions?
The first option can be more robust because the database will be maintaining the field. This comes with the possible overhead of using triggers.
If you could have other apps writing to this table in the future, via their own interfaces, I'd go with a trigger so you're not repeating that logic anywhere else.
If your app is pretty much it, or any other apps would access the database through the same datalayer, then I'd avoid that nightmare that triggers can induce and put the logic directly in your datalayer (SQL, ORM, stored procs, etc.).
Of course you'd have to make sure your time-source (your app, your users' pcs, your SQL server) is accurate in either case.
Regarding why I don't like triggers:
Perhaps I was rash by calling them a nightmare. Like everything else, they are appropriate in moderation. If you use them for very simple things like this, I could get on board.
It's when the trigger code gets complex (and expensive) that triggers start to cause lots of problems. They are a hidden tax on every insert/update/delete query you execute (depending on the type of trigger). If that tax is acceptable then they can be the right tool for the job.
You didn't mention 3. Use a stored procedure to update the table. The procedure can set timestamps as desired.
Perhaps that's not feasible for you, but I didn't see it mentioned.
As long as I'm using a DBMS in whose triggers I trust, I'd always go with the trigger option. It allows the DBMS to take care of as many things as possible, which is usually a good thing.
It work make sure under any circumstances that the timestamp column has the correct value. The overhead would be negligible.
The only thing that would be against triggers is portability. If that's not an issue, I don't think there is a question which direction to go.
I would say trigger just in case that someone uses something besides your app to update the table, you probably also want to have a LastUpdatedBy and use SUSER_SNAME() for that, this way you can see who did the update
I'm a proponent of stored procedures for everything. Your update proc could contain a GETDATE() for the column.
And I don't like triggers for this kind of update. Lack of visibility of triggers tends to cause confusion.
This sounds like business logic to me ... I would be more disposed to putting this in the code. Let the database manage the storage of data ... No more and no less.
Triggers are a blessing and a curse.
Blessing: You can use them to enable all kinds of custom constraint checking and data management without backend systems knowledge or changes.
Curse: You don't know whats happening behind your back. Concurrency issues/deadlocks by additional objects brought into transactions that were not origionally expected. Phantom behavior including session environment changes, unreliable rowcounts. Excessive triggering of conditions..additional hotspot/performance penalties.
The answer to this question (Update dates implicitly(trigger) or explicitly (code)) ususally weights heavily on context. For example if you are using last change date as an informational field you might want to only change it when a 'user' actually makes salient changes to a row vs an automated process that simply updates some sort of internal marker users don't care about.
If you are using the trigger for change synchronization or you have no control over code that is executing a trigger makes a lot more sense.
My advise on trigger use it to be careful. Most systems allow you to filter execution based on the operation and fields changed. Proper use of 'before' vs 'after' triggers can have a significant performance impacts.
Finally a few systems are capable of executing a single trigger on multiple changes (multiple rows effected within a transaction) your code should be prepared to apply itself as a bulk update to multiple rows.
Normally I'd say do it database side, but it depends on your application. If you're using LINQ-to-SQL you can just set the field as Timestamp and have your DAL use the Timestamp field for concurrency. It handles it for you automatically, so having to repeat code is a non event.
If you're writing your DAL yourself though, then I'd be more likely to handle this on the database side as it makes writing user interfaces far more flexible - although, I'd likely do this in a stored procedure that has "public" access and the tables locked down - you don't want just any clown coming along and bypassing your stored procedure by writing to the tables directly... unless you plan on making your DAL a standalone component that any future application must use to access the database, in which case, you could code it directly into the DAL - of course, you should only do this if you can guarantee that everyone accessing the database is doing so through your DAL component.
If you're going to allow "public" access to the database to insert into tables, then you'll have to go with the trigger because otherwise anyone can insert/update a single field in the table and the updated field could never get updated.
I would have the date maintained at the database, i.e., a trigger, stored procedure, etc. In most of your database-driven applications the user app is not going to be the only means by which the business users get at data. There are reporting tools, extracts, user SQL, etc. There's also updates and corrections that are done by the DBA that the application won't be providing the date for as well.
But honestly the #1 reason I wouldn't do it from the application is you have no control over the date/time on the client machine. They might be rolling it back to get more days out of a trial license on something or may just want to do bad things to your program.
You can do this without the trigger if your database supports default values on the fields. For example, in SQL Server 2005 I have a table with a field created like this:
create table dbo.Repository
(
...
last_updated datetime default getdate(),
...
)
then the insert code just leaves that field out of the insert field list.
I forgot that only worked for the first insert - I do have an update trigger as well, to update the date fields and put a copy of the updated record in my history table - which I would post ... but the editor keeps erroring out on my code ...
Finally:
create trigger dbo.Repository_Upd on dbo.Repository instead of update
as
--**************************************************************************
-- Trigger: Repository_Upd
-- Author: Ron Savage
-- Date: 09/28/2008
--
-- Description:
-- This trigger sets the last_updated and updated_by fields before the update
-- and puts a copy of the updated row into the Repository_History table.
--
-- Modification History:
-- Date Init Comment
-- 10/22/2008 RS Blocked .prm files from updating the history as they
-- get updated every time the cfg file is run.
-- 10/21/2008 RS Updated the insert into the history table to use the
-- d.last_updated field from the Repository table rather
-- than getdate() to avoid micro second differences.
-- 09/28/2008 RS Created.
--**************************************************************************
begin
--***********************************************************************
-- Update the record but fill in the updated_by, updated_system and
-- last_updated date with current information.
--***********************************************************************
update cr set
cr.filename = i.filename,
cr.created_by = i.created_by,
cr.created_system = i.created_system,
cr.create_date = i.create_date,
cr.updated_by = user,
cr.updated_system = host_name(),
cr.last_updated = getdate(),
cr.content = i.content
from
Repository cr
JOIN Inserted i
on (i.config_id = cr.config_id);
--***********************************************************************
-- Put a copy in the history table
--***********************************************************************
declare #extention varchar(3);
select #extention = lower(right(filename,3)) from Inserted;
if (#extention <> 'prm')
begin
Insert into Repository_History
select
i.config_id,
i.filename,
i.created_by,
i.created_system,
i.create_date,
user as updated_by,
host_name() as updated_system,
d.last_updated,
d.content
from
Inserted i
JOIN Repository d
on (d.config_id = i.config_id);
end
end
Ron