My project uses EF (tested with version 4 using self-tracking template and with version 5 using default templates, all database-first) against SQL Server 2012. The database tables have each a rowversion (timestamp) column defined.
Using EF in it core, meaning my code on database updates looking so:
using (var db = new MyContext())
{
//db.Entry(myInstance).State = EntityState.Modified;
db.SaveChanges();
}
does not trigger any rowversion alerts. I run parallel clients, each reads the same record, makes a change to it and then each writes it to the database. All updates are accepted, no concurrency is applied.
Do I have to work with stored procedures for my update commands (with a where clause that states my rowversion value) to have EF acknowledge the "built-in" concurrency or is there another way (configuration, specific method calls) to make my code work?
Here's the answer (I've given it a couple of weeks to POC):
RowVersion (or the TimeStamp field type) in SQL is a field as any other (except for being mandatory and self incrementing). There is a specific database handling of its value on updates (i.e. incrementing it), but there is no specific database handling comparing its value before update.
EF on the other side allows you to define ConcurrencyMode for each field (edmx). You might, if you want, mark all your fields with ConcurrencyMode=Fix (instead of the default None), and thus include all of them within the update's where-clause (comparing the entity's original values with the record's current values in the database). But it's easier to set one field per entity, i.e. the RowVersion field, with that mode. Especially since the only party maintaining it for you is the database.
Once you've done that, you're sure to get the System.Data.OptimisticConcurrencyException isolation error. You still have to keep away from EF workflows which manipulate your object sets, such as the use of myObjectSet.Attach(myEntity), which makes a trip to the database to fetch the current data and merges your changes into it. Since the RowVersion field is normally unchanged, the update will trigger with the current value from the database, and will not result in the concurrency exception.
Are you looking to handle concurrency if so take a look at this link:
http://www.asp.net/mvc/tutorials/getting-started-with-ef-5-using-mvc-4/handling-concurrency-with-the-entity-framework-in-an-asp-net-mvc-application
Related
Typically to expose version data you'd have to add a column of type rowversion, but this operation would take quite a while on a large table. I did it anyway in a dev sandbox environment, and indeed it took a while, but I also noticed that the column was populated with some meaningful-looking initial value. I expected it to be all 0's or 1's to indicate that each row is in some sort of "initial" state (after all, there was no history before this), but what I saw were what looked like accurate values for each row (they were all different, non-default-looking values).
Where did they come from? It seems like the rowversion is being tracked behind the scenes anyway, regardless of whether you've exposed it in a column. If so, can I get at it directly without adding the column? Like maybe some kind of system function I can call directly? I really want to avoid downtime, and I also have a huge number of existing queries so migration to a different table/view/combo is not an option (as suggested in other related questions).
The rowversion value is generated when a table with a rowversion (a.k.a timestamp) value is modified. The rowversion value is database-scoped and the last generated value can be retrieved via ##DBTS.
Since the value is incremented only when a rowversion table is modified, I don't think you'll be able to use ##DBTS to avoid the downtime.
I'm currently reading over implementing optimistic concurrency checks in DB2. I've been mainly reading http://www.ibm.com/developerworks/data/library/techarticle/dm-0801schuetz/ and http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/index.jsp?topic=%2Fcom.ibm.db2.luw.admin.dbobj.doc%2Fdoc%2Fc0051496.html (as well as some other IBM docs).
Is RID necessary when you have an ID column already? In the 2 links they always mention use RID and row change version, however RID is row ID, so I'm not clear why I need to use it when row change token seems like SQLServer's rowversion (except for the page and not the row).
It seems as long as I have a row-change-timestamp column, then my row change token granularity will be good enough to prevent most false positives.
Thanks.
The way I read the first article is that you can use any of those features, you don't need to use all of them. In particular, it appears that the row-change-timestamp is derived from RID() and ROW CHANGE TOKEN:
Time-based update detection:
This feature is added to SQL using the
RID_BIT() and ROW CHANGE TOKEN. To support this feature, the table
needs to have a new generated column defined to store the timestamp
values. This can be added to existing tables using the ALTER TABLE
statement, or the column can be defined when creating a new table. The
column's existence, also affects the behavior of optimistic locking in
that the column if it is used to improve the granularity of the ROW
CHANGE TOKEN from page level to row level, which could greatly benefit
optimistic locking applications.
... among other things, the timestamp actually increases the granularity compared to the ROW CHANGE TOKEN, so it makes it easier to deal with updates.
For a number of reasons, please make sure to set the db time to UTC, as DB2 doesn't track timezone (so if you're somewhere that uses DST, the same timestamp can happen twice).
(As a side note, RID() isn't stable on all platforms. On the iSeries version, at least, it changes if somebody re-orgs the table, and you may not always get the results you expect when using it with joins. I'm also not sure about use with mirroring...)
Are you aware that if you update multiple rows in the same SQL statement execution, they will get the same timestamp (if the timestamp is updated in that statement)?
This means that a timestamp column is probably a bad choice for a unique row identifier.
I have been working with ORMs the last couple of years, and, on a personal project, am frustratingly finding myself struggling with simple ADO.NET.
I have a database with tables storing both transactional and slowly changing data. Data to update / insert is sourced via the network.
I am trying to use the disconnected Data Adapter paradigm in ADO.NET, in relatively generic DB classes to allow for many / all ADO.NET database implementations.
My problem is, due to the potential size of the database tables, I don't want to perform an Adapter.Fill into memory (as pretty much every reference and tutorial will demonstrate), rather use a delta DataSet to store push new / modified data back to the database.
If I peform a DbDataAdapter.FillSchema on a DataSet, I get a schema, and data tables I can populate, however all data, regardless of what I pass to my key fields, is treated as a new row when I update the table using Adapter.Update.
Am I using the correct ADO.NET classes to perform such a batch UPDATE / INSERT (by "batch" in terms of my not having to do it in a loop, rather than what any given database may be actually performing under the lid)?
The issue here turned out to be the RowState of the data row.
When a DataSet is filled through the DataAdapter, changes to existing and existing rows will set the row's RowState to modified.
Data that is added to the DataSet is seen as a new row, and its RowState will be set to Added.
RowState is a read only field, so it can not be manually set to Modified.
Therefore, all updating data received from a client should be added as follows:
// ... where dataTransferObject.IsNew == false
DataRow row = table.NewRow();
row["Id"] = dataTransferObject.Id;//Set Key fields on row
table.Rows.Add(row);//Add row to table
row.AcceptChanges();//change RowState to Unchanged
row["MyData1"] = dataTransferObject.MyData1; //Set other fields in row
row["MyData2"] = dataTransferObject.MyData2; //Set other fields in row
row["MyData3"] = dataTransferObject.MyData3; //Set other fields in row
I am also ignoring the CommandBuilder and hand crafting my Insert, Update and Select statements.
This data can now be persisted to the database using adapter.Update
I aknowledge that this is far from an elegant solution, but a working one. I will update this answer if I find a nicer method
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
I want to setup a mechanism for tracking DB schema changes, such the one described in this answer:
For every change you make to the
database, you write a new migration.
Migrations typically have two methods:
an "up" method in which the changes
are applied and a "down" method in
which the changes are undone. A single
command brings the database up to
date, and can also be used to bring
the database to a specific version of
the schema.
My question is the following: Is every DDL command in an "up" method reversible? In other words, can we always provide a "down" method? Can you imagine any DDL command that can not be "down"ed?
Please, do not consider the typical data migration problem where during the "up" method we have loss of data: e.g. changing a field type from datetime (DateOfBirth) to int (YearOfBirth) we are losing data that can not be restored.
in sql server every DDL command that i know of is an up/down pair.
Other than loss of data, every migration I've ever done is reversible. That said, Rails offers a way to mark a migration as "destructive":
Some transformations are destructive
in a manner that cannot be reversed.
Migrations of that kind should raise
an ActiveRecord::IrreversibleMigration
exception in their down method.
See the API documentation here.
Yes, you've identified cases where you lose data, either by transforming it or simply DROP COLUMN in the "up" migration.
Another example is that you could drop a SEQUENCE object, thus losing its state. The "down" migration would recreate the sequence, but it would start over at 1. This could cause duplicate values to be generated by the sequence. Not a problem if you're performing a migration on an empty database, and you want the sequence to start at 1 anyway, but if you have some number of rows of data, you'd want the sequence to be reset to the greatest value currently in use, which is hard to do reliably, unless you have an exclusive lock on that table.
Any other DDL that is dependent on the state of data in the database has similar problems. That's probably not a good schema design in the first place, I'm just trying to think of any cases that fit your question.