Concurrency with Linq To Sql Stored Procedures - sql

I come from the asp.net world where we'd use an objectdatasource, hooked up to data access layer, and set it's ConflictDetection property to "CompareAllValues". There is an OldValuesParameterFormatString on the ObjectDataSource that you use to identify old value parameters.
The sql procedure that does an update would then require both new params and old params and that was it... Super simple to implement; the ODS handled the old values for you.
I've moved over to Linq to SQL and WinForms. I've created a WCF service that is our business layer and I have a stored procedure that will update some table. In the data context designer I see that there is an Update Check property on my class columns. I'm not directly updating the table from the class, rather I'm calling a stored procedure to do the update. Is there some way to retain the original values, perhaps from the data context, similar to they way an objectdatasource would?

Are you using stored procedures directly (through a SqlCommand) or through LINQ to SQL? LINQ to SQL supports using stored procs for all its database access. You might want to look at Updating our Database using Stored Procedures, part 7 of Scott Guthrie's blog post series about LINQ to SQL. You can setup the use of sprocs through the DBML designer or in code using a DataContext partial class. The idea is that you send both the new and original values (e.g. Name and OriginalName) to the sproc so it can to its concurrency checking.
If you are using the sproc directly and not through LINQ to SQL, and all you want is to get the object's original values, you can obtain them by using Table<T>.GetOriginalEntityState() like this:
Order modifiedOrder = db.Orders.First(); // using first Order as example
modifiedOrder.Name = "new name"; // modifying the Order
Order originalOrder = db.Orders.GetOriginalEntityState(modifiedOrder);

Not the answer you may be looking for, but since you mentioned using WCF as a business layer along with LINQ2SQL, I felt it is my obligation to point out this article for your reference:
http://www.sidarok.com/web/blog/content/2008/05/26/linq-to-sql-with-wcf-in-a-multi-tiered-action-part-1.html
While the article implements ASP.NET as the main presentation layer, but considering your background, it might actually make the article easier to understand.
I personally handled the same sort of development as you are doing right now (winforms for client, WCF for business logic layer, LINQ2SQL for data access), but being complete beginner to WCF & LINQ2SQL at the time, I basically forced to drop the original values retention. This article comes closest to your needs, though to be honest I've not seen anything that works with using stored procedures.

Ditch the sprocs and create a new DBML file for your tables.
Drag your tables in and bam! LinqToSql will create entity classes for you with methods for updating (creating.. etc).
LinqToSql has a number of approaches for concurrency. One of the overloaded Attach() methods (used for updates) requires 2 params: the original entity and the new entity. LinqToSql will do what the ObjectDataSource used to do and compare the old values with the new and throw concurrency exceptions (it even makes handling concurrency exceptions much easier. +10 ..but that's not your question).
Use this reference, particularly the section towards the bottom called With Complete Entities. It makes a lot of sense and shows how the different concurrency approaches are used and how to handle the exceptions.
hav fun.

Related

How to avoid SQL statements spreading everywhere in your app?

I have a medium-sized app written in Ruby, which makes pretty heavy use of a RDBMS. As our code grows, I found the ugly SQL statements are spreading to all modules and methods in my app and embedded in many application logic. I am not sure if this is bad, however, my gut tells me this is quite ugly...
So generally in any languages, how do you manage your SQL statements? Or do you think it is harmful for maintainibility to let many SQL statements embedded in the application logic? Why or why not?
Thanks.
SQL is a language for accessing databases. Often, it gets confused as being the API into the data store for a larger application. In fact, you should design a real API between the data store and the app.
The means several things.
For accessing data stored in tables, you want to go through views in the database, rather than directly access the tables.
For data modification steps, you want to wrap insert/update/delete in stored procedures. This has secondary benefits, where you can handle constraints and triggers in the stored procedure and better log what is happening.
For security, you want to include database security as part of your security architecture. Giving all users full access may not be the best approach.
Unfortunately, it is easy to write a simple app that uses a database directly, whether in java or ruby or VBA or whatever. This grows into a bigger app, and then the maintenance problems arise.
I would suggest an incremental approach to fixing this. Go through the code and create views where you have nasty select statements. You'll probably find you need many fewer views than selects (the views can be re-used -- a good thing).
Find places where code is being modified, and change these to stored procedures. I always return status from the stored procedure for error checking and put log information into a table called someting like splog or _spcalls.
If you want to limit permissions for different users of your app, then you might be interested in this.
Leaving the raw SQL statements in the code is a problem. Just wait until you want to rename a column and you have to find all the places where this breaks the code.
Yes, this is not optimal - maintenance becomes a nightmare; it's hard to forecast and determine which code must change when underlying DB changes occur. This is why it is good practice to create a data access layer (DAL) to encapsulate CRUD operations from the application logic. There is often an business logic layer (BLL) between the application logic and DAL to enforce business rules/logic.
Google "data access layer" "business logic layer" and even "n-tier architecture" to learn more.
If you are concerned about the SQL statements littered around your application logic, maybe consider implementing them as Stored Procedures?
That way you will only be including the procedure name and any parameters that need to be passed to it in your code.
It has other benefits too, a common one being easier to re-use in multiple files.
There is much debate about speed and security of Stored Procedure and you will never get a definitive answer about that so I won't even open that can of worms.
Here is how you do this with Java: Create a class that encapsulates all access to the database. Add a method to the class for each query you need to run.
The answer for ruby will be similar to this.
It depends on the architecture of your application but a simple solution is to keep each sql in a file, qry.sql. For each Ruby module (or whatever is used in Ruby to aggregate related code) you can keep a folder SQL with these files. So, the collection of SQL folder/files form the data access layer of your application. The Ruby code provides the business layer. If your data model changes (field names, etc), you can do greps to identify the sql files that need changes. Anyway, definitely separate SQL from your logic code.

WCF data serialization : can it go faster?

This question is sort of a sequel to that question.
When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.
While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.
Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.
The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?
The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)
So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.
Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:
var result = from person in DB.Entities.Persons
.Include("District")
.Include("District.City")
.Include("District.City.State")
.Include("Nationality")
select new PersonViewModel()
{
Name = person.Name,
City = person.District.City,
State = person.District.City.State
Nationality = person.Nationality.Name
};
This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.
You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.
If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:
Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI
While this could easily be reduced to:
Database -> TDS -> UI
Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.
OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.
If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.
There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.
I use Json.Net's implementation of Bson in my RIA application. More info here.
I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.
Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.
Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.

Micro ORM - maintaining your SQL query strings

I will not go into the details why I am exploring the use of Micro ORMs at this stage - except to say that I feel powerless when I use a full blown ORM. There are too many things going on in the background that happens automatically, and not all of them are the best possible choices. I was quite ready to go back to raw database access, but I found out about the three new guys on the block: Dapper, PetaPoco and Massive. So I decided to give the low-level approach a go with a pet project. It is not relevant, but so far, I am using PetaPoco.
In any case, I am having trouble deciding how to go about maintaining the SQL strings that I will use from the higher levels. There are three main solutions that I can think of:
Sprinkle the SQL queries wherever I need them. This is the least infrastructure heavy method. However, it suffers in both maintainability and testability areas.
Limit the query usage to some service classes. This helps maintainability, is still low on infrastructure I need to implement. It may also be possible to build these service classes such that it would be easy to mock for testing purposes.
Prepare some classes to make the system somewhat flexible. I have started on this path. I implemented a Repository interface, and a database dependent Repository class. I have also build some tiny interfaces to capture SQL queries that can be passed to my Repository's GetMany() method. All the queries are implemented as individual classes right now, and I will probably need a little more interface around this to add some level of database independence - and maybe for some flexibility in decorating queries into paged and sorted queries (again, this would also make them a little bit more flexible in handling different databases).
What I am mainly worried about right now is that I have entered the slippery slope of writing all the functions needed for a full blown ORM, but badly. For example, it feels sensible right now that I write or find a library to convert linq calls into SQL statements so that I can massage my queries easily or write extenders that can decorate any query I pass to it, etc. But that is a large task, and is already done by the big guys, so I am resisting the urge to go there. I also want to retain control over what queries I send to the database - by explicitly writing them.
So what is the suggestion? Should I go #2 option, or try to stumble along on option #3? I am certain I cannot show any code written in the first option to anyone without blushing. Is there any other approach you can recommend?
EDIT: After I've asked the question, I realized there is another option, somewhat orthogonal to these three options: stored procedures. There seems to be a few advantages to putting all your queries inside the database as stored procedures. They are kept in a central location, and not spread through the code (though maintenance is an issue - the parameters may get out of sync). The reliance on database dialect is solved automatically: if you move databases, you port all your stored procedures, and you are done. And there is also the security benefits.
With the stored procedure option, the alternatives 1 and 2 seem a little bit more suitable. There seems to be not enough entities to warrant option 3 - but it is still possible to separate the procedure call commands from database accessing code.
I've implemented option 3 without stored procedures, and option 2 with stored procedures, and it seems like the latter is more suitable for me (in case anyone is interested with the outcome of the question).
I would say put the sql where you would have put the equivalent LINQ query, or the sql for DataContext.ExecuteQuery. As for where that is... well, that is up to you and depends on how much separation you want. - Marc Gravell, creator on Dapper
See Marc's opinion on the matter
I think the key point is, you shouldn't really be re-using the SQL. If your logic is re-used then it should be wrapped in a method called that can then be called from multiple places.
I know you've accepted your answer already but I still wanted to show you a nice alternative that may be helpful in your case as well. Now or in the future.
When using stored procedures it's wise to use T4
I tend to use stored procedures on my project even though it's not using PetaPoco, Dapper or Massive (project started before these were here). It uses BLToolkit instead. Anyway. Instead of writing my methods to run stored procedures and write code to provide stored procedure parameters, I've written a T4 template that generates the code for me.
Whenever stored procedures change (some may be added/removed, parameters added/removed/renamed/retyped), my code will break on compilation because method calls will not match their signature any more.
I keep my stored procedures in a file (so they get version controlled). If you work in a multi-developer team it may be sensible to have stored procedures each in its own file. It makes updates much less painful. I've experienced that on some project and it worked ok as long as number of SPs is not huge. You can restructure them into folders based on the entity they're related to.
Anyway. Maintenance is related to stored procedures, code change is just a simple click of a button in Visual Studio that converts all T4s at once. You don't have to search your methods that use those procedures. You'll be reported errors while compiling. One thing less to worry about.
So instead of writing
using (var db = new DbManager())
{
return db
.SetSpCommand(
"Person_SaveWithRelations",
db.Parameter("#Name", name),
db.Parameter("#Email", email),
db.Parameter("#Birth", birth),
db.Parameter("#ExternalID", exId))
.ExecuteObject<Person>();
}
and having a bunch of magic strings I can just simply write:
using (var db = new DataManager())
{
return db
.Person
.SaveWithRelations(name, email, birth, exId)
.ExecuteObject<Person>();
}
This is nicer, cleaner breaks on compile and provides intellisense so it's also faster to while developing.
The good thing is that stored procedures may become very complex and may do many things. In my upper example I check some data, insert person record and some related one as well and in the end return the newly inserted Person record. Inserts and updated should usually return data that was added/changed to reflect actual state.

What will be the benefits of NHibernate in a data retrieval only scenario?

We have been suggested to use NHibernate for our project. But the point is our application will only be retrieving the data from the database so are there any benefits of using NHibernate in this scenario?
One more thing, what is the query execution plan in NHIbernate does it support something like prepared statements or the so called pre complied statements.?
I agree to the answers above, but there is one more reason for using nhibernate: Your completely independend of the underlaying database system. You can switch from mysql to oracle and the only thing you have to do, is to change the settings. the access to the database stays exactly the same.
NHibernate is useful is you need to map data from a database table into a .NET class. Even if you're only doing select queries, it still might be useful if you need to pass the data objects to a client tier (web page, desktop app, etc.) for display. Working with plain objects can be easier than working with a DataSet or other ADO.NET data class in a presentation layer.
NHibernate does have the ability to parse/pre-compile queries if you put them in the mapping file.
The benefit for using NHibernate in a read only scenario is that you would not need to map the results of queries back to .net objects as the runtime would do this for you. Also, it provides a more object oriented query syntax (you can also use LINQ), and you can take advantage of lazy loading.
I don't believe NHibernate can use prepared statements unless you are having it call stored procedures.

Improving my data access layer

I am putting some heavy though into re-writing the data access layer in my software(If you could even call it that). This was really my first project that uses, and things were done in an improper manner.
In my project all of the data that is being pulled is being stored in an arraylist. some of the data is converted from the arraylist into an typed object, before being put backinto an arraylist.
Also, there is no central set of queries in the application. This means that some queries are copy and pasted, which I want to eliminate as well.This application has some custom objects that are very standard to the application, and some queries that are very standard to those objects.
I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?
Also, if this is a good way of doing things, how should I return the data from the database? I am currently using SqlDataReader.read, and filling an array list. I am sure that this is not the best method to use here, i am just not real clear on how to improve this.
The Reason for all of this, is I want to centralize all of the database operations into a few classes, rather than have them spread out amongst all of the classes in the project
You should use an ORM. "Not doing so is stealing from your customers" - Ayende
One thing comes to mind right off the bat. Is there a reason you use ArrayLists instead of generics? If you're using .NET 1.1 I could understand, but it seems that one area where you could gain performance is to remove ArrayLists from the picture and stop converting and casting between types.
Another thing you might think about which can help a lot when designing data access layers is an ORM. NHibernate and LINQ to SQL do this very well. In general, the N-tier approach works well for what it seems like you're trying to accomplish. For example, performing data access in a class library with specific methods that can be reused is far better than "copy-pasting" the same queries all over the place.
I hope this helps.
It really depends on what you are doing. If it is a growing application with user interfaces and the like, you're right, there are better ways.
I am currently developing in ASP.NET MVC, and I find Linq to SQL really comfortable. Linq to SQL uses code generation to create a collection of code classes that model your data.
ScottGu has a really nice introduction to Linq to SQL on his blog:
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
I have over the past few projects used a base class which does all my ADO.NET work and that all other data access classes inherit. So my UserDB class will inherit the DataAccessBase class. I have it at the moment that my UserDB class actualy takes the data returned from the database and populates a User object which is then returned to the calling Business Object. If multiple objects are returned then these are then a Generic list ie List<Users> is returned.
There is a good article by Daemon Armstrong (search Google for Daemon Armstrong which demonstrates on how this can be achived.
""http://www.simple-talk.com/dotnet/.net-framework/.net-application-architecture-the-data-access-layer/""
However I have now started to move all of this over to use the entitty framework as its performs much better and saves on all those manual CRUD operations. Was going to use LINQ to SQL but as it seems to be going to be dead in the water very soon thought it would be best to invest my time in the next ORM.
"I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?"
I'm a Java developer, but I believe that the language-agnostic answer is "yes".
Have a look at Martin Fowler's "Patterns Of Enterprise Application Architecture". I believe that technologies like LINQ were born for this.