I’m looking for a way to have fine grained control over what is saved using Entity Framework, rather than the whole ObjectContext.SaveChanges(). My scenario is pretty straight forward, and I’m quite amazed not catered for in EF – pretty basic in NHibernate and all other data access paradigms I’ve seen. I’m generating a bunch of data (in a WPF UI) and allowing the user to fine tune what is proposed and choose what is actually committed to the database. For the proposed entities I’m:
getting a bunch of reference entities (eg languages) via my objectcontext,
creating the proposed entities and assigning these reference entities to them (as navigation properties), so by virtue of their relationship to the reference entities they’re implicitly added to the objectconext
Trying to create & save individual entites based on the proposed entities.
I figure this should be really simple & trivial but everything I’ve tried I’ve hit a brick wall, either I set up another objectcontext & add just the entity I need (it then tries to add the whole graph and fails as it’s on another objectcontext). I’ve tried MergeOptions = NoTracking on my reference entities to try to get the Attach/AddObject not to navigate through these to create a graph, no avail. I've removed the navigation properties from the reference entities. I've tried AcceptAllChanges, that works but pretty useless in practice as I do still want to track & save other entities. In a simple test, I can create 2 of my proposed entities, AddObject the one I want to save and then Detach the one I dont then call SaveChanges, this works but again not great in practice. Following are a few links to some of the nifty ideas which in the end don’t help in the end but illustrate the complexity of EF for something so simple. I’m really looking for a SaveSingle/SaveAtomic method, and think it’s a pretty reasonable & basic ask for any DAL, letalone a cutting edge ORM.
Saving a single entity instead of the entire context
www.codeproject.com/KB/architecture/attachobjectgraph.aspx?fid=1534536&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=3071122&fr=1
bernhardelbl.spaces.live.com/blog/cns!DB54AE2C5D84DB78!238.entry
I'll answer this myself - sofar I've found no solution for EF1. EF4 will allow you to implement this with self-tracking entities, ie. you'll need to roll your own classes with T4 templates so there's a bit of a learning curve there (see link at end).
For now, we've decided to give our domain objects interfaces (which irks me as I really like working with poco classes in nhibernate/wcf which kills the need for this) and implement 'proposed' entites which we work with til the user decides to commit to the database, at which point we map to an EntityObject.
Some actual answers here:
http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/32b04a36-0579-4d6f-af48-9cb670a3d9ff
Related
For a hobby project I am building an application to keep track of my money. Register everything that comes in and goes out. I am using sqlite as a database backend.
I have two data access models in mind.
Creating one master object as a sort of database connector, which contains methods which execute the queries and provide the required sets of data as a list of objects
Have objects who need data execute the queries themselves
Which one of these is 'the best' and why? Or are there different, better models out there?
The latter option is better. In the first option, you would end up having to touch your universal data access object for just about any update to the code that wasn't purely a change in display logic. If you have different data access objects, then you will have much more testable, maintainable code.
I suggest you read up a bit on the model-view-controller paradigm. The wikipedia article on it is a good start: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller.
Also, you didn't say which language/platform you were coding in, but most platforms have numerous options for auto-generating a starting point your data access classes from your database. You may find something like that useful.
Much of a muchness really, the thing to avoid is having the "same" sql sprinkled all over your code base.
The key point is. You've just added a new column to Table1. When you do Find In Files "Table1", how many hits are you going to get and where.
If you use one class and there's a lot of db operations, it's going to get very messy very quickly, but if you have one interface (say IModel) with one implementation, you can swap backends very easily.
So how many db operations, and how likely is it you will move away from SqlLite.
I have a client/project that I am working on and their solution is MASSIVE. using .NET 3.5/VS2008/MVC 2/EF/LINQ-SQL
problem is (well where do I begin?) that everything is overly abstracted out. It's causing alot of headaches just to create a new entity. creating a service repository, then a data repository, models, DTO's...list goes on.
One of the problems I currently faced was that if I need to do an IQueryable based on a related entity, then its not possible for LINQ-SQL to translate that code. I can't even begin to explain why its not possible.
Anyway, to cut to the chase - I've been given the task of finding a micro type ORM which is easily integratible into the solution, and that does the job right without all this hassle we are going through.
Now, we also need it so that it is possible for it to work with IQueryable scenarios and to pull up related objects/entities easily.
performance is another thing, currently its pretty slow but i do understand why.
my mind is currently all over the place so apologies in advanced if I dont make much sense.
I see the Dapper has as good reputation. I am leaning towards it but how easy is for me to integrate into the solution of....
ASP.NET/Some top layer
Infrastructure/LINQ-SQL
DTO's
ServiceRepository
DataRepository
about 96% of the DTO's/Repos/Entity/whatever is all interfaced.
personally, I would have gone down the route of using DataReader and SPROCS/parsing out the results. But want something quick, clean (if possible) and less hassle.
I am emphasising IQueryable because, some sections on the website needs to search/filter through a LARGE table (histories). using the Telerik grid control gives us the flexibility of somewhat letting us construct the LINQ-SQL query before executing it.
But one of the problems I mentioned earlier is just that - where LINQ-SQL cannot translate a scenario where there is a ViewModel containing flattened out data and perhaps a DTO which contains our related entity we need.
bleh. complicated! confused. I want to curl up and cry at times!
thanks for your responses (and comfort!)
I am creating a framework that works with Core Data. One of the requirements for using my framework on your Core Data class is that any entity you want to have the Framework's capabilities will need to be sub entities and sub classes of an entity I provide to you. For the sake of this I will call that object Foo.
Today I realized that Core Data stores all objects that are sub entities of Foo into a table called ZFOO. I'm worried about the performance of Core Data if someone with massive data sets wants to use it, since ALL sub entities of the foo class will be store in one enormous ZFOO table.
Any opinions or recommendations would be highly appreciated.
I worked with #deathbob on this project as the iOS lead. In our instance I had multiple classes which contained the attributes "remote_id" and "remote_update". I initially set the tables up using subclasses. I had a "RemoteEntity" abstract entity which contained those attributes and a bunch of other entities which inherited from it, each with their own. I thought that we would end up with a bunch of tables each with remote_id, remote_update, and then their custom attributes. Instead we ended up with the massive table you describe.
The fix was pretty simple you must not set up inheritance through the GUI. Instead include all attributes for that object including your shared ones in the Core Data modeller (this means "remote_id" and "remote_update" will appear in each entity. That being said we can still use a subclass. After generating your models' classes, create the parent entity's class. This must not be in the GUI. It should inherit from NSManagedObject and in the .m file the properties should use #dynamic instead of #synthesize. Now that you have the parent class it is time to adjust the child classes. Set the parent class to RemoteEntity (in my example) instead of NSManagedObject. Then remove any properties that appear in your super class (in my example, "remote_id" and "remote_update").
Here is an example of my super class https://gist.github.com/1121689.
I hope this helps, hat tip to #deathbob for pointing this out.
Last year I worked on a project that did the same thing, we stored everything in core data and everything in core data inherited from a single class which had some common attributes.
We had somewhere between 1k - 10k records in core data and performance degraded to the point where we rewrote it and removed the common ancestor. As I recall simple searches were taking multiple seconds, and insertions / updates were pretty crappy too. It was only after things had gotten painfully slow that we cracked the db open and noticed under the covers core data was storing everything in one table.
Sorry I don't remember specific numbers, the big takeaway was we had to redo it because it was too slow, and not too slow like too slow for high frequency trading but too slow like the app crashed on load when trying to populate the initial view out of core data.
So, with the grain of salt that this was on older iOS and older hardware, I would say definitely do not do this.
Hindsight is a wonderful thing.
As people are still reading this Q&A and referring to it in their questions and thinking that nothing has changed, I'd like to add a few comments for clarity and to provide a "modern" or more recent response.
Core data is a powerful beast, but you must learn to control the beast, and thanks to the pioneers who have answered previously and the improvements that Apple has made to the framework, it is a lot easier to do today than it was a couple of years ago (in particular iOS 5).
Initially I'd recommend learning how to prepare a solid and robust data model. There is a huge amount of information on this so I will leave it to the reader to investigate. As the previous answers mention, it is important to learn to prepare all relationships in the data model.
Beyond that, there are a number of mechanisms to control the size of data set you fetch. It has not been better explained to me than in a book from The Pragmatic Bookshelf – "Core Data, 2nd Edition, Data Storage and Management for iOS, OS X, and iCloud" (Jan 2013) by Marcus S. Zarra, and in particular Chapter 4 titled "Performance Tuning”.
Read it.
Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)
(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).
You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.
From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.
I am putting some heavy though into re-writing the data access layer in my software(If you could even call it that). This was really my first project that uses, and things were done in an improper manner.
In my project all of the data that is being pulled is being stored in an arraylist. some of the data is converted from the arraylist into an typed object, before being put backinto an arraylist.
Also, there is no central set of queries in the application. This means that some queries are copy and pasted, which I want to eliminate as well.This application has some custom objects that are very standard to the application, and some queries that are very standard to those objects.
I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?
Also, if this is a good way of doing things, how should I return the data from the database? I am currently using SqlDataReader.read, and filling an array list. I am sure that this is not the best method to use here, i am just not real clear on how to improve this.
The Reason for all of this, is I want to centralize all of the database operations into a few classes, rather than have them spread out amongst all of the classes in the project
You should use an ORM. "Not doing so is stealing from your customers" - Ayende
One thing comes to mind right off the bat. Is there a reason you use ArrayLists instead of generics? If you're using .NET 1.1 I could understand, but it seems that one area where you could gain performance is to remove ArrayLists from the picture and stop converting and casting between types.
Another thing you might think about which can help a lot when designing data access layers is an ORM. NHibernate and LINQ to SQL do this very well. In general, the N-tier approach works well for what it seems like you're trying to accomplish. For example, performing data access in a class library with specific methods that can be reused is far better than "copy-pasting" the same queries all over the place.
I hope this helps.
It really depends on what you are doing. If it is a growing application with user interfaces and the like, you're right, there are better ways.
I am currently developing in ASP.NET MVC, and I find Linq to SQL really comfortable. Linq to SQL uses code generation to create a collection of code classes that model your data.
ScottGu has a really nice introduction to Linq to SQL on his blog:
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
I have over the past few projects used a base class which does all my ADO.NET work and that all other data access classes inherit. So my UserDB class will inherit the DataAccessBase class. I have it at the moment that my UserDB class actualy takes the data returned from the database and populates a User object which is then returned to the calling Business Object. If multiple objects are returned then these are then a Generic list ie List<Users> is returned.
There is a good article by Daemon Armstrong (search Google for Daemon Armstrong which demonstrates on how this can be achived.
""http://www.simple-talk.com/dotnet/.net-framework/.net-application-architecture-the-data-access-layer/""
However I have now started to move all of this over to use the entitty framework as its performs much better and saves on all those manual CRUD operations. Was going to use LINQ to SQL but as it seems to be going to be dead in the water very soon thought it would be best to invest my time in the next ORM.
"I am really just not sure if I should create a layer between my objects and the class that reads and writes to the database. This layer would take the data that comes from the database, type it as the proper object, and if there is a case of multiple objects being returned, return a list of those object. Is this a good approach?"
I'm a Java developer, but I believe that the language-agnostic answer is "yes".
Have a look at Martin Fowler's "Patterns Of Enterprise Application Architecture". I believe that technologies like LINQ were born for this.