How to effectively refresh many to many relationship - sql

Lets say I have entity A, which have many to many relationship with another entities of type A. So on entity A, I have collection of A. And lets say I have to "update" this relationships according to some external service - from time to time I receive notification that relations for certain entity has changed, and array of IDs of current related entities - some relations can be new, some existing, some of existing no longer there... How can I effectively update my database with EF ?
Some ideas:
eager load entity with its related entities, do foreach on collection of IDs from external service, and remove/add as needed. But this is not very effective - need to load possibly hundreds of related entities
clear current relations and insert new. But how ? Maybe perform delete by stored procedure, and then insert by "fake" objects
a.Related.Add(new A { Id = idFromArray })
but can this be done in transaction ? (call to stored procedure and then inserts done by SaveChanges)
or is there any 3rd way ?
Thanx.

Well, "from time to time" does not sound like a situation to think much about performance improvement (unless you mean "from millisecond to millisecond") :)
Anyway, the first approach is the correct idea to do this update without a stored procedure. And yes, you must load all old related entities because updating a many-to-many relationship goes only though EFs change detection. There is no exposed foreign key you could leverage to update the relations without having loaded the navigation properties.
An example how this might look in detail is here (fresh question from yesterday):
Selecting & Updating Many-To-Many in Entity Framework 4
(Only the last code snippet before the "Edit" section is relevant to your question and the Edit section itself.)
For your second solution you can wrap the whole operation into a manually created transaction:
using (var scope = new TransactionScope())
{
using (var context = new MyContext())
{
// ... Call Stored Procedure to delete relationships in link table
// ... Insert fake objects for new relationships
context.SaveChanges();
}
scope.Complete();
}

Ok, solution found. Of course, pure EF solution is the first one proposed in original question.
But, if performance matters, there IS a third way, the best one, although it is SQL server specific (afaik) - one procedure with table-valued parameter. All new related IDs goes in, and the stored procedure performs delete and inserts in transaction.
Look for the examples and performance comparison here (great article, i based my solution on it):
http://www.sommarskog.se/arrays-in-sql-2008.html

Related

What does Include() do in LINQ?

I tried to do a lot of research but I'm more of a db guy - so even the explanation in the MSDN doesn't make any sense to me. Can anyone please explain, and provide some examples on what Include() statement does in the term of SQL query?
Let's say for instance you want to get a list of all your customers:
var customers = context.Customers.ToList();
And let's assume that each Customer object has a reference to its set of Orders, and that each Order has references to LineItems which may also reference a Product.
As you can see, selecting a top-level object with many related entities could result in a query that needs to pull in data from many sources. As a performance measure, Include() allows you to indicate which related entities should be read from the database as part of the same query.
Using the same example, this might bring in all of the related order headers, but none of the other records:
var customersWithOrderDetail = context.Customers.Include("Orders").ToList();
As a final point since you asked for SQL, the first statement without Include() could generate a simple statement:
SELECT * FROM Customers;
The final statement which calls Include("Orders") may look like this:
SELECT *
FROM Customers JOIN Orders ON Customers.Id = Orders.CustomerId;
I just wanted to add that "Include" is part of eager loading. It is described in Entity Framework 6 tutorial by Microsoft. Here is the link:
https://learn.microsoft.com/en-us/aspnet/mvc/overview/getting-started/getting-started-with-ef-using-mvc/reading-related-data-with-the-entity-framework-in-an-asp-net-mvc-application
Excerpt from the linked page:
Here are several ways that the Entity Framework can load related data into the navigation properties of an entity:
Lazy loading. When the entity is first read, related data isn't retrieved. However, the first time you attempt to access a navigation property, the data required for that navigation property is automatically retrieved. This results in multiple queries sent to the database — one for the entity itself and one each time that related data for the entity must be retrieved. The DbContext class enables lazy loading by default.
Eager loading. When the entity is read, related data is retrieved along with it. This typically results in a single join query that retrieves all of the data that's needed. You specify eager loading by using the Include method.
Explicit loading. This is similar to lazy loading, except that you explicitly retrieve the related data in code; it doesn't happen automatically when you access a navigation property. You load related data manually by getting the object state manager entry for an entity and calling the Collection.Load method for collections or the Reference.Load method for properties that hold a single entity. (In the following example, if you wanted to load the Administrator navigation property, you'd replace Collection(x => x.Courses) with Reference(x => x.Administrator).) Typically you'd use explicit loading only when you've turned lazy loading off.
Because they don't immediately retrieve the property values, lazy loading and explicit loading are also both known as deferred loading.
Think of it as enforcing Eager-Loading in a scenario where your sub-items would otherwise be lazy-loading.
The Query EF is sending to the database will yield a larger result at first, but on access no follow-up queries will be made when accessing the included items.
On the other hand, without it, EF would execute separte queries later, when you first access the sub-items.
include() method just to include the related entities.
but what happened on sql is based on the relationship between those entities which you are going to include what the data you going to fetch.
your LINQ query decides what type of joins have to use, there could be left outer joins there could be inner join there could be right joins etc...
#Corey Adler
Remember that you should use .Include() and .ThenInclude() only when returning the object (NOT THE QUERYABLE) with the "other table property".
As a result, it should only be used when returning APIs' objects, not in your intra-application.

Yii: combination of CDbCommand and ActiveRecord queries in one Controller/Action

I was wondering if it is a good (acceptable) practice to combine those to ways of retrieving/updating database data?
For example, in my database I have two tables (Books and Users) and one "many-to-many" table Books_Users. When a user rates a book, the Books_Users table should be updated (a new record with a book_id and a user_id should be whether inserted or deleted).
I googled ways of doing it using AR methods only, but I haven't found any good solution. I ended up using CDbCommand execute() and very simple SQL-query like INSERT INTO books_users(book_id, user_id) VALUES(:bid , :uid); in a BookController action.
The point is that all my models extend CActiveRecord, and I use AR methods all the way.
So here is the question: is that kind of blending of different approaches could be used without remorse, or I should get rid of it immediately and write the code in some "proper way"?
Yii does support Many_TO_Many relations (to some degree) and this support has been improving through the 1.1.x releases http://www.yiiframework.com/doc/guide/database.arr.
Generally i don't think you will have to use CDbCommand & get dirty with SQL, you shouldn't face any problems doing it with AR specially the retrieval part, However, Insertion (Create/Update) Could be a problem (not a huge one though) since it can be solved with some triggers either on database level (database triggers) or App level (Model afterCreate() & afterUpdate()) to automate populating/updating the middle table (pivot) records.
Another (cleaner) way would be to use this extension: http://www.yiiframework.com/extension/cadvancedarbehavior/ which should do the job for you.
Last thing: take a look at this question and this one for related inquires.

Best practice for many to many data mapping

I am looking to find the best practice for many to many mapping in database.
For example I have two tables to be mapped. I create third table to store this mapping.
On UI I have several A to be mapped(or not) multiple with B. And I see two solutions for now:
1 - On every update for every record from A I will delete all mapped data for it and insert new data mapping.
Advantage: I store only mapped data.
Disadvantage: I need to use delete and insert statement every time.
2 - I need to add new bit column to AB table with name isMapped. And I will store all mapping for every record from A to every record from B. On save mapping action I will use only update statement.
Advantage: No need to delete and insert every time.
Disadvantage: Need to store unnecessary records.
Can you offer me best solution?
Thanks
between the 2 options you have listed I would go with option no 1, isMapped is not meaningful, if they are not mapped the records should not exists in the first place.
you still have one more option though:
DELETE FROM AB where Not in the new map
INSERT INTO AB FROM (New map) where NOT in AB
if these are a lot of maps I would delete and insert from the new mapping, otherwise I would just delete all then insert like you are suggesting.
I'd say anytime you see the second bullet point in your #2 scenario
"Need to store unnecessary records"
that's your red flag not to use that scenario.
Your data is modeled correctly in scenario 1, i.e. mapppings exist in the mapping table when there are mappings between records in A and B and mappings do not exist in the mapping table when there is not a mapping between those records in A and B.
Also, the underlying mechanics of an update statement are a delete and then an insert, so you are not really saving the database any work by issuing one over the other.
Lastly, speaking of saving the database work, don't try and do it at this stage. This is what they are designed for. :)
Implementing your data model correctly as you are in Scenario 1 is the best optimization you can make.
Once you have the basic normalized structure in place and have some test data, then you can start testing performance and refactoring if necessary. Adding indexes, changing data structures, etc.

Designing an append-only data access layer with LINQ to SQL

I have an application in mind which dicates database tables be append-only; that is, I can only insert data into the database but never update or delete it. I would like to use LINQ to SQL to build this.
Since tables are append-only but I still need to be able to "delete" data, my thought is that each table Foo needs to have a corresponding FooDeletion table. The FooDeletion table contains a foreign key which references a Foo that has been deleted. For example, the following tables describe the state "Foos 1, 2, and 3 exist, but Foo 2 and Foo 3 have been deleted".
Foo FooDeletion
id id fooid
---- -------------
1 1 2
2 2 3
3
Although I could build an abstraction on top of the data access layer which (a) prevents direct access to LINQ to SQL entities and (b) manages deletions in this manner, one of my goals is to keep my data access layer as thin as possible, so I'd prefer to make the DataContext or entity classes do the work behind the scenes. So, I'd like to let callers use Table<Foo>.DeleteOnSubmit() like normal, and the DAL knows to add a row to FooDeletion instead of deleting a row from Foo.
I've read through "Implementing Business Logic" and "Customizing the Insert, Update, and Delete Behavior of Entity Classes", but I can't find a concrete way to implement what I want. I thought I could use the partial method DataContext.DeleteFoo() to instead call ExecuteDynamicInsert(FooDeletion), but according to this article, "If an inapplicable method is called (for example, ExecuteDynamicDelete for an object to be updated), the results are undefined".
Is this a fool's errand? Am I making this far harder on myself than I need to?
You have more than one option - you can either:
a) Override SubmitChanges, take the change set (GetChangeSet()) and translate updates and deletes into inserts.
b) Use instead-of triggers db-side to change the updates/delete behavior.
c) Add a new Delete extension method to Table that implements the behavior you want.
d) ...or combine a+b+c as needed...
if you want a big-boy enterprise quality solution, you'd put it in the database - either b) from above or CRUD procedures <- my preference... triggers are evil.
If this is a small shop, not a lot of other developers or teams, or data of minimal value such that a second or third app trying to access the data isn't likely than stick with whatever floats your boat.

How can one delete an entity in nhibernate having only its id and type?

I am wondering how can one delete an entity having just its ID and type (as in mapping) using NHibernate 2.1?
If you are using lazy loading, Load only creates a proxy.
session.Delete(session.Load(type, id));
With NH 2.1 you can use HQL. Not sure how it actually looks like, but something like this: note that this is subject to SQL injection - if possible use parametrized queries instead with SetParameter()
session.Delete(string.Format("from {0} where id = {1}", type, id));
Edit:
For Load, you don't need to know the name of the Id column.
If you need to know it, you can get it by the NH metadata:
sessionFactory.GetClassMetadata(type).IdentifierPropertyName
Another edit.
session.Delete() is instantiating the entity
When using session.Delete(), NH loads the entity anyway. At the beginning I didn't like it. Then I realized the advantages. If the entity is part of a complex structure using inheritance, collections or "any"-references, it is actually more efficient.
For instance, if class A and B both inherit from Base, it doesn't try to delete data in table B when the actual entity is of type A. This wouldn't be possible without loading the actual object. This is particularly important when there are many inherited types which also consist of many additional tables each.
The same situation is given when you have a collection of Bases, which happen to be all instances of A. When loading the collection in memory, NH knows that it doesn't need to remove any B-stuff.
If the entity A has a collection of Bs, which contains Cs (and so on), it doesn't try to delete any Cs when the collection of Bs is empty. This is only possible when reading the collection. This is particularly important when C is complex of its own, aggregating even more tables and so on.
The more complex and dynamic the structure is, the more efficient is it to load actual data instead of "blindly" deleting it.
HQL Deletes have pitfalls
HQL deletes to not load data to memory. But HQL-deletes aren't that smart. They basically translate the entity name to the corresponding table name and remove that from the database. Additionally, it deletes some aggregated collection data.
In simple structures, this may work well and efficient. In complex structures, not everything is deleted, leading to constraint violations or "database memory leaks".
Conclusion
I also tried to optimize deletion with NH. I gave up in most of the cases, because NH is still smarter, it "just works" and is usually fast enough. One of the most complex deletion algorithms I wrote is analyzing NH mapping definitions and building delete statements from that. And - no surprise - it is not possible without reading data from the database before deleting. (I just reduced it to only load primary keys.)