given a list of objects using C# push them to ravendb without knowing which ones already exist - batch-processing

Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}

Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}

Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?

This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}

Related

EF Core include related ids but not related entities

Before I go creating my own SQL scripts by hand for this, I have a scenario where I want to get the ids of a foreign key, but not the entirety of the foreign entities, using EF Core.
Right now, I'm getting the ids manually by looping through the related entities and extracting the ids one at a time, like so:
List<int> ClientIds = new List<int>();
for (var i = 0; i < Clients.length; i++){
ClientIds.add(Clients.ElementAt(i).Id);
}
To my understanding, this will either cause data returns much larger than needed (my entity + every related entity) or a completely separate query to be run for each related entity I access, which obviously I don't want to do if I can avoid it.
Is there a straightforward way to accomplish this in EF Core, or do I need to head over the SQL side and handle it myself?
Model:
public class UserViewModel {
public UserViewModel(UserModel userModel){
ClientIds = new List<int>();
for (var i = 0; i < UserModel.Clients.length; i++){
ClientIds.add(Clients.ElementAt(i).Id);
}
//...all the other class asignments
}
public IEnumerable<int> ClientIds {get;set;}
//...all the other irrelevant properties
}
Basically, I need my front-end to know which Client to ask for later.
It looks like you are trying to query this from within the parent entity. I.e.
public class Parent
{
public virtual ICollection<Client> Clients { get; set; }
public void SomeMethod()
{
// ...
List<int> ClientIds = new List<int>();
for (var i = 0; i < Clients.length; i++)
{
ClientIds.add(Clients.ElementAt(i).Id);
}
// ...
}
}
This is not ideal because unless your Clients were eager loaded when the Parent was loaded, this would trigger a lazy load to load all of the Clients data when all you want is the IDs. Still, it's not terrible as it would only result in one DB call to load the clients.
If they are already loaded, there is a more succinct way to get the IDs:
List<int> ClientIds = Clients.Select(x => x.Id).ToList();
Otherwise, if you have business logic involving the Parent and Clients where-by you want to be more selective about when and how the data is loaded, it is better to leave the entity definition to just represent the data state and basic rules/logic about the data, and move selective business logic outside of the entity into a business logic container that scopes the DbContext and queries against the entities to fetch what it needs.
For instance, if the calling code went and did this:
var parent = _context.Parents.Single(x => x.ParentId == parentId);
parent.SomeMethod(); // which resulted in checking the Client IDs...
The simplest way to avoid the extra DB call is to ensure the related entities are eager loaded.
var parent = _context.Parents
.Include(x => x.Clients)
.Single(x => x.ParentId == parentId);
parent.SomeMethod(); // which resulted in checking the Client IDs...
The problem with this approach is that it will still load all details about all of the Clients, and you end up in a situation where you end up defaulting to eager loading everything all of the time because the code might call something like that SomeMethod() which expects to find related entity details. This is the use-case for leveraging lazy loading, but that does have the performance overheads of the ad-hoc DB hits and ensuring that the entity's DbContext is always available to perform the read if necessary.
Instead, if you move the logic out of the entity and into the caller or another container that can take the relevant details, so that this caller projects down the data it will need from the entities in an efficient query:
var parentDetails = _context.Parents
.Where(x => x.ParentId == parentId)
.Select(x => new
{
x.ParentId,
// other details from parent or related entities...
ClientIds = x.Clients.Select(c => c.Id).ToList()
}).Single();
// Do logic that SomeMethod() would have done here, or pass these
// loaded details to a method / service to do the work rather than
// embedding it in the Entity.
This doesn't load a Parent entity, but rather executes a query to load just the details about the parent and related entities that we need. In this example it is projected into an anonymous type to hold the information we can later consume, but if you are querying the data to send to a view then you can project it directly into a view model or DTO class to serialize and send.

Why does SQLite in memory returns different result when using AsNoTracking?

I was writing tests using SQLite in-memory database with XUnit and ASP.NET Core 3.1 and found strange behavior.
Lets say that we have User model and we want to change property IsActive to false:
var u = new User {Id = Guid.NewGuid(), IsActive = true};
_db.Users.Add(u);
_db.SaveChanges();
u.IsActive = false;
// Returns false
var isActive = _db.Users.Single(x => x.Id == u.Id).IsActive;
// Returns true
var isActiveNoTracking = _db.Users.AsNoTracking().Single(x => x.Id == u.Id).IsActive;
// Fails.
Assert.Equal(isActive, isActiveNoTracking);
I get different result depending if AsNoTracking() is called or not. Why is this happening? Isn't AsNoTracking() supposed to stop tracking changes made on fetched object, not to mess with data that was already changed?
If I call SaveChanges() after changing the property then it is all good (as expected):
var u = new User {Id = Guid.NewGuid(), IsActive = true};
_db.Users.Add(u);
_db.SaveChanges();
u.IsActive = false;
_db.SaveChanges();
// Returns false
var isActive = _db.Users.Single(x => x.Id == u.Id).IsActive;
// Returns false
var isActiveNoTracking = _db.Users.AsNoTracking().Single(x => x.Id == u.Id).IsActive;
// Success.
Assert.Equal(isActive, isActiveNoTracking);
So I am confused, I'm not sure when SQLite in-memory actually commits changes. Sometimes you can fetch changes from db without calling SaveChanges() but sometimes you cannot.
Here is code related to db
public class SqliteInMemoryAppDbContext : AppDbContext
{
public SqliteInMemoryAppDbContext(IConfiguration configuration) : base(configuration)
{
}
protected override void OnConfiguring(DbContextOptionsBuilder options)
{
var connection = new SqliteConnection("DataSource=:memory:");
connection.Open();
options.UseSqlite(connection);
}
}
// I create db context for each test like this and dispose it after each test.
var _db = new SqliteInMemoryAppDbContext(null);
_db.Database.EnsureDeleted();
_db.Database.EnsureCreated();
So I am confused, I'm not sure when SQLite in-memory actually commits changes. Sometimes you can fetch changes from db without calling SaveChanges() but sometimes you cannot.
This is impossible. In order for something to be saved to the DB you need to call SaveChanges. What happens here is that you see local objects and you assume that they are stored in your DB. I generally suggest that you use a a DB query tool to learn how it works because it can be difficult at first.
Entity Framework has some local objects that it stores. For example at your first query.
// returns true, because it checks the db due to no tracking
_db.Users.AsNoTracking().Where(x => x.IsActive).OrderBy(x=>x.Username).ToList()[0].IsActive
// returns false, it finds the local reference
_db.Users.Where(x => x.IsActive).OrderBy(x=>x.Username).ToList()[0].IsActive
As you can see from the comments above it has different behavior based on the commands. It's not about when changes are saved to the db. This happens only if you call SaveChanges. What you are confused is for when the 'queries' you write with EF look at the DB or locally.
Generally for SQL at least I like to work with SQL profiler to see what queries EF sends to the Database. For example in your case you will have a query with where and order by send to the db.
EDIT:
About how to understand when the db is called or not i suggest reading here.
To summarize AsNoTracking always creates the new entity which means that it will look in the db for it. Instead the other commands in your example first look locally for the object.

Update Document with external object

i have a database containing Song objects. The song class has > 30 properties.
My Music Tagging application is doing changes on a song on the file system.
It then does a lookup in the database using the filename.
Now i have a Song object, which i created in my Tagging application by reading the physical file and i have a Song object, which i have just retrieved from the database and which i want to update.
I thought i just could grab the ID from the database object, replace the database object with my local song object, set the saved id and store it.
But Raven claims that i am replacing the object with a different object.
Do i really need to copy every single property over, like this?
dbSong.Artist = songfromFilesystem.Artist;
dbSong.Album = songfromFileSystem.Album;
Or are there other possibilities.
thanks,
Helmut
Edit:
I was a bit too positive. The suggestion below works only in a test program.
When doing it in my original code i get following exception:
Attempted to associate a different object with id 'TrackDatas/3452'
This is produced by following code:
try
{
originalFileName = Util.EscapeDatabaseQuery(originalFileName);
// Lookup the track in the database
var dbTracks = _session.Advanced.DocumentQuery<TrackData, DefaultSearchIndex>().WhereEquals("Query", originalFileName).ToList();
if (dbTracks.Count > 0)
{
track.Id = dbTracks[0].Id;
_session.Store(track);
_session.SaveChanges();
}
}
catch (Exception ex)
{
log.Error("UpdateTrack: Error updating track in database {0}: {1}", ex.Message, ex.InnerException);
}
I am first looking up a song in the database and get a TrackData object in dbTracks.
The track object is also of type TrackData and i just put the ID from the object just retrieved and try to store it, which gives the above error.
I would think that the above message tells me that the objects are of different types, which they aren't.
The same error happens, if i use AutoMapper.
any idea?
You can do what you're trying: replace an existing object using just the ID. If it's not working, you might be doing something else wrong. (In which case, please show us your code.)
When it comes to updating existing objects in Raven, there are a few options:
Option 1: Just save the object using the same ID as an existing object:
var song = ... // load it from the file system or whatever
song.Id = "Songs/5"; // Set it to an existing song ID
DbSession.Store(song); // Overwrites the existing song
Option 2: Manually update the properties of the existing object.
var song = ...;
var existingSong = DbSession.Load<Song>("Songs/5");
existingSong.Artist = song.Artist;
existingSong.Album = song.Album;
Option 3: Dynamically update the existing object:
var song = ...;
var existingSong = DbSession.Load<Song>("Songs/5");
existingSong.CopyFrom(song);
Where you've got some code like this:
// Inside Song.cs
public virtual void CopyFrom(Song other)
{
var props = typeof(Song)
.GetProperties(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance)
.Where(p => p.CanWrite);
foreach (var prop in props)
{
var source = prop.GetValue(other);
prop.SetValue(this, source);
}
}
If you find yourself having to do this often, use a library like AutoMapper.
Automapper can automatically copy one object to another with a single line of code.
Now that you've posted some code, I see 2 things:
First, is there a reason you're using the Advanced.DocumentQuery syntax?
// This is advanced query syntax. Is there a reason you're using it?
var dbTracks = _session.Advanced.DocumentQuery<TrackData, DefaultSearchIndex>().WhereEquals("Query", originalFileName).ToList();
Here's how I'd write your code using standard LINQ syntax:
var escapedFileName = Util.EscapeDatabaseQuery(originalFileName);
// Find the ID of the existing track in the database.
var existingTrackId = _session.Query<TrackData, DefaultSearchIndex>()
.Where(t => t.Query == escapedFileName)
.Select(t => t.Id);
if (existingTrackId != null)
{
track.Id = existingTrackId;
_session.Store(track);
_session.SaveChanges();
}
Finally, #2: what is track? Was it loaded via session.Load or session.Query? If so, that's not going to work, and it's causing your problem. If track is loaded from the database, you'll need to create a new object and save that:
var escapedFileName = Util.EscapeDatabaseQuery(originalFileName);
// Find the ID of the existing track in the database.
var existingTrackId = _session.Query<TrackData, DefaultSearchIndex>()
.Where(t => t.Query == escapedFileName)
.Select(t => t.Id);
if (existingTrackId != null)
{
var newTrack = new Track(...);
newTrack.Id = existingTrackId;
_session.Store(newTrack);
_session.SaveChanges();
}
This means you already have a different object in the session with the same id. The fix for me was to use a new session.

Updating complex type with ef code first

I have a complex type called account, which contains a list of licenses.
Licenses in turn contains a list of domains (a domain is a simple id + url string).
In my repository I have this code
public void SaveLicense(int accountId, License item)
{
Account account = GetById(accountId);
if (account == null)
{
return;
}
if (item.Id == 0)
{
account.Licenses.Add(item);
}
else
{
ActiveContext.Entry(item).State = EntityState.Modified;
}
ActiveContext.SaveChanges();
}
When I try to save an updated License (with modified domains) what happens is that strings belonging straight to the license get updated just fine.
However no domains get updated.
I should mention that what I have done is allow the user to add and remove domains in the user interface. Any new domains get id=0 and any deleted domains are simply not in the list.
so what I want is
Any domains that are in the list and database and NOT changed - nothing happens
Any domains that are in the list and database, but changed in the list - database gets updated
Any domains with id=0 should be inserted (added) into database
Any domains NOT in the list but that are in the database should be removed
I have played a bit with it with no success but I have a sneaky suspicion that I am doing something wrong in the bigger picture so I would love tips on if I am misunderstanding something design-wise or simply just missed something.
Unfortunately updating object graphs - entities with other related entities - is a rather difficult task and there is no very sophisticated support from Entity Framework to make it easy.
The problem is that setting the state of an entity to Modified (or generally to any other state) only influences the entity that you pass into DbContext.Entry and only its scalar properties. It has no effect on its navigation properties and related entities.
You must handle this object graph update manually by loading the entity that is currently stored in the database including the related entities and by merging all changes you have done in the UI into that original graph. Your else case could then look like this:
//...
else
{
var licenseInDb = ActiveContext.Licenses.Include(l => l.Domains)
.SingleOrDefault(l => l.Id == item.Id)
if (licenseInDb != null)
{
// Update the license (only its scalar properties)
ActiveContext.Entry(licenseInDb).CurrentValus.SetValues(item);
// Delete domains from DB that have been deleted in UI
foreach (var domainInDb in licenseInDb.Domains.ToList())
if (!item.Domains.Any(d => d.Id == domainInDb.Id))
ActiveContext.Domains.Remove(domainInDb);
foreach (var domain in item.Domains)
{
var domainInDb = licenseInDb.Domains
.SingleOrDefault(d => d.Id == domain.Id);
if (domainInDb != null)
// Update existing domains
ActiveContext.Entry(domainInDb).CurrentValus.SetValues(domain);
else
// Insert new domains
licenseInDb.Domains.Add(domain);
}
}
}
ActiveContext.SaveChanges();
//...
You can also try out this project called "GraphDiff" which intends to do this work in a generic way for arbitrary detached object graphs.
The alternative is to track all changes in some custom fields in the UI layer and then evaluate the tracked state changes when the data get posted back to set the appropriate entity states. Because you are in a web application it basically means that you have to track changes in the browser (most likely requiring some Javascript) while the user changes values, adds new items or deletes items. In my opinion this solution is even more difficult to implement.
This should be enough to do what you are looking to do. Let me know if you have more questions about the code.
public void SaveLicense(License item)
{
if (account == null)
{
context.Licenses.Add(item);
}
else if (item.Id > 0)
{
var currentItem = context.Licenses
.Single(t => t.Id == item.Id);
context.Entry(currentItem ).CurrentValues.SetValues(item);
}
ActiveContext.SaveChanges();
}

Raven DB: How can I delete all documents of a given type

More specifically in Raven DB, I want to create a generic method with a signature like;
public void Clear<T>() {...
Then have Raven DB clear all documents of the given type.
I understand from other posts by Ayende to similar questions that you'd need an index in place to do this as a batch.
I think this would involve creating an index that maps each document type - this seems like a lot of work.
Does anyone know an efficient way of creating a method like the above that will do a set delete directly in the database?
I assume you want to do this from the .NET client. If so, use the standard DocumentsByEntityName index:
var indexQuery = new IndexQuery { Query = "Tag:" + collectionName };
session.Advanced.DocumentStore.DatabaseCommands.DeleteByIndex(
"Raven/DocumentsByEntityName",
indexQuery,
new BulkOperationOptions { AllowStale = true });
var hilo = session.Advanced.DocumentStore.DatabaseCommands.Get("Raven/H‌​ilo/", collectionName);
if (hilo != null) {
session.Advanced.DocumentStore.DatabaseCommands.Delete(hilo.‌​Key, hilo.Etag);
}
Where collectionName is the actual name of your collection.
The first operation deletes the items. The second deletes the HiLo file.
Also check out the official documentation - How to delete or update documents using index.
After much experimentation I found the answer to be quite simple, although far from obvious;
public void Clear<T>()
{
session.Advanced.DocumentStore.DatabaseCommands.PutIndex(indexName, new IndexDefinitionBuilder<T>
{
Map = documents => documents.Select(entity => new {})
});
session.Advanced.DatabaseCommands.DeleteByIndex(indexName, new IndexQuery());
}
Of course you almost certainly wouldn't define your index and do your delete in one go, I've put this as a single method for the sake of brevity.
My own implementation defines the indexes on application start as recommended by the documentation.
If you wanted to use this approach to actually index a property of T then you would need to constrain T. For example if I have an IEntity that all my document classes inherit from and this class specifies a property Id. Then a 'where T : IEntity' would allow you to use that property in the index.
It's been said in other places, but it's also worth noting that once you define a static index Raven will probably use it, this can cause your queries to seemingly not return data that you've inserted:
RavenDB Saving to disk query
I had this problem as well and this is the solution that worked for me. I'm only working in a test project, so this might be slow for a bigger db, but Ryan's answer didn't work for me.
public static void ClearDocuments<T>(this IDocumentSession session)
{
var objects = session.Query<T>().ToList();
while (objects.Any())
{
foreach (var obj in objects)
{
session.Delete(obj);
}
session.SaveChanges();
objects = session.Query<T>().ToList();
}
}
You can do that using:
http://blog.orangelightning.co.uk/?p=105