We seem to have verified that RavenDB is getting stale results even when we use various flavors of "WaitForNonStaleResults". Following is the fully-functional sample code (written as a standalone test so that you can copy/paste it and run it as is).
public class Cart
{
public virtual string Email { get; set; }
}
[Test]
public void StandaloneTestForPostingOnStackOverflow()
{
var testDocument = new Cart { Email = "test#abc.com" };
var documentStore = new EmbeddableDocumentStore { RunInMemory = true };
documentStore.Initialize();
using (var session = documentStore.OpenSession())
{
using (var transaction = new TransactionScope())
{
session.Store(testDocument);
session.SaveChanges();
transaction.Complete();
}
using (var transaction = new TransactionScope())
{
var documentToDelete = session
.Query<Cart>()
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
.First(c => c.Email == testDocument.Email);
session.Delete(documentToDelete);
session.SaveChanges();
transaction.Complete();
}
RavenQueryStatistics statistics;
var actualCount = session
.Query<Cart>()
.Statistics(out statistics)
.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
.Count(c => c.Email == testDocument.Email);
Assert.IsFalse(statistics.IsStale);
Assert.AreEqual(0, actualCount);
}
}
We have tried every flavor of WaitForNonStaleResults and there is no change. Waiting for non-stale results seems to work fine for the update, but not for the delete.
Update
Some things which I have tried:
Using separate sessions for each action. Outcome: no difference. Same successes and fails.
Putting Thread.Current.Sleep(500) before the final query. Outcome: success. If I sleep the thread for a half second, the count comes back zero like it should.
Re: my comment above on stale results, AllowNonAuthoritiveInformation wasn't working. Needing to put WaitForNonStaleResults in each query, which is the usual "answer" to this issue, feels like a massive "code smell" (as much as I normally hate the term, it seems completely appropriate here).
The only real solution I've found so far is:
var store = new DocumentStore(); // do whatever
store.DatabaseCommands.DisableAllCaching();
Performance suffers accordingly, but I think slower performance is far less of a sin than unreliable if not outright inaccurate results.
This is an old question, but I recently ran across this problem as well. I was able to work around it by changing the convention on the DocumentStore used by the session to make it wait for non stale as of last write:
session.DocumentStore.DefaultQueryingConsistency = ConsistencyOptions.AlwaysWaitForNonStaleResultsAsOfLastWrite;
This made it so that I didn't have to customize every query run after. That said, I believe this only works for queries. It definitely doesn't work on patches as I have found out through testing.
I would also be careful about this and only use it around the code that's needed as it can cause performance issues. You can set the store back to its default with the following:
session.DocumentStore.DefaultQueryingConsistency = ConsistencyOptions.None;
The problem isn't related to deletes, it is related to using TransactionScope. The problem here is that DTC transaction complete in an asynchronous manner.
To fix this issue, what you need to do is call:
session.Advanced.AllowNonAuthoritiveInformation = false;
Which will force RavenDB to wait for the transaction to complete.
Related
I want to sync data from remote api! something like 1M record! but the whole process talks about 5Mins.
as a user experience, that's very bad thing to do! I want the whole process takes less than 1S!
I mainly use .net core web api 6.0 with SQLite, EF Core!
I search a lot and I used BulkInsert! and BlukSaveChangesAsync and same it talks a long time!
Same it's very bad user experience. I tried the following commented solutions and same problem! I want to make it very fast! as the user! does not feel that there is sync in background or thow.
Note: also I stopped all indexes while inserting the data, to make the process faster! and same problem.
Note: My app is Monolithic.
I know I can use something like Azure function but that would be considered as over engineering.
I want the simpliest way to solve this! I searched a lot in YouTube, GitHub and Stack overflow and I found nothing that would help me as I wish.
Note: I'm writing the data in two tables!
first table: contains only 5 rows.
second table: contains 3 rows.
`
public async Task<IEnumerable<DatumEntity>> SyncCities()
{
var httpClient = _httpClientFactory.CreateClient("Cities");
var httpResponseMessage = await httpClient.GetAsync(
"API_KEY_WITH_SOME_CREDS");
if (httpResponseMessage.IsSuccessStatusCode)
{
using var contentStream =
await httpResponseMessage.Content.ReadAsStreamAsync();
var result = await JsonSerializer.DeserializeAsync<Result>(contentStream);
var datums = result!.Data;
if (datums.Any())
{
//First solution
//_context.Datums.AddRange(datums);
//await _context.SaveChangesAsync();
//second solution
//await _context.BulkInsertAsync(datums);
//await _context.BulkSaveChangesAsync();
//Thread solution
//ThreadPool.QueueUserWorkItem(async delegate
//{
// _context.Datums.AddRange(datums);
// await _context.BulkSaveChangesAsync();
//});
}
return datums;
}
return Enumerable.Empty<DatumEntity>();
}
Tried: I tried bulkInsert! tried ThreadPool!stopped all indexes! I tried a lot of things. and nothing helped me as I tought!
I want the whole process takes less than 1S as the user does not move away from the application! because the bad user experience.
This ThreadPool solved the issue for me:
if (datums.Any())
{
ThreadPool.QueueUserWorkItem(async _ =>
{
using (var scope = _serviceScopeFactory.CreateScope())
{
var context = scope.ServiceProvider
.GetRequiredService<CitiesDbContext>();
context.Datums.AddRange(datums);
await context.SaveChangesAsync();
};
});
}
This code always fails with a ConcurrencyException:
[Test]
public void EventOrderingCode_Fails_WithConcurrencyException()
{
Guid id = Guid.NewGuid();
using (var scope1 = new TransactionScope())
using (var session = DataAccess.NewOpenSession)
{
session.Advanced.UseOptimisticConcurrency = true;
session.Advanced.AllowNonAuthoritativeInformation = false;
var ent1 = new CTEntity
{
Id = id,
Name = "George"
};
using (var scope2 = new TransactionScope(TransactionScopeOption.RequiresNew))
{
session.Store(ent1);
session.SaveChanges();
scope2.Complete();
}
var ent2 = session.Load<CTEntity>(id);
ent2.Name = "Gina";
session.SaveChanges();
scope1.Complete();
}
}
It fails at the last session.SaveChanges. Stating that it is using a NonCurrent etag. If I use Required instead of RequiresNew for scope2 - i.e. using the same Transaction. It works.
Now, since I load the entity (ent2) it should be using the newest Etag unless this is some cached value attached to scope1 that I am using (but I have disabled Caching). So I do not understand why this fails.
I really need this setup. In the production code the outer TransactionScope is created by NServiceBus, and the inner is for controlling an aspect of event ordering. It cannot be the same Transaction.
And I need the optimistic concurrency too - if other threads uses the entity at the same time.
BTW: This is using Raven 2.0.3.0
Since no one else have answered, I had better give it a go myself.
It turns out this was a human error. Due to a bad configuration of our IOC container the DataAccess.NewOpenSession gave me the same Session all the time (across other tests). In other words Raven works as expected :)
Before I found out about this I also experimented with using TransactionScopeOption.Suppress instead of RequiresNew. That also worked. Then I just had to make sure that whatever I did in the suppressed scope could not fail. Which was a valid option in my case.
I'm using Ravendb 2.5. I have the situation that I need wait for none-stale index first, and if it's timeout after 15 seconds, query the stale index rather than throw a timeout exception. Here is my code.
RavenQueryStatistics stats;
var result = queryable.Statistics(out stats).Take(maxPageSize).ToList();
if (stats.IsStale)
{
try
{
return queryable.Customize(x => x.WaitForNonStaleResultsAsOfLastWrite(TimeSpan.FromSeconds(15))).ToList();
}
catch (Exception)
{
return result;
}
}
else
{
return result;
}
I need add extension method to make the above code work for all the queries, for example:
public static List ToList(this IRavenQueryable queryable)
I may also need add extension method to overwrite: .All(), .Any(), .Contains(), .Count(), .ToList(), .ToArray(), .ToDictionary(), .First(), .FirstOrDefault(), .Single(), .SingleOrDefault(), .Last(), .LastOrDefault(), etc.
I wonder if there is any other better solution for this. What's the best practice?
Does ravendb has an AOP cut point that when timeout exception throws, we can do something to change the query the stable index and return stale results?
Depending on your requirements, I would prefer to have this as two separate calls from the end-user client. First issue a query without waiting for non-stale results, show the query results immediatly to the end-user. If the results are stale then make that visible to the end-user, and make a second query to the server where you wait for non-stale results.
That way the end-user will always get something to see quickly without having to wait potentially 15 seconds even for stale results.
You may force the document store to always wait for the last write, then you could use queries without customize instructions
documentStore.Conventions.DefaultQueryingConsistency = ConsistencyOptions.QueryYourWrites;
Note: If the index is very busy or you write something to DB and query relative data immediately, using async instead of timeout will be better.
//deal with very busy index
using (var session = documentStore.OpenAsyncSession())
{
var result = await session.Query<...>()
.Where(x => ...)
.ToListAsync();
}
//write then read
using (var session = documentStore.OpenAsyncSession())
{
await session.StoreAsync(entity);
await session.SaveChangesAsync();
//query relative data of entity
var result = await session.Query<...>()
.Where(x => ...)
.ToListAsync();
}
I am working on an app, and need to keep track of how any views a page has. Almost like how SO does it. It is a value used to determine how popular a given page is.
I am concerned that writing to the DB every time a new view needs to be recorded will impact performance. I know this borderline pre-optimization, but I have experienced the problem before. Anyway, the value doesn't need to be real time; it is OK if it is delayed by 10 minutes or so. I was thinking that caching the data, and doing one large write every X minutes should help.
I am running on Windows Azure, so the Appfabric cache is available to me. My original plan was to create some sort of compound key (PostID:UserID), and tag the key with "pageview". Appfabric allows you to get all keys by tag. Thus I could let them build up, and do one bulk insert into my table instead of many small writes. The table looks like this, but is open to change.
int PageID | guid userID | DateTime ViewTimeStamp
The website would still get the value from the database, writes would just be delayed, make sense?
I just read that the Windows Azure Appfabric cache does not support tag based searches, so it pretty much negates my idea.
My question is, how would you accomplish this? I am new to Azure, so I am not sure what my options are. Is there a way to use the cache without tag based searches? I am just looking for advice on how to delay these writes to SQL.
You might want to take a look at http://www.apathybutton.com (and the Cloud Cover episode it links to), which talks about a highly scalable way to count things. (It might be overkill for your needs, but hopefully it gives you some options.)
You could keep a queue in memory and on a timer drain the queue, collapse the queued items by totaling the counts by page and write in one SQL batch/round trip. For example, using a TVP you could write the queued totals with one sproc call.
That of course doesn't guarantee the view counts get written since its in memory and latently written but page counts shouldn't be critical data and crashes should be rare.
You might want to have a look at how the "diagnostics" feature in Azure works. Not because you would use diagnostics for what you are doing at all, but because it is dealing with a similar problem and may provide some inspiration. I am just about to implement a data auditing feature and I want to log that to table storage so also want to delay and bunch the updates together and I have taken a lot of inspiration from diagnostics.
Now, the way Diagnostics in Azure works is that each role starts a little background "transfer" thread. So, whenever you write any traces then that gets stored in a list in local memory and the background thread will (by default) bunch all the requests up and transfer them to table storage every minute.
In your scenario, I would let each role instance keep track of a count of hits and then use a background thread to update the database every minute or so.
I would probably use something like a static ConcurrentDictionary (or one hanging off a singleton) on each webrole with each hit incrementing the counter for the page identifier. You'd need to have some thread handling code to allow multiple request to update the same counter in the list. Alternatively, just allow each "hit" to add a new record to a shared thread-safe list.
Then, have a background thread once per minute increment the database with the number of hits per page since last time and reset the local counter to 0 or empty the shared list if you are going with that approach (again, be careful about the multi threading and locking).
The important thing is to make sure your database update is atomic; If you do a read-current-count from the database, increment it and then write it back then you may have two different web role instances doing this at the same time and thus losing one update.
EDIT:
Here is a quick sample of how you could go about this.
using System.Collections.Concurrent;
using System.Data.SqlClient;
using System.Threading;
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static void Main(string[] args)
{
// You would put this in your Application_start for the web role
Thread hitTransfer = new Thread(() => HitCounter.Run(new TimeSpan(0, 0, 1))); // You'd probably want the transfer to happen once a minute rather than once a second
hitTransfer.Start();
//Testing code - this just simulates various web threads being hit and adding hits to the counter
RunTestWorkerThreads(5);
Thread.Sleep(5000);
// You would put the following line in your Application shutdown
HitCounter.StopRunning(); // You could do some cleverer stuff with aborting threads, joining the thread etc but you probably won't need to
Console.WriteLine("Finished...");
Console.ReadKey();
}
private static void RunTestWorkerThreads(int workerCount)
{
Thread[] workerThreads = new Thread[workerCount];
for (int i = 0; i < workerCount; i++)
{
workerThreads[i] = new Thread(
(tagname) =>
{
Random rnd = new Random();
for (int j = 0; j < 300; j++)
{
HitCounter.LogHit(tagname.ToString());
Thread.Sleep(rnd.Next(0, 5));
}
});
workerThreads[i].Start("TAG" + i);
}
foreach (var t in workerThreads)
{
t.Join();
}
Console.WriteLine("All threads finished...");
}
}
public static class HitCounter
{
private static System.Collections.Concurrent.ConcurrentQueue<string> hits;
private static object transferlock = new object();
private static volatile bool stopRunning = false;
static HitCounter()
{
hits = new ConcurrentQueue<string>();
}
public static void LogHit(string tag)
{
hits.Enqueue(tag);
}
public static void Run(TimeSpan transferInterval)
{
while (!stopRunning)
{
Transfer();
Thread.Sleep(transferInterval);
}
}
public static void StopRunning()
{
stopRunning = true;
Transfer();
}
private static void Transfer()
{
lock(transferlock)
{
var tags = GetPendingTags();
var hitCounts = from tag in tags
group tag by tag
into g
select new KeyValuePair<string, int>(g.Key, g.Count());
WriteHits(hitCounts);
}
}
private static void WriteHits(IEnumerable<KeyValuePair<string, int>> hitCounts)
{
// NOTE: I don't usually use sql commands directly and have not tested the below
// The idea is that the update should be atomic so even though you have multiple
// web servers all issuing similar update commands, potentially at the same time,
// they should all commit. I do urge you to test this part as I cannot promise this code
// will work as-is
//using (SqlConnection con = new SqlConnection("xyz"))
//{
// foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
// {
// var cmd = con.CreateCommand();
// cmd.CommandText = "update hits set count = count + #count where tag = #tag";
// cmd.Parameters.AddWithValue("#count", hitCount.Value);
// cmd.Parameters.AddWithValue("#tag", hitCount.Key);
// cmd.ExecuteNonQuery();
// }
//}
Console.WriteLine("Writing....");
foreach (var hitCount in hitCounts.OrderBy(h => h.Key))
{
Console.WriteLine(String.Format("{0}\t{1}", hitCount.Key, hitCount.Value));
}
}
private static IEnumerable<string> GetPendingTags()
{
List<string> hitlist = new List<string>();
var currentCount = hits.Count();
for (int i = 0; i < currentCount; i++)
{
string tag = null;
if (hits.TryDequeue(out tag))
{
hitlist.Add(tag);
}
}
return hitlist;
}
}
Given 1000 documents with a complex data structure. for e.g. a Car class that has three properties, Make and Model and one Id property.
What is the most efficient way in C# to push these documents to raven db (preferably in a batch) without having to query the raven collection individually to find which to update and which to insert. At the moment I have to going like so. Which is totally inefficient.
note : _session is a wrapper on the IDocumentSession where Commit calls SaveChanges and Add calls Store.
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var page = 0;
const int total = 30;
do
{
var paged = sales.Skip(page*total).Take(total);
if (!paged.Any()) return;
foreach (var sale in paged)
{
var current = sale;
var existing = _session.Query<Sale>().FirstOrDefault(s => s.Id == current.Id);
if (existing != null)
existing = current;
else
_session.Add(current);
}
_session.Commit();
page++;
} while (true);
}
Your session code doesn't seem to track with the RavenDB api (we don't have Add or Commit).
Here is how you do this in RavenDB
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
sales.ForEach(session.Store);
session.SaveChanges();
}
Your code sample doesn't work at all. The main problem is that you cannot just switch out the references and expect RavenDB to recognize that:
if (existing != null)
existing = current;
Instead you have to update each property one-by-one:
existing.Model = current.Model;
existing.Make = current.Model;
This is the way you can facilitate change-tracking in RavenDB and many other frameworks (e.g. NHibernate). If you want to avoid writing this uinteresting piece of code I recommend to use AutoMapper:
existing = Mapper.Map<Sale>(current, existing);
Another problem with your code is that you use Session.Query where you should use Session.Load. Remember: If you query for a document by its id, you will always want to use Load!
The main difference is that one uses the local cache and the other not (the same applies to the equivalent NHibernate methods).
Ok, so now I can answer your question:
If I understand you correctly you want to save a bunch of Sale-instances to your database while they should either be added if they didn't exist or updated if they existed. Right?
One way is to correct your sample code with the hints above and let it work. However that will issue one unnecessary request (Session.Load(existingId)) for each iteration. You can easily avoid that if you setup an index that selects all the Ids of all documents inside your Sales-collection. Before you then loop through your items you can load all the existing Ids.
However, I would like to know what you actually want to do. What is your domain/use-case?
This is what works for me right now. Note: The InjectFrom method comes from Omu.ValueInjecter (nuget package)
private void PublishSalesToRaven(IEnumerable<Sale> sales)
{
var ids = sales.Select(i => i.Id);
var existingSales = _ravenSession.Load<Sale>(ids);
existingSales.ForEach(s => s.InjectFrom(sales.Single(i => i.Id == s.Id)));
var existingIds = existingSales.Select(i => i.Id);
var nonExistingSales = sales.Where(i => !existingIds.Any(x => x == i.Id));
nonExistingSales.ForEach(i => _ravenSession.Store(i));
_ravenSession.SaveChanges();
}