RavenDB Paging Behaviour

RavenDB Paging Behaviour - ravendb

I have the following test for skip take -
[Test]
public void RavenPagingBehaviour()
{
const int count = 2048;
var eventEntities = PopulateEvents(count);
PopulateEventsToRaven(eventEntities);
using (var session = Store.OpenSession(_testDataBase))
{
var queryable =
session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
var entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
eventEntity.Key = "Modified";
}
session.SaveChanges();
queryable = session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
Assert.AreEqual(eventEntity.Key, "Modified");
}
}
}
PopulateEventsToRaven simply adds 2048 very simple documents to the database.
The first skip take combination gets the first 1024 doucuments modifies the documents and then commits changes.
The next skip take combination again wants to get the first 1024 documents but this time it gets the document number 1024 to 2048 and hence fails the test. Why is this , I would expect the first 1024 again?
Edit: I have varified that if I dont modify the documents the behaviour is fine.

The problem is that you don't specify an order by, and that means that RavenDB is free to choose with items to return, those aren't necessarily going to be the same items that it returned in the previous call.
Use an OrderBy and it will be consistent.

Related

Hibernate Search manual indexing throw a "org.hibernate.TransientObjectException: The instance was not associated with this session"

I use Hibernate Search 5.11 on my Spring Boot 2 application, allowing to make full text research.
This librairy require to index documents.
When my app is launched, I try to re-index manually data of an indexed entity (MyEntity.class) each five minutes (for specific reason, due to my server context).
I try to index data of the MyEntity.class.
MyEntity.class has a property attachedFiles, which is an hashset, filled with a join #OneToMany(), with lazy loading mode enabled :
#OneToMany(mappedBy = "myEntity", cascade = CascadeType.ALL, orphanRemoval = true)
private Set<AttachedFile> attachedFiles = new HashSet<>();
I code the required indexing process, but an exception is thrown on "fullTextSession.index(result);" when attachedFiles property of a given entity is filled with one or more items :
org.hibernate.TransientObjectException: The instance was not associated with this session
The debug mode indicates a message like "Unable to load [...]" on entity hashset value in this case.
And if the HashSet is empty (not null, only empty), no exception is thrown.
My indexing method :
private void indexDocumentsByEntityIds(List<Long> ids) {
final int BATCH_SIZE = 128;
Session session = entityManager.unwrap(Session.class);
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<MyEntity> criteria = builder.createQuery(MyEntity.class);
Root<MyEntity> root = criteria.from(MyEntity.class);
criteria.select(root).where(root.get("id").in(ids));
TypedQuery<MyEntity> query = fullTextSession.createQuery(criteria);
List<MyEntity> results = query.getResultList();
int index = 0;
for (MyEntity result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index % BATCH_SIZE == 0 || index == ids.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
}
Does someone have an idea ?
Thanks !

This is probably caused by the "clear" call you have in your loop.
In essence, what you're doing is:
load all entities to reindex into the session
index one batch of entities
remove all entities from the session (fullTextSession.clear())
try to index the next batch of entities, even though they are not in the session anymore... ?
What you need to do is to only load each batch of entities after the session clearing, so that you're sure they are still in the session when you index them.
There's an example of how to do this in the documentation, using a scroll and an appropriate batch size: https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#search-batchindex-flushtoindexes
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.

Thanks for the explanations #yrodiere, they helped me a lot !
I chose your alternative solution :
Alternatively, you can just split your ID list in smaller lists of 128 elements, and for each of these lists, run a query to get the corresponding entities, reindex all these 128 entities, then flush and clear.
...and everything works perfectly !
Well seen !
See the code solution below :
private List<List<Object>> splitList(List<Object> list, int subListSize) {
List<List<Object>> splittedList = new ArrayList<>();
if (!CollectionUtils.isEmpty(list)) {
int i = 0;
int nbItems = list.size();
while (i < nbItems) {
int maxLastSubListIndex = i + subListSize;
int lastSubListIndex = (maxLastSubListIndex > nbItems) ? nbItems : maxLastSubListIndex;
List<Object> subList = list.subList(i, lastSubListIndex);
splittedList.add(subList);
i = lastSubListIndex;
}
}
return splittedList;
}
private void indexDocumentsByEntityIds(Class<Object> clazz, String entityIdPropertyName, List<Object> ids) {
Session session = entityManager.unwrap(Session.class);
List<List<Object>> splittedIdsLists = splitList(ids, 128);
for (List<Object> splittedIds : splittedIdsLists) {
FullTextSession fullTextSession = Search.getFullTextSession(session);
fullTextSession.setFlushMode(FlushMode.MANUAL);
fullTextSession.setCacheMode(CacheMode.IGNORE);
Transaction transaction = fullTextSession.beginTransaction();
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Object> criteria = builder.createQuery(clazz);
Root<Object> root = criteria.from(clazz);
criteria.select(root).where(root.get(entityIdPropertyName).in(splittedIds));
TypedQuery<Object> query = fullTextSession.createQuery(criteria);
List<Object> results = query.getResultList();
int index = 0;
for (Object result : results) {
index++;
try {
fullTextSession.index(result); //index each element
if (index == splittedIds.size()) {
fullTextSession.flushToIndexes(); //apply changes to indexes
fullTextSession.clear(); //free memory since the queue is processed
}
} catch (TransientObjectException toEx) {
LOGGER.info(toEx.getMessage());
throw toEx;
}
}
transaction.commit();
}
}

Update context in SQL Server from ASP.NET Core 2.2

_context.Update(v) ;
_context.SaveChanges();
When I use this code then SQL Server adds a new record instead of updating the
current context
[HttpPost]
public IActionResult PageVote(List<string> Sar)
{
string name_voter = ViewBag.getValue = TempData["Namevalue"];
int count = 0;
foreach (var item in Sar)
{
count = count + 1;
}
if (count == 6)
{
Vote v = new Vote()
{
VoteSarparast1 = Sar[0],
VoteSarparast2 = Sar[1],
VoteSarparast3 = Sar[2],
VoteSarparast4 = Sar[3],
VoteSarparast5 = Sar[4],
VoteSarparast6 = Sar[5],
};
var voter = _context.Votes.FirstOrDefault(u => u.Voter == name_voter && u.IsVoted == true);
if (voter == null)
{
v.IsVoted = true;
v.Voter = name_voter;
_context.Add(v);
_context.SaveChanges();
ViewBag.Greeting = "رای شما با موفقیت ثبت شد";
return RedirectToAction(nameof(end));
}
v.IsVoted = true;
v.Voter = name_voter;
_context.Update(v);
_context.SaveChanges();
return RedirectToAction(nameof(end));
}
else
{
return View(_context.Applicants.ToList());
}
}

You need to tell the DbContext about your entity. If you do var vote = new Vote() vote has no Id. The DbContext see this and thinks you want to Add a new entity, so it simply does that. The DbContext tracks all the entities that you load from it, but since this is just a new instance, it has no idea about it.
To actually perform an update, you have two options:
1 - Load the Vote from the database in some way; If you get an Id, use that to find it.
// Loads the current vote by its id (or whatever other field..)
var existingVote = context.Votes.Single(p => p.Id == id_from_param);
// Perform the changes you want..
existingVote.SomeField = "NewValue";
// Then call save normally.
context.SaveChanges();
2 - Or if you don't want to load it from Db, you have to manually tell the DbContext what to do:
// create a new "vote"...
var vote = new Vote
{
// Since it's an update, you must have the Id somehow.. so you must set it manually
Id = id_from_param,
// do the changes you want. Be careful, because this can cause data loss!
SomeField = "NewValue"
};
// This is you telling the DbContext: Hey, I control this entity.
// I know it exists in the DB and it's modified
context.Entry(vote).State = EntityState.Modified;
// Then call save normally.
context.SaveChanges();
Either of those two approaches should fix your issue, but I suggest you read a little bit more about how Entity Framework works. This is crucial for the success (and performance) of your apps. Especially option 2 above can cause many many issues. There's a reason why the DbContext keep track of entities, so you don't have to. It's very complicated and things can go south fast.
Some links for you:
ChangeTracker in Entity Framework Core
Working with Disconnected Entity Graph in Entity Framework Core

Why can I not use Continuation when using a proxy class to access MS CRM 2013?

So I have a standard service reference proxy calss for MS CRM 2013 (i.e. right-click add reference etc...) I then found the limitation that CRM data calls limit to 50 results and I wanted to get the full list of results. I found two methods, one looks more correct, but doesn't seem to work. I was wondering why it didn't and/or if there was something I'm doing incorrectly.
Basic setup and process
crmService = new CrmServiceReference.MyContext(new Uri(crmWebServicesUrl));
crmService.Credentials = System.Net.CredentialCache.DefaultCredentials;
var accountAnnotations = crmService.AccountSet.Where(a => a.AccountNumber = accountNumber).Select(a => a.Account_Annotation).FirstOrDefault();
Using Continuation (something I want to work, but looks like it doesn't)
while (accountAnnotations.Continuation != null)
{
accountAnnotations.Load(crmService.Execute<Annotation>(accountAnnotations.Continuation.NextLinkUri));
}
using that method .Continuation is always null and accountAnnotations.Count is always 50 (but there are more than 50 records)
After struggling with .Continutation for a while I've come up with the following alternative method (but it seems "not good")
var accountAnnotationData = accountAnnotations.ToList();
var accountAnnotationFinal = accountAnnotations.ToList();
var index = 1;
while (accountAnnotationData.Count == 50)
{
accountAnnotationData = (from a in crmService.AnnotationSet
where a.ObjectId.Id == accountAnnotationData.First().ObjectId.Id
select a).Skip(50 * index).ToList();
accountAnnotationFinal = accountAnnotationFinal.Union(accountAnnotationData).ToList();
index++;
}
So the second method seems to work, but for any number of reasons it doesn't seem like the best. Is there a reason .Continuation is always null? Is there some setup step I'm missing or some nice way to do this?

The way to get the records from CRM is to use paging here is an example with a query expression but you can also use fetchXML if you want
// Query using the paging cookie.
// Define the paging attributes.
// The number of records per page to retrieve.
int fetchCount = 3;
// Initialize the page number.
int pageNumber = 1;
// Initialize the number of records.
int recordCount = 0;
// Define the condition expression for retrieving records.
ConditionExpression pagecondition = new ConditionExpression();
pagecondition.AttributeName = "address1_stateorprovince";
pagecondition.Operator = ConditionOperator.Equal;
pagecondition.Values.Add("WA");
// Define the order expression to retrieve the records.
OrderExpression order = new OrderExpression();
order.AttributeName = "name";
order.OrderType = OrderType.Ascending;
// Create the query expression and add condition.
QueryExpression pagequery = new QueryExpression();
pagequery.EntityName = "account";
pagequery.Criteria.AddCondition(pagecondition);
pagequery.Orders.Add(order);
pagequery.ColumnSet.AddColumns("name", "address1_stateorprovince", "emailaddress1", "accountid");
// Assign the pageinfo properties to the query expression.
pagequery.PageInfo = new PagingInfo();
pagequery.PageInfo.Count = fetchCount;
pagequery.PageInfo.PageNumber = pageNumber;
// The current paging cookie. When retrieving the first page,
// pagingCookie should be null.
pagequery.PageInfo.PagingCookie = null;
Console.WriteLine("#\tAccount Name\t\t\tEmail Address");while (true)
{
// Retrieve the page.
EntityCollection results = _serviceProxy.RetrieveMultiple(pagequery);
if (results.Entities != null)
{
// Retrieve all records from the result set.
foreach (Account acct in results.Entities)
{
Console.WriteLine("{0}.\t{1}\t\t{2}",
++recordCount,
acct.EMailAddress1,
acct.Name);
}
}
// Check for more records, if it returns true.
if (results.MoreRecords)
{
// Increment the page number to retrieve the next page.
pagequery.PageInfo.PageNumber++;
// Set the paging cookie to the paging cookie returned from current results.
pagequery.PageInfo.PagingCookie = results.PagingCookie;
}
else
{
// If no more records are in the result nodes, exit the loop.
break;
}
}

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.

The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default

The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging

Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();

You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

Having difficulty with specific Future<T> query and collections with nHibernate.

For the sake of example, I am removing non-queried and non-essential data just to figure out how to do the initial query here.
I have a model structure like this.
class Path {
Guid Id { get; protected set; }
IList<Step> Steps { get; set; }
void AddStep(Step entity) {
// write up bidirectional association
}
}
class Step {
Guid Id { get; protected set; }
Path Path { get; set; }
// other data irreleveent
}
Now assuming 50000 steps, each with 5000 steps... I do realize I don't want to return all of them at once. But putting a limit on my query fetch isn't my real problem.
Here is the exact query I am attempting to use. I am getting the exception..
NHibernate.QueryException : duplicate alias: lpStep
----> System.ArgumentException : An item with the same key has already been added.
I'm not entirely sure how to handle this scenario. if I use a flat out Fetch on the Path query, I get Select+N errors from the NHibernate Profiler.
I do have batching enabled - but as far as I am aware, that only really applies to inserts, not retrievals. But in any case I am getting back these errors and not sure how to handle it. Any ideas?
using (var Transaction = Session.BeginTransaction()) {
Path lpPath = null;
Step lpStep = null;
var lpPaths = Session.QueryOver<Path>(() => lpPath)
.Take(50)
.Future<Path>();
var lpSteps = Session.QueryOver<Step>(() => lpStep)
.JoinAlias(() => lpPath.Steps, () => lpStep)
.Where(o => o.Path.Id == lpPath.Id)
.Take(12)
.Future<Step>();
Transaction.Commit();
foreach (var path in lpPaths) {
Console.WriteLine("{0} fetched {1} Steps",
path.Id, path.Steps.Count);
}
}
I basically want to say ..
Select (50) Paths, also, as a separate select but part of the same trip, Select the first (12) Steps that belong the previously selected Paths.
But if I use a flat out join, I get 110 rows, whereas I expect to have 2 tables, 1 of 50 rows, 1 of 600 rows.
Can someone explain to me what I am doing wrong?
mind you, I can do some minor alterations and the query runs, but it isn't 'optimized'. I can get the data I want, but it takes multiple trips and lazy loading. I can optimize the actual Path selection easily enough but it is those blasted Steps. If I just take a restrictive where clause out of the lpSteps query, it just returns the first 12 steps, not returning 12 steps for each query done.
I've looked at some of the other stack overflow posts on Future<T> and found them to look a lot like this. So I don't understand why it isn't working. I suspect that what is going on is this..
lpPaths runs.
lpSteps tries to run, first one succeeds.
lpSteps then tries to run again, finds it cannot redefine lpPaths.
Apocolypse
I'm really hoping someone smarter than me can enlighten me on the absolute most optimal way to write this.

i cant really understand what your use case is. why do you only need the first 12 Steps of each Path? What about batches of Steps to process
IList<Guid> pathIds;
while ((pathIds = QueryOver.For<Path>()
.Where(...)
.Projection(path => path.Id)
.SetmaxResults(100)).Count > 0)
{
int batch = 0;
const int batchsize = 600;
IList<Step> steps;
while ((steps = Session.QueryOver<Step>()
.Where(step => step.Path.Id).In(pathIds)
.Where(step => step. ...)
.SetFirstResult(batch * batchsize)
.Take(batchsize)
.List<Step>()).Count > 0)
{
DoSomething(steps);
batch++;
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

RavenDB Paging Behaviour - ravendb

The problem is that you don't specify an order by, and that means that RavenDB is free to choose with items to return, those aren't necessarily going to be the same items that it returned in the previous call. Use an OrderBy and it will be consistent.

Related

Hibernate Search manual indexing throw a "org.hibernate.TransientObjectException: The instance was not associated with this session"

Update context in SQL Server from ASP.NET Core 2.2

Why can I not use Continuation when using a proxy class to access MS CRM 2013?

Proper Way to Retrieve More than 128 Documents with RavenDB

Having difficulty with specific Future<T> query and collections with nHibernate.

Categories

Resources