For the sake of example, I am removing non-queried and non-essential data just to figure out how to do the initial query here.
I have a model structure like this.
class Path {
Guid Id { get; protected set; }
IList<Step> Steps { get; set; }
void AddStep(Step entity) {
// write up bidirectional association
}
}
class Step {
Guid Id { get; protected set; }
Path Path { get; set; }
// other data irreleveent
}
Now assuming 50000 steps, each with 5000 steps... I do realize I don't want to return all of them at once. But putting a limit on my query fetch isn't my real problem.
Here is the exact query I am attempting to use. I am getting the exception..
NHibernate.QueryException : duplicate alias: lpStep
----> System.ArgumentException : An item with the same key has already been added.
I'm not entirely sure how to handle this scenario. if I use a flat out Fetch on the Path query, I get Select+N errors from the NHibernate Profiler.
I do have batching enabled - but as far as I am aware, that only really applies to inserts, not retrievals. But in any case I am getting back these errors and not sure how to handle it. Any ideas?
using (var Transaction = Session.BeginTransaction()) {
Path lpPath = null;
Step lpStep = null;
var lpPaths = Session.QueryOver<Path>(() => lpPath)
.Take(50)
.Future<Path>();
var lpSteps = Session.QueryOver<Step>(() => lpStep)
.JoinAlias(() => lpPath.Steps, () => lpStep)
.Where(o => o.Path.Id == lpPath.Id)
.Take(12)
.Future<Step>();
Transaction.Commit();
foreach (var path in lpPaths) {
Console.WriteLine("{0} fetched {1} Steps",
path.Id, path.Steps.Count);
}
}
I basically want to say ..
Select (50) Paths, also, as a separate select but part of the same trip, Select the first (12) Steps that belong the previously selected Paths.
But if I use a flat out join, I get 110 rows, whereas I expect to have 2 tables, 1 of 50 rows, 1 of 600 rows.
Can someone explain to me what I am doing wrong?
mind you, I can do some minor alterations and the query runs, but it isn't 'optimized'. I can get the data I want, but it takes multiple trips and lazy loading. I can optimize the actual Path selection easily enough but it is those blasted Steps. If I just take a restrictive where clause out of the lpSteps query, it just returns the first 12 steps, not returning 12 steps for each query done.
I've looked at some of the other stack overflow posts on Future<T> and found them to look a lot like this. So I don't understand why it isn't working. I suspect that what is going on is this..
lpPaths runs.
lpSteps tries to run, first one succeeds.
lpSteps then tries to run again, finds it cannot redefine lpPaths.
Apocolypse
I'm really hoping someone smarter than me can enlighten me on the absolute most optimal way to write this.
i cant really understand what your use case is. why do you only need the first 12 Steps of each Path? What about batches of Steps to process
IList<Guid> pathIds;
while ((pathIds = QueryOver.For<Path>()
.Where(...)
.Projection(path => path.Id)
.SetmaxResults(100)).Count > 0)
{
int batch = 0;
const int batchsize = 600;
IList<Step> steps;
while ((steps = Session.QueryOver<Step>()
.Where(step => step.Path.Id).In(pathIds)
.Where(step => step. ...)
.SetFirstResult(batch * batchsize)
.Take(batchsize)
.List<Step>()).Count > 0)
{
DoSomething(steps);
batch++;
}
}
Related
I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.
I need to find the number of documents that are in the raven database , so that I can properly page the documents out. I had the following implementation -
public int Getcount<T>()
{
IQueryable<T> queryable = from p in _session.Query<T>().Customize(x =>x.WaitForNonStaleResultsAsOfLastWrite())
select p;
return queryable.Count();
}
But if the count is too large then it times out.
I tried the method suggested in FAQs -
public int GetCount<T>()
{
//IQueryable<T> queryable = from p in _session.Query<T>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite())
// select p;
//return queryable.Count();
RavenQueryStatistics stats;
var results = _session.Query<T>()
.Statistics(out stats);
return stats.TotalResults;
}
This always returns 0.
What am I doing wrong?
stats.TotalResults is 0 because the query was never executed. Try this instead:
var results = _session
.Query<T>()
.Statistics(out stats)
.Take(0)
.ToArray();
The strange syntax to get the statistics tripped me up as well. I can see why the query needs to be run in order to populate the statistic object but the syntax is a bit verbose imo.
I have written the following extension method for use in my unit tests. It helps keep the code terse.
Extension Method
public static int QuickCount<T>(this IRavenQueryable<T> results)
{
RavenQueryStatistics stats;
results.Statistics(out stats).Take(0).ToArray();
return stats.TotalResults;
}
Unit Test
...
db.Query<T>().QuickCount().ShouldBeGreaterThan(128);
...
I have the following test for skip take -
[Test]
public void RavenPagingBehaviour()
{
const int count = 2048;
var eventEntities = PopulateEvents(count);
PopulateEventsToRaven(eventEntities);
using (var session = Store.OpenSession(_testDataBase))
{
var queryable =
session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
var entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
eventEntity.Key = "Modified";
}
session.SaveChanges();
queryable = session.Query<EventEntity>().Customize(x => x.WaitForNonStaleResultsAsOfLastWrite()).Skip(0).Take(1024);
entities = queryable.ToArray();
foreach (var eventEntity in entities)
{
Assert.AreEqual(eventEntity.Key, "Modified");
}
}
}
PopulateEventsToRaven simply adds 2048 very simple documents to the database.
The first skip take combination gets the first 1024 doucuments modifies the documents and then commits changes.
The next skip take combination again wants to get the first 1024 documents but this time it gets the document number 1024 to 2048 and hence fails the test. Why is this , I would expect the first 1024 again?
Edit: I have varified that if I dont modify the documents the behaviour is fine.
The problem is that you don't specify an order by, and that means that RavenDB is free to choose with items to return, those aren't necessarily going to be the same items that it returned in the previous call.
Use an OrderBy and it will be consistent.
I want to fetch Hierarchical/Tree data something like below from a Table which has following definiton.
Tree Table:
"""""""""""
Id |ParentId
"""""""""""
Work1|null
Work2|Work1
Work3|Work2
...
Required Query result Data (no need to be tabbed)- If I Pick 'Work1' I should complete Ids which are under its root something like below. If I pick 'Work2' then also I should complete Ids above and below its root.
> Work1
----------
> Work2
----------
> Work3
---------
What is the best way in NHibernate to fetch data in the above scenario in optimized manner.
To find out what the "best way" is, more information regarding the actual scenario would be needed. What kind of "optimization" are you looking for? Minimal amount of data (only the rows you are really going to need) or minimal number of SQL queries (preferably one roundtrip to the database) or any other?
Scenario 1: Menu or tree structure that is loaded once and kept in memory for longer periods of time (not a list that updates every few seconds). Small number of rows in the table (small is relative but I'd say anything below 200).
In this case I would just get the whole table with one query like this:
var items = session.Query<Work>()
.Fetch(c => c.ParentWork)
.Fetch(c => c.ChildWorks).ToList();
var item = session.Get<Work>(id);
This will result in a single SQL query which simply loads all the rows from the table. item will contain the complete tree (parents, grandparents, children, etc.).
Scenario 2: Large number of rows and only a fraction of rows needed. Only few levels in the hierarchy are to be expected.
In this case, just load the item and let NHibernate to the rest with lazy loading or force it to load everything by writing a recursive method to traverse parents and children. This will cause a N+1 select, which may or may not be slower than scenario 1 (depending on your data).
Here is a quick hack demonstrating this:
var item = session.Get<Work>(id);
Work parent = item.ParentWork;
Work root = item;
// find the root item
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// scan the whole tree
this.ScanChildren(root);
// -----
private void ScanChildren(Work item)
{
if (item == null)
{
return;
}
foreach (Work child in item.ChildWorks)
{
string name = child.Name;
this.ScanChildren(child);
}
}
Edit:
Scenario 3: Huge amount of data. Minimal number of queries and minimal amount of data.
In this case, I would think not of a tree structure but of having layers of data that we load one after another.
var work = repo.Session.Get<Work>(id);
// get root of that Work
Work parent = work.ParentWork;
Work root = work;
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// Get all the Works for each level
IList<Work> worksAll = new List<Work>() { root };
IList<Work> worksPerLevel = new List<Work>() { root };
// get each level until we don't have any more Works in the next level
int count = worksPerLevel.Count;
while (count > 0)
{
worksPerLevel = this.GetChildren(session, worksPerLevel);
// add the Works to our list of all Works
worksPerLevel.ForEach(c => worksAll.Add(c));
count = worksPerLevel.Count;
}
// here you can get the names of the Works or whatever
foreach (Work c in worksAll)
{
string s = c.Name;
}
// this methods gets the Works in the next level and returns them
private IList<Work> GetChildren(ISession session, IList<Work> worksPerLevel)
{
IList<Work> result = new List<Work>();
// get the IDs for the works in this level
IList<int> ids = worksPerLevel.Select(c => c.Id).ToList();
// use a WHERE IN clause do get the Works
// with the ParentId of Works in the current level
result = session.QueryOver<Work>()
.Where(
NHibernate.Criterion.Restrictions.InG<int>(
NHibernate.Criterion.Projections.Property<Work>(
c => c.ParentWork.Id),
ids)
)
.Fetch(c => c.ChildWorks).Eager // this will prevent the N+1 problem
.List();
return result;
}
This solution will not cause a N+1 problem, because we use an eager load for the children, so NHibernate will know the state of the child lists and not hit the DB again. You will only get x+y selects, where x is the number of selects to find the root Work and y is the number of levels (max depth of he tree).
Can I use a Criteria to execute a t-sql command to select the max value for a column in a table?
'select #cus_id = max(id) + 1 from customers'
Ta
Ollie
Use Projection:
session.CreateCriteria(typeof(Customer))
.SetProjection( Projections.Max("Id") )
. UniqueResult();
Max(id) + 1 is a very bad way to generate ids. If that's your goal, find another way to generate ids.
Edit: in answer to LnDCobra:
it's bad because it's hard to make sure that the max(id) you got is still the max(id) when you do the insert. If another process inserts a row, your insert will have the same id, and your insert will fail. (Or, conversely, the other process's insert will fail if your insert happened first.)
To prevent this, you have to prevent any other inserts/make your get and subsequent insert atomic, which generally means locking the table, which will hurt performance.
If you only lock against writes, the other process gets max(id), which is the same max(id) you got. You do your insert and release the lock, it inserts a duplicate id and fails. Or it tries to lock too, in which case it waits on you. If you lock against reads too, everybody waits on you. If it locks against writes also, then it doesn't insert the duplicate id, but it does wait on your read and your write.
(And it breaks encapsulation: you should let the rdbms figure out its ids, not the client programs that connect to it.)
Generally, this strategy will either:
* break
* require a bunch of "plumbing" code to make it work
* significantly reduce performance
* or all three
and it will be slower, less robust, and require more hard to maintain code than just using the RDBMS's built in sequences or generated autoincrement ids.
Best approach is to make additional Sequences table.
Where you can maintain sequence target and value.
public class Sequence : Entity
{
public virtual long? OwnerId { get; set; }
public virtual SequenceTarget SequenceTarget { get; set; }
public virtual bool IsLocked { get; set; }
public virtual long Value { get; set; }
public void GenerateNextValue()
{
Value++;
}
}
public class SequenceTarget : Entity
{
public virtual string Name { get; set; }
}
public long GetNewSequenceValueForZZZZ(long ZZZZId)
{
var target =
Session
.QueryOver<SequenceTarget>()
.Where(st => st.Name == "DocNumber")
.SingleOrDefault();
if (target == null)
{
throw new EntityNotFoundException(typeof(SequenceTarget));
}
return GetNewSequenceValue(ZZZZId, target);
}
protected long GetNewSequenceValue(long? ownerId, SequenceTarget target)
{
var seqQry =
Session
.QueryOver<Sequence>()
.Where(seq => seq.SequenceTarget == target);
if (ownerId.HasValue)
{
seqQry.Where(seq => seq.OwnerId == ownerId.Value);
}
var sequence = seqQry.SingleOrDefault();
if (sequence == null)
{
throw new EntityNotFoundException(typeof(Sequence));
}
// re-read sequence, if it was in session
Session.Refresh(sequence);
// update IsLocked field, so we acuire lock on record
// configure dynamic update , so only 1 field is being updated
sequence.IsLocked = !sequence.IsLocked;
Session.Update(sequence);
// force update to db
Session.Flush();
// now we gained block - re-read record.
Session.Refresh(sequence);
// generate new value
sequence.GenerateNextValue();
// set back dummy filed
sequence.IsLocked = !sequence.IsLocked;
// update sequence & force changes to DB
Session.Update(sequence);
Session.Flush();
return sequence.Value;
}
OwnerId - when you need to maintain different sequences for same entity, based on some kind of owner. For example you need to maintain numbering for document within contract, then OwnerId will be = contractId