Best way to fetch tree data in NHibernate - nhibernate

I want to fetch Hierarchical/Tree data something like below from a Table which has following definiton.
Tree Table:
"""""""""""
Id |ParentId
"""""""""""
Work1|null
Work2|Work1
Work3|Work2
...
Required Query result Data (no need to be tabbed)- If I Pick 'Work1' I should complete Ids which are under its root something like below. If I pick 'Work2' then also I should complete Ids above and below its root.
> Work1
----------
> Work2
----------
> Work3
---------
What is the best way in NHibernate to fetch data in the above scenario in optimized manner.

To find out what the "best way" is, more information regarding the actual scenario would be needed. What kind of "optimization" are you looking for? Minimal amount of data (only the rows you are really going to need) or minimal number of SQL queries (preferably one roundtrip to the database) or any other?
Scenario 1: Menu or tree structure that is loaded once and kept in memory for longer periods of time (not a list that updates every few seconds). Small number of rows in the table (small is relative but I'd say anything below 200).
In this case I would just get the whole table with one query like this:
var items = session.Query<Work>()
.Fetch(c => c.ParentWork)
.Fetch(c => c.ChildWorks).ToList();
var item = session.Get<Work>(id);
This will result in a single SQL query which simply loads all the rows from the table. item will contain the complete tree (parents, grandparents, children, etc.).
Scenario 2: Large number of rows and only a fraction of rows needed. Only few levels in the hierarchy are to be expected.
In this case, just load the item and let NHibernate to the rest with lazy loading or force it to load everything by writing a recursive method to traverse parents and children. This will cause a N+1 select, which may or may not be slower than scenario 1 (depending on your data).
Here is a quick hack demonstrating this:
var item = session.Get<Work>(id);
Work parent = item.ParentWork;
Work root = item;
// find the root item
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// scan the whole tree
this.ScanChildren(root);
// -----
private void ScanChildren(Work item)
{
if (item == null)
{
return;
}
foreach (Work child in item.ChildWorks)
{
string name = child.Name;
this.ScanChildren(child);
}
}
Edit:
Scenario 3: Huge amount of data. Minimal number of queries and minimal amount of data.
In this case, I would think not of a tree structure but of having layers of data that we load one after another.
var work = repo.Session.Get<Work>(id);
// get root of that Work
Work parent = work.ParentWork;
Work root = work;
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// Get all the Works for each level
IList<Work> worksAll = new List<Work>() { root };
IList<Work> worksPerLevel = new List<Work>() { root };
// get each level until we don't have any more Works in the next level
int count = worksPerLevel.Count;
while (count > 0)
{
worksPerLevel = this.GetChildren(session, worksPerLevel);
// add the Works to our list of all Works
worksPerLevel.ForEach(c => worksAll.Add(c));
count = worksPerLevel.Count;
}
// here you can get the names of the Works or whatever
foreach (Work c in worksAll)
{
string s = c.Name;
}
// this methods gets the Works in the next level and returns them
private IList<Work> GetChildren(ISession session, IList<Work> worksPerLevel)
{
IList<Work> result = new List<Work>();
// get the IDs for the works in this level
IList<int> ids = worksPerLevel.Select(c => c.Id).ToList();
// use a WHERE IN clause do get the Works
// with the ParentId of Works in the current level
result = session.QueryOver<Work>()
.Where(
NHibernate.Criterion.Restrictions.InG<int>(
NHibernate.Criterion.Projections.Property<Work>(
c => c.ParentWork.Id),
ids)
)
.Fetch(c => c.ChildWorks).Eager // this will prevent the N+1 problem
.List();
return result;
}
This solution will not cause a N+1 problem, because we use an eager load for the children, so NHibernate will know the state of the child lists and not hit the DB again. You will only get x+y selects, where x is the number of selects to find the root Work and y is the number of levels (max depth of he tree).

Related

How do perform a graph query and join?

I apologize for the title, I don't exactly know how to word it. But essentially, this is a graph-type query but I know RavenDB's graph functionality will be going away so this probably needs to be solved with Javascript.
Here is the scenario:
I have a bunch of documents of different types, call them A, B, C, D. Each of these particular types of documents have some common properties. The one that I'm interested in right now is "Owner". The owner field is an ID which points to one of two other document types; it can be a Group or a User.
The Group document has a 'Members' field which contains an ID which either points to a User or another Group. Something like this
It's worth noting that the documents in play have custom IDs that begin with their entity type. For example Users and Groups begin with user: and group: respectively. Example IDs look like this: user:john#castleblack.com or group:the-nights-watch. This comes into play later.
What I want to be able to do is the following type of query:
"Given that I have either a group id or a user id, return all documents of type a, b, or c where the group/user id is equal to or is a descendant of the document's owner."
In other words, I need to be able to return all documents that are owned by a particular user or group either explicitly or implicitly through a hierarchy.
I've considered solving this a couple different ways with no luck. Here are the two approaches I've tried:
Using a function within a query
With Dejan's help in an email thread, I was able to devise a function that would walk it's way down the ownership graph. What this attempted to do was build a flat array of IDs which represented explicit and implicit owners (i.e. root + descendants):
declare function hierarchy(doc, owners){
owners = owners || [];
while(doc != null) {
let ownerId = id(doc)
if(ownerId.startsWith('user:')) {
owners.push(ownerId);
} else if(ownerId.startsWith('group:')) {
owners.push(ownerId);
doc.Members.forEach(m => {
let owner = load(m, 'Users') || load(m, 'Groups');
owners = hierarchy(owner, owners);
});
}
}
return owners;
}
I had two issues with this. 1. I don't actually know how to use this in a query lol. I tried to use it as part of the where clause but apparently that's not allowed:
from #all_docs as d
where hierarchy(d) = 'group:my-group-d'
// error: method hierarchy not allowed
Or if I tried anything in the select statement, I got an error that I have exceeded the number of allowed statements.
As a custom index
I tried the same idea through a custom index. Essentially, I tried to create an index that would produce an array of IDs using roughly the same function above, so that I could just query where my id was in that array
map('#all_docs', function(doc) {
function hierarchy(n, graph) {
while(n != null) {
let ownerId = id(n);
if(ownerId.startsWith('user:')) {
graph.push(ownerId);
return graph;
} else if(ownerId.startsWith('group:')){
graph.push(ownerId);
n.Members.forEach(g => {
let owner = load(g, 'Groups') || load(g, 'Users');
hierarchy(owner, graph);
});
return graph;
}
}
}
function distinct(value, index, self){ return self.indexOf(value) === index; }
let ownerGraph = []
if(doc.Owner) {
let owner = load(doc.Owner, 'Groups') || load(doc.Owner, 'Users');
ownerGraph = hierarchy(owner, ownerGraph).filter(distinct);
}
return { Owners: ownerGraph };
})
// error: recursion is not allowed by the javascript host
The problem with this is that I'm getting an error that recursion is not allowed.
So I'm stumped now. Am I going about this wrong? I feel like this could be a subquery of sorts or a filter by function, but I'm not sure how to do that either. Am I going to have to do this in two separate queries (i.e. two round-trips), one to get the IDs and the other to get the docs?
Update 1
I've revised my attempt at the index to the following and I'm not getting the recursion error anymore, but assuming my queries are correct, it's not returning anything
// Entity/ByOwnerGraph
map('#all_docs', function(doc) {
function walkGraph(ownerId) {
let owners = []
let idsToProcess = [ownerId]
while(idsToProcess.length > 0) {
let current = idsToProcess.shift();
if(current.startsWith('user:')){
owners.push(current);
} else if(current.startsWith('group:')) {
owners.push(current);
let group = load(current, 'Groups')
if(!group) { continue; }
idsToProcess.concat(group.Members)
}
}
return owners;
}
let owners = [];
if(doc.Owner) {
owners.concat(walkGraph(doc.Owner))
}
return { Owners: owners };
})
// query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners = "group:my-group-id"
// alternate query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners ALL IN ("group:my-group-id")
I still can't use this approach in a query either as I get the same error that there are too many statements.

Entity Framework, update multiple fields more efficiently

Using Entity Framework, I am updating about 300 rows, and 9 columns about every 30 seconds. Below is how I am currently doing it. My question is, how can I make the code more efficient?
Every once in a while, I feel my database gets hit with the impact and I just want to make it as efficient as possible.
// FOREACH OF MY 300 ROWS
var original = db.MarketDatas.FirstOrDefault(x => x.BBSymbol == targetBBsymbol);
if (original != null)
{
//if (original.BBSymbol.ToUpper() == "NOH7 INDEX")
//{
// var x1 = 1;
//}
original.last_price = marketDataItem.last_price;
original.bid = marketDataItem.bid;
original.ask = marketDataItem.ask;
if (marketDataItem.px_settle_last_dt_rt != null)
{
original.px_settle_last_dt_rt = marketDataItem.px_settle_last_dt_rt;
}
if (marketDataItem.px_settle_actual_rt != 0)
{
original.px_settle_actual_rt = marketDataItem.px_settle_actual_rt;
}
original.chg_on_day = marketDataItem.chg_on_day;
if (marketDataItem.prev_close_value_realtime != 0)
{
original.prev_close_value_realtime = marketDataItem.prev_close_value_realtime;
}
if (marketDataItem.px_settle_last_dt_rt != null)
{
DateTime d2 = (DateTime)marketDataItem.px_settle_last_dt_rt;
if (d1.Day == d2.Day)
{
//market has settled
original.settled = "yes";
}
else
{
//market has NOT settled
original.settled = "no";
}
}
if (marketDataItem.updateTime.Year != 1)
{
original.updateTime = marketDataItem.updateTime;
}
db.SaveChanges();
}
Watching what is being hit in the debugger...
SELECT TOP (1)
[Extent1].[MarketDataID] AS [MarketDataID],
[Extent1].[BBSymbol] AS [BBSymbol],
[Extent1].[Name] AS [Name],
[Extent1].[fut_Val_Pt] AS [fut_Val_Pt],
[Extent1].[crncy] AS [crncy],
[Extent1].[fut_tick_size] AS [fut_tick_size],
[Extent1].[fut_tick_val] AS [fut_tick_val],
[Extent1].[fut_init_spec_ml] AS [fut_init_spec_ml],
[Extent1].[last_price] AS [last_price],
[Extent1].[bid] AS [bid],
[Extent1].[ask] AS [ask],
[Extent1].[px_settle_last_dt_rt] AS [px_settle_last_dt_rt],
[Extent1].[px_settle_actual_rt] AS [px_settle_actual_rt],
[Extent1].[settled] AS [settled],
[Extent1].[chg_on_day] AS [chg_on_day],
[Extent1].[prev_close_value_realtime] AS [prev_close_value_realtime],
[Extent1].[last_tradeable_dt] AS [last_tradeable_dt],
[Extent1].[fut_notice_first] AS [fut_notice_first],
[Extent1].[updateTime] AS [updateTime]
FROM [dbo].[MarketDatas] AS [Extent1]
WHERE ([Extent1].[BBSymbol] = #p__linq__0) OR (([Extent1].[BBSymbol] IS NULL) AND (#p__linq__0 IS NULL))
It seems it updates the same thing multiple times, if I am understanding it correctly.
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)
UPDATE [dbo].[MarketDatas]
SET [last_price] = #0, [chg_on_day] = #1, [updateTime] = #2
WHERE ([MarketDataID] = #3)
You can reduce this to 2 round trips.
Don't call SaveChanges() in side the loop. Move it outside and call it after you are done processing everything.
Write the select in such a way that it retrieves all the originals in one go and pushes them to a memory collection, then retrieve from that for each item you are updating/inserting.
code
// use this as your source
// to retrieve an item later use TryGetValue
var originals = db.MarketDatas
.Where(x => arrayOftargetBBsymbol.Contains(x.BBSymbol));
.ToDictionary(x => x.BBSymbol, y => y);
// iterate over changes you want to make
foreach(var change in changes){
MarketData original = null;
// is there an existing entity
if(originals.TryGetValue(change.targetBBsymbol, out original)){
// update your original
}
}
// save changes all at once
db.SaveChanges();
You could only execute "db.SaveChanges" after your foreach loop. It think it you would do exactly what your are asking for.
It seems it updates the same thing multiple times, if I am
understanding it correctly.
Entity Framework performs a database round-trip for every entity to update.
Just check the parameter value, they will be different.
how can I make the code more efficient
The major problem is your current solution is not scalable.
It works well when you only have a few entities to update but will become worse and worse are the number of items to update in a batch will increase.
It's often better to make this kind of logic all in the database, but perhaps you cannot do it.
Disclaimer: I'm the owner of the project Entity Framework Extensions
This library can make your code more efficient by allowing you to save multiples entities at once. All bulk operations are supported:
BulkSaveChanges
BulkInsert
BulkUpdate
BulkDelete
BulkMerge
BulkSynchronize
Example:
// Easy to use
context.BulkSaveChanges();
// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);
// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);
// Customize Primary Key
context.BulkMerge(customers, operation => {
operation.ColumnPrimaryKeyExpression =
customer => customer.Code;
});

RavenDB : Use "Search" only if given within one query

I have a situation where my user is presented with a grid, and it, by default, will just get the first 15 results. However they may type in a name and search for an item across all pages.
Alone, either of these works fine, but I am trying to figure out how to make them work as a single query. This is basically what it looks like...
// find a filter if the user is searching
var filters = request.Filters.ToFilters();
// determine the name to search by
var name = filters.Count > 0 ? filters[0] : null;
// we need to be able to catch some query statistics to make sure that the
// grid view is complete and accurate
RavenQueryStatistics statistics;
// try to query the items listing as quickly as we can, getting only the
// page we want out of it
var items = RavenSession
.Query<Models.Items.Item>()
.Statistics(out statistics) // output our query statistics
.Search(n => n.Name, name)
.Skip((request.Page - 1) * request.PageSize)
.Take(request.PageSize)
.ToArray();
// we need to store the total results so that we can keep the grid up to date
var totalResults = statistics.TotalResults;
return Json(new { data = items, total = totalResults }, JsonRequestBehavior.AllowGet);
The problem is that if no name is given, it does not return anything; Which is not the desired result. (Searching by 'null' doesn't work, obviously.)
Do something like this:
var q= RavenSession
.Query<Models.Items.Item>()
.Statistics(out statistics); // output our query statistics
if(string.IsNullOrEmpty(name) == false)
q = q.Search(n => n.Name, name);
var items = q.Skip((request.Page - 1) * request.PageSize)
.Take(request.PageSize)
.ToArray();

How to query an index with a subcollection that has date ranges in RavenDB?

I prepared the full test case here: https://gist.github.com/pkrakowiak/cc8addf5725193a01f2d
There are Location documents. Each location can have zero or more sponsors during some time periods (represented by the IList<Sponsorship> Sponsors property). I need to return only those locations that are sponsored on a particular day (say 15th of March in my example). So such location must have at least one Sponsorship instance that matches the following query: .Where(x => x.Sponsors.Any(s => s.From <= today && s.To >= today))
I prepared two tests, one is not using an index explicitly: CanGetCurrentlySponsoredLocations, and one which uses a static index that I created: CanGetCurrentlySponsoredLocationsUsingStaticIndex. The first one will pass, the second one will fail. The question is - how do I make the second test pass? What sort of modifications do I need to apply to my Locations_ByCoordinates index?
In case you are wondering where the index name came from or what the reviews are - just ignore them. :) They are leftovers from other things that I was testing.
Update
I took this question first to the official RavenDB Google group: https://groups.google.com/forum/?fromgroups=#!topic/ravendb/ySUPXqkpTA8 Sadly, it did not bring me a solution.
The simplest index that will pass your unit test is:
private class Locations_ByCoordinates : AbstractIndexCreationTask<Location>
{
public Locations_ByCoordinates()
{
Map = locations => from l in locations
from s in l.Sponsors
select new
{
Sponsors_From = s.From,
Sponsors_To = s.To
};
}
}
You might want to pick a better name, since the coordinates aren't indexed.
I'm not sure what your other test CanSortOnSponsorshipStatus is all about though.
UPDATE
To include locations that have no sponsors, use the DefaultIfEmpty linq extension method. This will make sure that all locations have at least one index entry.
private class Locations_ByCoordinates : AbstractIndexCreationTask<Location>
{
public Locations_ByCoordinates()
{
Map = locations => from l in locations
from s in l.Sponsors
.DefaultIfEmpty(new Sponsorship
{
From = DateTime.MinValue,
To = DateTime.MaxValue
})
select new
{
Sponsors_From = s.From,
Sponsors_To = s.To
};
}
}

Having difficulty with specific Future<T> query and collections with nHibernate.

For the sake of example, I am removing non-queried and non-essential data just to figure out how to do the initial query here.
I have a model structure like this.
class Path {
Guid Id { get; protected set; }
IList<Step> Steps { get; set; }
void AddStep(Step entity) {
// write up bidirectional association
}
}
class Step {
Guid Id { get; protected set; }
Path Path { get; set; }
// other data irreleveent
}
Now assuming 50000 steps, each with 5000 steps... I do realize I don't want to return all of them at once. But putting a limit on my query fetch isn't my real problem.
Here is the exact query I am attempting to use. I am getting the exception..
NHibernate.QueryException : duplicate alias: lpStep
----> System.ArgumentException : An item with the same key has already been added.
I'm not entirely sure how to handle this scenario. if I use a flat out Fetch on the Path query, I get Select+N errors from the NHibernate Profiler.
I do have batching enabled - but as far as I am aware, that only really applies to inserts, not retrievals. But in any case I am getting back these errors and not sure how to handle it. Any ideas?
using (var Transaction = Session.BeginTransaction()) {
Path lpPath = null;
Step lpStep = null;
var lpPaths = Session.QueryOver<Path>(() => lpPath)
.Take(50)
.Future<Path>();
var lpSteps = Session.QueryOver<Step>(() => lpStep)
.JoinAlias(() => lpPath.Steps, () => lpStep)
.Where(o => o.Path.Id == lpPath.Id)
.Take(12)
.Future<Step>();
Transaction.Commit();
foreach (var path in lpPaths) {
Console.WriteLine("{0} fetched {1} Steps",
path.Id, path.Steps.Count);
}
}
I basically want to say ..
Select (50) Paths, also, as a separate select but part of the same trip, Select the first (12) Steps that belong the previously selected Paths.
But if I use a flat out join, I get 110 rows, whereas I expect to have 2 tables, 1 of 50 rows, 1 of 600 rows.
Can someone explain to me what I am doing wrong?
mind you, I can do some minor alterations and the query runs, but it isn't 'optimized'. I can get the data I want, but it takes multiple trips and lazy loading. I can optimize the actual Path selection easily enough but it is those blasted Steps. If I just take a restrictive where clause out of the lpSteps query, it just returns the first 12 steps, not returning 12 steps for each query done.
I've looked at some of the other stack overflow posts on Future<T> and found them to look a lot like this. So I don't understand why it isn't working. I suspect that what is going on is this..
lpPaths runs.
lpSteps tries to run, first one succeeds.
lpSteps then tries to run again, finds it cannot redefine lpPaths.
Apocolypse
I'm really hoping someone smarter than me can enlighten me on the absolute most optimal way to write this.
i cant really understand what your use case is. why do you only need the first 12 Steps of each Path? What about batches of Steps to process
IList<Guid> pathIds;
while ((pathIds = QueryOver.For<Path>()
.Where(...)
.Projection(path => path.Id)
.SetmaxResults(100)).Count > 0)
{
int batch = 0;
const int batchsize = 600;
IList<Step> steps;
while ((steps = Session.QueryOver<Step>()
.Where(step => step.Path.Id).In(pathIds)
.Where(step => step. ...)
.SetFirstResult(batch * batchsize)
.Take(batchsize)
.List<Step>()).Count > 0)
{
DoSomething(steps);
batch++;
}
}