How do perform a graph query and join? - ravendb

I apologize for the title, I don't exactly know how to word it. But essentially, this is a graph-type query but I know RavenDB's graph functionality will be going away so this probably needs to be solved with Javascript.
Here is the scenario:
I have a bunch of documents of different types, call them A, B, C, D. Each of these particular types of documents have some common properties. The one that I'm interested in right now is "Owner". The owner field is an ID which points to one of two other document types; it can be a Group or a User.
The Group document has a 'Members' field which contains an ID which either points to a User or another Group. Something like this
It's worth noting that the documents in play have custom IDs that begin with their entity type. For example Users and Groups begin with user: and group: respectively. Example IDs look like this: user:john#castleblack.com or group:the-nights-watch. This comes into play later.
What I want to be able to do is the following type of query:
"Given that I have either a group id or a user id, return all documents of type a, b, or c where the group/user id is equal to or is a descendant of the document's owner."
In other words, I need to be able to return all documents that are owned by a particular user or group either explicitly or implicitly through a hierarchy.
I've considered solving this a couple different ways with no luck. Here are the two approaches I've tried:
Using a function within a query
With Dejan's help in an email thread, I was able to devise a function that would walk it's way down the ownership graph. What this attempted to do was build a flat array of IDs which represented explicit and implicit owners (i.e. root + descendants):
declare function hierarchy(doc, owners){
owners = owners || [];
while(doc != null) {
let ownerId = id(doc)
if(ownerId.startsWith('user:')) {
owners.push(ownerId);
} else if(ownerId.startsWith('group:')) {
owners.push(ownerId);
doc.Members.forEach(m => {
let owner = load(m, 'Users') || load(m, 'Groups');
owners = hierarchy(owner, owners);
});
}
}
return owners;
}
I had two issues with this. 1. I don't actually know how to use this in a query lol. I tried to use it as part of the where clause but apparently that's not allowed:
from #all_docs as d
where hierarchy(d) = 'group:my-group-d'
// error: method hierarchy not allowed
Or if I tried anything in the select statement, I got an error that I have exceeded the number of allowed statements.
As a custom index
I tried the same idea through a custom index. Essentially, I tried to create an index that would produce an array of IDs using roughly the same function above, so that I could just query where my id was in that array
map('#all_docs', function(doc) {
function hierarchy(n, graph) {
while(n != null) {
let ownerId = id(n);
if(ownerId.startsWith('user:')) {
graph.push(ownerId);
return graph;
} else if(ownerId.startsWith('group:')){
graph.push(ownerId);
n.Members.forEach(g => {
let owner = load(g, 'Groups') || load(g, 'Users');
hierarchy(owner, graph);
});
return graph;
}
}
}
function distinct(value, index, self){ return self.indexOf(value) === index; }
let ownerGraph = []
if(doc.Owner) {
let owner = load(doc.Owner, 'Groups') || load(doc.Owner, 'Users');
ownerGraph = hierarchy(owner, ownerGraph).filter(distinct);
}
return { Owners: ownerGraph };
})
// error: recursion is not allowed by the javascript host
The problem with this is that I'm getting an error that recursion is not allowed.
So I'm stumped now. Am I going about this wrong? I feel like this could be a subquery of sorts or a filter by function, but I'm not sure how to do that either. Am I going to have to do this in two separate queries (i.e. two round-trips), one to get the IDs and the other to get the docs?
Update 1
I've revised my attempt at the index to the following and I'm not getting the recursion error anymore, but assuming my queries are correct, it's not returning anything
// Entity/ByOwnerGraph
map('#all_docs', function(doc) {
function walkGraph(ownerId) {
let owners = []
let idsToProcess = [ownerId]
while(idsToProcess.length > 0) {
let current = idsToProcess.shift();
if(current.startsWith('user:')){
owners.push(current);
} else if(current.startsWith('group:')) {
owners.push(current);
let group = load(current, 'Groups')
if(!group) { continue; }
idsToProcess.concat(group.Members)
}
}
return owners;
}
let owners = [];
if(doc.Owner) {
owners.concat(walkGraph(doc.Owner))
}
return { Owners: owners };
})
// query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners = "group:my-group-id"
// alternate query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners ALL IN ("group:my-group-id")
I still can't use this approach in a query either as I get the same error that there are too many statements.

Related

Select from IndexedDB database, using multiple indexes

Each of my IndexedDB objects have the following format:
object:
{
objectName: "someName"
objectTest: "someTest"
objectAge: "someAge"
}
Then I have the following indexes already set:
storeObject.createIndex( "by_objectName", "objectName", { unique: false } );
storeObject.createIndex( "by_objectTest", "objectTest", { unique: false } );
storeObject.createIndex( "by_objectAge", "objectAge", { unique: false } );
So first I wanted to loop through all my objects by objectName (this is working):
var openedIndex = storeObject.index("by_objectName");
var numItemsInIndex = openedIndex.count();
if (openedIndex) {
var curCursor = openedIndex.openCursor();
curCursor.onsuccess = function(evt) {
var cursor = evt.target.result;
if (cursor) {
//do something
cursor.continue();
}
}
}
So the above code is taking all the objects and they are sorted by objectName.
How can I take all the objects that have objectTest: "certainValue" and the sorting by objectName to remain the same as in the above example. I need to filter the list of result before the line if (openedIndex) { because I need to use the numItemsInIndex later in the loop.
So in other words, if this was relational database, how to implement:
SELECT * FROM objects WHERE objectTest = "certainValue" SORT BY objectName
You can create an index on two properties at once by defining the keyPath parameter to createIndex as an array. Use the property you wish to sort by as the first item in the array. For example, see my other posts, like this one: In IndexedDB, is there a way to make a sorted compound query?

RavenDB : Use "Search" only if given within one query

I have a situation where my user is presented with a grid, and it, by default, will just get the first 15 results. However they may type in a name and search for an item across all pages.
Alone, either of these works fine, but I am trying to figure out how to make them work as a single query. This is basically what it looks like...
// find a filter if the user is searching
var filters = request.Filters.ToFilters();
// determine the name to search by
var name = filters.Count > 0 ? filters[0] : null;
// we need to be able to catch some query statistics to make sure that the
// grid view is complete and accurate
RavenQueryStatistics statistics;
// try to query the items listing as quickly as we can, getting only the
// page we want out of it
var items = RavenSession
.Query<Models.Items.Item>()
.Statistics(out statistics) // output our query statistics
.Search(n => n.Name, name)
.Skip((request.Page - 1) * request.PageSize)
.Take(request.PageSize)
.ToArray();
// we need to store the total results so that we can keep the grid up to date
var totalResults = statistics.TotalResults;
return Json(new { data = items, total = totalResults }, JsonRequestBehavior.AllowGet);
The problem is that if no name is given, it does not return anything; Which is not the desired result. (Searching by 'null' doesn't work, obviously.)
Do something like this:
var q= RavenSession
.Query<Models.Items.Item>()
.Statistics(out statistics); // output our query statistics
if(string.IsNullOrEmpty(name) == false)
q = q.Search(n => n.Name, name);
var items = q.Skip((request.Page - 1) * request.PageSize)
.Take(request.PageSize)
.ToArray();

SQL to Magento model understanding

Understanding Magento Models by reference of SQL:
select * from user_devices where user_id = 1
select * from user_devices where device_id = 3
How could I perform the same using my magento models? getModel("module/userdevice")
Also, how can I find the number of rows for each query
Following questions have been answered in this thread.
How to perform a where clause ?
How to retrieve the size of the result set ?
How to retrieve the first item in the result set ?
How to paginate the result set ? (limit)
How to name the model ?
You are referring to Collections
Some references for you:
http://www.magentocommerce.com/knowledge-base/entry/magento-for-dev-part-5-magento-models-and-orm-basics
http://alanstorm.com/magento_collections
http://www.magentocommerce.com/wiki/1_-_installation_and_configuration/using_collections_in_magento
lib/varien/data/collection/db.php and lib/varien/data/collection.php
So, assuming your module is set up correctly, you would use a collection to retrieve multiple objects of your model type.
Syntax for this is:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
Magento has provided some great features for developers to use with collections. So your example above is very simple to achieve:
$yourCollection = Mage::getModel('module/userdevice')->getCollection()
->addFieldToFilter('user_id', 1)
->addFieldToFilter('device_id', 3);
You can get the number of objects returned:
$yourCollection->count() or simply count($yourCollection)
EDIT
To answer the question posed in the comment: "what If I do not require a collection but rather just a particular object"
This depends if you still require both conditions in the original question to be satisfied or if you know the id of the object you wish to load.
If you know the id of the object then simply:
Mage::getModel('module/userdevice')->load($objectId);
but if you wish to still load based on the two attributes:
user_id = 1
device_id = 3
then you would still use a collection but simply return the first object (assuming that only one object could only ever satisfy both conditions).
For reuse, wrap this logic in a method and place in your model:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
->setCurPage(1)
->setPageSize(1)
;
foreach ($collection as $obj) {
return $obj;
}
return false;
}
You would call this as follows:
$userId = 1;
$deviceId = 3;
Mage::getModel('module/userdevice')->loadByUserDevice($userId, $deviceId);
NOTE:
You could shorten the loadByUserDevice to the following, though you would not get the benefit of the false return value should no object be found:
public function loadByUserDevice($userId, $deviceId)
{
$collection = $this->getResourceCollection()
->addFieldToFilter('user_id', $userId)
->addFieldToFilter('device_id', $deviceId)
;
return $collection->getFirstItem();
}

Proper Way to Retrieve More than 128 Documents with RavenDB

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...
It was my understanding that one could retrieve more documents than the 128 default setting by doing this:
session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;
And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:
public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
using (IDocumentSession session = GetRavenSession())
{
return session.Query<T>().Where(whereClause).ToList();
}
}
However, that only returns 128 documents. Why?
Note, here is the code that calls the above method:
RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);
If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:
return session.Query<T>().Where(whereClause).Take(200).ToList();
Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?
For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.
** EDIT **
I tried using int.MaxValue in my Take():
return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();
And that returns 1024. Argh. How do I get more than 1024?
** EDIT 2 - Sample document showing data **
{
"Header_ID": 3525880,
"Sub_ID": "120403261139",
"TimeStamp": "2012-04-05T15:14:13.9870000",
"Equipment_ID": "PBG11A-CCM",
"AverageAbsorber1": "284.451",
"AverageAbsorber2": "108.442",
"AverageAbsorber3": "886.523",
"AverageAbsorber4": "176.773"
}
It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:
var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
while (enumerator.MoveNext())
{
User activeUser = enumerator.Current.Document;
}
}
There is support for standard RavenDB queries, Lucence queries and there is also async support.
The documentation can be found here. Ayende's introductory blog article can be found here.
The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:
<add key="Raven/MaxPageSize" value="5000"/>
For more info, see: http://ravendb.net/docs/intro/safe-by-default
The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all
var points = new List<T>();
var nextGroupOfPoints = new List<T>();
const int ElementTakeCount = 1024;
int i = 0;
int skipResults = 0;
do
{
nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
i++;
skipResults += stats.SkippedResults;
points = points.Concat(nextGroupOfPoints).ToList();
}
while (nextGroupOfPoints.Count == ElementTakeCount);
return points;
RavenDB Paging
Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.
If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.
RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.
If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:
Index:
from post in docs.Posts
select new { post.Author, Count = 1 }
from result in results
group result by result.Author into g
select new
{
Author = g.Key,
Count = g.Sum(x=>x.Count)
}
Query:
session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.
var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);
using (var enumerator = session.Advanced.Stream<User>(query))
{
while (enumerator.MoveNext())
{
var user = enumerator.Current.Document;
// do something
}
}
Example index:
public class MyUserIndex: AbstractIndexCreationTask<User>
{
public MyUserIndex()
{
this.Map = users =>
from u in users
select new
{
u.IsDeleted,
u.Username,
};
}
}
Documentation: What are indexes?
Session : Querying : How to stream query results?
Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.
Other note: you may get the following exception if you do not specify the index to use.
InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

Best way to fetch tree data in NHibernate

I want to fetch Hierarchical/Tree data something like below from a Table which has following definiton.
Tree Table:
"""""""""""
Id |ParentId
"""""""""""
Work1|null
Work2|Work1
Work3|Work2
...
Required Query result Data (no need to be tabbed)- If I Pick 'Work1' I should complete Ids which are under its root something like below. If I pick 'Work2' then also I should complete Ids above and below its root.
> Work1
----------
> Work2
----------
> Work3
---------
What is the best way in NHibernate to fetch data in the above scenario in optimized manner.
To find out what the "best way" is, more information regarding the actual scenario would be needed. What kind of "optimization" are you looking for? Minimal amount of data (only the rows you are really going to need) or minimal number of SQL queries (preferably one roundtrip to the database) or any other?
Scenario 1: Menu or tree structure that is loaded once and kept in memory for longer periods of time (not a list that updates every few seconds). Small number of rows in the table (small is relative but I'd say anything below 200).
In this case I would just get the whole table with one query like this:
var items = session.Query<Work>()
.Fetch(c => c.ParentWork)
.Fetch(c => c.ChildWorks).ToList();
var item = session.Get<Work>(id);
This will result in a single SQL query which simply loads all the rows from the table. item will contain the complete tree (parents, grandparents, children, etc.).
Scenario 2: Large number of rows and only a fraction of rows needed. Only few levels in the hierarchy are to be expected.
In this case, just load the item and let NHibernate to the rest with lazy loading or force it to load everything by writing a recursive method to traverse parents and children. This will cause a N+1 select, which may or may not be slower than scenario 1 (depending on your data).
Here is a quick hack demonstrating this:
var item = session.Get<Work>(id);
Work parent = item.ParentWork;
Work root = item;
// find the root item
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// scan the whole tree
this.ScanChildren(root);
// -----
private void ScanChildren(Work item)
{
if (item == null)
{
return;
}
foreach (Work child in item.ChildWorks)
{
string name = child.Name;
this.ScanChildren(child);
}
}
Edit:
Scenario 3: Huge amount of data. Minimal number of queries and minimal amount of data.
In this case, I would think not of a tree structure but of having layers of data that we load one after another.
var work = repo.Session.Get<Work>(id);
// get root of that Work
Work parent = work.ParentWork;
Work root = work;
while (parent != null)
{
root = parent;
parent = parent.ParentWork;
}
// Get all the Works for each level
IList<Work> worksAll = new List<Work>() { root };
IList<Work> worksPerLevel = new List<Work>() { root };
// get each level until we don't have any more Works in the next level
int count = worksPerLevel.Count;
while (count > 0)
{
worksPerLevel = this.GetChildren(session, worksPerLevel);
// add the Works to our list of all Works
worksPerLevel.ForEach(c => worksAll.Add(c));
count = worksPerLevel.Count;
}
// here you can get the names of the Works or whatever
foreach (Work c in worksAll)
{
string s = c.Name;
}
// this methods gets the Works in the next level and returns them
private IList<Work> GetChildren(ISession session, IList<Work> worksPerLevel)
{
IList<Work> result = new List<Work>();
// get the IDs for the works in this level
IList<int> ids = worksPerLevel.Select(c => c.Id).ToList();
// use a WHERE IN clause do get the Works
// with the ParentId of Works in the current level
result = session.QueryOver<Work>()
.Where(
NHibernate.Criterion.Restrictions.InG<int>(
NHibernate.Criterion.Projections.Property<Work>(
c => c.ParentWork.Id),
ids)
)
.Fetch(c => c.ChildWorks).Eager // this will prevent the N+1 problem
.List();
return result;
}
This solution will not cause a N+1 problem, because we use an eager load for the children, so NHibernate will know the state of the child lists and not hit the DB again. You will only get x+y selects, where x is the number of selects to find the root Work and y is the number of levels (max depth of he tree).