RavenDb : Search occurrences in text is slow - ravendb

I would like to find the occurrences of a word in a text.
I have a class like this
public class Page
{
public string Id { get; set; }
public string BookId { get; set; }
public string Content { get; set; }
public int PageNumber { get; set; }
}
I have my index like this :
class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>
{
public class ReduceResult
{
public string PageId { get; set; }
public int Count { get; set; }
public string Word { get; set; }
public string Content { get; set; }
}
public Pages_SearchOccurrence()
{
Map = pages => from page in pages
let words = page.Content
.ToLower()
.Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries)
from w in words
select new
{
page.Content,
PageId = page.Id,
Count = 1,
Word = w
};
Reduce = results => from result in results
group result by new { PageId = result.PageId, result.Word } into g
select new
{
Content = g.First().Content,
PageId = g.Key.PageId,
Word = g.Key.Word,
Count = g.ToList().Count()
};
Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
}
}
Finally, my query is like this :
using (var session = documentStore.OpenSession())
{
RavenQueryStatistics stats;
var occurence = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>()
.Statistics(out stats)
.Where(x => x.Word == "works")
.ToList();
}
But I realize that RavenDb is very slow (or my query is not good  )
stats.IsStale = true and raven studio take too much time and give only few results.
I have 1000 document “Pages” with a content of 1000 words per Page .
Why is my query not okay and how can I find the occurrences in a page ?
Thank you for your help!

You are doing it wrong. You should set the Content field as Analyzed and use RavenDB's Search() operator. The slowness is most likely because of the amount of un-optimized work your index code is doing.

I had found a partial result.
Perhaps I'm not clear : my goal is to find the occurrences of a word in the page.
I search the hits count of a word in the page and I would like to order by this count.
I changed my index like this :
class Pages_SearchOccurrence : AbstractIndexCreationTask<Page, Pages_SearchOccurrence.ReduceResult>{
public class ReduceResult
{
public string Content { get; set; }
public string PageId { get; set; }
public string Count { get; set; }
public string Word { get; set; }
}
public Pages_SearchOccurrence()
{
Map = pages => from page in pages
let words = page.Content.ToLower().Split(new string[] { " ", "\n", ",", ";" }, StringSplitOptions.RemoveEmptyEntries)
from w in words
select new
{
page.Content,
PageId = page.Id,
Count = 1,
Word = w
};
Index(x => x.Content, Raven.Abstractions.Indexing.FieldIndexing.Analyzed);
Index(x => x.PageId, Raven.Abstractions.Indexing.FieldIndexing.NotAnalyzed);
}
Finally, my new query looks like this :
using (var session = documentStore.OpenSession())
{
var query = session.Query<Pages_SearchOccurrence.ReduceResult, Pages_SearchOccurrence>()
.Search((x) => x.Word, "works")
.AggregateBy(x => x.PageId)
.CountOn(x => x.Count)
.ToList()
.Results
.FirstOrDefault();
var listFacetValues = query.Value.Values;
var finalResult = listFacetValues.GroupBy(x => x.Hits).OrderByDescending(x => x.Key).Take(5).ToList();
}
The finalResult gives me a group of Facetvalue which have a property Hits
( the properties Hits and Count of my FacetValue are the same here )
The Hits property gives me the result that I want but for me this code is not correct and ravendb studio doesn't like this too.
Do you have a better solution ?

Related

Take function doesn't work and cannot be sent to RavenDB for query

Query Code:
var query = session.IndexQuery<App_OrgSearch.IndexResult, App_OrgSearch>();
var organizationUnitResults = query.Statistics(out stats)
.Skip(0)
.Take(5)
.AsProjection<Org>().ToList();
public static IRavenQueryable<TResult> IndexQuery<TResult, TIndex>(this IDocumentSession session)
where TIndex : AbstractIndexCreationTask, new()
{
return session.Query<TResult, TIndex>();
}
App_OrgSearch is the index I defined as below:
public class App_OrgSearch : AbstractIndexCreationTask<Org, App_OrgSearch.IndexResult>
{
public class IndexResult
{
public string Id { get; set; }
public string BusinessName { get; set; }
public string ShortName { get; set; }
public IList<string> Names { get; set; }
public List<string> PhoneNumbers { get; set; }
public List<OrganizationUnitPhone> OrganizationUnitPhones { get; set; }
}
public App_OrganizationUnitSearch()
{
Map = docs => from doc in docs
select new
{
Id = doc.Id,
Names = new List<string>
{
doc.BusinessName,
doc.ShortName,
},
BusinessName = doc.BusinessName,
ShortName = doc.ShortName,
PhoneNumbers = doc.OrganizationUnitPhones.Where(x => x != null && x.Phone != null).Select(x => x.Phone.Number),
};
Indexes.Add(x => x.Names, FieldIndexing.Analyzed);
}
}
I have 27 records in database. I want to take 5, but after query, all 27 records are returned. Why does Take function not work?
Your sample code seems wrong.
var query = session.IndexQuery<App_OrgSearch.IndexResult, App_OrgSearch>();
var organizationUnitResults = organizationUnitsQuery.Statistics(out stats)
What is organizationUnitsQuery ? You have the query as query, but there is no IndexQuery method on the session

RavenDB trouble adding wherein to spatial intersect query

I have an object Partner that looks like this:
public class Partner
{
public double Latitude { get; set; }
public double Longitude { get; set; }
public string ServiceKeyIds { get; set; }
public double WorkingRadius { get; set; }
public float Rating { get; set; }
}
I then have a projection LocalPartner that looks like this:
public class LocalPartner : Partner
{
public string WorkingRadiusShape { get; set; }
}
My index PartnersByLocation looks like this:
public class PartnersByLocation : AbstractIndexCreationTask<Partner,LocalPartner>
{
public PartnersByLocation()
{
Map = partners => from doc in partners
where doc.ServiceKeyIds != null
select new
{
doc.ServiceKeyIds,
WorkingRadiusShape = string.Format("Circle({0},{1}, d={2})", doc.Latitude, doc.Longitude, doc.WorkingRadius)
}.Boost((float)doc.Rating);
Spatial(x => x.WorkingRadiusShape, options => options.Geography.Default());
Index(x => x.ServiceKeyIds, FieldIndexing.NotAnalyzed);
}
}
I try to query the index as follows:
public LocalPartnerList GetNearbyPartners(double lat, double lng, string serviceIds, double rad = 25, int page = 0, int resultsPerPage = 10)
{
RavenQueryStatistics stats;
var point = string.Format(CultureInfo.InvariantCulture, "POINT ({0} {1})", lng, lat);
var query = session.Advanced.LuceneQuery<LocalPartner, PartnersByLocation>().Statistics(out stats).Spatial(x => x.WorkingRadiusShape, c => c.Intersects(point));
if (serviceIds != null && serviceIds.Any())
{
query = query.AndAlso().WhereIn("ServiceKeyIds", serviceIds.SafeSplit());
}
var raw = query.ToString();
query = query.Skip(page * resultsPerPage).Take(resultsPerPage);
var result = new LocalPartnerList();
result.LocalPartners = query.SelectFields<LocalPartner>().ToList();
result.Results = stats.TotalResults;
result.Page = page;
result.NumberPages = (int)Math.Ceiling((double)result.Results / (double)resultsPerPage);
result.Radius = (int)rad;
return result;
}
This query works if I call the method and serciceIds is null. It returns the expected results, but as soon as I apply the .AndAlso().WhereIn() clause the query returns an empty set.
I am certain that there are matching records. I should note that the ServiceKeyIds property of Partner holds a string of comma-separated strings, and the serviceIds passed to the method is the same, and SafeSplit() is an extension method that returns an array of strings.

How to count all posts belonging to multiple tags in NHibernate?

I have a many to many relationship:
A post can have many tags
A tag can have many posts
Models:
public class Post
{
public virtual string Title { get; set; }
public virtual string Content{ get; set; }
public virtual User User { get; set; }
public virtual ICollection<Tag> Tags { get; set; }
}
public class Tag
{
public virtual string Title { get; set; }
public virtual string Description { get; set; }
public virtual User User { get; set; }
public virtual ICollection<Post> Posts { get; set; }
}
I want to count all posts that belong to multiple tags but I don't know how to do this in NHibernate. I am not sure if this is the best way to do this but I used this query in MS SQL:
SELECT COUNT(*)
FROM
(
SELECT Posts.Id FROM Posts
INNER JOIN Users ON Posts.UserId=Users.Id
LEFT JOIN TagsPosts ON Posts.Id=TagsPosts.PostId
LEFT JOIN Tags ON TagsPosts.TagId=Tags.Id
WHERE Users.Username='mr.nuub' AND (Tags.Title in ('c#', 'asp.net-mvc'))
GROUP BY Posts.Id
HAVING COUNT(Posts.Id)=2
)t
But NHibernate does not allow subqueries in the from clause. It would be great if someone could show me how to do this in HQL.
I found a way of how to get this result without a sub query and this works with nHibernate Linq. It was actually not that easy because of the subset of linq expressions which are supported by nHibernate... but anyways
query:
var searchTags = new[] { "C#", "C++" };
var result = session.Query<Post>()
.Select(p => new {
Id = p.Id,
Count = p.Tags.Where(t => searchTags.Contains(t.Title)).Count()
})
.Where(s => s.Count >= 2)
.Count();
It produces the following sql statment:
select cast(count(*) as INT) as col_0_0_
from Posts post0_
where (
select cast(count(*) as INT)
from PostsToTags tags1_, Tags tag2_
where post0_.Id=tags1_.Post_id
and tags1_.Tag_id=tag2_.Id
and (tag2_.Title='C#' or tag2_.Title='C++'))>=2
you should be able to build your user restriction into this, I hope.
The following is my test setup and random data which got generated
public class Post
{
public Post()
{
Tags = new List<Tag>();
}
public virtual void AddTag(Tag tag)
{
this.Tags.Add(tag);
tag.Posts.Add(this);
}
public virtual string Title { get; set; }
public virtual string Content { get; set; }
public virtual ICollection<Tag> Tags { get; set; }
public virtual int Id { get; set; }
}
public class PostMap : ClassMap<Post>
{
public PostMap()
{
Table("Posts");
Id(p => p.Id).GeneratedBy.Native();
Map(p => p.Content);
Map(p => p.Title);
HasManyToMany<Tag>(map => map.Tags).Cascade.All();
}
}
public class Tag
{
public Tag()
{
Posts = new List<Post>();
}
public virtual string Title { get; set; }
public virtual string Description { get; set; }
public virtual ICollection<Post> Posts { get; set; }
public virtual int Id { get; set; }
}
public class TagMap : ClassMap<Tag>
{
public TagMap()
{
Table("Tags");
Id(p => p.Id).GeneratedBy.Native();
Map(p => p.Description);
Map(p => p.Title);
HasManyToMany<Post>(map => map.Posts).LazyLoad().Inverse();
}
}
test run:
var sessionFactory = Fluently.Configure()
.Database(FluentNHibernate.Cfg.Db.MsSqlConfiguration.MsSql2012
.ConnectionString(#"Server=.\SQLExpress;Database=TestDB;Trusted_Connection=True;")
.ShowSql)
.Mappings(m => m.FluentMappings
.AddFromAssemblyOf<PostMap>())
.ExposeConfiguration(cfg => new SchemaUpdate(cfg).Execute(false, true))
.BuildSessionFactory();
using (var session = sessionFactory.OpenSession())
{
var t1 = new Tag() { Title = "C#", Description = "C#" };
session.Save(t1);
var t2 = new Tag() { Title = "C++", Description = "C/C++" };
session.Save(t2);
var t3 = new Tag() { Title = ".Net", Description = "Net" };
session.Save(t3);
var t4 = new Tag() { Title = "Java", Description = "Java" };
session.Save(t4);
var t5 = new Tag() { Title = "lol", Description = "lol" };
session.Save(t5);
var t6 = new Tag() { Title = "rofl", Description = "rofl" };
session.Save(t6);
var tags = session.Query<Tag>().ToList();
var r = new Random();
for (int i = 0; i < 1000; i++)
{
var post = new Post()
{
Title = "Title" + i,
Content = "Something awesome" + i,
};
var manyTags = r.Next(1, 3);
while (post.Tags.Count() < manyTags)
{
var index = r.Next(0, 6);
if (!post.Tags.Contains(tags[index]))
{
post.AddTag(tags[index]);
}
}
session.Save(post);
}
session.Flush();
/* query test */
var searchTags = new[] { "C#", "C++" };
var result = session.Query<Post>()
.Select(p => new {
Id = p.Id,
Count = p.Tags.Where(t => searchTags.Contains(t.Title)).Count()
})
.Where(s => s.Count >= 2)
.Count();
var resultOriginal = session.CreateQuery(#"
SELECT COUNT(*)
FROM
(
SELECT count(Posts.Id)P FROM Posts
LEFT JOIN PostsToTags ON Posts.Id=PostsToTags.Post_id
LEFT JOIN Tags ON PostsToTags.Tag_id=Tags.Id
WHERE Tags.Title in ('c#', 'C++')
GROUP BY Posts.Id
HAVING COUNT(Posts.Id)>=2
)t
").List()[0];
var isEqual = result == (int)resultOriginal;
}
As you can see at the end I do test against your original query (without the users) and it is actually the same count.
In HQL:
var hql = "select count(p) from Post p where p in " +
"(select t.Post from Tag t group by t.Post having count(t.Post) > 1)";
var result = session.Query(hql).UniqueResult<long>();
You can add additional criteria to the subquery if you need to specify tags or other criteria.
Edit : In the future I should read the questions until the last words. I would have seen in HQL...
After some seach, realizing that RowCount removes any grouping in the query ( https://stackoverflow.com/a/8034921/1236044 ). I found a solution using QueryOver and SubQuery which I post here as information.
I find this solution interesting as it offers some modularity, and seprates the counting from the subquery itself, which can be reused as it is.
var searchTags = new[] { "tag1", "tag3" };
var userNames = new[] { "mr.nuub" };
Tag tagAlias = null;
Post postAlias = null;
User userAlias = null;
var postsSubquery =
QueryOver.Of<Post>(() => postAlias)
.JoinAlias(() => postAlias.Tags, () => tagAlias)
.JoinAlias(() => postAlias.User, () => userAlias)
.WhereRestrictionOn(() => tagAlias.Title).IsIn(searchTags)
.AndRestrictionOn(() => userAlias.UserName).IsIn(userNames)
.Where(Restrictions.Gt(Projections.Count<Post>(p => tagAlias.Title), 1));
var numberOfPosts = session.QueryOver<Post>()
.WithSubquery.WhereProperty(p => p.Id).In(postsSubquery.Select(Projections.Group<Post>(p => p.Id)))
.RowCount();
Hope this will help

Indexing list count within last month

I would like to be able to query the first 10 documents from a collection in RavenDB ordered by the count with a constraint in a sublist. This is my entity:
public class Post
{
public string Title { get; set; }
public List<Like> Likes { get; set; }
}
public class Like
{
public DateTime Created { get; set; }
}
I've tried with the following query:
var oneMonthAgo = DateTime.Today.AddMonths(-1);
session
.Query<Post>()
.OrderByDescending(x => x.Likes.Count(y => y.Created > oneMonthAgo))
.Take(10);
Raven complaints that count should be done on index time rather than query time. I've tried moving the count to a index using the following code:
public class PostsIndex : AbstractIndexCreationTask<Post>
{
public PostsIndex()
{
var month = DateTime.Today.AddMonths(-1);
Map = posts => from doc in posts
select
new
{
doc.Title,
LikeCount = doc.Likes.Count(x => x.Created > month),
};
}
}
When adding this index, Raven throws a error 500.
What to do?
You can do this by creating a Map/Reduce index to flatten the Posts/Likes and then query over that.
The index:
public class PostLikesPerDay : AbstractIndexCreationTask<Post, PostLikesPerDay.Result>
{
public PostLikesPerDay()
{
Map = posts => from post in posts
from like in post.Likes
select new Result
{
Title = post.Title,
Date = like.Created,
Likes = 1
};
Reduce = results => from result in results
group result by new
{
result.Title,
result.Date.Date
}
into grp
select new Result
{
Title = grp.Key.Title,
Date = grp.Key.Date,
Likes = grp.Sum(l => l.Likes)
};
}
public class Result
{
public string Title { get; set; }
public DateTime Date { get; set; }
public int Likes { get; set; }
}
}
And the query:
using (var session = store.OpenSession())
{
var oneMonthAgo = DateTime.Today.AddMonths(-1);
var query = session.Query<PostLikesPerDay.Result, PostLikesPerDay>()
.Where(y => y.Date > oneMonthAgo)
.OrderByDescending(p => p.Likes)
.Take(10);
foreach (var post in query)
{
Console.WriteLine("'{0}' has {1} likes on {2:d}", post.Title, post.Likes, post.Date);
}
}

Ravendb Live projections produces null values (Ravendb Lucene, Multimap, Live projections)

Firstly, i am sorry for my english and i will be very happy if i can tell my problem very simply.
I spend so much time to solve multimap index and live projection problem. I read too much on stackoverflow, google, ayende blog etc... However couldn't solve my problem.
What i want:
I have an app and want a twitter like search which is the twitter search box searches from multiple sources, such as from twit content, user names and hashtags. While i get result, i want to apply transform on results and shape index result into FullSearchResult model. Also i want to find that where the result is found. In the post, in user or in tag?
The problem :
i have 3 type of docs (Post, User, Tag) and multimap index. When i create my multimap index with TransformResults i get all my results with null values. (I query my docs with multimap index by full text search).
My Docs
public class Post
{
public string Id { get; set; }
public long SqlDbId { get; set; }
public string Title { get; set; }
public string Content { get; set; }
public string ContentAsHtml { get; set; }
public Status Status { get; set; }
public DenormalizedUser User { get; set; }
public DenormalizedTagCollection Tags { get; set; }
}
public class User
{
public string Id { get; set; }
public string Name { get; set; }
public string Surname { get; set; }
public string Email { get; set; }
public string MobileNumber { get; set; }
}
public class Tag
{
public string Id { get; set; }
public long SqlDbId { get; set; }
public string TagName { get; set; }
public DenormalizedUser TagInserterDenormalizedUser { get; set; }
public bool IsSystemTag { get; set; }
public Status Status { get; set; }
}
public class FullSearchIndex : AbstractMultiMapIndexCreationTask<FullSearchResult>
{
public FullSearchIndex()
{
AddMap<Post>(posts => from post in posts
let tags = post.Tags
where post.Status == Status.Active
select new
{
UserId = post.User != null ? post.User.Id.ToString() : (string)null,
PostId = post.Id,
TagIds = tags != null ? tags.Select(tag => tag.Id).ToArray() : new string[0],
SearchQuery = new object[]
{
post.Title,
post.Content,
post.Tags != null ? tags.Select(x => x.TagName).ToArray() : new string[0]
},
Source = SearchResultSource.ResultIsFromPost
});
AddMap<User>(users => from user in users
select new
{
UserId = user.Id,
PostId = (string)null,
TagIds = new string[0],
SearchQuery = new object[]
{
user.Name,
user.Surname
},
Source = SearchResultSource.ResultIsFromUser
});
AddMap<Tag>(tags => from tag in tags
where tag.Status == Status.Active
select new
{
UserId = (string)null,
PostId = (string)null,
TagIds = new string[] { tag.Id },
SearchQuery = new object[]
{
tag.TagName
},
Source = SearchResultSource.ResultIsFromTag
});
Index(searchResult => searchResult.SearchQuery, FieldIndexing.Analyzed);
TransformResults = (clientSideDatabase, results) =>
from result in results
let post = clientSideDatabase.Load<Post>(result.PostId)
let tags = clientSideDatabase.Load<Tag>(result.TagIds)
let user = clientSideDatabase.Load<User>(result.UserId)
select new
{
PostId = post != null ? post.Id : (string)null,
PostTitle = post != null ? post.Title : (string)null,
PostContent = post != null ? post.Content : (string)null,
PostTags = tags != null ? tags.Select(x => x.TagName).ToArray() : (string[])null,
UserId = user != null ? user.Id : (string)null,
UserName = user != null ? user.Name : (string)null,
UserSurname = user != null ? user.Surname : (string)null,
UserEmail = user != null ? user.Email : (string)null,
UserMobileNumber = user != null ? user.MobileNumber : (string)null
};
}
}
When i query using multimap index and lucene search i have 4 results. However, all values are null
query = "Tag50";
session.Query<FullSearchResult, FullSearchIndex>()
.Search(resultItem => resultItem.SearchQuery, query)
.As<FullSearchResultViewModel>()
.ToList();
Nkaya,
You assumed that the input of the TransformResults is the output of the Maps, but that isn't the case.
The input to TransformResults are the actual documents. The output of the maps is used to generate the index for searching ,not to shape how the input to the transform results looks like.