Lucene Query Syntax - lucene

I'm trying to use Lucene to query a domain that has the following structure
Student 1-------* Attendance *---------1 Course
The data in the domain is summarised below
Course.name Attendance.mandatory Student.name
-------------------------------------------------
cooking N Bob
art Y Bob
If I execute the query "courseName:cooking AND mandatory:Y" it returns Bob, because Bob is attending the cooking course, and Bob is also attending a mandatory course. However, what I really want to query for is "students attending a mandatory cooking course", which in this case would return nobody.
Is it possible to formulate this as a Lucene query? I'm actually using Compass, rather than Lucene directly, so I can use either CompassQueryBuilder or Lucene's query language.
For the sake of completeness, the domain classes themselves are shown below. These classes are Grails domain classes, but I'm using the standard Compass annotations and Lucene query syntax.
#Searchable
class Student {
#SearchableProperty(accessor = 'property')
String name
static hasMany = [attendances: Attendance]
#SearchableId(accessor = 'property')
Long id
#SearchableComponent
Set<Attendance> getAttendances() {
return attendances
}
}
#Searchable(root = false)
class Attendance {
static belongsTo = [student: Student, course: Course]
#SearchableProperty(accessor = 'property')
String mandatory = "Y"
#SearchableId(accessor = 'property')
Long id
#SearchableComponent
Course getCourse() {
return course
}
}
#Searchable(root = false)
class Course {
#SearchableProperty(accessor = 'property', name = "courseName")
String name
#SearchableId(accessor = 'property')
Long id
}

What you are trying to do is sometimes known as "scoped search" or "xml search" - the ability to search based on a set of related sub-elements. Lucene does not support this natively but there are some tricks you can do to get it to work.
You can put all of the course data associated with a student in a single field. Then bump the term position by a fixed amount (like 100) between the terms for each course. You can then do a proximity search with phrase queries or span queries to force a match for attributes of a single course. This is how Solr supports multi-valued fields.

Another workaround is to add fake getter and index it
Something like:
#SearchableComponent
Course getCourseMandatory() {
return course + mandatory;
}

Try
+courseName:cooking +mandatory:Y
We use pretty similar queries and this works for us:
+ProdLineNum:1920b +HouseBrand:1
This selects everything in product line 1920b that is also a house brand (generic).

You can just create queries as text string and then parse that to get your query object. Presume you have seen Apache Lucene - Query Parser Syntax ?

Related

Using lucene to search data differently for different users conditionally

Consider that the entities that I need to perform text search are as following
Sample{
int ID, //Unique ID
string Name,//Searchable field
string Description //Searchable field
}
Now, I have several such entities which are commonly shared by all the users but each user can associate different tags, Notes etc to any of these entities. For simplicity lets say a user can add tags to a Sample entity.
UserSampleData{
int ID, //Sample ID
int UserID, //For condition
string tags //Searchable field
}
When a user performs search, I want to search for the given string in the fields Name, Description and tags associated to that Sample by the current user. I am pretty new to using lucene indexing and I am not able to figure how can I design a index and also the queries for such a situation. I need the results sorted on the relevance with the search query. Following approaches crossed my mind, but I have a feeling there could be better solutions:
Separately query 2 different entities Samples and UserSampleData and somehow mix the 2 results. For the results that intersect, we need to combine the match scores by may be averaging.
Flatten out the data by combining both the entities => multiple entries for same ID.
You could use a JoinUtil Lucene class but you must rename the second "ID" field of UserDataSample document into SAMPLE_ID (or another name different from "ID").
Below an example:
r = DirectoryReader.open(dir);
final Version version = Version.LUCENE_47; // Your lucene version
final IndexSearcher searcher = new IndexSearcher(r);
final String fromField = "ID";
final boolean multipleValuesPerDocument = false;
final String toField = "SAMPLE_ID";
String querystr = "UserID:xxxx AND yourQueryString"; //the userID condition and your query String
Query fromQuery = new QueryParser(version, "NAME", new WhitespaceAnalyzer(version)).parse(querystr);
final Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, searcher, ScoreMode.None);
final TopDocs topDocs = searcher.search(joinQuery, 10);
Check the bug https://issues.apache.org/jira/browse/LUCENE-4824). I don't know if the bug is automatically solved into the current version of LUCENE otherwise I think you must convert the type of your ID fields to String.
I think that you need Relational Data. Handling relational data is not simple with Lucene. This is a useful blog post for.

How should I filter the results of a RavenDB Lucene search?

Say I had a User class like this:
public class User
{
public bool IsActive {get;set;}
public string[] Tags{get;set;}
public string Description {get;set;}
}
I would like to use RavenDB to search for the set of users that match following criteria:
IsActive = true
Tags contains both 'hello' and 'world'
Description has the following phrase 'abject failure'
I have researched the Lucene Query syntax, I have even got some stuff working, but it all feels dreadfully clunky with lots of combinatorial string-building to create a text-based lucene query string. I hesitate to put my code up here because it is quite smelly.
I think what I want to do it submit a Lucene Search for the Description and Tags and then filter it with a Where clause for the IsActive field, perhaps like this Filter RavenDB Search Results. But I got lost.
I am using the latest official release (960) so all the groovy stuff that comes after this is not available to me yet. For example, this solution is verboten as 960 does not appear to support the .As<T>() extension.
Question
How do I construct the required Index and Query to perform a search that combines:
a single constraint, eg IsActive
a collection constraint, eg Tags
a free-text constraint eg Description
to return a strongly typed list of User objects?
Thank you for any code examples or pointers.
You query it like this:
var results = (from u in Session.Query<User>("YourUserIndex")
where u.IsActive && u.Tags.Any(x=>x == "hello") && x.Tags.Any(x=>x=="world")
select u)
.Search(x=>x.Description, "abject failure")
.ToList();
Where YourUserIndex looks like this:
from u in docs.Users
select new { u.IsActive, u.Tags, u.Description };
And you need to mark the Description field as analyzed.

NHibernate / Localization / Lookup tables

I want to add localization support to my domain object. I have the following:
class Person
{
int Id;
City city;
}
class City
{
int Id;
string Name;
}
All cities are saved in a lookup db table Cities. I would like to have:
Person p = PeopleService.GetPersonById(1);
//Assert p.City.Name == 'London' if culture == 'en-us'
I dont like doing
string City::Name { get { return ILocalizationProvider.Get(typeof(City), Id); }
I came by this article:
http://ayende.com/Blog/archive/2006/12/26/LocalizingNHibernateContextualParameters.aspx
Yet I dont know whether its supported in NH 2.1 or not.
How can I instruct NH to cache all cities in 2nd-level cache to avoid joins each time for the same locale?
Is there an easy and neat way to treat database lookup tables and localization in NHibernate ?
In the article it says "Please note that this is no longer supported behavior in NHibernate 2.1 and up. It was a hack to begin with, and it isn't guaranteed to continue working."
So this will not work in your scenario. I would recommend this: http://nhforge.org/wikis/howtonh/localization-techniques.aspx

Nhibernate Tag Cloud Query

This has been a 2 week battle for me so far with no luck. :(
Let me first state my objective. To be able to search entities which are tagged "foo" and "bar". Wouldn't think that was too hard right?
I know this can be done easily with HQL but since this is a dynamically built search query that is not an option. First some code:
public class Foo
{
public virtual int Id { get;set; }
public virtual IList<Tag> Tags { get;set; }
}
public class Tag
{
public virtual int Id { get;set; }
public virtual string Text { get;set; }
}
Mapped as a many-to-many because the Tag class is used on many different types. Hence no bidirectional reference.
So I build my detached criteria up using an abstract filter class. Lets assume for simplicity I am just searching for Foos with tags "Apples"(TagId1) && "Oranges"(TagId3) this would look something like.
SQL:
SELECT ft.FooId
FROM Foo_Tags ft
WHERE ft.TagId IN (1, 3)
GROUP BY ft.FooId
HAVING COUNT(DISTINCT ft.TagId) = 2; /*Number of items we are looking for*/
Criteria
var idsIn = new List<int>() {1, 3};
var dc = DetachedCriteria.For(typeof(Foo), "f").
.CreateCriteria("Tags", "t")
.Add(Restrictions.InG("t.Id", idsIn))
.SetProjection( Projections.ProjectionList()
.Add(Projections.Property("f.Id"))
.Add(Projections.RowCount(), "RowCount")
.Add(Projections.GroupProperty("f.Id")))
.ProjectionCriteria.Add(Restrictions.Eq("RowCount", idsIn.Count));
}
var c = Session.CreateCriteria(typeof(Foo)).Add(Subqueries.PropertyIn("Id", dc))
Basically this is creating a DC that projects a list of Foo Ids which have all the tags specified.
This compiled in NH 2.0.1 but didn't work as it complained it couldn't find Property "RowCount" on class Foo.
After reading this post I was hopeful that this might be fixed in 2.1.0 so I upgraded. To my extreme disappointment I discovered that ProjectionCriteria has been removed from DetachedCriteria and I cannot figure out how to make the dynamic query building work without DetachedCriteria.
So I tried to think how to write the same query without needing the infamous Having clause. It can be done with multiple joins on the tag table. Hooray I thought that's pretty simple. So I rewrote it to look like this.
var idsIn = new List<int>() {1, 3};
var dc = DetachedCriteria.For(typeof(Foo), "f").
.CreateCriteria("Tags", "t1").Add(Restrictions.Eq("t1.Id", idsIn[0]))
.CreateCriteria("Tags", "t2").Add(Restrictions.Eq("t2.Id", idsIn[1]))
In a vain attempt to produce the below sql which would do the job (I realise its not quite correct).
SELECT f.Id
FROM Foo f
JOIN Foo_Tags ft1
ON ft1.FooId = f.Id
AND ft1.TagId = 1
JOIN Foo_Tags ft2
ON ft2.FooId = f.Id
AND ft2.TagId = 3
Unfortunately I fell at the first hurdle with this attempt, receiving the exception "Duplicate Association Path". Reading around this seems to be an ancient and still very real bug/limitation.
What am I missing?
I am starting to curse NHibernates name at making what is you would think so simple and common a query, so difficult. Please help anyone who has done this before. How did you get around NHibernates limitations.
Forget reputation and a bounty. If someone does me a solid on this I will send you a 6 pack for your trouble.
I managed to get it working like this :
var dc = DetachedCriteria.For<Foo>( "f")
.CreateCriteria("Tags", "t")
.Add(Restrictions.InG("t.Id", idsIn))
.SetProjection(Projections.SqlGroupProjection("{alias}.FooId", "{alias}.FooId having count(distinct t1_.TagId) = " + idsIn.Count,
new[] { "Id" },
new IType[] { NHibernateUtil.Int32 }));
The only problem here is the count(t1_.TagId) - but I think that the alias should be generated the same every time in this DetachedCriteria - so you should be on the safe side hard coding that.
Ian,
Since I'm not sure what db backend you are using, can you do some sort of a trace against the produced SQL query and take a look at the SQL to figure out what went wrong?
I know I've done this in the past to understand how Linq-2-SQL and Linq-2-Entities have worked, and been able to tweak certain cases to improve the data access, as well as to understand why something wasn't working as initially expected.

NHibernate projection help

Im having a problem creating a projection for my nhibernate detachedcriteria object.
I have a class Spa which is linked to table Address.
Address has a field called City which is a string.
public class Spa : IAggregateRoot
{
[BelongsTo("AddressID", Cascade = CascadeEnum.All)]
public Address Address { get; set; }
}
My ultimate goal is to get a distinct list of City names.
If i could get all spas with distinct cities i would be happy too.
All my attempts have been for naught and havent found any helpful posts.
So far i've tried:
DetachedCriteria query = DetachedCriteria.For<Spa>()
.CreateAlias("Address", "A")
query.SetProjection(
Projections.Distinct(Projections.ProjectionList()
.Add(Projections.Alias(Projections.Property("Address"), "A"))));
var Spas = ActiveRecordMediator<Spa>.FindAll(query);
I know the above is not correct, just trying to find somewhere to start.
Any help would be appreciated.
Also any simple projections tutorials would be appreciated, cant seem to find anything straight forward out there.
I also tried, but got cast error, looking into it:
DetachedCriteria query = DetachedCriteria.For<Spa>()
.CreateAlias("Address", "A")
.SetProjection(Projections.Distinct(Projections.Property("A.City")));
It seems to me there are two parts to your question.
1. What should my DetachedCriteria look like?
If you are not performing any other aggregations, GROUP BY should provide the same results as DISTINCT. This is the query I would use:
var query = DetachedCriteria.For<Spa>()
.CreateAlias("Address", "A")
.SetProjection(Projections.GroupProperty("A.City"));
2. How do I execute it with Castle ActiveRecord?
I have never used ActiveRecord, but based on the method signatures, I would expect something like this to work:
var cities = ActiveRecordMediator<string>.FindAll(query);
If you have access to the NHibernate session, you could also execute it this way:
var cities = query.GetExecutableCriteria(session).List<string>();