I'm interested in changing db full text search to lucene. I'm using hibernate so I guess it would be smart to use hibernate search. I have a problem though.
Our record has a list of informations and titles from different languages and I need to be able to search based on a single language and over all languages.
I could probably do it in plain lucene but I don't know how well it would work with current transactions. So using hibernate search and hibernate to deal with the index would be much better.
Is it possible to create such fields in the index to search the way I described?
class Record{
List<Info> infos;
}
class Info{
String title;
String infoText;
String langCode;
}
Can I do it like this. Create getters in Record like this:
public String getEnghlishTitle(){...}
public String getFullInfos(){...}
And then put index annotations on these getters and then have necessary fields in index?
I would write a custom FieldBridge for the infos property. Then you have full control which fields you add to the index, eg you could could use text. as field names. This should allow to dynamically decide which language to search for. Remember you have to think about the analyzers too. A custom per field analyzer would work.
Related
I'm using sitecore 8 and I'm looking for a way to run a full text search for all my sitecore content. I have a solution in place, but I feel there's got to be a better way to do this.
My approach:
i have a computed field that merges all text fields into a single computed field. Before I execute a search I tokenize my search text and build a ORed predicate to match on the field.
I do not like this approach because it gets really complicated if I need to boost items that match the title vs the body i.e. i loose the field level boosting.
FYI: my code is very similar to this so post.
Thanks
Sitecore already maintains a full text field, _content, that contains all the text fields. You can run your search against that. You can even create computed fields that add to _content (such as the datasource content example here).
So assuming you are building a LINQ query for your full text search, and have already filtered on templates, latest version, location, etc., adding your search terms to the query would look something like this:
var terms = SearchTerm.Split();
var currentExpression = PredicateBuilder.True<SiteSearchResultItem>();
foreach (var term in terms)
{
//Content is mapped to _content
currentExpression = PredicateBuilder.And(currentExpression, x => x.Content.Contains(term));
}
query = query.Where(currentExpression);
Typically you would want to AND search terms rather than ORing them.
You are right that field level boosting is lost in this. In the end, Lucene is not a great solution for creating a quality full-text site search. If this is an important requirement, you may want to look at Coveo or even something like a Google Site Search.
I've got a basic search working, and I'm highlighting using FastVectorHighlighter. When you ask the highlighter for a "best fragment" you have a few overloads of getBestFragment(s) to choose from, documented here. I'm now using the simplest one, like this:
highlightedText = highlighter.getBestFragment(fieldQuery, searcher.getIndexReader(),
scoreDoc.doc, "description", 100)
So I'm highlighting the match from the "description" field. My query however searches another field, "notes". How do I include that in the highlighting? There is an overload that takes a Set<String> matchedFields and one String storedField, but I don't understand the docs. The doc for the method says:
it is advisable that all matchedFields share the same source as storedField or are at least a prefix of it.
What does that mean? How do I index the "notes" and "description" Strings, and what do I pass for matchedFields and storedField?
That call, I believe, is intended to highlight against multiple indexed forms of the same content. That is, if you have one stored full-text content field, but you have indexed it in a number of different ways to expand how you can search it. Perhaps you have one indexed field that uses standard analysis, another with language-specific stemming, another that uses ngrams, and another indexing metaphones.
If you want to highlight two different stored fields, two calls to getBestFragment would be called for. Or you could use a different highlighter that allows multiple stored fields to be highlighted at the same time, PostingsHighlighter, for instance.
I want some fields like urls, to be indexed and stored but not to be analyzed. Field class had a constructor to do the same.
Field(String name, String value, Field.Store store, Field.Index index)
But this constructor has been deprecated since lucene 4 and it is suggested to use StringField or TextField objects. But they don't have any constructors to specify which field to be indexed. So can it be done?
The correct way to index and store an un-analyzed field, as a single token, is to use StringField. It is designed to handle atomic strings, like id numbers, urls, etc. You can specify whether it is stored similarity to in Lucene 3.X
Such as:
new StringField("myUrl, "http://stackoverflow.com/questions/19042587/how-to-prevent-a-field-from-not-analyzing-in-lucene", Field.Store.YES)
Hello you are totally right with what you are saying. With the new fields provided by Lucene you cannot achieve what you want.
You can either continue using the Field as you described or implement your own field by implementing the interface IndexableField. there you can decide yourself what behaviors you want your Field to have.
how to index and search for custom fields using Lucene or hibernate search. i cannot find a way to index the custom field. they are dynamic.
'custom fields' in here means they can be editabled by user,those fields are not hard code.
Any help will be thankful!
Query of Custom Fields
Just use the projection API:
FullTextQuery hibernateQuery = fullTextSession
.createFullTextQuery(luceneQuery)
.setProjection("myField1", "myField2");
List results = hibernateQuery.list();
Using projections you get to read any field as long as it's STORED.
If it matches some property name of your indexed entities it will be materialized after being converted to the appropriate type (if you have a TwoWayFieldBridge); if not you will get the String value.
If for some reason you need to bypass this conversion or just want to have fun decoding the raw Lucene Document you can open an IndexReaderdirectly.
Indexing Custom Fields
When defining a FieldBridge you get to add as many fields as you like to the indexed Document, and you can name each of them as you like.
The method parameter name is a hint - useful for example to scope the field name - but you can ignore it.
An example FieldBridge implementation writing multiple fields is the DateSplitBridge in the documentation.
I read that Lucene has an internal query language where one specifies : and you make combinations of these using boolean operators.
I read all about it on their website and it works just fine in LUKE, I can do things like
field1:value1 AND field2:value2
and it will return seemingly correct results.
My problem is how do I pass this who Lucene query into the API? I've seen QueryParser, but I have to specifiy a field. Does this mean I still have to manually parse my input string, fields, values, parenthesis, etc or is there a way to feed the whole thing in and let lucene do it's thing?
I'm using Lucene.NET but since it's a method by method port of the orignal java, any advice is appreciated.
Are you asking whether you need to force your user to enter the field? If so, the query parser has a default field. Here's a little more info. As long as you have a default field that will do the job, they don't need to specify fields.
If you're asking how to get a Query object from the String, you need the parse method. It understands about fields, and the default field, etc. mentioned earlier. You just need to make sure that the query parser and the index builder are both using the same analysis.