How to index field with comma-separated text value - hibernate search - lucene

I am implementing a book search function based on hibernate search3.2.
Book object contains a field called authornames. Authornames value is a list of names and comma is the splitter, say "John Will, Robin Rod, James Timerberland"
#Field(index = org.hibernate.search.annotations.Index.UN_TOKENIZED,store=Store.YES)
#FieldBridge(impl=CollectionToCSVBridge.class)
private Set<String> authornames;
I need each of names to be UN_TOKENIZED, so that user search book by single author name: John Will, Robin Rod or James Timerberland.
I used Luke to check indexs, and value in authornames field is stored as "John Will, Robin Rod, James Timerberland", but I can not get result by querying "authornames:John Will"
Anybody can tell me how can I do it?

I gues CollectionToCSVBridge is concatenating all names with a ", " in a larger string.
You should keep them separate instead and add each element individually to the index:
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
if ( value == null ) {
return;
}
if ( !( value instanceof Collection ) ) {
throw new IllegalArgumentException( "This FieldBridge only supports collections." );
}
Collection<?> objects = (Collection<?>) value;
for ( Object object : objects ) {
luceneOptions.addFieldToDocument( name, objectToString( object ), document ); // in your case objectToString could do just a #toString
}
}
See also https://forum.hibernate.org/viewtopic.php?f=9&t=1015286&start=0

Related

Java8 Streams - Compare Two List's object values and add value to sub object of first list?

I have two classes:
public class ClassOne {
private String id;
private String name;
private String school;
private String score; //Default score is null
..getters and setters..
}
public class ClassTwo {
private String id;
private String marks;
..getters and setters..
}
And, I have two Lists of the above classes,
List<ClassOne> listOne;
List<ClassTwo> listTwo;
How can I compare two list and assign marks from listTwo to score of listOne based on the criteria if the IDs are equal. I know, we can use two for loops and do it. But I want to implement it using Java8 streams.
List<ClassOne> result = new ArrayList<>();
for(ClassOne one : listOne) {
for(ClassTwo two : listTwo) {
if(one.getId().equals(two.getId())) {
one.setScore(two.getmarks());
result.add(one);
}
}
}
return result;
How can I implement this using Java8 lambda and streams?
Let listOne.size() is N and listTwo.size() is M.
Then 2-for-loops solution has complexity of O(M*N).
We can reduce it to O(M+N) by indexing listTwo by ids.
Case 1 - assuming listTwo has no objects with the same id
// pair each id with its marks
Map<String, String> marksIndex = listTwo.stream().collect(Collectors.toMap(ObjectTwo::getId, ObjectTwo::getMarks));
// go through list of `ObjectOne`s and lookup marks in the index
listOne.forEach(o1 -> o1.setScore(marksIndex.get(o1.getId())));
Case 2 - assuming listTwo has objects with the same id
final Map<String, List<ObjectTwo>> marksIndex = listTwo.stream()
.collect(Collectors.groupingBy(ObjectTwo::getId, Collectors.toList()));
final List<ObjectOne> result = listOne.stream()
.flatMap(o1 -> marksIndex.get(o1.getId()).stream().map(o2 -> {
// make a copy of ObjectOne instance to avoid overwriting scores
ObjectOne copy = copy(o1);
copy.setScore(o2.getMarks());
return copy;
}))
.collect(Collectors.toList());
To implement copy method you either need to create a new object and copy fields one by one, but in such cases I prefer to follow the Builder pattern. It also results in more "functional" code.
Following code copies marks from ObjectTwo to score in ObjectOne, if both ids are equal, it doesn't have intermediate object List<ObjectOne> result
listOne.stream()
.forEach(one -> {listTwo.stream()
.filter(two -> {return two.getId().equals(one.getId());})
.limit(1)
.forEach(two -> {one.setScore(two.getMarks());});
});
This should work.
Map<String, String> collect = listTwo.stream().collect(Collectors.toMap(ObjectTwo::getId, ObjectTwo::getMarks));
listOne
.stream()
.filter(item -> collect.containsKey(item.getId()))
.forEach(item -> item.setScore(collect.get(item.getId())));

Should I have redundant method names between related classes?

I'm building a sports product. I have 3 classes
class Team {
getName // ex: Los Angeles Lakers
getShortName // ex: Lakers
getAbbrName // ex: LAL
}
class Match {
Team getHomeTeam
Team getAwayTeam
Play[] plays
}
class Play {
Match getMatch
String description // "Kobe Bryant scores a 3 pointer!"
}
A team is just any sports team. A match is a sports match between two teams. During a match, plays happen that get attached to that match.
I have the need to get the home/away team's name, short name, and abbreviated names, given a Match or Play. Which option do you prefer and why?
Option 1 - callers need to do this. Ex:
class SomeCaller {
foo() {
Play play = // somehow get a play;
Match match = play->getMatch;
Team home = match->getHomeTeam;
Team away = match->getAwayTeam;
String homeTeamName = home->getName;
String homeTeamShortName = home->getShortName;
String homeTeamAbbrName = home->getAbbrName;
String awayTeamName = away->getName;
String awayTeamShortName = away->getShortName;
String awayTeamAbbrName = away->getAbbrName;
// do something with the team names
}
}
Option 2 - Add the same methods to both classes
class Match {
Team getHomeTeam
Team getAwayTeam
Play[] plays
String getHomeTeamName() {
Team homeTeam = getHomeTeam();
return homeTeam->getName();
}
// same as above...
String getHomeTeamShortName()
String getHomeTeamAbbrName()
String getAwayTeamName()
String getAwayTeamShortName()
String getAwayTeamAbbrName()
}
class Play {
Match getMatch
String getHomeTeamName() {
Match match = getMatch;
return match->getHomeTeamName();
}
// same as above...
String getHomeTeamShortName()
String getHomeTeamAbbrName()
String getAwayTeamName()
String getAwayTeamShortName()
String getAwayTeamAbbrName()
}
Keep in mind I would want to get name, short name, abbr name, for both home and away teams given a Match or Play object, so there's going to be lots of method duplication with option 2.
Option 1 is prefered of the two.
Typically you wouldn't have a reference from Play to Match and Match to Play.
class Team {
getName // ex: Los Angeles Lakers
getShortName // ex: Lakers
getAbbrName // ex: LAL
}
class Match {
Team getHomeTeam
Team getAwayTeam
Play[] plays
}
class Play {
String description // "Kobe Bryant scores a 3 pointer!"
}
Now, you won't be able to get the team names with only a Play, but if you have access to all the matches you can do something like this:
foreach(match in matches)
foreach(play in match)
if(play == desiredPlay)
doSomething
If this isn't acceptable, in case you need to get team names from only a play, without knowing the matches, you would have a reference to this directly in your play. I.e:
class Play {
Team getHomeTeam
Team getAwayTeam
String description // "Kobe Bryant scores a 3 pointer!"
}
This couples your code more than option 2, but less than option 1.
This is all because of "Law of Demeter" (https://en.wikipedia.org/wiki/Law_of_Demeter) which in a nutshell says, you shouldn't know of the Match, if it is what Match knows you really want, you should instead know of Team.
The link has a decent example I suggest you read. :)

Get a value from array based on the value of others arrays (VB.Net)

Supposed that I have two arrays:
Dim RoomName() As String = {(RoomA), (RoomB), (RoomC), (RoomD), (RoomE)}
Dim RoomType() As Integer = {1, 2, 2, 2, 1}
I want to get a value from the "RoomName" array based on a criteria of "RoomType" array. For example, I want to get a "RoomName" with "RoomType = 2", so the algorithm should randomize the index of the array that the "RoomType" is "2", and get a single value range from index "1-3" only.
Is there any possible ways to solve the problem using array, or is there any better ways to do this? Thank you very much for your time :)
Note: Code examples below using C# but hopefully you can read the intent for vb.net
Well, a simpler way would be to have a structure/class that contained both name and type properties e.g.:
public class Room
{
public string Name { get; set; }
public int Type { get; set; }
public Room(string name, int type)
{
Name = name;
Type = type;
}
}
Then given a set of rooms you can find those of a given type using a simple linq expression:
var match = rooms.Where(r => r.Type == 2).Select(r => r.Name).ToList();
Then you can find a random entry from within the set of matching room names (see below)
However assuming you want to stick with the parallel arrays, one way is to find the matching index values from the type array, then find the matching names and then find one of the matching values using a random function.
var matchingTypeIndexes = new List<int>();
int matchingTypeIndex = -1;
do
{
matchingTypeIndex = Array.IndexOf(roomType, 2, matchingTypeIndex + 1);
if (matchingTypeIndex > -1)
{
matchingTypeIndexes.Add(matchingTypeIndex);
}
} while (matchingTypeIndex > -1);
List<string> matchingRoomNames = matchingTypeIndexes.Select(typeIndex => roomName[typeIndex]).ToList();
Then to find a random entry of those that match (from one of the lists generated above):
var posn = new Random().Next(matchingRoomNames.Count);
Console.WriteLine(matchingRoomNames[posn]);

SharePoint 2010 - Custom calculated column

In a document library I need a custom calculated column, because the default Excel formula don't provide the functionality I need.
I created a custom field inheriting from SPFieldText, that I then could customize at will. The question is: how is it possible, from my custom field, to access the content values of the other fields of the document library?
In other world, in the overriden GetValidatedString method, how can I return a value that is dependent upon values from other fields, for the same record? How to implement getFieldValue() , below:
public class MyCustomField : SPFieldText
{
....
public override string GetValidatedString(object value)
{
string value1 = getFieldValue("Column-Name1");
string value2 = getFieldValue("Column-Name2");
return value1 + ", " + value2; // any arbitrary operation on field values
}
}
Thanks!
You should be able to grab other values from the form using the Item property of the FormComponent or the Item property of the ItemContext.
Either of these should work from the FieldControl class:
Code Snippet
if ((this.ControlMode == SPControlMode.New) || (this.ControlMode == SPControlMode.Edit))
{
object obj = this.Item["Name"];
if (obj != null)
string name = obj.ToString();
object obj2 = base.ItemContext.Item["Name"];
if (obj2 != null)
string name2 = obj2.ToString();
}
where "Name" is the internal name of the field that you wish to retrieve.

Find all available values for a field in lucene .net

If I have a field x, that can contain a value of y, or z etc, is there a way I can query so that I can return only the values that have been indexed?
Example
x available settable values = test1, test2, test3, test4
Item 1 : Field x = test1
Item 2 : Field x = test2
Item 3 : Field x = test4
Item 4 : Field x = test1
Performing required query would return a list of:
test1, test2, test4
I've implemented this before as an extension method:
public static class ReaderExtentions
{
public static IEnumerable<string> UniqueTermsFromField(
this IndexReader reader, string field)
{
var termEnum = reader.Terms(new Term(field));
do
{
var currentTerm = termEnum.Term();
if (currentTerm.Field() != field)
yield break;
yield return currentTerm.Text();
} while (termEnum.Next());
}
}
You can use it very easily like this:
var allPossibleTermsForField = reader.UniqueTermsFromField("FieldName");
That will return you what you want.
EDIT: I was skipping the first term above, due to some absent-mindedness. I've updated the code accordingly to work properly.
TermEnum te = indexReader.Terms(new Term("fieldx"));
do
{
Term t = te.Term();
if (t==null || t.Field() != "fieldx") break;
Console.WriteLine(t.Text());
} while (te.Next());
You can use facets to return the first N values of a field if the field is indexed as a string or is indexed using KeywordTokenizer and no filters. This means that the field is not tokenized but just saved as it is.
Just set the following properties on a query:
facet=true
facet.field=fieldname
facet.limit=N //the number of values you want to retrieve
I think a WildcardQuery searching on field 'x' and value of '*' would do the trick.
I once used Lucene 2.9.2 and there I used the approach with the FieldCache as described in the book "Lucene in Action" by Manning:
String[] fieldValues = FieldCache.DEFAULT.getStrings(indexReader, fieldname);
The array fieldValues contains all values in the index for field fieldname (Example: ["NY", "NY", "NY", "SF"]), so it is up to you now how to process the array. Usually you create a HashMap<String,Integer> that sums up the occurrences of each possible value, in this case NY=3, SF=1.
Maybe this helps. It is quite slow and memory consuming for very large indexes (1.000.000 documents in index) but it works.