How can I search lucene for "John J" and get people like "John Jameson" not just people with John? - lucene

For reasons out of my control, I must do this with a global search. I've taken converting a search term "John J" into (John AND J), which works for anyone who's last name doesn't start with the same letter as their first.
How can I make the search for "John J" become "find all people who have John and then another, different J in the field"?
Thanks for your time.

You may want to try out Wildcard Query. For example:
Term term = new Term("secondName", "J*");
Query query = new WildcardQuery(term);
I am assuming you have a different fields for first and second name. You can create a boolean query with a combination of queries for first and second names.
Documentation for WildcardQuery: http://lucene.apache.org/core/6_2_0/core/org/apache/lucene/search/WildcardQuery.html
I hope this helps.

Since you mentioned it is a type ahead input; prefixQuery might help -
new PrefixQuery(new Term("lastName","J"));
This will return all documents with lastName starting with "J".
To get results where firstName starts with "John" and lastName starts with "J", you can have -
BooleanQuery.Builder booleanQueryBuilder;
booleanQueryBuilder.add(new PrefixQuery(new Term("firstName","John")));
booleanQueryBuilder.add(new PrefixQuery(new Term("lastName","J")));`

Related

Can I clear the stopword list in lucene.net in order for exact matches to work better?

When dealing with exact matches I'm given a real world query like this:
"not in education, employment, or training"
Converted to a Lucene query with stopwords removed gives:
+Content:"? ? education employment ? training"
Here's a more contrived example:
"there is no such thing"
Converted to a Lucene query with stopwords removed gives:
+Content:"? ? ? ? thing"
My goal is to have searches like these match only the exact match as the user entered it.
Could one solution be to clear the stopwords list? would this have adverse affects? if so what? (my google-fu failed)
This all depends on the analyzer you are using. The StandardAnalyzer uses Stop words and strips them out, in fact the StopAnalyzer is where the StandardAnalyzer gets its stop words from.
Use the WhitespaceAnalyzer or create your own by inheriting from one that most closely suits your needs and modify it to be what you want.
Alternatively, if you like the StandardAnalyzer you can new one up with a custom stop word list:
//This is what the default stop word list is in case you want to use or filter this
var defaultStopWords = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
//create a new StandardAnalyzer with custom stop words
var sa = new StandardAnalyzer(
Version.LUCENE_29, //depends on your version
new HashSet<string> //pass in your own stop word list
{
"hello",
"world"
});

Hibernate Search with Lucene Query for first and last name together

I have attempted this with phrase, wildcard and keyword queries but nothing really works perfectly.
...
#Field(name = "firstLastName", index = org.hibernate.search.annotations.Index.YES, analyze = Analyze.NO, store = Store.NO)
public String getFirstLastName() {
return this.firstLastName;
}
...
Now I want to query this field and return the correct results if a user types John Smith, Smith John or Smith Jo* or John Smi*....
junction = junction.should(qb.keyword().wildcard().onField("firstLastName")
.matching("John Smith*").createQuery());
If I search for just Smith or John given a keyword query, I get a hit. I am not analyzing the field as I didn't think I needed to but I tried it both ways with no success...
Several issues here:
You need to use an analyzer, be it only to split the strings on whitespaces. Define an analyzer and assign it to your field.
You can't use wildcard queries if you want the strings to be analyzed: wildcard queries are not analyzed. You should use an EdgeNGramFilter instead.
This answer to a very similar question will probably help: Hibernate Search: How to use wildcards correctly?

how to search on part of string with Filterrific

I'm implementing the Filterrific gem for a tournament calendar application.
I took the code from the demo 'Student' application and adapted it to the needs of the tournament calendar application.
I noticed that the search function is searching on the beginning of the search string and not a part of the string.
For example, when I have a tournament called: 'Hamburger Michel 2016', it will find the tournament when I start my search query with 'ham', but when I type 'michel', it will not find the tournament.
I tried to solve this by replacing '*' with '%' in the search scope, like this:
terms = terms.map {|e|
e.gsub('%', '%') + '%').gsub(/%+/, %)
}
But that didn't solve the issue.
Is there a way to search on a part of a string instead of a literal string?
Thanks for your help,
Anthony
Start the search value with *. So, if you want to search for anything that contains michel use: *michel

finding a pattern in a set of string

Let's say I have a set of documents that contains a persons name like a driver's license, a passport, an invoce etc,.
From each document I have a process that using ocr(Optical Character Recognition) extracts the persons name from these documents. Since the extraction process may contain errors I need to find the "correct name" in that set of strings.
So lets I have the following strings as a persons name - "John" ; "J0hn" ; "JOHN"; "10hn";"+o-."; "john smith".
As a person you can tell that the person name is John as it is the most common occurrence.
What is the best way to do this? Is there an algorithm to find the most common occurrence in a set of string?

SQL query to bring all results regardless of punctuation with JSF

So I have a database with articles in them and the user should be able to search for a keyword they input and the search should find any articles with that word in it.
So for example if someone were to search for the word Alzheimer's I would want it to return articles with the word spell in any way regardless of the apostrophe so;
Alzheimer's
Alzheimers
results should all be returned. At the minute it is search for the exact way the word is spell and wont bring results back if it has punctuation.
So what I have at the minute for the query is:
private static final String QUERY_FIND_BY_SEARCH_TEXT = "SELECT o FROM EmailArticle o where UPPER(o.headline) LIKE :headline OR UPPER(o.implication) LIKE :implication OR UPPER(o.summary) LIKE :summary";
And the user's input is called 'searchText' which comes from the input box.
public static List<EmailArticle> findAllEmailArticlesByHeadlineOrSummaryOrImplication(String searchText) {
Query query = entityManager().createQuery(QUERY_FIND_BY_SEARCH_TEXT, EmailArticle.class);
String searchTextUpperCase = "%" + searchText.toUpperCase() + "%";
query.setParameter("headline", searchTextUpperCase);
query.setParameter("implication", searchTextUpperCase);
query.setParameter("summary", searchTextUpperCase);
List<EmailArticle> emailArticles = query.getResultList();
return emailArticles;
}
So I would like to bring back all results for alzheimer's regardless of weather their is an apostrophe or not. I think I have given enough information but if you need more just say. Not really sure where to go with it or how to do it, is it possible to just replace/remove all punctuation or just apostrophes from a user search?
In my point of view, you should change your query,
you should add alter your table and add a FULLTEXT index to your columns (headline, implication, summary).
You should also use MATCH-AGAINST rather than using LIKE query and most important, read about SOUNDEX() syntax, very beautiful syntax.
All I can give you is a native query example:
SELECT o.* FROM email_article o WHERE MATCH(o.headline, o.implication, o.summary) AGAINST('your-text') OR SOUNDEX(o.headline) LIKE SOUNDEX('your-text') OR SOUNDEX(o.implication) LIKE SOUNDEX('your-text') OR SOUNDEX(o.summary) LIKE SOUNDEX('your-text') ;
Though it won't give you results like Google search but it works to some extent. Let me know what you think.