I'm working in Lucene 4.6 and i'm trying to look for records that contains "keyword1" in "field1" and "keyword2" in "field2"
I wrote following query:
Query q = MultifieldQueryParser.parse(
Version.Lucene_46,
new String[] {keyword1, keyword2},
new String[]{"field1","field2"},
new StandardAnalyzer()
);
That gives me some results but I want to have something like %keyword1% , %keyword2% in SQL.
Thanks for your answers. In case I have a field with the value "Lucene Game Lucene" and I'm looking for that document using the keyword "Game" I can't get that result using keyword neither keyword Who have any idea about this?
You can use WildcardQuery. Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. \ is the escape character.
You can also use the wildcard as prefix, for example *nix, but that can very slow on large indexes, because Lucene needs to scan the entire list of Terms.
[edit]
If you need a prefix wildcard in the queryparser, make sure to call setAllowLeadingWildcard(true)
on the QueryParser As can be seen here
WildcardQuery in Lucene provides the possibility to search for keyword%. For the other way arount there is some work to be done during indexing. You need to index the terms in reversed form (in an other field) and perform the query drowyek%.
Related
I'm trying to create the fastest way to search millions (80+ mio) of records in a PostgreSQL (version 9.4), over multiple columns.
I would like to try and use standard PostgreSQL, and not Solr etc.
I'm currently testing Full Text Search followed https://blog.lateral.io/2015/05/full-text-search-in-milliseconds-with-postgresql/.
It works, but I would like some more flexible way to search.
Currently, if I have a column containing ex. "Volvo" and one containing "Blue" I am able to find the record with the search string "volvo blue", but I would like to also find the record using "volvo blu" as if I used LIKE and "%blu%'.
Is that possible with full text search?
The only option to something like this is by using the pg_trgm contrib module.
This enables you to create a GIN or GiST index that indexes all sequences of three characters, which can be used for a search with the similarity operator %.
Two notes:
Using the % operator may return “false positive” results, so be sure to add a second condition (e.g. with LIKE) that eliminates those.
A trigram search works well with longer search strings, but performs badly with short search strings because of the many false positive results.
If that is not good enough for your purposes, you'll have to resort to an third-party solution.
I'm using Examine in Umbraco to query Lucene index of content nodes. I have a field "completeNodeText" that is the concatenation of all the node properties (to keep things simple and not search across multiple fields).
I'm accepting user-submitted search terms. When the search term is multiple words (ie, "firstterm secondterm"), I want the resulting query to be an OR query: Bring me back results where fullNodeText is firstterm OR secondterm.
I want:
{+completeNodeText:"firstterm ? secondterm"}
but instead, I'm getting:
{+completeNodeText:"firstterm secondterm"}
If I search for "firstterm OR secondterm" instead of "firstterm secondterm", then the generated query is correctly: {+completeNodeText:"firstterm ? secondterm"}
I'm using the following API calls:
var searcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = searcher.CreateSearchCriteria();
var query = searchCriteria.Field("completeNodeText", term).Compile();
Is there an easy way to force Examine to generate this "OR" query? Or do I have to manually construct the raw query by calling the StandardAnalyzer to tokenize the user input and concatenating together a query by iterating through the tokens? And bypassing the entire Examine fluent query API?
I don't think that question mark means what you think it means.
It looks like you are generating a PhraseQuery, but you want two disjoint TermQueries. In Lucene query syntax, a phrase query is enclosed in quotes.
"firstterm secondterm"
A phrase query is looking for precisely that phrase, with the two terms appearing consecutively, and in order. Placing an OR within a phrase query does not perform any sort of boolean logic, but rather treats it as the word "OR". The question mark is a placeholder using in PhraseQuery.toString() to represent a removed stop word (See #Lucene-1396). You are still performing a phrasequery, but now it is expecting a three word phrase firstterm, followed by a removed stop word, followed by secondterm
To simply search for two separate terms, get rid of the quotes.
firstterm secondterm
Will search for any document with either of those terms (with higher score given to documents with both).
With multiple start points, if i provide the full name (value) of the key 'Name' the cypher query works.
This Works:
start n=node:na('NAME:("JERI, MICHAEL M", "ANDREW, TONNA", "JILLSO, DAVID")')
return n.NAME
Say, if i wish to use wildcards on Name key, something like this:
start n=node:na('NAME:("JERI*", "ANDREW*", "JILLSO*")')
return n.NAME
This doesn't work. It gives me zero rows.
It would be great if someone could help me with the correct way to achieve this.
I think this may be due to the double quotes, making Lucene query parser 3.6.2 (used in Neo4j 1.9) parse the terms as phrases instead of single terms. And wildcards are only supported for single terms, not phrases.
I am using lucene 2.9.2 (.NET doesnt have a lucene 3)
"tag:C#" Gets me the same results as "tag:c". How do i allow 'C#' to be a searchword? i tried changing Field.Index.ANALYZED to Field.Index.NOT_ANALYZED but that gave me no results.
I assuming i need to escape each tag, how might i do that?
The problem isn't the query, its the query analyzer you are using which is removing the "#" from both the query and (if you are using the same analyzer for insertion - which you should be) and the field.
You will need to find an analyzer that preserves special characters like that or write a custom one.
Edit: Check out KeywordAnalyzer - it might just do the trick:
"Tokenizes" the entire stream as a single token. This is useful for data like zip codes, ids, and some product names.
According to the Java Documentation for Lucene 2.9.2 '#' is not a special character, which needs escaping in the Query. Can you check out (i.e. by opening the index with Luke), how the value 'C#' is actually stored in the index?
Is there a way to perform a FULLTEXT search which returns literals found within words?
I have been using MATCH(col) AGAINST('+literal*' IN BOOLEAN MODE) but it fails if the text is like:
blah,blah,literal,blah
blahliteralblah
blah,blah,literal
Please Note that there is no space after commas.
I want all three cases above to be returned.
I think that should be better fetching the array of entries and then perform a text manipulation over the fetched data (in this case a search)!
Because any text manipulation or complex query take more resources and if your database contains a lot of data, the query become too slow! Moreover, if you are running your
query on a shared server, that increases the performance issues!
You can easily accomplish what you are trying to do with regex, once you have fetched the data from the database!
UPDATE: My suggestion is the same even if you are running your script on a dedicated server! However, if you want to perform a full-text search of the word "literal" in BOOLEAN MODE like you have described, you can remove the + operator (because you are searching only one word) and construct the query as follow:
SELECT listOfColumsNames WHERE
MATCH (colName)
AGAINST ('literal*' IN BOOLEAN MODE);
However, even if you add the AND operator, your query works fine: tested on Apache Server with MySQL 5.1!
I suggest you to read the documentation about the full-text search in boolean mode.
The only one problem of this query is that doesn't matches the word "literal" if it is a sub-string inside an other word, for example: "textliteraltext".
As you noticed, you can't use the * operator at the beginning of the word!
So, to accomplish what you are trying to do, the fastest and easiest way is to follow the suggestion of Paul, using the % placeholder:
SELECT listOfColumsNames
WHERE colName LIKE '%literal%';