Is there a full match in Lucene? - lucene

The PhraseQuery still matches partial strings. For example:
Query: "a test"
String 1: "this is a test"
String 2: "a test"
I want my Query only matches String 1, not String 2. Is there a way to do this in Lucene?

Related

Exception when SQL query has multiple word string to W10 desktop search index

I've gobbled together a basic Powershell script to query W10's Windows Desktop Search (WDS) index. Here is the relevant bits,
$query = "
SELECT System.DateModified, System.ItemPathDisplay
FROM SystemIndex
WHERE CONTAINS(System.Search.Contents, '$($text)')
"
$objConnection = New-Object -ComObject adodb.connection
$objrecordset = New-Object -ComObject adodb.recordset
$objrecordset.CursorLocation = 3
$objconnection.open("Provider=Search.CollatorDSO;Extended Properties='Application=Windows';")
$objrecordset.open($query, $objConnection, $adOpenStatic)
Until now my tests have been using single words and everything works. But when I started using two words, it falls apart with the following error,
Searching for 'and then'...
SELECT System.DateModified, System.ItemPathDisplay
FROM SystemIndex
WHERE CONTAINS(System.Search.Contents, 'and then')
Exception from HRESULT: 0x80040E14
At D:\searchSystemIndex.ps1:72 char:1
+ $objrecordset.open($query, $objConnection, $adOpenStatic)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], COMException
+ FullyQualifiedErrorId : System.Runtime.InteropServices.COMException
Using Explorer to query the index using content:"and then" works fine.
Any ideas?
According to the documentation for Windows Search SQL Syntax and the examples in the CONTAINS predicate, if you want to search for a literal phrase with "multiple words or included spaces" you need to quote the phrase inside the query:
Type: Phrase
Description: Multiple words or included spaces.
Examples
...WHERE CONTAINS('"computer software"')
So in your example you probably want:
$text = "and then"
$query = "
SELECT System.DateModified, System.ItemPathDisplay
FROM SystemIndex
WHERE CONTAINS(System.Search.Contents, '`"$($text)`"')
"
# ^^ ^^
# quoted search phrase
(note the quotes are prefixed with a backtick as the quote would otherwise terminate your entire query string.)
If you're not looking for the exact phrase "and then", and you just want results that contain "and" and "then" it looks like you need to to do something like this:
Type: Boolean
Description: Words, phrases, and wildcard strings combined by using the Boolean operators AND, OR, or NOT. Enclose the Boolean terms in double quotation marks.
Example:
...WHERE CONTAINS('"computer monitor" AND "software program" AND "install component"')
...WHERE CONTAINS(' "computer" AND "software" AND "install" ' )
$query = "
SELECT System.DateModified, System.ItemPathDisplay
FROM SystemIndex
WHERE CONTAINS(System.Search.Contents, '`"and`" AND `"then`"')
# ^^^^^^^^^^^^^^^^^^^^^^
# multiple independent words
"

Need to use Replace in string with pentaho with REGEx

I want to use "Replace in String" step in Pentaho version 8 with the use of Regular expression. Which means my search string will contain a REGEX and "Replace With" in the step should contain "$10$3" which means it should replace with $1 value of regex append 0 in between and then $3 value of it.
ex: Input :- Ron-234-GR
Output :- Ron-2340-GR
Code for reference

How do I replace duplicate whitespaces in a String in Kotlin?

Say I have a string: "Test me".
how do I convert it to: "Test me"?
I've tried using:
string?.replace("\\s+", " ")
but it appears that \\s is an illegal escape in Kotlin.
replace function in Kotlin has overloads for either raw string and regex patterns.
"Test me".replace("\\s+", " ")
This replaces raw string \s+, which is the problem.
"Test me".replace("\\s+".toRegex(), " ")
This line replaces multiple whitespaces with a single space.
Note the explicit toRegex() call, which makes a Regex from a String, thus specifying the overload with Regex as pattern.
There's also an overload which allows you to produce the replacement from the matches. For example, to replace them with the first whitespace encountered, use this:
"Test\n\n me".replace("\\s+".toRegex()) { it.value[0].toString() }
By the way, if the operation is repeated, consider moving the pattern construction out of the repeated code for better efficiency:
val pattern = "\\s+".toRegex()
for (s in strings)
result.add(s.replace(pattern, " "))

Neo4j: Lucene phrase matching using Cypher (fuzzy)

In Lucene, a Phrase is a group of words surrounded by double quotes such as "hello dolly".
I would like to be able to do the CYPHER equivalent of this Lucene fuzzy query:
"hello dolly"~0.1
This finds my "hello dolly" node:
START n=node:node_auto_index("name:\"hello dolly\"~0.1") RETURN n
This doesn't:
START n=node:node_auto_index("name:\"hella dolly\"~0.1") RETURN n
Splitting the search phrase by whitespace into Single Terms does work:
START n=node:node_auto_index("name:hella~0.1 AND name:dolly~0.1") return n
However, my data might contain string like "HelloDolly" which I would like to have matched successfully with my "hello dolly" node.
EDIT:
Some other attempts:
START n=node:node_auto_index("name:hello\\ dolly") RETURN n
----> does work (finds my "hello dolly" node, but is not fuzzy
START n=node:node_auto_index("name:hello\\ dolly~0.00001") RETURN n
----> doesn't work (finds nothing)
Try this one:
START n=node:node_auto_index("name:hella\\ dolly~0.1") RETURN n
It's an old question but this may help others:
START n=node:node_auto_index('name:"hella dolly"~0.1') RETURN n

What is the difference between like vs contains in visual basic

I have this code:
If (string1 Like string2) AND string3.Contains(string4) Then
What is the difference of both?
I thought like is a contains but I am not sure... being a C# code.
Taking a look at the documentation, it would appear that the Like keyword has a bit more comparison logic than a simple .Contains() operation. The second string in the Like operation isn't just a string, but a pattern (like a regular expression). For example:
testCheck = "F" Like "[A-Z]"
In this operation testCheck will evaluate to True because the first string matches (or is included in) the pattern identified by the second string.
Like is more powerful as using pattern: http://msdn.microsoft.com/de-de/library/swf8kaxw.aspx (Compares a string against a pattern)
? Any single character
* Zero or more characters
# Any single digit (0–9)
[ charlist ] Any single character in charlist
[! charlist ] Any single character not in charlist