Issue using wildcards in index with multiple start points - lucene

With multiple start points, if i provide the full name (value) of the key 'Name' the cypher query works.
This Works:
start n=node:na('NAME:("JERI, MICHAEL M", "ANDREW, TONNA", "JILLSO, DAVID")')
return n.NAME
Say, if i wish to use wildcards on Name key, something like this:
start n=node:na('NAME:("JERI*", "ANDREW*", "JILLSO*")')
return n.NAME
This doesn't work. It gives me zero rows.
It would be great if someone could help me with the correct way to achieve this.

I think this may be due to the double quotes, making Lucene query parser 3.6.2 (used in Neo4j 1.9) parse the terms as phrases instead of single terms. And wildcards are only supported for single terms, not phrases.

Related

SQL Server full text search and spaces

I have a column with a product names. Some names look like ‘ab-cd’ ‘ab cd’
Is it possible to use full text search to get these names when user types ‘abc’ (without spaces) ? The like operator is working for me, but I’d like to know if it’s possible to use full text search.
If you want to use FTS to find terms that are adjacent to each other, like words separated by a space you should use a proximity term.
You can define a proximity term by using the NEAR keyword or the ~ operator in the search expression, as documented here.
So if you want to find ab followed immediately by cd you could use the expression,
'NEAR((ab,cd), 0)'
searching for the word ab followed by the word cd with 0 terms in-between.
No, unfortunately you cannot make such search via full-text. You can only use LIKE in that case LIKE ('ab%c%')
EDIT1:
You can create a view (WITH SCHEMABINDING!) with some id and column name in which you want to search:
CREATE VIEW dbo.ftview WITH SCHEMABINDING
AS
SELECT id,
REPLACE(columnname,' ','') as search_string
FROM YourTable
Then create index
CREATE UNIQUE CLUSTERED INDEX UCI_ftview ON dbo.ftview (id ASC)
Then create full-text search index on search_string field.
After that you can run CONTAINS query with "abc*" search and it will find what you need.
EDIT2:
But it wont help if search_string does not start with your search term.
For example:
ab c d -> abcd and you search cd
No. Full Text Search is based on WORDS and Phrases. It does not store the original text. In fact, depending on configuration it will not even store all words - there are so called stop words that never go into the index. Example: in english the word "in" is not selective enough to be considered worth storing.
Some names look like ‘ab-cd’ ‘ab cd’
Those likely do not get stored at all. At least the 2nd example is actually 2 extremely short words - quite likely they get totally ignored.
So, no - full text search is not suitable for this.

Operator LIKE in LUCENE

I'm working in Lucene 4.6 and i'm trying to look for records that contains "keyword1" in "field1" and "keyword2" in "field2"
I wrote following query:
Query q = MultifieldQueryParser.parse(
Version.Lucene_46,
new String[] {keyword1, keyword2},
new String[]{"field1","field2"},
new StandardAnalyzer()
);
That gives me some results but I want to have something like %keyword1% , %keyword2% in SQL.
Thanks for your answers. In case I have a field with the value "Lucene Game Lucene" and I'm looking for that document using the keyword "Game" I can't get that result using keyword neither keyword Who have any idea about this?
You can use WildcardQuery. Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. \ is the escape character.
You can also use the wildcard as prefix, for example *nix, but that can very slow on large indexes, because Lucene needs to scan the entire list of Terms.
[edit]
If you need a prefix wildcard in the queryparser, make sure to call setAllowLeadingWildcard(true)
on the QueryParser As can be seen here
WildcardQuery in Lucene provides the possibility to search for keyword%. For the other way arount there is some work to be done during indexing. You need to index the terms in reversed form (in an other field) and perform the query drowyek%.

SQL exclusion regexp does not work, why?

I have some sentences in db. I want to select the ones that don't have urls in them.
So what I do is
select ID, SENTENCE
from SENTENCE_TABLE
where regexp_like(SENTENCE, '[^http]');
However after the query is executed the sentences that appear in the results pane still have urls. I tried a lot of other combinations without any success.
Can somebody explain or give a good link where it is explained how regexps actually work in SQL.
How can I filter(exclude) actual words in db with SQL query?
You're over-complicating this. Just use a standard LIKE.
select ID, SENTENCE
from SENTENCE_TABLE
where SENTENCE not like '%http%';
regexp_like(SENTENCE, '[^http]') will match everything but h, t and p separately. I like the PSOUG page on regular expressions in Oracle but I would also recommend reading the documentation.
To respond to your comment you can use REGEXP_LIKE, there's just no point.
select ID, SENTENCE
from SENTENCE_TABLE
where not regexp_like(SENTENCE, 'http');
This looks for the string http rather than the letters individually.
[^http] would match any character except h or t or t or p..So this would match any string that doesn't contain h or t or t or p anywhere in the string
It should be where not regexp_like(SENTENCE, '^http');..this would match anything that doesn`t start with http

How to find strings which are similar to given string in SQL server?

I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table.
For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table:
1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری
I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because:
I am using MS SQL Server not MySQL
my table contents are in Persian, so I can't use Levenshtein distance and similar methods
I prefer an SQL Server only solution, not an indexing or daemon based solution.
The best solution would be a solution which help us sort result by similarity, but, its optional.
Do you have any suggestion for that?
Thanks
MSSQL supports LIKE which seems like it should work. Is there a reason it's not suitable for your program?
SELECT * FROM table WHERE input LIKE '%مختار%'
Hmm.. considering that you read the other post you probably know about the like operator already... maybe your problem is "getting the string and searching for something similar"?
--This part searches for a string you want
declare #MyString varchar(max)
set #MyString = (Select column from table
where **LOGIC TO FIND THE STRING GOES HERE**)
--This part searches for that string
select searchColumn, ABS(Len(searchColumn) - Len(#MyString)) as Similarity
from table where data LIKE '%' + #MyString + '%'
Order by Similarity, searchColumn
The similarity part is something like the thing you posted. If the strings are "more similar" meaning that they have a similar length, they will be higher on the results query.
The absolute part can be avoided obviously but I did it just in case.
Hope that helps =-)
Besides like operator, you can use the condition WHERE instr(columnname, search) > 0; however this is generally slower. What it does is return the starting position of a string within another string. thus if searching in ABCDEFG for CD it would return 3. 3>0, so the record would be returned. However in the case you've described, like seems to be the best solution.
The general problem is that in languages where the same letter has different writing form in the beginning, middle and at the end of word, and thus - different codes - we can try to use specific Persian collations, but in general this will not help.
The second option - is to use SQL FTS abilities, but again - if it has not special language module for the language - it is much less useful.
And most general way - to use your own language processing - which is very complex task at all. The next keywords and google can help to understand the size of the problem: DLP, words and terms, bi-gramms, n-gramms, grammar and morphology inflection
Try to use the Built-in Soundex() And Difference() functions. I hope they work fine for Persian.
Look at the following reference:
http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/
Similarity() function helps you to sort result by similarity (as you asked in your question) and it is also possible using algorithms different from Levenshtein edit distance depends on the Value for #method Algorithm:
0 The Levenshtein edit distance algorithm
1 The Jaccard similarity coefficient algorithm
2 A form of the Jaro-Winkler distance algorithm
3 Longest common subsequence algorithm
Like operator may not do what he is asking for. Like for example, if i have a record value "please , i want to ask a question' in my database record. and lets say on my query, i want to find a match similarity like this 'Can i ask a question, please'. like operator may do this using like %[your senttence] or [your sentence]% but it is not advisable to use it for string similarity cos sentences may change and all your like logic may not fetch the matching records. It is advisable to use naive bayes text classification for similarities assigning labels to your sentences or you can try the semantic search function in MSSQL server

Search literal within a word

Is there a way to perform a FULLTEXT search which returns literals found within words?
I have been using MATCH(col) AGAINST('+literal*' IN BOOLEAN MODE) but it fails if the text is like:
blah,blah,literal,blah
blahliteralblah
blah,blah,literal
Please Note that there is no space after commas.
I want all three cases above to be returned.
I think that should be better fetching the array of entries and then perform a text manipulation over the fetched data (in this case a search)!
Because any text manipulation or complex query take more resources and if your database contains a lot of data, the query become too slow! Moreover, if you are running your
query on a shared server, that increases the performance issues!
You can easily accomplish what you are trying to do with regex, once you have fetched the data from the database!
UPDATE: My suggestion is the same even if you are running your script on a dedicated server! However, if you want to perform a full-text search of the word "literal" in BOOLEAN MODE like you have described, you can remove the + operator (because you are searching only one word) and construct the query as follow:
SELECT listOfColumsNames WHERE
MATCH (colName)
AGAINST ('literal*' IN BOOLEAN MODE);
However, even if you add the AND operator, your query works fine: tested on Apache Server with MySQL 5.1!
I suggest you to read the documentation about the full-text search in boolean mode.
The only one problem of this query is that doesn't matches the word "literal" if it is a sub-string inside an other word, for example: "textliteraltext".
As you noticed, you can't use the * operator at the beginning of the word!
So, to accomplish what you are trying to do, the fastest and easiest way is to follow the suggestion of Paul, using the % placeholder:
SELECT listOfColumsNames
WHERE colName LIKE '%literal%';