Lucene query with same term exact, wildcard and fuzzy match

Lucene query with same term exact, wildcard and fuzzy match - lucene

I am trying to get Lucene results by Title first exact, then Title wildcard, then Title fuzzy, then by FieldX wildcard, then by FieldX fuzzy, ... but results are not in expected order (fuzzy results are higher).
Query:
Title:term^3 Title:term*^2 $"Title:term^1.9~ Field1:term*^1.6 Field1:term^1.5~ Field2:term*^1.4 Field2:term^1.3~...
How can I get needed results ordering with Lucene?

Related

PostgreSQL - Query keyword patterns in columns in table

we all know in SQL we can query a column (lets say, column "breeds") for a certain word like "dog" via a query like this:
select breeds
from myStackOverflowDBTable
where breeds = 'dog'
However, say I had many more columns with much more data, say millions of records, and I did not want to find a word, but rather the most common keyword pattern or wildcard expression, a query like this:
SELECT *
FROM myStackOverflowDBTable
WHERE address LIKE '%alb%'"
Is there an efficient way to find these 'patterns' inside the columns using SQL? I need to find the most common substring so-to-speak, per the query above, say the wildcard string "alb" appeared the most in a "location" column that had words like Albany, Albuquerque, Alabama, obviously querying the words directly would yield 0 results but querying on that wildcard keyword pattern would yield many, but I want to find the most repeating or most frequent wildcard/keyword pattern/regex expression/substring (however you want to define it) for a given column - is there an easy way to do this without querying a million test queries and doing it manually???

Well, if you want to find three character patterns, you could extract all 3-character patterns, aggregate and count:
select substr(t.address, gs.i, 3) as ngram_3, count(*)
from t cross join lateral
generate_series(1, length(address) - 3, 1) gs(i)
group by ngram_3
order by count(*) desc
limit 100;

Sql Server Contains search not Giving Result as expected

select * from table1 where contains(searchWord,"*comfort*")
I want result as
Uncomfortable
with search Word in between but it is showing
comfort xyz
only

You do not need contains function here. Searches for precise or fuzzy (less precise) matches to single words and phrases, words within a certain distance of one another, or weighted matches in SQL Server
You need simple predicate for the required result.
select * from table1 where searchWord like '%comfort%'

Oracle 'Contains' / 'Group' function return incorrect value

I have this query:
SELECT last_name, SCORE(1)
FROM Employees
WHERE CONTAINS(last_name, '%sul%', 1) > 0
It produces output below:
The question is:
Why does the SCORE(1) produce 9? As I recall that CONTAINS function returns number of occurrences of search_string (in this case '%sul%').
I expect the output should be:
Sullivan 1
Sully 1
But when I try this syntax:
SELECT last_name, SCORE(1)
FROM Employees
WHERE CONTAINS(last_name, 'sul', 1) >0;
It returns 0 rows selected.
And can someone please explain me what is the third parameter for?
Thanks in advance :)

The reason your second query is returning no rows is, you are looking for word sul in your search. Contains will not do pattern search unless you tell it to, it searches for words which you specified as your second paramter. To look for patterns, you will have to use wildcards, as you did in your first example.
Now, coming to the third parameter in CONTAINS - it is label and is just used to label the score operator. You should use the third parameter when you use SCORE in your SELECT list. It's importance is more clear when there are multiple SCORE operators
Quoting directly from documentaion
label
Specify a number to identify the score produced by the query.
Use this number to identify the CONTAINS clause which returns this
score.
Example
Single CONTAINS
When the SCORE operator is called (for example, in a SELECT clause),
the CONTAINS clause must reference the score label value as in the
following example:
SELECT SCORE(1), title from newsindex
WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
Multiple CONTAINS
Assume that a news database stores and indexes the title and body of
news articles separately. The following query returns all the
documents that include the words Oracle in their title and java in
their body. The articles are sorted by the scores for the first
CONTAINS (Oracle) and then by the scores for the second CONTAINS
(java).
SELECT title, body, SCORE(10), SCORE(20) FROM news WHERE CONTAINS
(news.title, 'Oracle', 10) > 0 OR CONTAINS (news.body, 'java', 20) > 0
ORDER BY SCORE(10), SCORE(20);

The Oracle Text Scoring Algorithm does not score by simply counting the number of occurrences. It uses an inverse frequency algorithm based on Salton's formula.
Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole.
Think of a google search. If you search for the term Oracle you will not find (directly) any result that may help to explain your scoring value questioning, so we can consider this term a "noise" to your expectations. But if you search for the term Oracle Text Scoring Algorithm you will find your answer in the first google result.
And about your other questionings, I think that #Incognito already gives them a good answer.

Lucene - search a subset with boolean query

I have an index with a field (eg. field1) with two rows, one is "short greg" and the second one is "great greg".
if I search with (using Luke) : field1:g* field1:greg
the result will be both rows but with the same score!
These because both words have the same initial character G.
My expectation is "great greg" with the maximum score, where G* give more weight to the score of "GREAT GREG".
the question is : how to write this query ?
thanks anyway

I'm not sure, but the identical score might be because your query is equivalent to:
field1:g* OR field1:greg
I would try in Luke:
+field1:g* +field1:greg
(which is equivalent to
field1:g* AND field1:greg

SQL :: How to find records with most common in search string (Tags field)

We have tbl_Articles:
id title tags
=================================
1 article1 science;
2 article2 art;
3 article3 sports;art;
I am looking for a query to return records from tbl_Articles which have most common words with a specific tags string (Ex: politics;art;):
EX: Select (something from tbl_articles) where Tags has common in "politics;art;"
Result:
tbl_Articles
id title tags
=================================
2 article2 art;
3 article3 sports;art;

Are you looking for this?
select a.*
from articles a
where ';'+tags+';' like '%;politics;%' and
';'+tags+';' like '%;art;%'
Notice that I use the separator at the beginning and end so you can have "art" and "smart" as tags.

You can accomplish this task using LIKE predicate BUT, Please note that it only works on Character Patterns.
I think you are looking for Full-Text Query. The Free Text not only returns the exact wording, but also the nearest meanings attached to it.
For more details check for the following types..
FREETEXT
FREETEXTTABLE
CONTAINS
CONTAINSTABLE
The Performance benefit of using Full-Text search can be best realized when querying against a large amount of unstructured text data. A LIKE query (for example, ‘%microsoft%’) against millions of rows of text data can take minutes to return; whereas a Full-Text query (for ‘microsoft’) can take only seconds or less against the same data, depending on the number of rows that are returned.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene query with same term exact, wildcard and fuzzy match - lucene

Related

PostgreSQL - Query keyword patterns in columns in table

Sql Server Contains search not Giving Result as expected

Oracle 'Contains' / 'Group' function return incorrect value

Lucene - search a subset with boolean query

SQL :: How to find records with most common in search string (Tags field)

Categories

Resources