How to search prefix with "-" in aws structured query - asp.net-mvc-4

I am working with amazon cloud search api. I am using "prefix" operator to search and it's working fine at some level.
Working Copy:
AWSURL/search?q=
(and
(and searchtype:'Event' server:1 )
( and
( or eventownerid:70 eventcreatedbyuserid:70 )
)
(prefix 'event')
)&size=10&q.parser=structured
Non Working / issue
AWSURL/search?q=
(and
(and searchtype:'Event' server:1 )
( and
( or eventownerid:70 eventcreatedbyuserid:70 )
)
(prefix 'event-1')
)&size=10&q.parser=structured
When I passed search term with "-" it stop giving result.
I have also tried url.Encoded for special charter but no luck with that as well.
What to do to include "-" in prefix ?
Is there any way where i can search with any part of word in all
columns of document. for example the field contain text "this is
GreatSearchTech", I want this in result when i pass "Search" as
search term / word. ?
Thank you in advance.

Related

Lucene syntax as objects instead of query string in Azure search

I would like to send the filter as a syntax tree, and not a query string, to Azure search. Ist that possible?
All I can find is to send the filter as a string.
I have a filter syntax like ( State eq 1 ) or ( Domain eq 'Love' ) but I'd like to send it parameterised to Azure search instead of as a string.
(It's a security thing - I'd prefer not to have to escape/wash the indata but instead let Microsoft/Azure/Lucene take care of the details as they know more about the inner workings than I do.)
Basically: I'd like to
filter =
Or (
Equal( "State", stateValue ),
Equal( "FieldName", domainValue )
)
Instead of me doing it like
filter = $"( 'State' eq {MyStringEscapeFunction(stateValue)} ) " +
"or ( 'Love' eq {MyStringEscapeFunction(domainValue)} )"
Filters in Azure Cognitive Search must be specified via the $filter parameter using OData-syntax.
https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter
Your example filter is a valid OData filter. Provided that you have an index where State is a number and Domain is text.
$filter=(State eq 1) or (Domain eq 'Love')
If I understand your question correctly, you have an application where the values 1 and 'Love' are inputs from end users. The Azure Search API will validate that the filter values are valid according to the datatype. Other than that, you are responsible for validating input to your application.
For example, assume that your input parameters are s and d for State and Domain, respectively. You risk someone trying to manipulate your filter to return results you did not intend:
yourpage.aspx?s=1&d=Love%27%20or%20Domain%20eq%20%27Hate
This could potentially cause your $filter query to become:
$filter=(State eq 1) or (Domain eq 'Love' or Domain eq 'Hate')
You are responsible for implementing validation. You must build a layer that validates end-user inputs before using it in a $filter query. Here you can validate that end users' state and domain input are limited to valid values before creating an OData filter. See examples here:
https://learn.microsoft.com/en-us/aspnet/core/mvc/models/validation?view=aspnetcore-7.0

Amazon cloudsearch: Fuzzy search by field with boto3

How can do fuzzy search in Amazon Cloudsearch by field?
I tried it. But it's not works
cloudsearch.search(
query="1976~100",
queryParser='simple',
partial=True,
# queryOptions='{"fields":["passport_number"]}',
queryOptions='{"operators":["fuzzy"],"fields":["passport_number"]}',
returnFields="cognito_id,pk"
)
Also I tried this
cloudsearch.search(
query="(near field=passport_number '1976')",
queryParser='structured',
partial=True,
returnFields="cognito_id,pk"
)
But this not works also.
Can use lucene as query parser along with wildcards
cloudsearch.search(
query='passport_number:*1976*',
queryParser='lucene',
partial=True,
returnFields="cognito_id,pk"
)

Full text search returning too many irrelevant results and causing poor performance

I'm using the full text search feature from Postgres and for the most part it works fine.
I have a column in my database table called documentFts that is basically the ts_vector version of the body field, which is a text column, and that's indexed with GIN index.
Here's my query:
select
count(*) OVER() AS full_count,
id,
url,
(("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) as "finalScore",
ts_headline(\'english_unaccent\', title, websearch_to_tsquery($4, $1)) as title,
ts_headline(\'english_unaccent\', body, websearch_to_tsquery($4, $1)) as body,
"possibleEncoding",
"responseYear"
from "Entries"
where
"language" = $3 and
"documentFts" ## websearch_to_tsquery($4, $1)
order by (("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) desc limit 20 offset $2;
The dictionary is english_unaccent because I created one based on english that uses the unaccent extension by using:
CREATE TEXT SEARCH CONFIGURATION english_unaccent (
COPY = english
);
ALTER TEXT SEARCH CONFIGURATION english_unaccent
ALTER MAPPING FOR hword, hword_part, word WITH unaccent,
english_stem;
I did the same for other languages.
And then I did this to my Entries db:
ALTER TABLE "Entries"
ADD COLUMN "documentFts" tsvector;
UPDATE
"Entries"
SET
"documentFts" = (setweight(to_tsvector('english_unaccent', coalesce(title)), 'A') || setweight(to_tsvector('english_unaccent', coalesce(body)), 'C'))
WHERE
"language" = 'english';
I have a column in my table with the language of the entry, hence the "language" = 'english'.
So, the problem I'm having is that for words like animal, anime or animation, they all go into the vector as anim, which means that if I search for any of those words I get results with all of those variations.
That returns a HUGE dataset that causes the query to be quite slow compared to searches that return fewer items. And also, if I search for Anime, my first results contain Animal, Animated and the first result that has the word Anime is the 12th one.
Shouldn't animation be transformed to animat in the vector and animal just be animal as the other variations for it are animals or animalia?
I've been searching for a solution to this without much luck, is there any way I can improve this, I'm happy to install extensions, reindex the column or whatever.
There are so many little details to this. The best solution depends on the exact situation and exact requirements.
Two simple options:
Simple tweak 1
If you want to sort rows where title or body have a word starting with 'Anime' (exactly) in it, matched case-insensitively, add an ORDER BY expression like:
ORDER BY unaccent(concat_ws(' ', title, body) !~* ('\m' || f_regexp_escape($4))
, (("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) DESC
Where the auxiliary function f_regexp_escape() escapes special regexp characters and is defined here:
Escape function for regular expression or LIKE patterns
That expression is rather expensive, but since it's only applied to filtered results, the effect is limited.
You may have to fine-tune, as other search terms present other difficulties. Think of 'body' / 'bodies' stemming to 'bodi' ...
Simple tweak 2
To remove English stemming completely, base yours on the 'simple' TEXT SEARCH CONFIGURATION:
CREATE TEXT SEARCH CONFIGURATION simple_unaccent (
COPY = simple
);
Etc.
Then the actual language of the text is irrelevant.The index gets substantially bigger, and the search is done on literal spellings. You can now widen the search with prefix matching like:
WHERE "documentFts" ## to_tsquery('simple_unaccent', ($1 || ':*')
Again, you'll have to fine-tune. The simple example only works for single-word patterns. And I doubt you want to get rid of stemming altogether. Probably too radical.
See:
Get partial match from GIN indexed TSVECTOR column
Proper solution: Synonym dictionary
You need access to the installation drive of the Postgres server for this. So typically not possible with most hosted services.
To overrule some of the stemmer decisions, overrule with your own set of synonym(rule)s. Create a mapping file in $SHAREDIR/tsearch_data/my_synonyms.syn. That's /usr/share/postgresql/13/tsearch_data/my_synonyms.syn in my Linux installation:
Let it contain (case insensitive by default):
anime anime
Then:
CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
SYNONYMS = my_synonyms
);
There is a chapter with instructions in the manual. One quote:
A synonym dictionary can be used to overcome linguistic problems, for example, to prevent an English stemmer dictionary from reducing the word “Paris” to “pari”. It is enough to have a Paris paris line in the synonym dictionary and put it before the english_stem dictionary.
Then:
CREATE TEXT SEARCH CONFIGURATION my_english_unaccent (
COPY = english
);
ALTER TEXT SEARCH CONFIGURATION my_english_unaccent
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, my_synonym, english_stem; -- added my_synonym!
You have to update your column "documentFts" with my_english_unaccent. While being at it, use a proper lower-case column name like document_fts, and consider a GENERATED column. See:
Computed / calculated / virtual / derived columns in PostgreSQL
Are PostgreSQL column names case-sensitive?
Now, searching for Anime (or ánime, for that matter) won't find animal any more. And searching for animal won't find Anime.

Escaping special characters & Encoding unsafe and reserved characters Lucene query syntax Azure Search

I have words "C&&K", "So`am`I" , "Ant||Man", "A*B==AB", "Ant+Man" in index of azure search.
According to Doc for Escaping special characters + - && || ! ( ) { } [ ] ^ " ~ * ? : \ / I need to prefixing them with backslash (\) And for unsafe and reserved characters need to encode them in URL.
for "C&&K" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%5C%26%5C%26K~&queryType=full
for "So`am`I" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=So%5C%60am%5C%60I~&queryType=full
for "Ant||Man" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=A%5C*B%3D%3DAB~&queryType=full
for "A*B==AB" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=A%5C*B%3D%3DAB~&queryType=full
for "Ant+Man" my search url => /indexes/{index-name}/docs?api-version=2017-11-11&search=Ant%5C%2BMan~&queryType=full
For all off them I do not get search result. I get "value": []
for "C&&K" I have also tried
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%5C%26%26K~&queryType=full
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=C%26%5C%26K~&queryType=full
for "So`am`I" I have also tried
url => /indexes/{index-name}/docs?api-version=2017-11-11&search=So%60am%60I~&queryType=full
It does not work. What am I doing wrong here?
With standard analysis, all of these would be indexed as multiple terms. Fuzzy queries, however, are not analyzed, so it will attempt to find it as a single term. That is, when you index "Ant||Man", after analysis, you end up with the terms "ant" and "man" in the index. When you search for Ant||Man, it will analyze it in much the same way as at index time, but when searching for Ant||Man~, the query won't be analyzed, and since no terms like that exist in the index, you won't get any matches. Similarly, for "A*B==AB" you get the terms "b" and "ab" ("a" is a stop word with default analysis).
So, try the queries without the ~.
In addition to femtoRgon's response, you may want to consider using a custom analyzer that does not index these as multiple terms if you would always like them to be searchable as they are. There is documentation on custom analyzers here, and you can use the Analyze API to test to make sure a given analyzer works as you expect.

Sql Server Full Text Search not return exact searching word

I tried search word which is "alevi" on fulltext index column .
But it returns rows which include "alevi" and addition "alev" which is without suffix "alev".(alev-i in turkish lang.)
SELECT * FROM MYTABLE where (CONTAINS(MYCOLUMN,'alevi'))
i want to only return rows which include "alevi".
i tried freetext,N'alevi','"alevi"'
but it still return "alev"
i dont want to return with/without suffix /plural suffix.
Thanks in advice.
This is due to the stemming functionality of Full Text search, applied to your language. You can disable it by specifying a neutral language:
SELECT *
FROM MYTABLE
WHERE FREETEXT(MYCOLUMN, 'alevi', LANGUAGE 0x0)