I have a full-text index over a list of urls;
http://example.com:8080/Api
http://example.com:8080/Api/CustomController
http://example.com:8080/
And I am trying to search with a full-text in SQL
SELECT * FROM urls WHERE CONTAINS(urls.Url, '"http://localhost:8080/"')
But that gives me;
http://example.com:8080/Api
http://example.com:8080/Api/CustomController
http://example.com:8080/
when I only want
http://example.com:8080/
How can I do this? I still need the capabilities that CONTAINS offers, like the OR, AND and *-wildcard
You cannot perform an exact match with contains.
You could rewrite your query as such:
SELECT * FROM urls WHERE urls.Url = 'http://localhost:8080/'
OR CONTAINS(...)
Related
I'm using the full text search feature from Postgres and for the most part it works fine.
I have a column in my database table called documentFts that is basically the ts_vector version of the body field, which is a text column, and that's indexed with GIN index.
Here's my query:
select
count(*) OVER() AS full_count,
id,
url,
(("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) as "finalScore",
ts_headline(\'english_unaccent\', title, websearch_to_tsquery($4, $1)) as title,
ts_headline(\'english_unaccent\', body, websearch_to_tsquery($4, $1)) as body,
"possibleEncoding",
"responseYear"
from "Entries"
where
"language" = $3 and
"documentFts" ## websearch_to_tsquery($4, $1)
order by (("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) desc limit 20 offset $2;
The dictionary is english_unaccent because I created one based on english that uses the unaccent extension by using:
CREATE TEXT SEARCH CONFIGURATION english_unaccent (
COPY = english
);
ALTER TEXT SEARCH CONFIGURATION english_unaccent
ALTER MAPPING FOR hword, hword_part, word WITH unaccent,
english_stem;
I did the same for other languages.
And then I did this to my Entries db:
ALTER TABLE "Entries"
ADD COLUMN "documentFts" tsvector;
UPDATE
"Entries"
SET
"documentFts" = (setweight(to_tsvector('english_unaccent', coalesce(title)), 'A') || setweight(to_tsvector('english_unaccent', coalesce(body)), 'C'))
WHERE
"language" = 'english';
I have a column in my table with the language of the entry, hence the "language" = 'english'.
So, the problem I'm having is that for words like animal, anime or animation, they all go into the vector as anim, which means that if I search for any of those words I get results with all of those variations.
That returns a HUGE dataset that causes the query to be quite slow compared to searches that return fewer items. And also, if I search for Anime, my first results contain Animal, Animated and the first result that has the word Anime is the 12th one.
Shouldn't animation be transformed to animat in the vector and animal just be animal as the other variations for it are animals or animalia?
I've been searching for a solution to this without much luck, is there any way I can improve this, I'm happy to install extensions, reindex the column or whatever.
There are so many little details to this. The best solution depends on the exact situation and exact requirements.
Two simple options:
Simple tweak 1
If you want to sort rows where title or body have a word starting with 'Anime' (exactly) in it, matched case-insensitively, add an ORDER BY expression like:
ORDER BY unaccent(concat_ws(' ', title, body) !~* ('\m' || f_regexp_escape($4))
, (("urlScore" / 100) + ts_rank("documentFts", websearch_to_tsquery($4, $1))) DESC
Where the auxiliary function f_regexp_escape() escapes special regexp characters and is defined here:
Escape function for regular expression or LIKE patterns
That expression is rather expensive, but since it's only applied to filtered results, the effect is limited.
You may have to fine-tune, as other search terms present other difficulties. Think of 'body' / 'bodies' stemming to 'bodi' ...
Simple tweak 2
To remove English stemming completely, base yours on the 'simple' TEXT SEARCH CONFIGURATION:
CREATE TEXT SEARCH CONFIGURATION simple_unaccent (
COPY = simple
);
Etc.
Then the actual language of the text is irrelevant.The index gets substantially bigger, and the search is done on literal spellings. You can now widen the search with prefix matching like:
WHERE "documentFts" ## to_tsquery('simple_unaccent', ($1 || ':*')
Again, you'll have to fine-tune. The simple example only works for single-word patterns. And I doubt you want to get rid of stemming altogether. Probably too radical.
See:
Get partial match from GIN indexed TSVECTOR column
Proper solution: Synonym dictionary
You need access to the installation drive of the Postgres server for this. So typically not possible with most hosted services.
To overrule some of the stemmer decisions, overrule with your own set of synonym(rule)s. Create a mapping file in $SHAREDIR/tsearch_data/my_synonyms.syn. That's /usr/share/postgresql/13/tsearch_data/my_synonyms.syn in my Linux installation:
Let it contain (case insensitive by default):
anime anime
Then:
CREATE TEXT SEARCH DICTIONARY my_synonym (
TEMPLATE = synonym,
SYNONYMS = my_synonyms
);
There is a chapter with instructions in the manual. One quote:
A synonym dictionary can be used to overcome linguistic problems, for example, to prevent an English stemmer dictionary from reducing the word “Paris” to “pari”. It is enough to have a Paris paris line in the synonym dictionary and put it before the english_stem dictionary.
Then:
CREATE TEXT SEARCH CONFIGURATION my_english_unaccent (
COPY = english
);
ALTER TEXT SEARCH CONFIGURATION my_english_unaccent
ALTER MAPPING FOR hword, hword_part, word
WITH unaccent, my_synonym, english_stem; -- added my_synonym!
You have to update your column "documentFts" with my_english_unaccent. While being at it, use a proper lower-case column name like document_fts, and consider a GENERATED column. See:
Computed / calculated / virtual / derived columns in PostgreSQL
Are PostgreSQL column names case-sensitive?
Now, searching for Anime (or ánime, for that matter) won't find animal any more. And searching for animal won't find Anime.
To make it clearer I have this fields
Columntobesearch
aword1 bword1
aword2 bword2
aword3 bword4
Now what I want to do is search using the sql wild card so what I did is like this
%searchbox%
I placed to wildcards on both ends of my search but what it searches is just the first word on the field
when I search 'aword' all of the fields is showing but when I search 'bword' nothing is showing, Please help.
Here is my Full Code
$Input=Input::all();
$makethis=Input::flash();
$soptions=Input::get('soptions');
$searchbox=Input::get('searchbox');
$items = Gamefarm::where('roost_hen', '=',Input::get('sex'))
->where($soptions, 'LIKE','%' . $searchbox . '%')
->paginate(12);
If you use mysql you can try this:
<?php
$q = Input::get('searchbox');
$results = DB::table('table')
->whereRaw("MATCH(columntobesearch) AGAINST(? IN BOOLEAN MODE)",
array($q)
)->get();
Ofcourse you need to prepare your table for full text search in your migration file with
DB::statement('ALTER TABLE table ADD FULLTEXT search(columntobesearch)');
Any way, this is not the more scalable nor efficient way to do FTS.
For a scalable and reliable full text search I strongly recommend you see elasticsearch and implement any Laravel package to this task
I tried search word which is "alevi" on fulltext index column .
But it returns rows which include "alevi" and addition "alev" which is without suffix "alev".(alev-i in turkish lang.)
SELECT * FROM MYTABLE where (CONTAINS(MYCOLUMN,'alevi'))
i want to only return rows which include "alevi".
i tried freetext,N'alevi','"alevi"'
but it still return "alev"
i dont want to return with/without suffix /plural suffix.
Thanks in advice.
This is due to the stemming functionality of Full Text search, applied to your language. You can disable it by specifying a neutral language:
SELECT *
FROM MYTABLE
WHERE FREETEXT(MYCOLUMN, 'alevi', LANGUAGE 0x0)
So I have a database with articles in them and the user should be able to search for a keyword they input and the search should find any articles with that word in it.
So for example if someone were to search for the word Alzheimer's I would want it to return articles with the word spell in any way regardless of the apostrophe so;
Alzheimer's
Alzheimers
results should all be returned. At the minute it is search for the exact way the word is spell and wont bring results back if it has punctuation.
So what I have at the minute for the query is:
private static final String QUERY_FIND_BY_SEARCH_TEXT = "SELECT o FROM EmailArticle o where UPPER(o.headline) LIKE :headline OR UPPER(o.implication) LIKE :implication OR UPPER(o.summary) LIKE :summary";
And the user's input is called 'searchText' which comes from the input box.
public static List<EmailArticle> findAllEmailArticlesByHeadlineOrSummaryOrImplication(String searchText) {
Query query = entityManager().createQuery(QUERY_FIND_BY_SEARCH_TEXT, EmailArticle.class);
String searchTextUpperCase = "%" + searchText.toUpperCase() + "%";
query.setParameter("headline", searchTextUpperCase);
query.setParameter("implication", searchTextUpperCase);
query.setParameter("summary", searchTextUpperCase);
List<EmailArticle> emailArticles = query.getResultList();
return emailArticles;
}
So I would like to bring back all results for alzheimer's regardless of weather their is an apostrophe or not. I think I have given enough information but if you need more just say. Not really sure where to go with it or how to do it, is it possible to just replace/remove all punctuation or just apostrophes from a user search?
In my point of view, you should change your query,
you should add alter your table and add a FULLTEXT index to your columns (headline, implication, summary).
You should also use MATCH-AGAINST rather than using LIKE query and most important, read about SOUNDEX() syntax, very beautiful syntax.
All I can give you is a native query example:
SELECT o.* FROM email_article o WHERE MATCH(o.headline, o.implication, o.summary) AGAINST('your-text') OR SOUNDEX(o.headline) LIKE SOUNDEX('your-text') OR SOUNDEX(o.implication) LIKE SOUNDEX('your-text') OR SOUNDEX(o.summary) LIKE SOUNDEX('your-text') ;
Though it won't give you results like Google search but it works to some extent. Let me know what you think.
I would like to use Lucene to index/search text. The text can contain mistyped words, names, etc. What is the most simple way of getting Lucene to find a document containing
"this is Licene"
when user searches for
"Lucene"?
This is only for a demo app, so we need the most simple solution.
Lucene's fuzzy queries and based on Levenshtein edit distance.
Use a fuzzy query in the QueryParser, with syntax like:
Lucene~0.5
Or create a FuzzyQuery, passing in the maximum number of edits, something like:
Query query = new FuzzyQuery(new Term("field", "lucene"), 1);
Note: FuzzyQuery, in Lucene 4.x, does not support greater edit distances than 2.
Another option you could try is using the Lucene SpellChecker:
http://lucene.apache.org/core/6_4_0/suggest/org/apache/lucene/search/spell/SpellChecker.html
It is a out of box, and very easy to use:
SpellChecker spellchecker = new SpellChecker(spellIndexDirectory);
// To index a field of a user index:
spellchecker.indexDictionary(new LuceneDictionary(my_lucene_reader, a_field));
// To index a file containing words:
spellchecker.indexDictionary(new PlainTextDictionary(new File("myfile.txt")));
String[] suggestions = spellchecker.suggestSimilar("misspelt", 5);
By default, it is using the LevensteinDistance, but you could provide your own customized Edit Distance.