Sql Server Contains search not Giving Result as expected - sql

select * from table1 where contains(searchWord,"*comfort*")
I want result as
Uncomfortable
with search Word in between but it is showing
comfort xyz
only

You do not need contains function here. Searches for precise or fuzzy (less precise) matches to single words and phrases, words within a certain distance of one another, or weighted matches in SQL Server
You need simple predicate for the required result.
select * from table1 where searchWord like '%comfort%'

Related

Parsing location from a search query in postgresql

I have a table of location data that is stored in json format with an attributes column that contains data as below:-
{
"name" : "Common name of a place or a postcode",
"other_name":"Any aliases",
"country": "country"
}
This is indexed as follows:-
CREATE INDEX location_jsonb_ts_vector
ON location
USING gin (jsonb_to_tsvector('simple'::regconfig, attributes,'["string","numeric"]'::jsonb));
I can search this for a location using the query:-
SELECT *
FROM location
WHERE jsonb_to_tsvector('simple'::regconfig, attributes, '["string", "numeric"]'::jsonb) ## plainto_tsquery('place name')
This works well if just using place names. But I want to search using more complex text strings such as:-
'coffee shops with wifi near charing cross'
'all restaurants within 10 miles of swindon centre'
'london nightlife'
I want to get the location found first and then strip it from the search text and go looking for the items in other tables using my location record to narrow down the scope.
This does not work with my current search mechanism as the intent and requirement pollute the text search vector and can cause odd results. I know this is a NLP problem and needs proper parsing of the search string, but this is for a small proof of concept and needs to work entirely in postgres via SQL or PL/PGSQL.
How can I modify my search to get better matches? I've tried splitting into keywords and looking for them individually, but they risk not bring back results unless combined. For example; "Kings Cross" will bring back "Kings".
I've come up with a cheap and cheerful solution:-
WITH tsv AS (
SELECT to_tsquery('english', 'football | matches | in | swindon') AS search_vector,
'football matches in swindon' AS search_text
)
SELECT * FROM
(
SELECT attributes,
position(lower(ATTRIBUTES->>'name1') IN lower(search_text)) AS name1_position
FROM location,tsv
WHERE jsonb_to_tsvector('simple'::regconfig, attributes, '["string", "numeric"]'::jsonb) ## search_vector
) loc
ORDER BY name1_position DESC

SQL query find few strings in diferent columns in a table row (restrictive)

I have a table like this one (in a SQL SERVER):
field_name
field_descriptor
tag1
tag2
tag3
tag4
tag5
house
your home
home
house
null
null
null
car
first car
car
wheel
null
null
null
...
...
...
...
...
...
...
I'm developing a WIKI with a searchbar, which should be able to handle a query with more than one string for search. As an user enters a second string (spaced) the query should be able to return results that match restrictively the two strings (if exists) in any column, and so with a three string search.
Easy to do for one string with a simple SELECT with ORs.
Tried in the fronted in JS with libraries like match-sorter but it's heavy with a table with more than 100,000 results and more in the future.
I thought the query should do the heavy work, but maybe there is no simple way doing it.
Thanks in advance!
Tried to do the heavy work with all results in frontend with filtering and other libraries like match-sorter. Works but take several seconds and blocks the front.
Tried to create a simple OR/AND query but the posibilities with 3 search-strings (could be 1, 2 or 3) matching any column to any other possibility is overwhelming.
You can use STRING_SPLIT to get a separate row per search word from the search words string. Then only select rows where all search words have a match.
The query should look like this:
select *
from mytable t
where exists
(
select null
from (select value from string_split(#search, ' ')) search
having min(case when search.value in (t.tag1, t.tag2, t.tag3, t.tag4, t.tag5) then 1 else 0 end) = 1
);
Unfortunately, SQL Server seems to have a flaw (or even a bug) here and reports:
Msg 8124 Level 16 State 1 Line 8
Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.
Demo: https://dbfiddle.uk/kNL1PVOZ
I don't have more time at hand right now, so you may use this query as a starting point to get the final query.

PostgreSQL - Query keyword patterns in columns in table

we all know in SQL we can query a column (lets say, column "breeds") for a certain word like "dog" via a query like this:
select breeds
from myStackOverflowDBTable
where breeds = 'dog'
However, say I had many more columns with much more data, say millions of records, and I did not want to find a word, but rather the most common keyword pattern or wildcard expression, a query like this:
SELECT *
FROM myStackOverflowDBTable
WHERE address LIKE '%alb%'"
Is there an efficient way to find these 'patterns' inside the columns using SQL? I need to find the most common substring so-to-speak, per the query above, say the wildcard string "alb" appeared the most in a "location" column that had words like Albany, Albuquerque, Alabama, obviously querying the words directly would yield 0 results but querying on that wildcard keyword pattern would yield many, but I want to find the most repeating or most frequent wildcard/keyword pattern/regex expression/substring (however you want to define it) for a given column - is there an easy way to do this without querying a million test queries and doing it manually???
Well, if you want to find three character patterns, you could extract all 3-character patterns, aggregate and count:
select substr(t.address, gs.i, 3) as ngram_3, count(*)
from t cross join lateral
generate_series(1, length(address) - 3, 1) gs(i)
group by ngram_3
order by count(*) desc
limit 100;

Oracle 'Contains' / 'Group' function return incorrect value

I have this query:
SELECT last_name, SCORE(1)
FROM Employees
WHERE CONTAINS(last_name, '%sul%', 1) > 0
It produces output below:
The question is:
Why does the SCORE(1) produce 9? As I recall that CONTAINS function returns number of occurrences of search_string (in this case '%sul%').
I expect the output should be:
Sullivan 1
Sully 1
But when I try this syntax:
SELECT last_name, SCORE(1)
FROM Employees
WHERE CONTAINS(last_name, 'sul', 1) >0;
It returns 0 rows selected.
And can someone please explain me what is the third parameter for?
Thanks in advance :)
The reason your second query is returning no rows is, you are looking for word sul in your search. Contains will not do pattern search unless you tell it to, it searches for words which you specified as your second paramter. To look for patterns, you will have to use wildcards, as you did in your first example.
Now, coming to the third parameter in CONTAINS - it is label and is just used to label the score operator. You should use the third parameter when you use SCORE in your SELECT list. It's importance is more clear when there are multiple SCORE operators
Quoting directly from documentaion
label
Specify a number to identify the score produced by the query.
Use this number to identify the CONTAINS clause which returns this
score.
Example
Single CONTAINS
When the SCORE operator is called (for example, in a SELECT clause),
the CONTAINS clause must reference the score label value as in the
following example:
SELECT SCORE(1), title from newsindex
WHERE CONTAINS(text, 'oracle', 1) > 0 ORDER BY SCORE(1) DESC;
Multiple CONTAINS
Assume that a news database stores and indexes the title and body of
news articles separately. The following query returns all the
documents that include the words Oracle in their title and java in
their body. The articles are sorted by the scores for the first
CONTAINS (Oracle) and then by the scores for the second CONTAINS
(java).
SELECT title, body, SCORE(10), SCORE(20) FROM news WHERE CONTAINS
(news.title, 'Oracle', 10) > 0 OR CONTAINS (news.body, 'java', 20) > 0
ORDER BY SCORE(10), SCORE(20);
The Oracle Text Scoring Algorithm does not score by simply counting the number of occurrences. It uses an inverse frequency algorithm based on Salton's formula.
Inverse frequency scoring assumes that frequently occurring terms in a document set are noise terms, and so these terms are scored lower. For a document to score high, the query term must occur frequently in the document but infrequently in the document set as a whole.
Think of a google search. If you search for the term Oracle you will not find (directly) any result that may help to explain your scoring value questioning, so we can consider this term a "noise" to your expectations. But if you search for the term Oracle Text Scoring Algorithm you will find your answer in the first google result.
And about your other questionings, I think that #Incognito already gives them a good answer.

search criteria difference between Like vs Contains() in oracle

I created a table with two columns.I inserted two rows.
id name
1 narsi reddy
2 narei sia
one is simply number type and another one is CLOB type.So i decided to use indexing on that. I queried on that by using contains.
query:
select * from emp where contains(name,'%a%e%')>0
2 narei sia
I expected 2 would come,but not. But if i give same with like it's given what i wanted.
query:
select * from emp where name like '%a%e%'
ID NAME
1 (CLOB) narsi reddy
2 (CLOB) narei sia
2 rows selected
finally i understood that like is searching whole document or paragraph but contains is looking in words.
so how can i get required output?
LIKE and CONTAINS are fundamentally different methods for searching.
LIKE is a very simple string pattern matcher - it recognises two wildcards (%) and (_) which match zero-or-more, or exactly-one, character respectively. In your case, %a%e% matches two records in your table - it looks for zero or more characters followed by a, followed by zero or more characters followed by e, followed by zero or more characters. It is also very simplistic in its return value: it either returns "matched" or "not matched" - no shades of grey.
CONTAINS is a powerful search tool that uses a context index, which builds a kind of word tree which can be searched using the CONTAINS search syntax. It can be used to search for a single word, a combination of words, and has a rich syntax of its own, such as boolean operators (AND, NEAR, ACCUM). It is also more powerful in that instead of returning a simple "matched" or "not matched", it returns a "score", which can be used to rank results in order of relevance; e.g. CONTAINS(col, 'dog NEAR cat') will return a higher score for a document where those two words are both found close together.
I believe that your CONTAINS query is matching 'narei sia' because the pattern '%a%e%' matches the word 'narei'. It does not match against 'narsi reddy' because neither word, taken individually, matches the pattern.
I assume you want to use CONTAINS instead of LIKE for performance reasons. I am not by any means an expert on CONTAINS query expressions, but I don't see a simple way to do the exact search you want, since you are looking for letters that can be in the same word or different words, but must occur in a given order. I think it may be best to do a combination of the two techniques:
WHERE CONTAINS(name,'%a% AND %e%') > 0
AND name LIKE '%a%e%'
I think this would allow the text index to be used to find candidate matches (anything which has at least one word containing 'a' and at least one word containing 'e'). These would would then be filtered by the LIKE condition, enforcing the requirement that 'a' precede 'e' in the string.