Dictionary-like search using SQL Server FULL-TEXT - sql

I am trying to create a dictionary for my website.
Searching for 'server' using FREETEXTTable & Rank DESC returns:
name server - A program or server that maps human-readable names..
server - One who serves; a waitress or waiter.
server - A tray for dishes; a salver.
4...
'server' is obviously closer to 'server' than 'name server'. How do I fix the ranking?
I can not just reverse to ASC because there are even worse matches.
Top 3 results for 'God' are 'act of God', 'Lamb of God', 'Le God'..
Edit: Sorry for any confusion. nameserver, server, server.. are in a single column called 'word' this is the column that is queried with full-text search. The definitions are in the next column 'definition' and returned as query results.

I think you can use union to solve the problem of your result ordering problem ..
like
select * from your_table_name where col_name = 'server'
union
select * from your_table_name where col_name like '%server%' order by col1,col2..
this query should give you first row with full text search and then with partial search ..
clarification ..
please note that by col_name i meant to say about the column name what you have for your words..
say your table structure is ..
dictionary-
( c_word ,
c_definition,
c_synonyms
)
then you have to modify my query as
select * from Dictionary where c_word = 'server'
union
select * from Dictionary where c_word like '%server%' order by c_definition,c_synonyms
so that this query will show first where c_word value exactly match the word 'server' followed by the partial search..
for dynamic query-- you need to replace 'server' with the variable where you are getting requested keyword for search .

I used the PATINDEX function to do this at one point. Something like the following:
SELECT
Word,
Definition
FROM
FREETEXTTABLE(Dictionary, Word, #search, 20) AS Matches
INNER JOIN Dictionary ON
Matches.Key = Dictionary.ID
ORDER BY
CASE PATINDEX('%' + #search + '%', Word)
WHEN -1 THEN 1000
ELSE PATINDEX('%' + #search + '%', Word)
END
It doesn't perform too terribly since you're using the full text index to get a smaller result set (max 20 as well, in this case). PATINDEX finds a string within an expression. If the search string doesn't exist within the expression, it returns -1. This might occur if you're also searching definitions, if your search matches on a synonym or stemmed word (eg: you search for "took" so "take" is returned), or if your search involves multiple words. The CASE statement sorts those results to the end.

Related

Full Text Search Using Multiple Partial Words

I have a sql server database that has medical descriptions in it. I've created a full text index on it, but I'm still figuring out how this works.
The easiest example to give is if there is a description of Hypertensive heart disease
Now they would like to be able to type hyp hea as a search term and have it return that.
So from what I've read it seems like my query needs to be something like
DECLARE #Term VARCHAR(100)
SET #Term = 'NEAR(''Hyper*'',''hea*'')'
SELECT * FROM Icd10Codes WHERE CONTAINS(Description, #Term)
If I take the wild card out for Hypertensive and heart, and type out the full words it works, but adding the wild card in returns nothing.
If it makes any difference I'm using Sql Server 2017
So it was a weird syntax issue that didn't cause an error, but stopped the search from working.
I changed it to
SELECT * FROM Icd10Codes where CONTAINS(description, '"hyper*" NEAR "hea*"')
The key here being I needed double quotes " and not to single quotes. I assumed it was two single quotes, the first to escape the second, but it was actually double quotes. The above query returns the results exactly as expected.
this will work:
SELECT * FROM Icd10Codes where SOUNDEX(description)=soundex('Hyp');
SELECT * FROM Icd10Codes where DIFFERENCE(description,'hyp hea')>=2;
You could try a like statement. You can find a thorough explanation here.
Like so:
SELECT * FROM Icd10Codes WHERE Icd10Codes LIKE '%hyp hea%';
And then instead of putting the String in there just use a variable.
If you need to search for separated partial words, as in an array of search terms, it gets a bit tricky, since you need to dynamically build the SQL statement.
MSSQL provides a few features for full text search. You can find those here. One of them is the CONTAINS keyword:
SELECT column FROM table WHERE CONTAINS (column , 'string1 string2 string3');
For me - this had more mileage.
create a calculated row with fields as full text search.
fullname / company / lastname all searchable.
ALTER TABLE profiles ADD COLUMN fts tsvector generated always as (to_tsvector('english', coalesce(profiles.company, '') || ' ' || coalesce(profiles.broker, '') || ' ' || coalesce(profiles.firstname, '') || ' ' || coalesce(profiles.lastname, '') || ' ' )) stored;
let { data, error } = await supabase.from('profiles')
.select()
.textSearch('fts',str)

Find substring in string

Is it possible to check if a specific substring which is in SQL Server column, is contained in a user provided string?
Example :
SELECT * FROM Table WHERE 'random words to check, which are in a string' CONTAINS Column
From my understanding, CONTAINS can't do such kind of search.
EDIT :
I have a fully indexed text and would like to search (by the fastest method) if a string provided by me contains words that are present in a column.
You can use LIKE:
SELECT * FROM YourTable t
WHERE 'random words ....' LIKE '%' + t.column + '%'
Or
SELECT * FROM YourTable t
WHERE t.column LIKE '%random words ....%'
Depends what did you mean, first one select the records that the column has a part of the provided string. The second one is the opposite.
Just use the LIKE syntax together with % around the string you are looking for:
SELECT
*
FROM
table
WHERE
Column LIKE '%some random string%'
This will return all rows in the table table in which the column Column contains the text "some random string".
1) If you want to get data starting with some letter you can use % this operator like this in your where clause
WHERE
Column LIKE "%some random string"
2) If you want to get data contains any letter you can use
WHERE
Column LIKE "%some random string%"
3)if you want to get data ending with some letter you can use
WHERE
Column LIKE "some random string%"

Get total number of user where username have defferrent case

I have SQL table where username have different cases for example "ACCOUNTS\Ninja.Developer" or "ACCOUNTS\ninja.developer"
I want to find the how many records where username where first in first and last name capitalize ? how can use Regex to find the total ?
x table
User
"ACCOUNTS\James.McAvoy"
"ACCOUNTS\michael.fassbender"
"ACCOUNTS\nicholas.hoult"
"ACCOUNTS\Oscar.Isaac"
Do you want something like this?
select count(*)
from t
where name rlike 'ACCOUNTS\[A-Z][a-z0-9]*[.][A-Z][a-z0-9]*'
Of course, different databases implement regular expressions differently, so the actual comparator may not be rlike.
In SQL Server, you can do:
select count(*)
from t
where name like 'ACCOUNTS\[A-Z][^.][.][A-Z]%';
You might need to be sure that you have a case-sensitive collation.
In most cases in MS SQL string collation is case insensitive so we need some trick. Here is an example:
declare #accts table(acct varchar(100))
--sample data
insert #accts values
('ACCOUNTS\James.McAvoy'),
('ACCOUNTS\michael.fassbender'),
('ACCOUNTS\nicholas.hoult'),
('ACCOUNTS\Oscar.Isaac')
;with accts as (
select
--cleanup and split values
left(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)-1) frst,
right(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)) last
from #accts
)
,groups as (--add comparison columns
select frst, last,
case when CAST(frst as varbinary(max)) = CAST(lower(frst) as varbinary(max)) then 'lower' else 'Upper' end frstCase, --circumvert case insensitive
case when CAST(last as varbinary(max)) = CAST(lower(last) as varbinary(max)) then 'lower' else 'Upper' end lastCase
from accts
)
--and gather fruit
select frstCase, lastCase, count(frst) cnt
from groups
group by frstCase,lastCase
Your question is a little vague but;
You might be looking for the DISTINCT command.
REF
I don't think you need regex.
Maybe do something like:
Get distinct names from Table X as Table A
Use inputs table A as where clause on Table X
count
union
I hope this helps,
Rhys
Given your example set you can use a combination of techniques. First if the user name always begins with "ACCOUNTS\" then you can use substr to select the characters that start after the "\" character.
For the first name:
Then you can use a regex function to see if it matches against [A-Z] or [a-z] assuming your username must start with an alpha character.
For the last name:
Use the instr function on the substr and search for the character '.' and again apply the regex function to match against [A-Z] or [a-z] to see if the last name starts with an upper or a lower character.
To total:
Select all matches where both first and last match against upper and do a count. Repeat for the lower matches and you'll have both totals.

Find similar entries in SQL column and rank by frequency

I have a column of 10k URIs in my SQLite database. I would like to identify which of these URIs are subdomains of the same website.
For instance, for the given set...
1. daiquiri.rum.cu
2. mojito.rum.cu
3. cubalibre.rum.cu
4. americano.campari.it
5. negroni.campari.it
6. hemingway.com
... I would like to run a query that returns:
Website | Occurrences
----------------------------
rum.cu | 3
campari.it | 2
hemingway.com | 1
That is, the domain names / patterns that were matched, ranked by the number of times they were found in the database.
The heuristic I would use is: for every URI with 3+ domains, replace first domain with '%'and execute the pseudoquery: COUNT(uris from website where uris LIKE '%.remainderofmyuri').
Note that I don't care much about execution speed (in fact, not at all). The number of entries is within the range of 10k-100k.
The only problem is to find the domain. In order to find an algorithm imagine your urls with an additional dot in front (like '.negroni.campari.it' and '.hemingway.com'). You see then it's always the string that comes after the second dot from right. All we have to do is look for that occurrence and strip part of the string. Unfortunately, however, SQLite's string functions are rather poor. There is no function that gives you the second occurence of a dot, not even when counting from left. So the agorithm is great for most dbms, but it isn't for SQLite. We need another approach. (I am writing this anyhow, to show how to usually approach the problem.)
Here is the SQLite solution: The difference between a domain and the subdomains is that in the domain there is exactly one dot, whereas a subdomain has at least two. So when there is more than one dot, we must remove the first part including the first dot in order to get to the domain. Moreover we want this to work even with sub domains like abc.def.geh.ijk.com, so we must do this recursively.
with recursive cte(uri) as
(
select uri from uris
union all
select substr(uri, instr(uri, '.') + 1) as uri from cte where instr(uri, '.') > 0
)
select uri, count(*)
from cte
where length(uri) = length(replace(uri,'.','')) + 1 -- domains only
group by uri
order by count(*) desc;
Here we generate 'daiquiri.rum.cu' and 'rum.cu' and 'cu' from 'daiquiri.rum.cu' etc. So for every uri we get the domain (here 'rum.cu') and some other strings. At last we filter with LENGTH to get those strings that have exactly one dot - the domains. The rest is group by and count.
Here is the SQL fiddle: http://sqlfiddle.com/#!5/c1f35/37.
select x.site, count(*)
from mytable a
inner join
(
select 'rum.cu' as site
union all select 'campari.it'
union all select 'hemingway.com'
) x on a.url like '%' + x.site + '%'
group by x.site -- EDIT I missed out the GROUP BY on the first go - sorry!
(This is how I'd do it in SQL-Server; not sure how SQLite differs in syntax.)
'mytable' is your table whuch has a column called url containing 'mojito.rum.cu' etc. I haven't put the '%.' in the like because that would miss out hemmingway.com. However you could get around that by using this line instead:
) x on a.url like '%.' + x.site + '%' or a.url = x.site
You may not need the fimal + '%' - I put it in to catch urls like 'hemingway.com/some-page.html. If you don't have urls like that you can skip that.
EDIT for dynamic names
select x.site, count(*)
from mytable a
inner join
(
select distinct ltrim(url, instr(url, '.')) as site
from mytable
where url like '%.%.%'
union
select distinct url
from mytable
where url like '%.%' and url not like '%.%.%'
) x on a.url like '%' + x.site + '%'
group by x.site
Something like that should do it. I haven't tested that the INSTR() function is correct. You may need to add or subtract 1 from the offset it generates when you test it. It may not be the fastest query but it should work.

sql select a record containing a phrase

I want to query a record containing a phrase.
for example: I want the search to return the record: 'The needle in the haystack' with the search phrase 'needle haystack'
The query will work if I just have 'needle' or just 'haystack' using like% in the where clause.
Is there a way to search with the phrase 'needle haystack'?
SELECT * FROM table WHERE phrase LIKE '%needle%' AND phrase LIKE '%haystack%'
Replace phrase with LOWER(phrase) if you want the search to be case-insensitive (depends on the DB engine and other things, though).
you can also try this
select * from Suppliers where patindex(REPLACE('%' + 'YOUR SEARCH STRING' + '%',' ','%'),CompanyName) > 1