Find exact match using full-text search - sql

Using the Sql Server 2008 how can you actually find an exact string match using full-text search. I'm having a real hard time with this and I just couldn't find a satisfactory solution anywhere online.
For example, if I'm searching for the string "Bojan Skrchevski" I want the first result to be exactly that.
So far I've tried formatting the string like: "Bojan* NEAR Skrchevski*" and call CONTAINSTABLE to get results, but this string is formatted to return more results as Bojana and Bojananana etc. I also tried to ORDER BY RANK, but still no success.
Furthermore, in my string I have a number sequence like: "3 1 7", but with the current formatting it also returns "7 1 3" etc.
Example:
DECLARE #var varchar(4000);
SET #var = '"Oxford*" NEAR 24 NEAR 7 NEAR 5 NEAR "London*"'
SELECT [Key] FROM CONTAINSTABLE(dbo.[MyTable], [MyField], #var);
I want to be able to get the exact ordering. Not to get "Oxford 7 24 5 London" as a result.
How do I format the string to accomplish this correctly?

There's 2 options
1)
This will get all items which have Mountain in their name
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain');
GO
2)
This will get all items which have these 3 strings in Document no matter what order
SELECT Title
FROM Production.Document
WHERE FREETEXT (Document, 'vital safety components' );
It depends on what you really want but I couldn't understand completely.
If I'm missing the point please post a sample and what the result should be.
kr,
Kristof

Perhaps one approach could be to select several results with the full-text search and then SELECT the specific one from those results. But maybe there could be a better solution to this.
I tried this approach and it actually worked. It also works a lot faster then to just SELECT the value.

Related

Why would SQL truncate the right-side of a string when using the right function?

I have a database with MANY out-of-date file locations. The difference between the out-of-date file locations and the correct locations is simply the left-side of the address. So, I am attempting to take the left-side off and replace it with the correct string. But, I can't get there because my query is altering the right-side of the address.
This query is made using "vfpoledb."
SELECT RIGHT(LINK,LEN(LINK)-8) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
This query returns the following:
EXP1:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447
EXP2:
77
EXP3:
\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
LINK:
\\SERVER\SHARES\DATA\QMS\QMS DATA\TRACKING FILES\REMOTEENTRIES\V46145 216447-002A.PDF
I don't understand why EXP1 and EXP3 are giving different results. EXP3 is what I'm looking for EXP1 to return. If I could get that, I could append the correct left-hand-side and create an update query to fix everything.
Edit:
Even when changing the query to:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
The link still cuts off at the same point, which is odd because expression_3 which still uses Right(), but manually provides the length instead of using Len() does not do this.
Furthermore, it seems that when I run the query to include all results:
SELECT RIGHT(LINK,LEN(LINK)) ,LEN(LINK)-8,RIGHT(LINK,77),LINK
FROM LINKSTORE
WHERE 1=1
All values returned by Exp1 are equal in length even though Exp2 and Link are different in size.
So back to the problem, how can I run a query to replace the left-side with the correct server if I can't separate them out?
OK this is tricky, I did some Foxpro 20 years ago but don't have it to hand.
Your SELECT statement looks OK to me. In the comments under the question Thomas G created this DbFiddle which shows that in a 'normal' dbms, your SELECT statement gives the result you are expecting: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=37047d2b7efb91aaa029fa0fb98eea24
So the problem must be something FoxPro/dBase specific rather than a problem with your SELECT statement.
Reading up I see people say that with FoxPro always use ALLTRIM() when using RIGHT() or LEN() on table fields because the data gets returned padded with spaces. I don't see how that would cause the exact bug you're seeing but you could try this maybe:
SELECT RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8) ,LEN(ALLTRIM(LINK))-8,RIGHT(ALLTRIM(LINK),77),ALLTRIM(LINK)
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
edit: OK I got a better idea - are there other rows in your result set?
According to this: https://www.tek-tips.com/viewthread.cfm?qid=1706948 ... when you do SELECT (expr) in FoxPro whatever the length of the expr in the first row becomes that max length for that 'field' and so all subsequent rows get truncated to that length. Makes sense in a crazy 1970s sort of way.
So perhaps you have a row of data above the one we are talking about which comes out at 68 chars long and so every subsequent value gets truncated to that length.
The way around it is to pad your expression results with CAST or PADR:
SELECT PADR(RIGHT(ALLTRIM(LINK),LEN(ALLTRIM(LINK))-8),100),LEN(ALLTRIM(LINK))-8,PADR(RIGHT(ALLTRIM(LINK),77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"
Or same without the ALLTRIM()
SELECT PADR(RIGHT(LINK,LEN(LINK)-8),100),LEN(LINK)-8,PADR(RIGHT(LINK,77),100),LINK
FROM LINKSTORE
WHERE DOCLBL = "V46145002A"

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

How to write SQL query with many % wildcard characters

I have a coloumn in Sql Server table as:
companystring = {"CompanyId":0,"CompanyType":1,"CompanyName":"Test
215","TradingName":"Test 215","RegistrationNumber":"Test
215","Email":"test215#tradeslot.com","Website":"Test
215","DateStarted":"2012","CompanyValidationErrors":[],"CompanyCode":null}
I want to query the column to search for
companyname like '%CompanyName":"%test 2%","%'
I want to know if I'm querying correctly, because for some search string it does not yield the proper result. Could anyone please help me with this?
Edit: I have removed the format bold
% is a special character that means a wildcard. If you want to find the actual character inside a string, you need to escape it.
DECLARE #d TABLE(id INT, s VARCHAR(32));
INSERT #d VALUES(1,'foo%bar'),(2,'fooblat');
SELECT id, s FROM #d WHERE s LIKE 'foo[%]%'; -- returns only 1
SELECT id, s FROM #d WHERE s LIKE 'foo%'; -- returns both 1 and 2
Depending on your platform, you might be able to use some combination of regular expressions and/or lambda expressions which are built into its main libraries. For example, .NET has LINQ , which is a powerful tool that abstracts querying and which provides leveraging for searches.
It looks like you have JSON data stored in a column called "companystring". If you want to search within the JSON data from SQL things get very tricky.
I would suggest you look at doing some extra processing at insert/update to expose the properties of the JSON you want to search on.
If you search in the way you describe, you would actually need to use Regular Expressions or something else to make it reliable.
In your example you say you want to search for:
companystring like '%CompanyName":"%test 2%","%'
I understand this as searching inside the JSON for the string "test 2" somewhere inside the "CompanyName" property. Unfortunately this would also return results where "test 2" was found in any other property after "CompanyName", such as the following:
-- formatted for readability
companystring = '{
"CompanyId":0,
"CompanyType":1,
"CompanyName":"Test Something 215",
"TradingName":"Test 215",
"RegistrationNumber":"Test 215",
"Email":"test215#tradeslot.com",
"Website":"Test 215",
"DateStarted":"2012",
"CompanyValidationErrors":[],
"CompanyCode":null}'
Even though "test 2" isn't in the CompanyName, it is in the text following it (TradingName), which is also followed by the string "," so it would meet your search criteria.
Another option would be to create a view that exposes the value of CompanyName using a column defined as follows:
LEFT(
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring)),
CHARINDEX('"', SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))) - 1
) AS CompanyName
Then you could query that view using WHERE CompanyName LIKE '%test 2%' and it would work, although performance could be an issue.
The logic of the above is to get everything after "CompanyName":":
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))
Up to but not including the first " in the sub-string (which is why it is used twice).

SQL select statement

in a string array i have a variable amount of values.
how will an sql statement look if i want to select all the records that are equal with the array variables.
it will look something this:
SELECT * FROM users WHERE username='"+ variable amount of elements in the String array like: Steven, Mike, John ..... +"'"
You might be looking for the IN( ) operator.
SELECT * FROM users WHERE username IN ('chris', 'bob', 'bill');
This? (obviously the "OR username='xxx' is repeated for as many items as you require)
SELECT * FROM users WHERE username='item1' OR username='item2' OR username='item3'
Every one above me is corrrect! There are multiple ways of doing this! A simple google search for 'mysql statement variables' yield tons of help full results including:
http://www.daniweb.com/forums/thread126895.html
Just treat your array like a standard varible with [a number] on the end!
Two tips come from this:
Google / Search your problem first 9 times out of 10 someone else has had the same problem and found a working solution!
And be prepared to look at the answers on here with in 5 mins. This forums probably the quickest you'll ever see. The only thing that slows the community down is typing ! :P
Hope that Helps,
Andy
Another approach, although less efficient, can be used if parsing the string apart is problematic...
DECLARE #listOfNames VARCHAR(2000)
SET #ListOfNames = 'JOE,KELLIE,PETE'
SELECT * FROM users WHERE charindex(","+userName+",",","+#listOfNames+",") > 0
This approach is slower than either of the other answers, but can save you from using dynamic SQL to build a SQL query based on the list of names passed in...

Google Style Search Suggestions with Levenshtein Edit Distance

Ok guys working on search suggestions using jQuery-UI AutoComplete with results from sql-sever 2008 db. Using AdventureWorks DB Products table for testing. I want to search across 2 fields in this example. ProductNumber and Name.
I asked 2 questions earlier relating to this...here and here
and ive come up with this so far...
CREATE procedure [dbo].[procProductAutoComplete]
(
#searchString nvarchar(100)
)
as
begin
declare #param nvarchar(100);
set #param = LOWER(#searchString);
WITH Results(result)
AS
(
select TOP 10 Name as 'result'
from Production.Product
where LOWER(Name) like '%' + #param + '%' or (0 <= dbo.lvn(#param, LOWER (Name), 6))
union
select TOP 10 ProductNumber as 'result'
from Production.Product
where LOWER(ProductNumber) like '%' + #param + '%' or (0 <= dbo.lvn(#param, LOWER(ProductNumber), 6))
)
SELECT TOP 20 * from Results
end;
My problem now is ordering of the results...I am getting the correct results but they are just ordered by the Name or product number and are not relevant to the input string...
for example I can search for product Number starting with "BZ-" and the top returned results are ProductNums starting with "A" although I do get more relevant results elsewhere in the list..
any ideas for sorting the results in terms of relevance to the search string??
EDIT:
in regards to the tql implementation of the levenschtein distance found here(linked to in previous question)...
I am wondering what would be the best way to determine the MAX value to send to the function (6 in my example above)
Would it be best to choose an arbitrary value based on what "seems" to work well for my given data set? or would it be best to adjust it dynamically based on the length of the input string...
My initial thoughs were that the value to should be inverely proportional to the length of the searchString...so as the search string grows and becomes more specific..the tolerance decreases...thoughts??
The Full Text Search feature seems to be the way go when using SQL Server
The relevance is the result of dbo.lvn(). It returns the amount of operations need to transform one string into the other. So the answer is simple:
ORDER BY dbo.lvn(#param, LOWER (Name), 6)
But this won't work in combination to the LIKE as this does not return any relevance value. But the usage of LIKE is not a good idea at all. If someone is tiping "tooth" to buy "toothpaste" he would get "bluetooth" as proposal.
To make devlim faster read here:
https://stackoverflow.com/a/14261807/318765