Google Style Search Suggestions with Levenshtein Edit Distance - sql

Ok guys working on search suggestions using jQuery-UI AutoComplete with results from sql-sever 2008 db. Using AdventureWorks DB Products table for testing. I want to search across 2 fields in this example. ProductNumber and Name.
I asked 2 questions earlier relating to this...here and here
and ive come up with this so far...
CREATE procedure [dbo].[procProductAutoComplete]
(
#searchString nvarchar(100)
)
as
begin
declare #param nvarchar(100);
set #param = LOWER(#searchString);
WITH Results(result)
AS
(
select TOP 10 Name as 'result'
from Production.Product
where LOWER(Name) like '%' + #param + '%' or (0 <= dbo.lvn(#param, LOWER (Name), 6))
union
select TOP 10 ProductNumber as 'result'
from Production.Product
where LOWER(ProductNumber) like '%' + #param + '%' or (0 <= dbo.lvn(#param, LOWER(ProductNumber), 6))
)
SELECT TOP 20 * from Results
end;
My problem now is ordering of the results...I am getting the correct results but they are just ordered by the Name or product number and are not relevant to the input string...
for example I can search for product Number starting with "BZ-" and the top returned results are ProductNums starting with "A" although I do get more relevant results elsewhere in the list..
any ideas for sorting the results in terms of relevance to the search string??
EDIT:
in regards to the tql implementation of the levenschtein distance found here(linked to in previous question)...
I am wondering what would be the best way to determine the MAX value to send to the function (6 in my example above)
Would it be best to choose an arbitrary value based on what "seems" to work well for my given data set? or would it be best to adjust it dynamically based on the length of the input string...
My initial thoughs were that the value to should be inverely proportional to the length of the searchString...so as the search string grows and becomes more specific..the tolerance decreases...thoughts??

The Full Text Search feature seems to be the way go when using SQL Server

The relevance is the result of dbo.lvn(). It returns the amount of operations need to transform one string into the other. So the answer is simple:
ORDER BY dbo.lvn(#param, LOWER (Name), 6)
But this won't work in combination to the LIKE as this does not return any relevance value. But the usage of LIKE is not a good idea at all. If someone is tiping "tooth" to buy "toothpaste" he would get "bluetooth" as proposal.
To make devlim faster read here:
https://stackoverflow.com/a/14261807/318765

Related

How to find rows by filtering a specific text using Full text search in MS SQL 2012

I have a requirements to find rows by filtering a specific text using Full Text Search in MS SQL. The first requirement is to find rows by searching the text within the xml column, and the second requirement, is to find rows by searching the text within the json column(nvarchar data type). The following conditions should return a result.
XML Column
Criteria 1. Where Contains(XMLData,"1")
Criteria 2. Where Contains(XMLData,"/1/")
Criteria 3. Where Contains(XMLData,"<field>1</field>")
JSONDATA Column :
Criteria 1. Where Contains(JSONData,"1")
Criteria 2. Where Contains(JSONData,"/1/")
Criteria 2. Where Contains(JSONData,"PortalId:1")
My current implementation is by using the query below which has a performance issues when running thousand of records. Is there any other approach other than the code below?
XML QUERY
SELECT *
WHERE cast(XMLData as nvarchar(max)) LIKE '%/' + CONVERT(VARCHAR,'1') +'/%'
JSON QUERY
SELECT *
WHERE JSONDataLIKE '%/' + CONVERT(VARCHAR,'1') +'/%'
Here is a sample table for this question.
http://sqlfiddle.com/#!18/f65ef/1
I do not think that a full text search would help you. It seems you are looking for any fragment even such as technical terms like /1/.
Try this for XML
DECLARE #SearchFor VARCHAR(100)='1';
SELECT *
FROM SettingsData
WHERE xmldata.exist(N'//*[contains(text()[1],sql:variable("#SearchFor"))]')=1;
It will check any node's internal text() if it contains the search phrase. But any value with a 1 inside is found (e.g. any unrelated number which has a 1 somewhere.) You might search for text()="1" and perform the contains only if the string length exceeds a certain minimum.
Something like
WHERE xmldata.exist(N'//*[text()[1]=sql:variable("#SearchFor") or(string-length(text()[1])>=3 and contains(text()[1],concat("/",sql:variable("#SearchFor"),"/")))]')=1;
Json is - up to now - nothing more than a string and must be parsed. With v2016 Microsoft introduced JSON support, but you are on v2012. The problem with a LIKE search on a JSON-string might be, that you would find the 1 even as a part of an element's name. The rest is as above...
SELECT *
FROM SettingsData
WHERE jsondata LIKE '%' + #SearchFor + '%';

Extract a substring from a text field

New to TSQL and SQL generally, please pardon if this is really basic:
I am working with a new-to-me-database that has ignored some best practices. Relevant to this discussion, some data is stored in a generalized note field, including loyalty numbers. The good news is that the loyalty numbers are at least stored consistently within the note.
So, a simplified example from the note table might be:
I have verified that every Loyalty Number is stored consistently ("Loyalty Number ####"), but obviously this is not ideal. I want to extract the Loyalty Number for every primary key that has them, then create a new field that stores the Loyalty Number.
What I'm having trouble with is the following: How do I run a query that will give me each primary key then, if there is a loyalty number return it, if not leave it null or say something like no result found. E.g., turn the above into something like.
It's trivially easy to construct something like "select primary_key, note from note_table where note like '%Loyalty Number%', but that doesn't do the job of clipping down to just the loyalty number (and leaving out extraneous text). The uniformity of the data means I could probably do this in Excel, but I'm wondering if it's possible in TSQL. Thanks in advance for your help.
Give something like this a try using case with substring and charindex:
select id,
case when note like '%Loyalty Number [0-9][0-9][0-9][0-9]%'
then 'Loyalty Number ' +
substring(note,
charindex('Loyalty Number', note) + Len('Loyalty Number ') + 1, 4)
end as Note
from note
SQL Fiddle Demo
The case statement checks to see if Loyalty Number exists in the data. Substring splits the note field using charindex to find the starting position. This is hard coding a length of 4 characters for the loyalty number. Given your comments, this should work. If you have a dynamic number of characters, you'll need to modify this slightly.
Building on #segeddes answer, here's the rest of the code, that will update your new LoyaltyNumber column.
Working SQL Fiddle: http://sqlfiddle.com/#!3/36e46/8
UPDATE note_table
SET LoyaltyNumber =
CASE
WHEN note LIKE '%Loyalty Number [0-9][0-9][0-9][0-9]%'
THEN SUBSTRING(note, CHARINDEX('Loyalty Number', note)
+ LEN('Loyalty Number ') + 1, 4)
ELSE 'Regular Customer'
END
FROM note_table
Table Definition and CRUD
CREATE TABLE note_table (
id int identity(1,1),
Note VarChar(500),
LoyaltyNumber varchar(20)
)
Insert Into note_table(Note) Values
('Customer Since 2012. Loyalty Number 4747'),
('Loyalty Number 2209'),
('Loyalty Number 2234.Customer Since 2009'),
('Pending Order');

Search half the refID in SQL

I have a SQL table which hold unique REFID (int) and many other columns. I wanted to search a row using the half REFID . So if someone just search 0001 then 50001, 00015... comes up.
I have tried:
SELECT TOP 10 REFID
FROM Tablename
where REFID LIKE '%' + cast(0001 as varchar(10)) +'%'
however the problem is, it also giving me 150100 however I wanted 0001 to be in order.
'0001' is passed in as a parameter passed in from my C# application. I know I can convert the '0001' to string/varchar before sending it to the SQL however I was looking for a way to do it within the SQL so I can pass in the int from C# application
Code:
SELECT TOP 10 REFID
FROM Tablename
where REFID LIKE '%0001%'
0001 is a number and when converted to varchar() it will become '1'.
This will work with any number but only if you know beforehand that you will use four characters in your expression.
SELECT TOP 10 REFID
FROM Tablename
where REFID LIKE '%' + RIGHT('0000' + CAST(0001 AS VARCHAR(4)), 4) +'%'
We don't know how are you building your SQL statement, so we may need more information in order to help. Where do you get your ' 0001' value from? Is it a variable? Is it a parameter in a stored procedure? Is it inside a function in a different programming language?
You need to compare the REFID to a string value (not an int: as the comments point out, CAST(0001 AS VARCHAR(10)) returns 1, not 0001.
SELECT TOP 10 REFID
FROM Tablename
where REFID LIKE '%0001%'
EDIT: you have bigger issues too, like how to search for an integer value stored without leading zeroes, but if you are passing in a parameter you need to either make it varchar, or convert it to varchar in your query body, like so (assuming, of course, that you are always searching for a four-digit string):
SET #SearchParam_char = RIGHT('000' + CAST(#searchParam_Int AS VARCHAR(10)), 4)
I have found:
cast('0001' as varchar(10)) as 0001 === 1 thanks to ALEX K.
SQL will strip leading zero and there is no way of keeping the zero if you don't know the length.
My solution: I will send a string from my application and let SQL search it using the string.

How to write SQL query with many % wildcard characters

I have a coloumn in Sql Server table as:
companystring = {"CompanyId":0,"CompanyType":1,"CompanyName":"Test
215","TradingName":"Test 215","RegistrationNumber":"Test
215","Email":"test215#tradeslot.com","Website":"Test
215","DateStarted":"2012","CompanyValidationErrors":[],"CompanyCode":null}
I want to query the column to search for
companyname like '%CompanyName":"%test 2%","%'
I want to know if I'm querying correctly, because for some search string it does not yield the proper result. Could anyone please help me with this?
Edit: I have removed the format bold
% is a special character that means a wildcard. If you want to find the actual character inside a string, you need to escape it.
DECLARE #d TABLE(id INT, s VARCHAR(32));
INSERT #d VALUES(1,'foo%bar'),(2,'fooblat');
SELECT id, s FROM #d WHERE s LIKE 'foo[%]%'; -- returns only 1
SELECT id, s FROM #d WHERE s LIKE 'foo%'; -- returns both 1 and 2
Depending on your platform, you might be able to use some combination of regular expressions and/or lambda expressions which are built into its main libraries. For example, .NET has LINQ , which is a powerful tool that abstracts querying and which provides leveraging for searches.
It looks like you have JSON data stored in a column called "companystring". If you want to search within the JSON data from SQL things get very tricky.
I would suggest you look at doing some extra processing at insert/update to expose the properties of the JSON you want to search on.
If you search in the way you describe, you would actually need to use Regular Expressions or something else to make it reliable.
In your example you say you want to search for:
companystring like '%CompanyName":"%test 2%","%'
I understand this as searching inside the JSON for the string "test 2" somewhere inside the "CompanyName" property. Unfortunately this would also return results where "test 2" was found in any other property after "CompanyName", such as the following:
-- formatted for readability
companystring = '{
"CompanyId":0,
"CompanyType":1,
"CompanyName":"Test Something 215",
"TradingName":"Test 215",
"RegistrationNumber":"Test 215",
"Email":"test215#tradeslot.com",
"Website":"Test 215",
"DateStarted":"2012",
"CompanyValidationErrors":[],
"CompanyCode":null}'
Even though "test 2" isn't in the CompanyName, it is in the text following it (TradingName), which is also followed by the string "," so it would meet your search criteria.
Another option would be to create a view that exposes the value of CompanyName using a column defined as follows:
LEFT(
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring)),
CHARINDEX('"', SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))) - 1
) AS CompanyName
Then you could query that view using WHERE CompanyName LIKE '%test 2%' and it would work, although performance could be an issue.
The logic of the above is to get everything after "CompanyName":":
SUBSTRING(companystring, CHARINDEX('"CompanyName":"', companystring) + LEN('"CompanyName":"'), LEN(companystring))
Up to but not including the first " in the sub-string (which is why it is used twice).

Find exact match using full-text search

Using the Sql Server 2008 how can you actually find an exact string match using full-text search. I'm having a real hard time with this and I just couldn't find a satisfactory solution anywhere online.
For example, if I'm searching for the string "Bojan Skrchevski" I want the first result to be exactly that.
So far I've tried formatting the string like: "Bojan* NEAR Skrchevski*" and call CONTAINSTABLE to get results, but this string is formatted to return more results as Bojana and Bojananana etc. I also tried to ORDER BY RANK, but still no success.
Furthermore, in my string I have a number sequence like: "3 1 7", but with the current formatting it also returns "7 1 3" etc.
Example:
DECLARE #var varchar(4000);
SET #var = '"Oxford*" NEAR 24 NEAR 7 NEAR 5 NEAR "London*"'
SELECT [Key] FROM CONTAINSTABLE(dbo.[MyTable], [MyField], #var);
I want to be able to get the exact ordering. Not to get "Oxford 7 24 5 London" as a result.
How do I format the string to accomplish this correctly?
There's 2 options
1)
This will get all items which have Mountain in their name
SELECT Name, ListPrice
FROM Production.Product
WHERE ListPrice = 80.99
AND CONTAINS(Name, 'Mountain');
GO
2)
This will get all items which have these 3 strings in Document no matter what order
SELECT Title
FROM Production.Document
WHERE FREETEXT (Document, 'vital safety components' );
It depends on what you really want but I couldn't understand completely.
If I'm missing the point please post a sample and what the result should be.
kr,
Kristof
Perhaps one approach could be to select several results with the full-text search and then SELECT the specific one from those results. But maybe there could be a better solution to this.
I tried this approach and it actually worked. It also works a lot faster then to just SELECT the value.