I have an Employee table with these columns :
EmployeeId
Fullname
Phone
Department
Team
Function
Manager
I have a form with a search text input, where a user can type one column or all of them there like for example :
a user can search by Fullname only
a user can search by combining the Fullname + Phone + Team
What is the difference between Full text search and Fuzzy Search in SQL Server in this case?
so you have to options :
using full text search :
if the data is huge and you are looking for scalable data search , this method is preferable , however harder to maintain . so I suggest you add a computed column in that table and put and full text index on that:
alter table tablename
add column cmptcolumn as concat_ws(',',EmployeeId,FullName,PhoneNumber,...)
--full text catalog
CREATE FULLTEXT CATALOG catalogName AS DEFAULT;
-- full text index
create full text index on tablename (cmptcolumn)
-- search :
select * from tablename
where contain(cmptcolumn, 'SearchString');
by full text search you can search for synonyms and also words related to each other as well:
select * from tablename
where freetext(cmptcolumn, 'SearchString');
read more about different full text search options here
using search query. witch again you can benefit from computed column or search inside each column separately:
select *
from tablename
where (Fullname like '%'+#fullNameSearchString+'%' or #fullNameSearchString is null)
and (Department = #DepartmentSearchString or #DepartmentSearchString is null)
and ...
while first method is a faster way to search insode strings, second method provides more accurate result . however 'FreeText' looks for meaning of the word as well, in that case it might be slower.
in the second method either way you go ( with or without computed column) , having index on the column(s) in a necessity to improve the performance , however using like '%%' usually can't use index as it should.
create stored procedure and pass column comma separated check this example:
CREATE Procedure [dbo].Create_Sp
#SearchColumn varchar(500)=null
AS
BEGIN
DECLARE #SQL nvarchar(max)
SELECT #SQL = N'select '+#SearchColumn+' from #Employee'
EXECUTE sp_executesql #SQL
End
Well we know that the main difference between fuzzy and full text is "exactly what im looking for" versus "similarities." As that applies to SQL Server and to your example given, is fulltext search on firstname as opposed to mining similarities across three different columns, which might give a lot of noise depending on how clean your data is or how "guided" the end user experience in terms of entering those values is.
With fulltext search on fullname, im pretty sure that its a straight forward answer:
SELECT * FROM tblEmployee WHERE FullName=#FullNamParameter;
With fuzzy search, it can get very gross and nasty, as without clearer understanding of what the UI is doing I am forced to assume we want to check similarities on all parameters for all fields. TO be fairly honest Im pretty sure this query is the absolutel worst but it does demonstrate the idea for you.
SELECT * FROM tblEmployee WHERE FullName LIKE '%'+#Parameter1+'%'
UNION
SELECT * FROM tblEmployee WHERE FullName LIKE '%'+#Parameter2+'%'
UNION
SELECT * FROM tblEmployee WHERE FullName LIKE '%'+#Parameter3+'%'
UNION
SELECT * FROM tblEmployee WHERE Phone LIKE '%'+#Parameter1+'%'
UNION
SELECT * FROM tblEmployee WHERE Phone LIKE '%'+#Parameter2+'%'
UNION
SELECT * FROM tblEmployee WHERE Phone LIKE '%'+#Parameter3+'%'
UNION
SELECT * FROM tblEmployee WHERE Team LIKE '%'+#Parameter1+'%'
UNION
SELECT * FROM tblEmployee WHERE Team LIKE '%'+#Parameter2+'%'
UNION
SELECT * FROM tblEmployee WHERE Team LIKE '%'+#Parameter3+'%'
A very good solution is searloc. It is a CLR library with zero dependencies and with many features.
It supports full text search, phonetic match for all languages, keyboard match, fuzzy search, multi columns search, and many others.
And the most important it is very fast, it needs just a few milliseconds for millions of records.
Related
Let's say I have a SQL table with an int PK column and an nvarchar(max). In the the nvarchar(max) column, I have a bunch of table entries that are all like this:
SOME_PEOPLE_LIKE_APPLES
SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE
SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER
.
.
.
SOME_ANTS_HATE_SYRUP
SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES
There's millions of these rows - Then let's say my goal is to find the row with the most overlap for an input searchTerm - So in this case, if I input SOME PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN, the returned entry would be the third entry from the table above, SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
I have a SPROC that does this very naively, it goes through the entire table as follows:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE searchTerm LIKE phrase + '%'
-- phrase is the row entry being searched against
-- searchTerm is the phrase we're searching for
I then ORDER BY length and pick the TOP only
Would there be a way to speed this up, perhaps by doing some indexing?
If this is confusing, think of it as tableRowEntry + wildcard = searchTerm
I'm on MSSQL 2008 if that makes any difference
If there is an index on your NVARCHAR-column a LIKE 'Something%' -search will be able to use it and should be pretty fast.
If there is a wildcard in the beginning you are out of luck. But - in your case - this should work.
You might use an indexed persistant computed column storing the length of the string. In this case you might reduce the workload enormously by filtering out all string which are to short or to long.
If there are certain words in your search terms which appear often but not everywhere, you might use side columns again and filter like AND InlcudePEOPLE=1 AND IncludeMOON=1
UPDATE
Here is an example
CREATE TABLE Phrase(ID INT IDENTITY
,Phrase NVARCHAR(100)
,PhraseLength AS LEN(Phrase) PERSISTED);
CREATE INDEX IX_Phrase_Phrase ON Phrase(Phrase);
CREATE INDEX IX_Phrase_PhraseLength ON Phrase(PhraseLength);
INSERT INTO Phrase
VALUES
('SOME_PEOPLE_LIKE_APPLES')
,('SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER')
,('SOME_ANTS_HATE_SYRUP')
,('SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES');
DECLARE #SearchTerm NVARCHAR(100)=N'SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN';
--This uses the index (checked against execution plan)
SELECT TOP 1 *
FROM Phrase
WHERE #SearchTerm LIKE Phrase + '%'
ORDER BY PhraseLength DESC;
--This might be even better, check with your high row count.
SELECT TOP 1 *
FROM Phrase
WHERE Phrase=LEFT(#SearchTerm,PhraseLength)
ORDER BY PhraseLength DESC;
GO
--Clean-Up
DROP TABLE Phrase;
The best solution here is to create a full-text search index:
https://msdn.microsoft.com/en-us/library/ms142571.aspx
Full text search is optimized for this task, once the index is created you can use full-text queries with the CONTAINS full-text function to find the matches efficiently:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE CONTAINS(phrase, searchPhrase)
Full text search not only allows custom optimization through query hints like OPTIMIZE FOR, it also allows for stopwords like AND and OR within the search terms, and a variety of other text-searching goodies, like being able to find spelling variations of the same word automatically and filter by relevance, etc..
First I apologize for the poor formatting here.
Second I should say up front that changing the table schema is not an option.
So I have a table defined as follows:
Pin varchar
OfferCode varchar
Pin will contain data such as:
abc,
abc123
OfferCode will contain data such as:
123
123~124~125
I need a query to check for a count of a Pin/OfferCode combination and when I say OfferCode, I mean an individual item delimited by the tilde.
For example if there is one row that looks like abc, 123 and another that looks like abc,123~124, and I search for a count of Pin=abc,OfferCode=123 I wand to get a count = 2.
Obviously I can do a similar query to this:
SELECT count(1) from MyTable (nolock) where OfferCode like '%' + #OfferCode + '%' and Pin = #Pin
using like here is very expensive and I'm hoping there may be a more efficient way.
I'm also looking into using a split string solution. I have a Table-valued function SplitString(string,delim) that will return table OutParam, but I'm not quite sure how to apply this to a table column vs a string. Would this even be worth wile pursuing? It seems like it would be much more expensive, but I'm unable to get a working solution to compare to the like solution.
Your like/% solution is open to a bug if you had offer codes other than 3 digits (if there was offer code 123 and 1234, searching for like '%123%' would return both, which is wrong). You can use your string function this way:
SELECT Pin, count(1)
FROM MyTable (nolock)
CROSS APPLY SplitString(OfferCode,'~') OutParam
WHERE OutParam.Value = #OfferCode and Pin = #Pin
GROUP BY Pin
If you have a relatively small table you can probably get away with this. If you are working with a large number of rows or encountering performance problems, it would be more effective to normalize it as RedFilter suggested.
using like here is very expensive and I'm hoping there may be a more efficient way
The efficient way is to normalize the schema and put each OfferCode in its own row.
Then your query is more like (although you may need to use an intersection table depending on your schema):
select count(*)
from MyTable
where OfferCode = #OfferCode
and Pin = #Pin
Here is one way to use like for this problem, which is standard for getting exact matches when searching delimited strings while avoiding the '%123%' matches '123' and '1234' problem:
-- Create some test data
declare #table table (
Pin varchar(10) not null
, OfferCode varchar(100) not null
)
insert into #table select 'abc', '123'
insert into #table select 'abc', '123~124'
-- Mock some proc params
declare #Pin varchar(10) = 'abc'
declare #OfferCode varchar(10) = '123'
-- Run the actual query
select count(*) as Matches
from #table
where Pin = #Pin
-- Append delimiters to find exact matches
and '~' + OfferCode + '~' like '%~' + #OfferCode + '~%'
As you can see, we're adding the delimiters to the searched string, and also the search string in order to find matches, thus avoiding the bugs mentioned by other answers.
I highly doubt that a string splitting function will yield better performance over like, but it may be worth a test or two using some of the more recently suggested methods. If you still have unacceptable performance, you have a few options:
Updated:
Try an index on OfferCode (or on a computed persisted column of '~' + OfferCode + '~'). Contrary to the myth that SQL Server won't use an index with like and wildcards, this might actually help.
Check out full text search.
Create a normalized version of this table using a string splitter. Use this table to run your counts. Update this table according to some schedule or event (trigger, etc.).
If you have some standard search terms, pre-calculate the counts for these and store them on some regular basis.
Actually, the LIKE condition is going to have much less cost than doing any sort of string manipulation and comparison.
http://www.simple-talk.com/sql/performance/the-seven-sins-against-tsql-performance/
I want to search a text in a table without knowing its attributes.
Example : I have a table Customer,and i want to search a record which contains 'mohit' in any field without knowing its column name.
You are looking for Full Text Indexing
Example using the Contains
select ColumnName from TableName
Where Contains(Col1,'mohit') OR contains(col2,'mohit')
NOTE - You can convert the above Free text query into dynamic Query using the column names calculated from the sys.Columns Query
Also check below
FIX: Full-Text Search Queries with CONTAINS Clause Search Across Columns
Also you can check all Column Name From below query
Select Name From sys.Columns Where Object_Id =
(Select Object_Id from sys.Tables Where Name = 'TableName')
Double-WildCard LIKE statements will not speed up the query.
If you wanna make a full search on the table, you must surely be knowing the structure of the table. Considering the table has fields id, name, age, and address, then your SQL Query should be like:
SELECT * FROM `Customer`
WHERE `id` LIKE '%mohit%'
OR `name` LIKE '%mohit%'
OR `age` LIKE '%mohit%'
OR `address` LIKE '%mohit%';
Mohit, I'm glad you devised the solution by yourself.
Anyway, whenever you again face an unknown table or database, I think it will be very welcome the code snippet I just posted here.
Ah, one more thing: the answers given did not addressed your problem, did they?
I have create this little sample Table Person.
The Id is a primary key and is identity = True
Now when I try to create a FullText Catalog, I'm unable to search for the Id.
If I follow this wizard, I'll end up with a Catalog, where its only possible to search for a persons name. I would really like to know, how I can make it possible to do a fulltext search for both the Id and Name.
edit
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, 'anders' );
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, '1' );
I would like them to return the same result, the first returns id = 1 name = Anders, while the second query don't return anything.
edit 2
Looks like the problem is in using int, but is it not possible to trick FullText to support it?
edit 3
Created a view where I convert the int to a nvarchar. CONVERT(nvarchar(50), Id) AS PersonId this did make it possible for me to select that column, when creating the Full Text Catalog, but It still won't let me find it searching for the Id.
From reading your question, I am not sure that you understand the purpose of a full-text index. A full-text index is intended to search TEXT (one or more columns) on a table. And not just as a replacement for:
SELECT *
FROM table
WHERE col1 LIKE 'Bob''s Pizzaria%'
OR col2 LIKE 'Bob''s Pizzaria%'
OR col3 LIKE 'Bob''s Pizzaria%'
It also allows you to search for variations of "Bob's Pizzaria" like "Bobs Pizzeria" (in case someone misspells Pizzeria or forgot to put in the ' or over-zealous anti-SQL-injection code stripped the ') or "Robert's Pizza" or "Bob's Pizzeria" or "Bob's Pizza", etc. It also allows you to search "in the middle" of a text column (char, varchar, nchar, nvarchar, etc.) without the dreaded "%Bob%Pizza%" that eliminates any chance of using a traditional index.
Enough with the lecture, however. To answer your specific question, I would create a separate column (not a "computed column") "IdText varchar(10)" and then an AFTER INSERT trigger something like this:
UPDATE t
SET IdText = CAST(Id AS varchar(10))
FROM table AS t
INNER JOIN inserted i ON i.Id = t.Id
If you don't know what "inserted" is, see this MSDN article or search Stack Overflow for "trigger inserted". Then you can include the IdText column in your full-text index.
Again, from your example I am not sure that a full-text index is what you should use here but then again your actual situation might be something different and you just created this example for the question. Also, full-text indexes are relatively "expensive" so make sure you do a cost-benefit analysis. Here is a Stack Overflow question about full-text index usage.
Why not just do a query like this.
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, '1' )
OR ID = 1
You can leave off the "OR ID = 1" part if by first checking if the search term is a number.
I am new to SQL programming.
I have a table job where the fields are id, position, category, location, salary range, description, refno.
I want to implement a keyword search from the front end. The keyword can reside in any of the fields of the above table.
This is the query I have tried but it consist of so many duplicate rows:
SELECT
a.*,
b.catname
FROM
job a,
category b
WHERE
a.catid = b.catid AND
a.jobsalrange = '15001-20000' AND
a.jobloc = 'Berkshire' AND
a.jobpos LIKE '%sales%' OR
a.jobloc LIKE '%sales%' OR
a.jobsal LIKE '%sales%' OR
a.jobref LIKE '%sales%' OR
a.jobemail LIKE '%sales%' OR
a.jobsalrange LIKE '%sales%' OR
b.catname LIKE '%sales%'
For a single keyword on VARCHAR fields you can use LIKE:
SELECT id, category, location
FROM table
WHERE
(
category LIKE '%keyword%'
OR location LIKE '%keyword%'
)
For a description you're usually better adding a full text index and doing a Full-Text Search (MyISAM only):
SELECT id, description
FROM table
WHERE MATCH (description) AGAINST('keyword1 keyword2')
SELECT
*
FROM
yourtable
WHERE
id LIKE '%keyword%'
OR position LIKE '%keyword%'
OR category LIKE '%keyword%'
OR location LIKE '%keyword%'
OR description LIKE '%keyword%'
OR refno LIKE '%keyword%';
Ideally, have a keyword table containing the fields:
Keyword
Id
Count (possibly)
with an index on Keyword. Create an insert/update/delete trigger on the other table so that, when a row is changed, every keyword is extracted and put into (or replaced in) this table.
You'll also need a table of words to not count as keywords (if, and, so, but, ...).
In this way, you'll get the best speed for queries wanting to look for the keywords and you can implement (relatively easily) more complex queries such as "contains Java and RCA1802".
"LIKE" queries will work but they won't scale as well.
Personally, I wouldn't use the LIKE string comparison on the ID field or any other numeric field. It doesn't make sense for a search for ID# "216" to return 16216, 21651, 3216087, 5321668..., and so on and so forth; likewise with salary.
Also, if you want to use prepared statements to prevent SQL injections, you would use a query string like:
SELECT * FROM job WHERE `position` LIKE CONCAT('%', ? ,'%') OR ...
I will explain the method i usally prefer:
First of all you need to take into consideration that for this method you will sacrifice memory with the aim of gaining computation speed.
Second you need to have a the right to edit the table structure.
1) Add a field (i usually call it "digest") where you store all the data from the table.
The field will look like:
"n-n1-n2-n3-n4-n5-n6-n7-n8-n9" etc.. where n is a single word
I achieve this using a regular expression thar replaces " " with "-".
This field is the result of all the table data "digested" in one sigle string.
2) Use the LIKE statement %keyword% on the digest field:
SELECT * FROM table WHERE digest LIKE %keyword%
you can even build a qUery with a little loop so you can search for multiple keywords at the same time looking like:
SELECT * FROM table WHERE
digest LIKE %keyword1% AND
digest LIKE %keyword2% AND
digest LIKE %keyword3% ...
You can find another simpler option in a thread here: Match Against.. with a more detail help in 11.9.2. Boolean Full-Text Searches
This is just in case someone need a more compact option. This will require to create an Index FULLTEXT in the table, which can be accomplish easily.
Information on how to create Indexes (MySQL): MySQL FULLTEXT Indexing and Searching
In the FULLTEXT Index you can have more than one column listed, the result would be an SQL Statement with an index named search:
SELECT *,MATCH (`column`) AGAINST('+keyword1* +keyword2* +keyword3*') as relevance FROM `documents`USE INDEX(search) WHERE MATCH (`column`) AGAINST('+keyword1* +keyword2* +keyword3*' IN BOOLEAN MODE) ORDER BY relevance;
I tried with multiple columns, with no luck. Even though multiple columns are allowed in indexes, you still need an index for each column to use with Match/Against Statement.
Depending in your criterias you can use either options.
I know this is a bit late but what I did to our application is this. Hope this will help someone tho. But it works for me:
SELECT * FROM `landmarks` WHERE `landmark_name` OR `landmark_description` OR `landmark_address` LIKE '%keyword'
OR `landmark_name` OR `landmark_description` OR `landmark_address` LIKE 'keyword%'
OR `landmark_name` OR `landmark_description` OR `landmark_address` LIKE '%keyword%'