I have found this on microsoft support (http://technet.microsoft.com/en-us/library/ms142547(v=SQL.105).aspx) :
SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”)
Is it possible to call the script with "c" (candidate that have learned C language)?
Because c is single character, it doesn't work with fulltext search. Only if I set stoplist to off as following, the script returns data:
ALTER FULLTEXT INDEX ON [ADNEOM.BE].[adneom].[T_CANDIDATE] SET STOPLIST = OFF
But the data returned include c# and c++ too, and I want only C.
In addition, I don't think that disable system stoplist is a good idea.
From technical point of view, I see nothing that prevents you from doing a fulltext search on "c" character, except for minimal pattern length threshold mentioned in the question, so the answer is Yes, it is possible.
I assume that candidate_resume is a text field and like operator is applicable, so you may consider another solution:
select candidate_name, SSN from candidates where candidate_resume like '%C%';
But, before doing that you should consider that searching by one letter will give you tons of false-positive results. For example, my answer contains it 14 times.
If you have detailed wanted and unwanted patterns list, you can add it to the query:
--positive cases list
... where (candidate_resume like '%c/%' or candidate_resume like '%c,%' or ...
--negative cases list
...) and candidate_resume not like '%C#% and candidate_resume not like '%mvc% and ...
Disclaimer: Having lots of such clauses will slow down your query.
Disclaimer: Maybe you'll need "%c/%", not '%c/%', I don't remember unfortunately. In that case feel free to edit my post or add a comment so I fix it.
Related
In Sql Server, I have a table containing 46 million rows.
In "Title" column of table, I want make search. The word may be at any index of field value.
For example:
Value in table: BROTHERS COMPANY
Search string: ROTHER
I want this search to match the given record. This is exactly what LIKE '%ROTHER%' do. However, LIKE '%%' usage should not be used on large tables because of performance issues. How can I achieve it?
Though I don't know your requirements, your best approach may be to challenge them. Middle-of-the-string searches are usually not very practical. If you can get your users to perform prefix searches (broth%) then you can easily use Full Text's wildcard search (CONTAINS(*, '"broth*"')). Full Text can also handle suffix searches (%rothers) with a little extra work.
But when it comes to middle-of-the-string searches with SQL Server, you're stuck using LIKE. However you may be able to improve performance of LIKE by using a binary collation as explained in this article. (I hate to post a link without including its content but it is way too long of an article to post here and I don't understand the approach enough to sum it up.)
If that doesn't help and if middle-of-the-string searches are that important of a requirement then you should consider using a different search solution like Lucene.
Add Full-Text index if you want.
You can search the table using CONTAINS:
SELECT *
FROM YourTable
WHERE CONTAINS(TableColumnName, 'SearchItem')
My title sounds complicated, but the situation is very simple. People search on my site using a term such as "blackfriday".
When they conduct the search, my SQL code needs to look in various places such as a ProductTitle and ProductDescription field to find this term. For example:
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%blackfriday%' OR
ProductDescription LIKE '%blackfriday%'
However, the term appears differently in the database fields. It is most like to appear with a space between the words as such "Black Friday USA 2015". So without going through and adding more combinations to the WHERE clause such as WHERE ProductTitle LIKE '%Black-Friday%', is there a better way to accomplish this kind of fuzzy searching?
I have full-text search enabled on the above fields but its really not that good when I use the CONTAINS clause. And of course other terms may not be as neat as this example.
I should start by saying that "variations (of a string)" is a bit vague. You could mean plurality, verb tenses, synonyms, and/or combined words (or, ignoring spaces and punctuation between 2 words) like the example you posted: "blackfriday" vs. "black friday" vs "black-friday". I have a few solutions of which 1 or more together may work for you depending on your use case.
Ignoring punctuation
Full Text searches already ignore punctuation and match them to spaces. So black-friday will match black friday whether using FREETEXT or CONTAINS. But it won't match blackfriday.
Synonyms and combined words
Using FREETEXT or FREETEXTTABLE for your full text search is a good way to handle synonyms and some matching of combined words (I don't know which ones). You can customize the thesaurus to add more combined words assuming it's practical for you to come up with such a list.
Handling combinations of any 2 words
Maybe your use case calls for you to match poorly formatted text or hashtags. In that case I have a couple of ideas:
Write the full text query to cover each combination of words using a dictionary. For example your data layer can rewrite a search for black friday as CONTAINS(*, '"black friday" OR "blackfriday"'). This may have to get complex, for example would black friday treehouse have to be ("black friday" OR "blackfriday") AND ("treehouse" OR "tree house")? You would need a dictionary to figure out that "treehouse" is made up of 2 words and thus can be split.
If it's not practical to use a dictionary for the words being searched for (I don't know why, maybe acronyms or new memes) you could create a long query to cover every letter combination. So searching for do-re-mi could be "do re mi" OR "doremi" OR "do remi" OR "dore mi" OR "d oremi" OR "d o remi" .... Yes it will be a lot of combinations, but surprisingly it may run quickly because of how full text efficiently looks up words in the index.
A hack / workaround if searching for multiple variations is very important.
Define which fields in the DB are searchable (e.g ProductTitle, ProductDescription)
Before saving these fields in the DB, replace each space (or consecutive spaces by a placeholder e.g "%")
Search the DB for variation matches employing the placeholder
Do the reverse process when displaying these fields on your site (i.e replace placeholder with space)
Alternatively you can enable regex matching for your users (meaning they can define a regex either explicitly or let your app build one from their search term). But it is slower and probably error-prone to do it this way
After looking into everything, I have settled for using SQL's FREETEXT full-text search. Its not ideal, or accurate, but for now it will have to do.
My answer is probably inadequate but do you have any scenarios which wont be addressed by query below.
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%black%friday%' OR
ProductDescription LIKE '%black%friday%'
For some background information, I'm creating an application that searches against a couple of indexed tables to retrieve some records. It isn't overtly complex to the point of say Google, but it's good enough for the purpose it serves, barring this strange issue.
I'm using the Contains() function, and it's going very well, except when the search contains strings of numbers. Now, I'm only passing in a string -- nowhere numerical datatypes being passed in -- only characters. We're searching against a collection of emails, each appended with a custom ID when shot off from a workflow. So while testing, we decided to search via number strings.
In our test, we isolated a number 0042600006, which belongs to one and only one email subject. However, when using our query we are getting results for 0042600001, 0042600002, etc. The query is this as follows (with some generic columns standing in):
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006')
We've tried every possible combination: '0042600006*', '"0042600006"' and '"0042600006*"'.
I think it's just a limitation of the function, but I thought this would probably be the best place for answers. Thanks in advance.
Asked this same question recently. Please see the insightful answer someone left me here
Essentially what this user says to do is to turn off the noise words (Microsoft has included integers 0-9 as noise in the Full Text Search). Hope you can use this awesome tool with integers as I now am!
try to add language 1033 as an additional parameter. that worked with my solution.
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '0042600006', language 1033)
try using
SELECT description, subject FROM tableA WHERE CONTAINS((subject), '%0042600006%')
Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;
I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.