How to handle a "keyword search" via Stored Procedure? - sql-server-2005

I'm creating a self-help FAQ type application and one of the requirements is that the end user has to be able to search for FAQ topics. I have three models of note, listed below with their relevant (i.e. searchable) columns:
Topic: Name, Description
Question: Name, Answer
Problem: Name, Solution
All three tables are linked to Topic via a TopicID column. The idea is to provide a single textbox where the user can enter a search query, something either as a sentence (e.g. "How do I perform X") or a phrase (e.g. "Performing X" or "Perform X"), and provide all Topics/Questions/Problems that have any of the words they entered in either the name or description/answer/solution fields; the model will only ever have those columns searchable and I don't have to worry about filtering out the common words like "How" and such (It would be nice but isn't a requirement as it's not an exact match but a fuzzy match).
For reasons outside of my control, I have to use a Stored Procedure. My question is what would be the most appropriate way to handle a search like this; I've seen similar questions regarding multiple columns but really there is not a variable number of columns, there are always two columns per table that are actually searchable. The issue is that the search query could, in theory, be nearly anything - a sentence, a phrase, a comma-separated list of terms (e.g. "x,y,z"), so I would have to split the search term into components (e.g. split on whitespace) and then search each pair of columns for every term? Is that reasonably easy to do in SQL Server? The alternative, a little messier, is to just pull all the data back and then split the query and filter the results in the server-side code as there shouldn't ever be that many items entered, but I would feel a little dirty doing something like that ;-)

Suggest creating a new Full Text Catalog, and assign the table and columns to that catalog. Ensure your catalog is being updated at the right frequency for your needs.
You can then query this catalog using the FREETEXT predicate. It sounds like you need to match on those suffixes like 'ing', so suggest FREETEXT over CONTAINS in this case.
You can use a variable in this search, so it'll be easy to fit into a stored proc.
declare #token varchar(256);
select #token = 'perform';
select * from Problem
where freetext(Name, #token)
or freetext(Solution, #token);
--this will match 'perform' and 'performing'

Related

SQL Server Efficient Search for LIKE '%str%'

In Sql Server, I have a table containing 46 million rows.
In "Title" column of table, I want make search. The word may be at any index of field value.
For example:
Value in table: BROTHERS COMPANY
Search string: ROTHER
I want this search to match the given record. This is exactly what LIKE '%ROTHER%' do. However, LIKE '%%' usage should not be used on large tables because of performance issues. How can I achieve it?
Though I don't know your requirements, your best approach may be to challenge them. Middle-of-the-string searches are usually not very practical. If you can get your users to perform prefix searches (broth%) then you can easily use Full Text's wildcard search (CONTAINS(*, '"broth*"')). Full Text can also handle suffix searches (%rothers) with a little extra work.
But when it comes to middle-of-the-string searches with SQL Server, you're stuck using LIKE. However you may be able to improve performance of LIKE by using a binary collation as explained in this article. (I hate to post a link without including its content but it is way too long of an article to post here and I don't understand the approach enough to sum it up.)
If that doesn't help and if middle-of-the-string searches are that important of a requirement then you should consider using a different search solution like Lucene.
Add Full-Text index if you want.
You can search the table using CONTAINS:
SELECT *
FROM YourTable
WHERE CONTAINS(TableColumnName, 'SearchItem')

SQL Server Text Searching

I have a business requirement where we need to do somce crazy name matching against records stored in the database and I was wondering if there is any easy way to do it using SQL Server.
Name Stored in the DB : Austin K
Name to be Matched from UI : Austin Kierland
That's just a sample. In reality, there could be whole lot of different permutations and combinations.
If it's other way round, I could've used wild character but in this case, the name in the database is smaller than the search criteria.
Any suggestions?
Realistically - no. Databases were meant for comparing absolute values, not for messy comparisons. The way they store their data internally just isn't fit for really messy matching. Actually even a superpowerful dedicated search engine like Google, that has a LOT of messy matching features, wouldn't be able to pull off your example without prior knowledge.
I don't know how the requirement is precisely worded, but I'd either shoot the feature request with "technically impossible", or implement a rule set for which messy matches are tried - for your example, you could easily 'hard code' that multiple searches are executed when capitalized words are entered, shortening them so a single letter. No idea if that's a solution to your problem though.
You can do a normal search using the LIKE operator which determines whether a specific character string matches a specified pattern. The problem you will run into is the probability of the returning of multiple records or incorrect people. I've had similar requirement myself for a business app and the best solution to the issue is to require other qualifying values rather then just name. If you do a partial name search without other qualifying data you are certainly going to come across the false positive matches and/or multiple records. In my case I built a web service that checks eligibility allowing text search for first & last name but also added date of birth, primary person SSN, and gender which ensured the matching person was in deed the person intended to search for. If my situation was like yours in which name was the only search criteria my recommendation to the business would be we cannot perform the search until qualifying data is entered into the database otherwise there is no accurate way to query the results they are looking for.

Multiple searches within a search result set (stored procedure)

Multiple searches within a search result set while using all the search terms used in in that session.
For example, I have a table User (UserId, UserName, UserAddress, UserCity)
What I am trying to do is, I want to search all the columns in the table, for example using a user's name, (which might give me a result set consisting of more than 1 result). I want to be able to search within the result set again using a new search term (not necessarily have to have the first search term in the search field), but this time, it must search within the result set of the 1st search. This might go on breaking down the result set until what is required is found.
Sorry if I might sound very confusing with my request. I've tried and still got no clue to where to start with. I've tried googling and browsed through this website, but couldn't find what i am really trying to find.
I want to be able to search within the result-set again using a new
search term [...], but this time, it must search within the result set
of the 1st search .This might go on breaking down the result-set until
what is required is found.
It seems to me that you have not yet understood that SQL is a declarative language, not an imperative one. And yes, there are stored procedures, but these are a procedural extension to SQL and don't alter the fact that SQL is essentially declarative.
So instead of "breaking down the result-set until what is required is found", you specify all criteria at once, and preferably do so without resorting to a stored procedure until you've understood non-procedural SQL.
To give you an example, a query using multiple predicates (facts about the desired result specified in a WHERE clause) might look like this:
SELECT UserId FROM User
WHERE UserName LIKE 'cook%'
AND UserAddress LIKE 'sesam%'
AND UserCity = 'Hamburg';

How do i include other fields in a lucene search?

Lets use emails for an example as a document. You have your subject, body, the person who its from and lets say we can also tag them (as gmail does)
From my understanding of QueryParser i give it ONE field and the parser type. If a user enter text the user only searches whatever i set. I notice it will look in the subject or body field if i wrote fieldName: text to search however how do i make a regular query such as "funny SO question unicorn" find result(s) with some of those strings in the subject, the others in the body? ATM because i knew it would be easy i made a field called ALL and combined all the other fields into that but i would like to know how i can do it in a proper way. Especially since my next app is text search dependent
Use MultiFieldQueryParser. You can specify list of fields to be searched using following constructor.
MultiFieldQueryParser(Version matchVersion, String[] fields, Analyzer analyzer)
This will generate a query as if you have created multiple queries on different fields. This partially addresses your problem. This, still, will not match one term matching in field1 and another matching in field2. For this, as you have rightly pointed out, you will need to combine all the fields in one single field and search in that field. Nevertheless, you will find MultiFieldQueryParser useful when query terms do not cross the field boundaries.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.