Full text search on normalized database - sql

I have a normalized sql server 2005 database. An example of a table that is something like this:
This is abbreviated. However, the normal query syntax simply uses joins to show the location as city state zip and the name of the customer and so on.
I would like to implement full text search on those values. So if LocationID = 43 which is Phoenix AZ I would like the user to be able to search for 'Phoenix' or 'AZ' and return the associated rows. Similarly, if they search for 'Smith Phoenix' they will get all orders for a customer with a name similar to Smith in Phoenix.
My question is, should I use a View or a UDF to build a table that replaces the value 43 with 'Phoenix AZ'? And implement fulltext search from there?
How do I implement fulltext search on a normalized database?

You need to add the full text index on the table that has the string values. Then use CONTAINS or FREETEXT along with your joins.

Honestly, for something like this, I'd use Lucene.NET (assuming a .NET front end, or just Lucene for the back end). While you could search on each of those items, I've found that SQL Server full-text search is more of a pain than it's worth.
With Lucene, you create indexes when you add/edit/delete items in the DB, and then search those indexes (each item is a document with fields which you specify).


Select normalized strings

I have a table which contains company names which appear to have been a free text box entry. As such there ends up being lots of companies with 3-5 entries such as A Good Company, A Good Company LLC, AA Good Company etc.
I know if I was looking for one company I could use like (%) to get all the variations, but I would like to insert them into a new company table with just one row for all options so that I can use that as a reference table going forward. Is there a way to do this within SQL, or in an outside application for that matter?

How to re-rank documents based on their attributes rather than just their field relevance?

I'm trying to use Solr to re-rank document results based relevance to the user searching. For example, if I search joann*this could return documents where the Name field is anything from joanna to joanne. What I'm trying to do is to return documents that match on certain attributes that I have as well-- this could be something like us both having the field Location = "NYC".
So my question is two fold- is there a way to grab and handle a users information when they are making a query and also is there a way to re-rank based on these additional field values? Would this look more like writing some code or just an expanded query?
it looks to me like you are talking about functionality that Query Reranking exactly provides. Did you check that out?

Solr: Search in multiple fields BUT STOP if documents match was found

I want to search in multiple fields in Solr.
(In know the concept of the copy-fields and I know the (e)dismax search handler.)
So I have an orderd list of fields, I want the terms to be searched against.
1.) SKU
2.) Name
3.) Description
4.) Summary
and so on.
Now, when the query matches a term, let's say in the SKU field, I want this match and no further searches in the proceeding fields.
Only, if there are NO matches at all in the first field (SKU field), the second field (in this case "name") should be used and so on.
Is this possible with Solr?
Do I have to implement my own Lucene Search Handler for this?
Any advice is welcome!
Thank you,
I think your case requires executing 4 different searches. If you implement you very own SearchHandler you could avoid penalty of search result accumulation in 4 different request. Which means, you would send one query, and custom SearchHandler would execute 4 searches and prepare one result set.
If my guess is right you want to rank the results based on the order of the fields. If so then you can just use standard query like
q=sku:(query)^4 OR name:(query)^3 OR description:(query)^2 OR summary:(query)
this will rank the results by the order of the fields.
Hope is helps.

Apache SOLR search by category

I am using apache-solr-1.4.1 and jdk1.6.0_14.
I have the following scenario.
I have 3 categories of data indexed in SOLR i.e. CITIES, STATES, COUNTRIES.
When I query data from SOLR I need the search result from SOLR based on the following criteria:
In a single query to SOLR I need data fetched from SOLR grouped by each category with a predefined results count for each category.
How can I specify this condition in SOLR?
I have tried to use SOLR Field Collapsing feature, but I am not able to get the desired output from SOLR.
Please suggest.
My solution is not exactly what you have asked but is my take on what SOLR does best, which is full text search. Instead of grouping the results by "category", I'd suggest you order the results by relevance score but also provide a facet count for the category values. In my experience users expect a "search" to behave like Google, with the best matches at the top. Deviating form this norm confuses the user in most cases.
If you want exactly as you have asked (actual results grouped by category) then you could use a relational database and do a group_by or write a custom function query with SOLR (I cannot advise on this as I've never done it).
More info: index the data with the appropriate fields, e.g. name, population, etc. But also add a field called "category", which would have a value of either CITIES, STATES or COUNTRIES. Then perform a standard SOLR search, which will return results in order of relevance - i.e. best matches at the top. As part of the request, you can specify a facet.field=category, which will return counts for the search results for each of the given categories (in the "facet" results section). In the UI you can then create links for each category facet which performs the original search plus &fq=category:CITIES, etc., thus restricting results to just that category. See the facetting overview on the SOLR wiki for more info.

How to design a database table structure for storing and retrieving search statistics?

I'm developing a website with a custom search function and I want to collect statistics on what the users search for.
It is not a full text search of the website content, but rather a search for companies with search modes like:
by company name
by area code
by provided services
How to design the database for storing statistics about the searches?
What information is most relevant and how should I query for them?
Well, it's dependent on how the different search modes work, but generally I would say that a table with 3 columns would work:
SearchType SearchValue Count
Whenever someone does a search, say they search for "Company Name: Initech", first query to see if there are any rows in the table with SearchType = "Company Name" (or whatever enum/id value you've given this search type) and SearchValue = "Initech". If there is already a row for this, UPDATE the row by incrementing the Count column. If there is not already a row for this search, insert a new one with a Count of 1.
By doing this, you'll have a fair amount of flexibility for querying it later. You can figure out what the most popular searches for each type are:
... ORDER BY Count DESC WHERE SearchType = 'Some Search Type'
You can figure out the most popular search types:
... GROUP BY SearchType ORDER BY SUM(Count) DESC
This is a pretty general question but here's what I would do:
Option 1
If you want to strictly separate all three search types, then create a table for each. For company name, you could simply store the CompanyID (assuming your website is maintaining a list of companies) and a search count. For area code, store the area code and a search count. If the area code doesn't exist, insert it. Provided services is most dependent on your setup. The most general way would be to store key words and a search count, again inserting if not already there.
Optionally, you could store search date information as well. As an example, you'd have a table with Provided Services Keyword and a unique ID. You'd have another table with an FK to that ID and a SearchDate. That way you could make sense of the data over time while minimizing storage.
Option 2
Treat all searches the same. One table with a Keyword column and a count column, incorporating SearchDate if needed.
You may want to check this: