SQL query for a search engine - sql

My project is based on Questions and Answers (stackoverflow style).
I need to allow users to search for previously asked questions.
The Questions table would be like this:
Questions
-------------------------------------------------
id questions
-------------------------------------------------
1 How to cook pasta?
2 How to Drive a car?
3 When did Napoleon die?
Now when I'm going to write something to search for, I would write something like this:
When did Brazil win the world cup?
Let's say I'm gonna split this String on spaces, into an array of Strings.
What is the best SELECT SQL query to fetch all questions containing those Strings, ignoring upper case and lower case for each word, and sorting the results by the less mentioned word, why?
Because there will be so many questions which will contain When,and,will,how,etc.. , but not so many questions which will have Brazil, so Brazil would be like the Key Word.
I'm using SQL Server 2008.

You really don't want to be doing this in raw SQL.
I suggest you look into the full-text search options for your database, this might be a good place to start.

In mysql you have full-text indexes and the match() select function which allow just this,
in SQL Server you should use the function Contains()
Find more info on
http://msdn.microsoft.com/en-us/library/ms142571.aspx

Your option is not the best one. Take a look at open source Apache Solr project. http://lucene.apache.org/solr/
Solr is a standalone enterprise search server with a REST-like API. You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP. You query it via HTTP GET and receive XML, JSON, CSV or binary results.
Advanced Full-Text Search Capabilities Optimized for High Volume Web
Traffic Standards Based Open Interfaces - XML, JSON and HTTP
Comprehensive HTML Administration Interfaces Server statistics
exposed over JMX for monitoring Linearly scalable, auto index
replication, auto failover and recovery Near Real-time indexing
Flexible and Adaptable with XML configuration Extensible Plugin
Architecture
Take a look at Detailed Features and aspecialy Query section. There all you need for your app.

Related

FILESTREAM/FILETABLE Clarifications for Implementation

Recently our team was looking at FILESTREAM to expand the capabilities of our proprietary application. The main purpose of this app is managing the various PDFS, Images and documents to all of the parts we manufacture. Our ASP application uses a few third party tools to allow viewing of these files. We currently have 980GB of data on the Fileserver. We have around 200GB of Binary data in SQL Server that we would like to extract since it is not performing well hence FILESTREAM seems to be a good compromise to the two major data storage/access issues.
A few things are not exactly clear to us:
FILESTREAM Can or Cannot store its data on a drive that is not locally attached. We already have a File Server with a RAID 10 (1.5TB drives). This server stores all of the documents right now, would we have to move these drives to the SQL Server for FILESTREAM? That would be a tough bullet to bite since the server also is doubling as the Application Server (Two VMs on one physical server).
FILETABLE stores the common metadata about the files but where is the Full Text part of it stored to allow searching of files like doc/docx? Is this separate? Are you able to freely add criteria to this to search by? If so any links to clarify would be appreciated.
Can FILETABLE be referenced in another table with a foreign key?
Thank you in advance
EDIT: For those having these questions this web video covered everything and more in terms of explaining filestream from 2008 to 2012 and the cavets to consider (I would seriously rep him if I could): http://channel9.msdn.com/Events/TechDays/Techdays-2012-the-Netherlands/2270
In conclusion we will not be using FILESTREAM as it would be way to huge of an upsurge to accommodate for investment.
EDIT 2:
Update to #1 - After carefully assessing FileTable in addition to FILESTREAM we got a winning combination. We did have to move the files over to the new server (wasn't to painful since they were on the same VM).It honestly took more time to write an extraction tool to dump the binary data within SQL to the File System.
Update to #2 - This was seperate but again Bob had an excellent webinar explaining this: http://channel9.msdn.com/Events/TechEd/Europe/2012/DBI411
Update to #3 - Using TFT inheritance we recycled the Docs table we had (minus the huge binary blobs) which required very little changes in our legacy apps. This was a huge upshot for the developer team.
The location that the files are stored in for FileTables has to be local, or at least must appear to SQL Server as being local so a clever san driver might trick it. Since the FileTables stuff is built on the FILESTREAM stuff I imagine the limitations to be the same.
The searching of filetables is done via the containstable function which is documented on MSDN the search criteria uses the same syntax as FULLTEXT searching AFAIK.
For all intent and purpose the FileTable is a typical table so can be joined, searched or whatever. The only thing is that you have to use some functions of sql server in order to change the FILESTREAM guids into something more useful like a file path.

Grails app: own query language for data stored in DB and files + full text search (Hibernate Search, Compass etc)

I have an application which stored short descriptive data in DB and lots of related textual data in text files.
I would like to add "advanced search" for DB. I was thinking about adding own query language like JIRA does (Jira Query Language). Then I thought about having full text search across those textual files (less priority).
Wondering which tool will better suite me to implement that faster and simpler.
I most of all I want to provide users with ability to write their own queries instead of using elements to specify search filters.
Thanks
UPD. I save dates in DB and most of varchar fields contain one word strings.
UPD2. Apache Derby is used right now.
Take a look at the Searchable plugin for Grails.
http://www.grails.org/plugin/searchable

Would server traffic software (something like Piwik or Google) make a good case for using No SQL?

We are trying to develop a company specific tracking software but not interested in Google or Piwik. Essentially we would have a JavaScript tracking code also. The data that it would capture, would that be best suited for traditional RDMS or can we get a NO SQL solution ?
Any thoughts or ideas welcome.
Creating xml files could do the trick for a no sql solution but web analytics can encompass a very large ammount of data depending on your "tracking software." You'll need some sort of relational data solution if you want to properly analyse the data and see trends such as how many unique visitors are using a specific browser.

How best to develop the sql to support Search functionality in a web application?

Like many web applications (business) the customer wants a form that will search across each data field. The form could have 15-20 different fields where the user can select/enter/input to be used in sql (stored procedure).
These are quite typical requests by the user that most every application has to deal with.
The issue really at hand is how to provide the user with this type of interface/option AND establish fast SQL access. The above fields could span 15 different tables and respective sql statements (usually abstracted to a stored procedure) will have as many joins. The data always has to be brought back to a grid type view as well as some report format (often excel).
I/we are finding these sql statements are slow and hard to optimize as the user can enter 1 or 15 different search criteria.
How should this be done? Looking for suggestions/ideas as to how existing large applications deal with these requirements.
Does it really come down to trying to optimize the sql within the stored procedure?
thx
No, you need to employ a real search engine technology to make fulltext search have good performance. No SQL predicate (e.g. LIKE '%pattern%') is going to be scalable.
You don't identify which brand of RDBMS you're using, but every major brand of RDBMS has their own fulltext search capability:
Microsoft SQL Server: Full-Text Search
Oracle: Oracle Text (formerly ConText)
MySQL: FULLTEXT index (MyISAM only)
PostgreSQL: Text-Search data types and index types
SQLite: Full-Text Search (FTS)
IBM DB2: Text Search
There are also third-party solutions for indexing text, such as:
Apache Lucene / Solr
Sphinx Search
Xapian

Sql Server Full Text Search - Getting word occurances/location in text?

Suppose I have Sql Server (2005/2008) create an index from one of my tables.
I wish to use my own custom search engine (a little more tuned to my needs than Full Text Search).
In order to use it however, I need Sql Server to provide me the word positions and other data required by the search engine.
Is there anyway to query the "index" for this data instead of just getting search results?
Thanks
Roey
No. And if you could, what happens if Microsoft decide to change their internal data structures? Your code would break.
What are you trying to achieve?
You shouldnt rely on SQL servers internal data structures - they are tailored specifically for SQL servers use and aren't acessible for querying anyway.
If you want a fast indexer then you will probably have more success using a pre-written one rather than trying to write your own. Give Lucene.Net a try.