Can you perform an AND search of keywords using FREETEXT() on SQL Server 2005? - sql-server-2005

There is a request to make the SO search default to an AND style functionality over the current OR when multiple terms are used.
The official response was:
not as simple as it sounds; we use SQL Server 2005's FREETEXT() function, and I can't find a way to specify AND vs. OR -- can you?
So, is there a way?
There are a number of resources on it I can find, but I am not an expert.

As far as I've seen, it is not possible to do AND when using FREETEXT() under SQL 2005 (nor 2008, afaik).
A FREETEXT query ignores Boolean, proximity, and wildcard operators by design. However you could do this:
WHERE FREETEXT('You gotta love MS-SQL') > 0
AND FREETEXT('You gotta love MySQL too...') > 0
Or that's what I think :)
-- The idea is make it evaluate to Boolean, so you can use boolean operators. Don't know if this would give an error or not. I think it should work. But reference material is pointing to the fact that this is not possible by design.
The use of CONTAINS() instead of FREETEXT() could help.

OK, this change is in -- we now use CONTAINS() with implicit AND instead of FREETEXT() and its implicit OR.

I just started reading about freetext so bear with me. If what you are trying to do is allow searches for a tag, say VB, also find things tagged as VB6, Visual Basic, VisualBasic and VB.Net, wouldn't those values be set as synonyms in the DB's Thesaurus rather than query parameters?
If that is indeed the case, this link on MSDN explains how to add items to the Thesaurus.

Related

Pure SQL queries in Rails view?

I am asked to display some sort of data in my Rails App view with pure SQL query without help of ActiveRecord. This is done for the Application owner to be able to implement some third-party reporting tool (Pentaho or something).
I am bad in SQL and I am not really sure if that is even possible to do something similar. Does anyone have any suggestions?
If you must drop down to pure SQL, you can make life more pleasant by using 'find by sql':
bundy = MyModel.find_by_sql("SELECT my_models.id as id, my_articles.title as title from my_models, my_articles WHERE foo = 3 AND ... ...")
or similar. This will give you familiar objects which you can access using dot notation as you'd expect. The elements in the SELECT clause are available, as long as you use 'as' with compound parameters to give them a handle:
puts bundy.id
puts bundy.title
To find out what all the results are for a row, you can use 'attributes':
bundy.first.attributes
Also you can construct your queries using ActiveRecord, then call 'to_sql' to give you the raw SQL you're using.
sql = MyModel.joins(:my_article).where(id: [1,2,3,4,5]).to_sql
for example.
Then you could call:
MyModel.find_by_sql(sql)
Better, though, may be to just use ActiveRecord, then pass the result of 'to_sql' into whatever the reporting tool is that you need to use with it. Then your code maintains its portability and maintainability.

Using cast on a view in a Derby database that has multiple columns

This question is a quick one. I know that you could say something like SELECT CAST(someColumn AS otherType) FROM myView, where myView is a view in a Derby database, but is it combinable with other statements in a SELECT statement? In other words, could I say something like SELECT columnA, CAST(columnB AS otherType), someOtherColumns FROM myView, or would you have to cast YOUR ENTIRE QUERY? //I have been looking online for something that might help, but with no examples of this. I have even tried this on w3schools.com, but it was complaining that the database is read-only (somehow, cast() was changing the database!). Otherwise, I have found no working examples where someone has done something like what I am asking here.
I have found an answer to that question: yes, you can say something like I have said above in Derby. Here is an example: http://www.ibm.com/developerworks/library/os-ad-trifecta6/ //although I still couldn't use those methods on a database on w3schools.com!
You do not have to cast the entire freaking table; you could just cast one column at a time! Heck, in Derby, you could say DOUBLE(23), and it would return something like 23.000000!
I don't know what either Derby or w3schools.com means, but you need to understand that CAST is a function and all functions have syntax that is particular to whatever environment you are in. Many functions follow ANSI standard syntax but many others do not, especially in database languages. It doesn't make sense to use a function "on an entire query".
Here is a link to the Derby version 10.10 documentation for your DOUBLE function, a function that creates a floating point value (which is why you see it displayed as "23.000000"). When you get to that page, scroll down to the functions described in the section labeled "Built-in functions". You'll also see the documentation for the CAST function as well.
And note that although I've never heard of Derby before today, having access to the documentation like this means I'd fell comfortable developing code for it if the need ever came up.
In other words, always consult the documentation for the correct version of the database you are using!

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Good SQL search tool?

FreeTextTable is really great for searching, as it actually returns a relevancy score for each item it finds.
The problem is, it doesn't support the logical operator AND, so if I have 10 items with the word 'ice' in it, but not 'cream', and vice versa, then 20 results will be returned, when in this scenario 0 should've been returned.
Are there any alternative tools to search a SQL Server database? Or should I just write my own code to provide 'AND' functionality (I.E. doing two seperate searches in the scenario 'Ice'Cream' (splitting each search by spaces))
You can try SQL Search from RedGate.
It is a free tool (though not open source) - I have used it before and it is very powerful.
There is also a free SQL Search tool from ApexSQL you can try. It integrates into SSMS and can also show relationship diagrams and help with safely removing/renaming objects in your database. They do require you to leave email but the product itself is completely free. ApexSQL Search
Since you have full text search enabled to use FREETEXTTABLE perhaps you could make use of CONTAINS instead? (I have to be honest, I've not used full text search myself).
It would appear you can query like this:
SELECT Name, Price FROM Product
WHERE CONTAINS(Name, 'ice')
AND CONTAINS(Name, 'cream')

Regular expression to match common SQL syntax?

I was writing some Unit tests last week for a piece of code that generated some SQL statements.
I was trying to figure out a regex to match SELECT, INSERT and UPDATE syntax so I could verify that my methods were generating valid SQL, and after 3-4 hours of searching and messing around with various regex editors I gave up.
I managed to get partial matches but because a section in quotes can contain any characters it quickly expands to match the whole statement.
Any help would be appreciated, I'm not very good with regular expressions but I'd like to learn more about them.
By the way it's C# RegEx that I'm after.
Clarification
I don't want to need access to a database as this is part of a Unit test and I don't wan't to have to maintain a database to test my code. which may live longer than the project.
Regular expressions can match languages only a finite state automaton can parse, which is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you can stop trying.
SQL is a type-2 grammar, it is too powerful to be described by regular expressions. It's the same as if you decided to generate C# code and then validate it without invoking a compiler. Database engine in general is too complex to be easily stubbed.
That said, you may try ANTLR's SQL grammars.
As far as I know this is beyond regex and your getting close to the dark arts of BnF and compilers.
http://savage.net.au/SQL/
Same things happens to people who want to do correct syntax highlighting. You start cramming things into regex and then you end up writing a compiler...
I had the same problem - an approach that would work for all the more standard sql statements would be to spin up an in-memory Sqlite database and issue the query against it, if you get back a "table does not exist" error, then your query parsed properly.
Off the top of my head: Couldn't you pass the generated SQL to a database and use EXPLAIN on them and catch any exceptions which would indicate poorly formed SQL?
Have you tried the lazy selectors. Rather than match as much as possible, they match as little as possible which is probably what you need for quotes.
To validate the queries, just run them with SET NOEXEC ON, that is how Entreprise Manager does it when you parse a query without executing it.
Besides if you are using regex to validate sql queries, you can be almost certain that you will miss some corner cases, or that the query is not valid from other reasons, even if it's syntactically correct.
I suggest creating a database with the same schema, possibly using an embedded sql engine, and passing the sql to that.
I don't think that you even need to have the schema created to be able to validate the statement, because the system will not try to resolve object_name etc until it has successfully parsed the statement.
With Oracle as an example, you would certainly get an error if you did:
select * from non_existant_table;
In this case, "ORA-00942: table or view does not exist".
However if you execute:
select * frm non_existant_table;
Then you'll get a syntax error, "ORA-00923: FROM keyword not found where expected".
It ought to be possible to classify errors into syntax parsing errors that indicate incorrect syntax and errors relating to tables name and permissions etc..
Add to that the problem of different RDBMSs and even different versions allowing different syntaxes and I think you really have to go to the db engine for this task.
There are ANTLR grammars to parse SQL. It's really a better idea to use an in memory database or a very lightweight database such as sqlite. It seems wasteful to me to test whether the SQL is valid from a parsing standpoint, and much more useful to check the table and column names and the specifics of your query.
The best way is to validate the parameters used to create the query, rather than the query itself. A function that receives the variables can check the length of the strings, valid numbers, valid emails or whatever. You can use regular expressions to do this validations.
public bool IsValid(string sql)
{
string pattern = #"SELECT\s.*FROM\s.*WHERE\s.*";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
return rgx.IsMatch(sql);
}
I am assuming you did something like .\* try instead [^"]* that will keep you from eating the whole line. It still will give false positives on cases where you have \ inside your strings.