Sql Server 2005 Fulltext case sensitivity problem - sql

I seem to have a weird bug in Microsoft SQL Server 2005 where FREETEXT() searches are somewhat case-sensitive despite the collation being case-insensitive (Latin1_General_CI_AS).
First of, LIKE queries are perfectly case-insensitive, so
WHERE column LIKE '%word%'
and
WHERE column LIKE '%Word%'
return the same results.
Also, FREETEXT are infact case-insensitive to some extent, for instance
WHERE FREETEXT(column, 'Word')
will return results with different cases.
BUT
WHERE FREETEXT(column, 'word')
while still returning case-insensitive matches for word, gives a different resultset.
Or, as I found out after some investigation, searching for word gives all matches for different cases of word but searching for Word gives the same PLUS inflectional results.
Or to use one of the actual cases I found, searching for marketingleader returns all results containing that word, independent of the case, whereas searching for Marketingleader would return those, but also results that just contain leader that don't show up when searching for the lower case.
has anyone got any Idea as to what is causing this and how I could turn on inflectional/fuzzy searching for lower-case words as well?
Any help would be appreciated.

Use the alternative to freetext which is contains and the inflectional results are optional ..
CONTAINS (Transact-SQL)
.. oups just saw that you mention contains in your question, but does it behave the same way as the freetext in the provided examples ?

Related

SQL: LIKE and Contains — Different results

I am using MS SQL Express SQL function Contains to select data. However when I selected data with LIKE operator, I realised that Contains function is missing a few rows.
Rebuilt indexes but it didn't help.
Sql: brs.SearchText like '%aprilis%' and CONTAINS(brs.SearchText, '*aprilis*')
The contains function missed rows like:
22-28.aprīlis
[1.aprīlis]
Sīraprīlis
PS. If I search directly CONTAINS(brs.SearchText, '*22-28.aprīlis*'), then it finds them
contains is functionality based on the full text index. It supports words, phrases, and prefixed matches on words, but not suffixed matches. So you can match words that start with 'aprilis' but not words that end with it or that contain it arbitrarily in the middle. You might be able to take advantage of a thesaurus for these terms.
This is explained in more detail in the documentation.

Searching "AND" in lucene index

I have lucene indexes indexed using StandardAnalyzer. The index consist of a value "AND".
When I try to search for the field value AND using MultiFieldQueryParser, the search is resulting in error.
EG: field1:* AND field2:AND
filed1:* AND field:"AND"
I have tried escape but is that is escaping the field value. I have aslo tried in double coutes("AND"). But could not succed in getting correct value.
Any advice in this regard would be helpful.
Thanks in advance.
I suspect that there are probably two issues in play here:
Query syntax, I think you'll get further by putting the "and" in lower case. Boolean terms in the standard query parser must be in upper case. Anyway, given that one of the steps of the standard analyser is to drop case sensitivity, this shouldn't be an issue
The next problem is stop words: I suspect that "and" is excluded from the set of analysed terms by the standard analysers stop word list. You could get around this by using a different stop word list with the standard analyser that doesn't exclude "and" as a term.
Good luck,

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Searching numeric strings with Full-Text Search in SQL 2005

I'm using the SQL Full-Text Search and have a stored proceedure that uses the FREETEXTTABLE function.
This all works great, however, I have noticed that if I search for something such as 'Chapter 19' the 19 seems as if it is thrown away and the search only searches on 'Chapter'.
Also if I search for just '19' I get no results. I know the columns I have indexed contain a '19' in multiple rows.
Is this the intended behaviour? To not index numerics?
If so, then I suppose I'll have to live with it, but if not I'll be happy to post any T-SQL if anyone thinks I'm doing anything wrong.
Thanks.
P.S. I've googled this and have found nothing on searching numerics will full-text search.
I eventually found the reason behind this.
Numerics are considered as noise words in SQL server. You can allow searching on numerics by removing the numeric entries in the appropriate noise file for your language.
Noise files are found at in the FTData directoraty of your SQL Server install.
The english noise files are: noiseENU.txt & noiseENG.txt
Hope this helps someone.

Search literal within a word

Is there a way to perform a FULLTEXT search which returns literals found within words?
I have been using MATCH(col) AGAINST('+literal*' IN BOOLEAN MODE) but it fails if the text is like:
blah,blah,literal,blah
blahliteralblah
blah,blah,literal
Please Note that there is no space after commas.
I want all three cases above to be returned.
I think that should be better fetching the array of entries and then perform a text manipulation over the fetched data (in this case a search)!
Because any text manipulation or complex query take more resources and if your database contains a lot of data, the query become too slow! Moreover, if you are running your
query on a shared server, that increases the performance issues!
You can easily accomplish what you are trying to do with regex, once you have fetched the data from the database!
UPDATE: My suggestion is the same even if you are running your script on a dedicated server! However, if you want to perform a full-text search of the word "literal" in BOOLEAN MODE like you have described, you can remove the + operator (because you are searching only one word) and construct the query as follow:
SELECT listOfColumsNames WHERE
MATCH (colName)
AGAINST ('literal*' IN BOOLEAN MODE);
However, even if you add the AND operator, your query works fine: tested on Apache Server with MySQL 5.1!
I suggest you to read the documentation about the full-text search in boolean mode.
The only one problem of this query is that doesn't matches the word "literal" if it is a sub-string inside an other word, for example: "textliteraltext".
As you noticed, you can't use the * operator at the beginning of the word!
So, to accomplish what you are trying to do, the fastest and easiest way is to follow the suggestion of Paul, using the % placeholder:
SELECT listOfColumsNames
WHERE colName LIKE '%literal%';