sql query relative searching to previous searched words - sql

I have list of word in table. I want to search for all records contain e.g. book and books, pen and pens, that means, for all the word which ends with 's'. The query should show the word without 's' and the word with 's' too.
not a query "SELECT * FROM words WHERE word LIKE '%s'"
schema definition is,
words = <word, part_of_speech>
I have to search on 'word'
How can I do this?
The result could be,
book
books
pen
pens
Its something like, if there is a value in the colum as 'word' and there is another value as 'word'+'s' then show the rows of both 'word' and 'word'+'s'.
I'm using sqlite.

SELECT word FROM words WHERE word LIKE 'book%'
will match 'book', 'books', 'bookmark', etc

if you want to search for only a specific sufix then try
SELECT
*
FROM
words
WHERE
word = '%s'
or word = '%s' || 's' #change 's' to any addition you want to try

Google the "Porter Stemming Algorithm" and apply it to your data before you load it. This algorithm is as close as you can get to converting not just plurals but many other forms of word to a single word. e.g., "scholarly" becomes "scholar" and stuff like that.
If that does not meet your quality standards, because it will not trap for "mice" and other examples given in other answers, you will have to find a "stemming file". I know of no free ones (which does not mean there are none), but the one we use at my shop is part of a commercial package, so I've never had to find a free one.
At any rate, once you have applied the stemming to the words on the way in, you no longer have to search for multiple versions of a word, you just search for the stem.

Related

Sphinx search SQl to get exact word

My keywords column in table looks like
dog,dogs,dog and cat,dogs and cats
It's a commas separated words
I have try to get exact word now and query like this one is back me result but is not what I need and actually they match all this but I just need to get post ID where is dog, in this example.
SELECT id FROM {$CONF['sphinx_index']} WHERE MATCH('#keywords ",dog" | ",dog," | "dog,"')
In sql this should be easy WHERE (keywords LIKE 'dog,%' OR keywords LIKE '%,dog,%' OR keywords LIKE '%,dog') but for sphinx I can't find solution, anyone have idea how to make query for this ?
CONCAT('_sep_ ',REPLACE(REPLACE(keywords,' ','_space_'),',',' _sep_ '),' _sep_') AS keywords
And than search "_sep_ dog _sep_" to match dog or "_sep_ dogs_space_cats _sep_" for dogs and cats
You could just add comma to charset_table which would mean it indexed as part of words. But would also need to enable infix or prefix matching to be able to match part words, ie dog,dogs,dogs would index as one word.
An alternative might be to turn spaces into say underscores (already in charset_table) and index each tag as a word
dog dogs dogs_and_cats dog_and_cat
Sort of thing. Can just match world tags as words

SQL Server full text search and spaces

I have a column with a product names. Some names look like ‘ab-cd’ ‘ab cd’
Is it possible to use full text search to get these names when user types ‘abc’ (without spaces) ? The like operator is working for me, but I’d like to know if it’s possible to use full text search.
If you want to use FTS to find terms that are adjacent to each other, like words separated by a space you should use a proximity term.
You can define a proximity term by using the NEAR keyword or the ~ operator in the search expression, as documented here.
So if you want to find ab followed immediately by cd you could use the expression,
'NEAR((ab,cd), 0)'
searching for the word ab followed by the word cd with 0 terms in-between.
No, unfortunately you cannot make such search via full-text. You can only use LIKE in that case LIKE ('ab%c%')
EDIT1:
You can create a view (WITH SCHEMABINDING!) with some id and column name in which you want to search:
CREATE VIEW dbo.ftview WITH SCHEMABINDING
AS
SELECT id,
REPLACE(columnname,' ','') as search_string
FROM YourTable
Then create index
CREATE UNIQUE CLUSTERED INDEX UCI_ftview ON dbo.ftview (id ASC)
Then create full-text search index on search_string field.
After that you can run CONTAINS query with "abc*" search and it will find what you need.
EDIT2:
But it wont help if search_string does not start with your search term.
For example:
ab c d -> abcd and you search cd
No. Full Text Search is based on WORDS and Phrases. It does not store the original text. In fact, depending on configuration it will not even store all words - there are so called stop words that never go into the index. Example: in english the word "in" is not selective enough to be considered worth storing.
Some names look like ‘ab-cd’ ‘ab cd’
Those likely do not get stored at all. At least the 2nd example is actually 2 extremely short words - quite likely they get totally ignored.
So, no - full text search is not suitable for this.

How to tackle efficient searching of a string that could have multiple variations?

My title sounds complicated, but the situation is very simple. People search on my site using a term such as "blackfriday".
When they conduct the search, my SQL code needs to look in various places such as a ProductTitle and ProductDescription field to find this term. For example:
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%blackfriday%' OR
ProductDescription LIKE '%blackfriday%'
However, the term appears differently in the database fields. It is most like to appear with a space between the words as such "Black Friday USA 2015". So without going through and adding more combinations to the WHERE clause such as WHERE ProductTitle LIKE '%Black-Friday%', is there a better way to accomplish this kind of fuzzy searching?
I have full-text search enabled on the above fields but its really not that good when I use the CONTAINS clause. And of course other terms may not be as neat as this example.
I should start by saying that "variations (of a string)" is a bit vague. You could mean plurality, verb tenses, synonyms, and/or combined words (or, ignoring spaces and punctuation between 2 words) like the example you posted: "blackfriday" vs. "black friday" vs "black-friday". I have a few solutions of which 1 or more together may work for you depending on your use case.
Ignoring punctuation
Full Text searches already ignore punctuation and match them to spaces. So black-friday will match black friday whether using FREETEXT or CONTAINS. But it won't match blackfriday.
Synonyms and combined words
Using FREETEXT or FREETEXTTABLE for your full text search is a good way to handle synonyms and some matching of combined words (I don't know which ones). You can customize the thesaurus to add more combined words assuming it's practical for you to come up with such a list.
Handling combinations of any 2 words
Maybe your use case calls for you to match poorly formatted text or hashtags. In that case I have a couple of ideas:
Write the full text query to cover each combination of words using a dictionary. For example your data layer can rewrite a search for black friday as CONTAINS(*, '"black friday" OR "blackfriday"'). This may have to get complex, for example would black friday treehouse have to be ("black friday" OR "blackfriday") AND ("treehouse" OR "tree house")? You would need a dictionary to figure out that "treehouse" is made up of 2 words and thus can be split.
If it's not practical to use a dictionary for the words being searched for (I don't know why, maybe acronyms or new memes) you could create a long query to cover every letter combination. So searching for do-re-mi could be "do re mi" OR "doremi" OR "do remi" OR "dore mi" OR "d oremi" OR "d o remi" .... Yes it will be a lot of combinations, but surprisingly it may run quickly because of how full text efficiently looks up words in the index.
A hack / workaround if searching for multiple variations is very important.
Define which fields in the DB are searchable (e.g ProductTitle, ProductDescription)
Before saving these fields in the DB, replace each space (or consecutive spaces by a placeholder e.g "%")
Search the DB for variation matches employing the placeholder
Do the reverse process when displaying these fields on your site (i.e replace placeholder with space)
Alternatively you can enable regex matching for your users (meaning they can define a regex either explicitly or let your app build one from their search term). But it is slower and probably error-prone to do it this way
After looking into everything, I have settled for using SQL's FREETEXT full-text search. Its not ideal, or accurate, but for now it will have to do.
My answer is probably inadequate but do you have any scenarios which wont be addressed by query below.
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%black%friday%' OR
ProductDescription LIKE '%black%friday%'

CONTAINSTABLE adding 'of*' returns no results

I have a containstable query on SQL server 2008:
SELECT contacts.*, [Rank] FROM
CONTAINSTABLE(Contacts, SearchName, '("department*") AND ("work*")') tmp
JOIN contacts on contacts.contactid = tmp.[key]
WHERE contacts.deleted = 0
This returns 1 result as expected, however if the user has entered "of" in their search criteria the query returns no results:
SELECT contacts.*, [Rank] FROM
CONTAINSTABLE(Contacts, SearchName, '("department*") AND ("of*") AND ("work*")') tmp
JOIN contacts on contacts.contactid = tmp.[key]
WHERE contacts.deleted = 0
The full name of the contact record is "department of work and pensions".
The same happens if the user includes "and" in their search. Why are these words breaking the query and is there a way around it or do i have to strip out the words before executing the search?
You need to learn about stop words. These are explained well in the documentation.
The short explanation, though, is that all full text engines keep a list of words that are not indexed. Prominent among these words are things like "of", "the", and similar non-content containing words. Of course, you can configure the server to actually recognize these words. And this is very important in some applications: stop words happen to be very useful when trying to determine the language of a document.
In any case, the word "of" is in the stop word list. So it is not indexed and you cannot find it using CONTAINSTABLE. If you need to search for it, you can implement your own custom stop word list and rebuild the index.

SQL Server CONTAINS with digits gives no results

I have a database table which is full-text indexed and i use the CONTAINS-function to perform a search-query on it.
When I do:
SELECT * FROM Plants WHERE CONTAINS(Plants.Description, '"Plant*" AND "one*"');
I get back all correct results matching a description with the words "Plant" and "one".
Some plant are named like "Plant 1", "Plant 2" etc. and this is the problem.
When i do this, i get no results:
SELECT * FROM Plants WHERE CONTAINS(Plants.Description, '"Plant*" AND "1*"');
Anyone know why?
There is a list of commonly-used words that are not indexed in a keyword search, such as "and" and "the".
I believe the text "1" also appears in that list. Therefore it doesn't appear in the index, and can't be found with the CONTAINS clause.
If I recall correctly, there is an admin interface to allow you to edit that list of common words. I tried editing it once, a few years ago, and I recall having trouble telling the difference after I did.
Daan is correct. you need another * before the 1. Placing wildcards either side of a search term searches the entire string for the search term regardless of its position.