Sphinx search SQl to get exact word - sql

My keywords column in table looks like
dog,dogs,dog and cat,dogs and cats
It's a commas separated words
I have try to get exact word now and query like this one is back me result but is not what I need and actually they match all this but I just need to get post ID where is dog, in this example.
SELECT id FROM {$CONF['sphinx_index']} WHERE MATCH('#keywords ",dog" | ",dog," | "dog,"')
In sql this should be easy WHERE (keywords LIKE 'dog,%' OR keywords LIKE '%,dog,%' OR keywords LIKE '%,dog') but for sphinx I can't find solution, anyone have idea how to make query for this ?

CONCAT('_sep_ ',REPLACE(REPLACE(keywords,' ','_space_'),',',' _sep_ '),' _sep_') AS keywords
And than search "_sep_ dog _sep_" to match dog or "_sep_ dogs_space_cats _sep_" for dogs and cats

You could just add comma to charset_table which would mean it indexed as part of words. But would also need to enable infix or prefix matching to be able to match part words, ie dog,dogs,dogs would index as one word.
An alternative might be to turn spaces into say underscores (already in charset_table) and index each tag as a word
dog dogs dogs_and_cats dog_and_cat
Sort of thing. Can just match world tags as words

Related

In Postgres search for multiple words which are partially present in a string

I'm implementing a rudimentary form of the search. I would like to find all products names that contain all words partially in a search query.
So if I have these two products:
Deodorant with a cucumber flavor
Deodorant with apple flavor
I want each individual word in the list of words to be partially present in the string. If any word is not present partially I should discard that row.
the search query: cucumb deod should match only Deoderant with a cucumber flavor.
You have to split the words in your search string and combine them in your query so that in result it looks like this:
... where name like '%cucumb%' and name like '%deod%' ...
Of course, in your code you would create a parameterized query, e.g.:
... where name like ? and name like ? ...
and set the parameters' values accordingly ('%cucumb%' and '%deod%' in the example), depending on the language / API / framework used.

SQL Server full text search and spaces

I have a column with a product names. Some names look like ‘ab-cd’ ‘ab cd’
Is it possible to use full text search to get these names when user types ‘abc’ (without spaces) ? The like operator is working for me, but I’d like to know if it’s possible to use full text search.
If you want to use FTS to find terms that are adjacent to each other, like words separated by a space you should use a proximity term.
You can define a proximity term by using the NEAR keyword or the ~ operator in the search expression, as documented here.
So if you want to find ab followed immediately by cd you could use the expression,
'NEAR((ab,cd), 0)'
searching for the word ab followed by the word cd with 0 terms in-between.
No, unfortunately you cannot make such search via full-text. You can only use LIKE in that case LIKE ('ab%c%')
EDIT1:
You can create a view (WITH SCHEMABINDING!) with some id and column name in which you want to search:
CREATE VIEW dbo.ftview WITH SCHEMABINDING
AS
SELECT id,
REPLACE(columnname,' ','') as search_string
FROM YourTable
Then create index
CREATE UNIQUE CLUSTERED INDEX UCI_ftview ON dbo.ftview (id ASC)
Then create full-text search index on search_string field.
After that you can run CONTAINS query with "abc*" search and it will find what you need.
EDIT2:
But it wont help if search_string does not start with your search term.
For example:
ab c d -> abcd and you search cd
No. Full Text Search is based on WORDS and Phrases. It does not store the original text. In fact, depending on configuration it will not even store all words - there are so called stop words that never go into the index. Example: in english the word "in" is not selective enough to be considered worth storing.
Some names look like ‘ab-cd’ ‘ab cd’
Those likely do not get stored at all. At least the 2nd example is actually 2 extremely short words - quite likely they get totally ignored.
So, no - full text search is not suitable for this.

How to tackle efficient searching of a string that could have multiple variations?

My title sounds complicated, but the situation is very simple. People search on my site using a term such as "blackfriday".
When they conduct the search, my SQL code needs to look in various places such as a ProductTitle and ProductDescription field to find this term. For example:
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%blackfriday%' OR
ProductDescription LIKE '%blackfriday%'
However, the term appears differently in the database fields. It is most like to appear with a space between the words as such "Black Friday USA 2015". So without going through and adding more combinations to the WHERE clause such as WHERE ProductTitle LIKE '%Black-Friday%', is there a better way to accomplish this kind of fuzzy searching?
I have full-text search enabled on the above fields but its really not that good when I use the CONTAINS clause. And of course other terms may not be as neat as this example.
I should start by saying that "variations (of a string)" is a bit vague. You could mean plurality, verb tenses, synonyms, and/or combined words (or, ignoring spaces and punctuation between 2 words) like the example you posted: "blackfriday" vs. "black friday" vs "black-friday". I have a few solutions of which 1 or more together may work for you depending on your use case.
Ignoring punctuation
Full Text searches already ignore punctuation and match them to spaces. So black-friday will match black friday whether using FREETEXT or CONTAINS. But it won't match blackfriday.
Synonyms and combined words
Using FREETEXT or FREETEXTTABLE for your full text search is a good way to handle synonyms and some matching of combined words (I don't know which ones). You can customize the thesaurus to add more combined words assuming it's practical for you to come up with such a list.
Handling combinations of any 2 words
Maybe your use case calls for you to match poorly formatted text or hashtags. In that case I have a couple of ideas:
Write the full text query to cover each combination of words using a dictionary. For example your data layer can rewrite a search for black friday as CONTAINS(*, '"black friday" OR "blackfriday"'). This may have to get complex, for example would black friday treehouse have to be ("black friday" OR "blackfriday") AND ("treehouse" OR "tree house")? You would need a dictionary to figure out that "treehouse" is made up of 2 words and thus can be split.
If it's not practical to use a dictionary for the words being searched for (I don't know why, maybe acronyms or new memes) you could create a long query to cover every letter combination. So searching for do-re-mi could be "do re mi" OR "doremi" OR "do remi" OR "dore mi" OR "d oremi" OR "d o remi" .... Yes it will be a lot of combinations, but surprisingly it may run quickly because of how full text efficiently looks up words in the index.
A hack / workaround if searching for multiple variations is very important.
Define which fields in the DB are searchable (e.g ProductTitle, ProductDescription)
Before saving these fields in the DB, replace each space (or consecutive spaces by a placeholder e.g "%")
Search the DB for variation matches employing the placeholder
Do the reverse process when displaying these fields on your site (i.e replace placeholder with space)
Alternatively you can enable regex matching for your users (meaning they can define a regex either explicitly or let your app build one from their search term). But it is slower and probably error-prone to do it this way
After looking into everything, I have settled for using SQL's FREETEXT full-text search. Its not ideal, or accurate, but for now it will have to do.
My answer is probably inadequate but do you have any scenarios which wont be addressed by query below.
SELECT *
FROM dbo.Products
WHERE ProductTitle LIKE '%black%friday%' OR
ProductDescription LIKE '%black%friday%'

FREETEXT queries in SQL Server 2008 not phrase matching

I have a full text indexed table in SQL Server 2008 that I am trying to query for an exact phrase match using FULLTEXT. I don't believe using CONTAINS or LIKE is appropriate for this, because in other cases the query might not be exact (user doesn't surround phrase in double quotes) and in general I want to flexibility of FREETEXT.
According to the documentation[MSDN] for FREETEXT:
If freetext_string is enclosed in double quotation marks, a phrase match is instead performed; stemming and thesaurus are not performed.
which would lead me to believe a query like this:
SELECT Description
FROM Projects
WHERE FREETEXT(Description, '"City Hall"')
would only return results where the term "City Hall" appears in the Description field, but instead I get results like this:
1 Design of handicap ramp at Manning Hall.
2 Antenna investigation. Client: City of Cranston Engineering Dept.
3 Structural investigation regarding fire damage to International Tennis Hall of Fame.
4 Investigation Roof investigation for proposed satellite design on Herald Hall.
... etc
Obviously those results include at least one of the words in my phrase, but not the phrase itself. What's worse, I had thought the results would be ranked but the two results I actually wanted (because they include the actual phrase) are buried.
SELECT Description
FROM Projects
WHERE Description LIKE '%City Hall%'
1 Major exterior and interior renovation of the existing city hall for Quincy Massachusetts
2 Cursory structural investigation of Pawtucket City Hall tower plagued by leaks.
I'm sure this is a case of me not understanding the documentation, but is there a way to achieve what I'm looking for? Namely, to be able to pass in a search string without quotes and get exactly what I'm getting now or with quotes and get only that exact phrase?
As you said, FREETEXT looks up every word in your phrase, not the phrase as an all. For that you need to use the CONTAINS statement. Like this:
SELECT Description
FROM Projects
WHERE CONTAINS(Description, '"City Hall"')
If you want to get the rank of the results, you have to use CONTAINSTABLE. It works roughly the same, but it returns a table with two columns: [Key] wich contains the primary key of the search table and [Rank], which gives you the rank of the result.

sql query relative searching to previous searched words

I have list of word in table. I want to search for all records contain e.g. book and books, pen and pens, that means, for all the word which ends with 's'. The query should show the word without 's' and the word with 's' too.
not a query "SELECT * FROM words WHERE word LIKE '%s'"
schema definition is,
words = <word, part_of_speech>
I have to search on 'word'
How can I do this?
The result could be,
book
books
pen
pens
Its something like, if there is a value in the colum as 'word' and there is another value as 'word'+'s' then show the rows of both 'word' and 'word'+'s'.
I'm using sqlite.
SELECT word FROM words WHERE word LIKE 'book%'
will match 'book', 'books', 'bookmark', etc
if you want to search for only a specific sufix then try
SELECT
*
FROM
words
WHERE
word = '%s'
or word = '%s' || 's' #change 's' to any addition you want to try
Google the "Porter Stemming Algorithm" and apply it to your data before you load it. This algorithm is as close as you can get to converting not just plurals but many other forms of word to a single word. e.g., "scholarly" becomes "scholar" and stuff like that.
If that does not meet your quality standards, because it will not trap for "mice" and other examples given in other answers, you will have to find a "stemming file". I know of no free ones (which does not mean there are none), but the one we use at my shop is part of a commercial package, so I've never had to find a free one.
At any rate, once you have applied the stemming to the words on the way in, you no longer have to search for multiple versions of a word, you just search for the stem.