For some reason I cannot find the answer on Google! But with the SQL contains function how can I tell it to start at the beginning of a string, I.e I am looking for the full-text equivalent to
LIKE 'some_term%'.
I know I can use like, but since I already have the full-text index set up, AND the table is expected to have thousands of rows, I would prefer to use Contains.
Thanks!
You want something like this:
Rather than specify multiple terms, you can use a 'prefix term' if the
terms begin with the same characters. To use a prefix term, specify
the beginning characters, then add an asterisk (*) wildcard to the end
of the term. Enclose the prefix term in double quotes. The following
statement returns the same results as the previous one.
-- Search for all terms that begin with 'storm'
SELECT StormID, StormHead, StormBody FROM StormyWeather
WHERE CONTAINS(StormHead, '"storm*"')
http://www.simple-talk.com/sql/learn-sql-server/full-text-indexing-workbench/
You can use CONTAINS with a LIKE subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS('"Alice in wonderland"')
) AS S1
WHERE S1.edition LIKE 'Alice in wonderland%'
This way, the slow LIKE query will be run against a smaller set
The only solution I can think of it to actually prepend a unique word to the beginning of every field in the table.
e.g. Update every row so that 'xfirstword ' appears at the start of the text (e.g. Field1). Then you can search for CONTAINS(Field1, 'NEAR ((xfirstword, "TERM*"),0)')
Pretty crappy solution, especially as we know that the full text index stores the actual position of each word in the text (see this link for details: http://msdn.microsoft.com/en-us/library/ms142551.aspx)
I am facing the similar issue. This is what I have implemented as a work around.
I have made another table and pulled only the rows like 'some_term%'.
Now, on this new table I have implemented the FullText search.
Please do inform me if you tried some other better approach
Related
We have a database where our customer has typed "Bob's" one time and "Bob’s" another time. (Note the slight difference between the single-quote and apostrophe.)
When someone searches for "Bob's" or "Bob’s", I want to find all cases regardless of what they used for the apostrophe.
The only thing I can come up with is looking at people's queries and replacing every occurrence of one or the other with (’|'') (Note the escaped single quote) and using SIMILAR TO.
SELECT * from users WHERE last_name SIMILAR TO 'O(’|'')Dell'
Is there a better way, ideally some kind of setting that allows these to be interchangeable?
You can use regexp matching
with a_table(str) as (
values
('Bob''s'),
('Bob’s'),
('Bobs')
)
select *
from a_table
where str ~ 'Bob[''’]s';
str
-------
Bob's
Bob’s
(2 rows)
Personally I would replace all apostrophes in a table with one query (I had the same problem in one of my projects).
If you find that both of the cases above are valid and present the same information then you might actually consider taking care of your data before it arrives into the database for later retrieval. That means you could effectively replace one sign into another within your application code or before insert trigger.
If you have more cases like the one you've mentioned then specifying just LIKE queries would be a way to go, unfortunately.
You could also consider hints for your customer while creating another user that would fetch up records from database and return closest matches if there are any to avoid such problems.
I'm afraid there is no setting that makes two of these symbols the same in DQL of Postgres. At least I'm not familiar with one.
I have a column with a product names. Some names look like ‘ab-cd’ ‘ab cd’
Is it possible to use full text search to get these names when user types ‘abc’ (without spaces) ? The like operator is working for me, but I’d like to know if it’s possible to use full text search.
If you want to use FTS to find terms that are adjacent to each other, like words separated by a space you should use a proximity term.
You can define a proximity term by using the NEAR keyword or the ~ operator in the search expression, as documented here.
So if you want to find ab followed immediately by cd you could use the expression,
'NEAR((ab,cd), 0)'
searching for the word ab followed by the word cd with 0 terms in-between.
No, unfortunately you cannot make such search via full-text. You can only use LIKE in that case LIKE ('ab%c%')
EDIT1:
You can create a view (WITH SCHEMABINDING!) with some id and column name in which you want to search:
CREATE VIEW dbo.ftview WITH SCHEMABINDING
AS
SELECT id,
REPLACE(columnname,' ','') as search_string
FROM YourTable
Then create index
CREATE UNIQUE CLUSTERED INDEX UCI_ftview ON dbo.ftview (id ASC)
Then create full-text search index on search_string field.
After that you can run CONTAINS query with "abc*" search and it will find what you need.
EDIT2:
But it wont help if search_string does not start with your search term.
For example:
ab c d -> abcd and you search cd
No. Full Text Search is based on WORDS and Phrases. It does not store the original text. In fact, depending on configuration it will not even store all words - there are so called stop words that never go into the index. Example: in english the word "in" is not selective enough to be considered worth storing.
Some names look like ‘ab-cd’ ‘ab cd’
Those likely do not get stored at all. At least the 2nd example is actually 2 extremely short words - quite likely they get totally ignored.
So, no - full text search is not suitable for this.
I have Users table contains about 500,000 rows of users data
The Full Name of the User is stored in 4 columns each have type nvarchar(50)
I have a computed column called UserFullName that's equal to the combination of the 4 columns
I have a Stored Procedure searching in Users table by name using like operatior as below
Select *
From Users
Where UserFullName like N'%'+#FullName+'%'
I have a performance issue while executing this SP .. it take long time :(
Is there any way to overcome the lack of performance of using Like operator ?
Not while still using the like operator in that fashion. The % at the start means your search needs to read every row and look for a match. If you really need that kind of search you should look into using a full text index.
Make sure your computed column is indexed, that way it won't have to compute the values each time you SELECT
Also, depending on your indexing, using PATINDEX might be quicker, but really you should use a fulltext index for this kind of thing:
http://msdn.microsoft.com/en-us/library/ms187317.aspx
If you are using index it will be good.
So you can give the column an id or something, like this:
Alter tablename add unique index(id)
Take a look at that article http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning .
It easily describes how LIKE works in terms of performance.
Likely, you are facing such issues because your whole table shoould be traversed because of first % symbol.
You should try creating a list of substrings(in a separate table for example) representing k-mers and search for them without preceeding %. Also, an index for such column would help. Please read more about KMer here https://en.m.wikipedia.org/wiki/K-mer .
This will not break an index and will be more efficient to search.
I need to filter out records based on some text matching in nvarchar(1000) column.
Table has more than 400 thousands records and growing. For now, I am using Like condition:-
SELECT
*
FROM
table_01
WHERE
Text like '%A1%'
OR Text like '%B1%'
OR Text like '%C1%'
OR Text like '%D1%'
Is there any preferred work around?
SELECT
*
FROM
table_01
WHERE
Text like '%[A-Z]1%'
This will check if the texts contains A1, B1, C1, D1, ...
Reference to using the Like Condition in SQL Server
You can try the following if you know the exact position of your sub string:
SELECT
*
FROM
table_01
WHERE
SUBSTRING(Text,1,2) in ('B1','C1','D1')
Have a look at LIKE on msdn.
You could reduce the number filters by combining more details into a single LIKE clause.
SELECT
*
FROM
table_01
WHERE
Text like '%[ABCD]1%'
If you can create a FULLTEXT INDEX on that column of your table (that assumes a lot of research on performance and space), then you are probably going to see a big improvement on performance on text matching. You can go to this link to see what FULLTEXT SEARCH is
and this link to see how to create a FULLTEXT INDEX.
I needed to do this so that I could allow two different databases in a filter for the DatabaseName column in an SQL Server Profiler Trace Template.
All you can do is fill in the body of a Like clause.
Using the reference in John Hartscock's answer, I found out that the like clause uses a sort of limited regex pattern.
For the OP's scenario, MSMS has the solution.
Assuming I want databases ABCOne, ABCTwo, and ABCThree, I come up with what is essentially independent whitelists for each character:
Like ABC[OTT][NWH][EOR]%
Which is easily extensible to any set of strings. It won't be ironclad, that last pattern would also match ABCOwe, ABCTnr, or ABCOneHippotamus, but if you're filtering a limited set of possible values there's a good chance you can make it work.
You could alternatively use the [^] operator to present a blacklist of unacceptable characters.
The following two queries return the same (expected) result when I query my database:
SELECT * FROM articles
WHERE content LIKE '%Euskaldunak%'
SELECT * FROM articles
WHERE MATCH (content) AGAINST ('+"Euskaldunak"' IN BOOLEAN MODE)
The text in the content field that it's searching looks like this: "...These Euskaldunak, or newcomers..."
However, the following query on the same table returns the expected single result:
SELECT * FROM articles
WHERE content LIKE '%PCC%'
And the following query returns an empty result:
SELECT * FROM articles
WHERE MATCH (content) AGAINST ('+"PCC"' IN BOOLEAN MODE)
The text in the content field that matches this result looks like this: "...Portland Community College (PCC) is the largest..."
I can't figure out why searching for "Euskaldunak" works with that MATCH...AGAINST syntax but "PCC" doesn't. Does anyone see something that I'm not seeing?
(Also: "PCC" is not a common phrase in this field - no other rows contain the word, so the natural language search shouldn't be excluding it.)
Your fulltext minimum word length is probably set too high. I think the default is 4, which would explain what you are seeing. Set it to 1 if you want all words indexed regardless of length.
Run this query:
show variables like 'ft_min_word_len';
If the values is greater than 3 and you want to get hits on words shorter than that, edit your /etc/my.cnf and add or update this line in the [mysqld] section using a value appropriate for your application:
ft_min_word_len = 1
Then restart MySQL and rebuild your fulltext indexes and you should be all set.
There are two things I can think of right away. The first is your ft_min_word_len value is set to more than 3 characters. Any "word" less than the ft_min_word_len length will not get indexed.
The second is that more then 50% of your records contain the 'PCC' string. A full text search that matches more than 50% of the records is considered irrelevant and returns nothing.
Full text indexes have different rules than regular string indexes. For example, there is a stop words list so certain common words like to, the, and, don't get indexed.