MySql Regular Expression Select Columns Matching (dynamic)multiple Values within Stored Procedure - sql

I am trying to generate a query where I want to select columns(text) matching multiple values.
eg: I have two columns, id and description. suppose my first row contains description column with value
Google is website and an awesome
search engine
, second row with description column value
Amazon website is an awesome eCommerce
store
I have created a query
Select * from table_name where
description REGEXP 'Website \|
Search'
It returns both the rows i.e with both google and amazon, but i want to return only google as i want those rows with both the words website and search also the number of words to be matched is also not fixed, basically the query I am creating is for a search drop down,
All the words that are passed should be present in the column, the order of the words present in the column is not important. If there are other better options besides using regex , please do point out.
Editing: the number of words that are passed are dynamic and not known, the user may pass additional words to be matched against the column. I would be using the query within a stored Procedure

Really don;t think the regex solution is going to be good for you from a performance point of view. Think you should be looking for FULL text searches.
Specifically you need to create a full text index with something like this in the table definition:
create table testTable
(
Id int auto_increment not null,
TextCol varchar(500)
fulltext(TextCol)
);
Then your query gets easier:
select * from testTable where Match(TextCol) against ('web')
AND Match(TextCol) against ('server')
Strongly suggest you read the MySQL docs regarding FULLTEXT matching and there are lots of little tricks and features that will be useful in this task (including more efficient ways to run the query above)
Edit: Perhaps Boolean mode will help you to an easy solution like this:
Match(textCol) against ('web+ Server+' in boolean mode)
All you have to do is build the against string so I think this can be done in an SP with out dynamic SQL

Related

Optimising LIKE expressions that start with wildcards

I have a table in a SQL Server database with an address field (ex. 1 Farnham Road, Guildford, Surrey, GU2XFF) which I want to search with a wildcard before and after the search string.
SELECT *
FROM Table
WHERE Address_Field LIKE '%nham%'
I have around 2 million records in this table and I'm finding that queries take anywhere from 5-10s, which isn't ideal. I believe this is because of the preceding wildcard.
I think I'm right in saying that any indexes won't be used for seek operations because of the preceeding wildcard.
Using full text searching and CONTAINS isn't possible because I want to search for the latter parts of words (I know that you could replace the search string for Guil* in the below query and this would return results). Certainly running the following returns no results
SELECT *
FROM Table
WHERE CONTAINS(Address_Field, '"nham"')
Is there any way to optimise queries with preceding wildcards?
Here is one (not really recommended) solution.
Create a table AddressSubstrings. This table would have multiple rows per address and the primary key of table.
When you insert an address into table, insert substrings starting from each position. So, if you want to insert 'abcd', then you would insert:
abcd
bcd
cd
d
along with the unique id of the row in Table. (This can all be done using a trigger.)
Create an index on AddressSubstrings(AddressSubstring).
Then you can phrase your query as:
SELECT *
FROM Table t JOIN
AddressSubstrings ads
ON t.table_id = ads.table_id
WHERE ads.AddressSubstring LIKE 'nham%';
Now there will be a matching row starting with nham. So, like should make use of an index (and a full text index also works).
If you are interesting in the right way to handle this problem, a reasonable place to start is the Postgres documentation. This uses a method similar to the above, but using n-grams. The only problem with n-grams for your particular problem is that they require re-writing the comparison as well as changing the storing.
I can't offer a complete solution to this difficult problem.
But if you're looking to create a suffix search capability, in which, for example, you'd be able to find the row containing HWilson with ilson and the row containing ABC123000654 with 654, here's a suggestion.
WHERE REVERSE(textcolumn) LIKE REVERSE('ilson') + '%'
Of course this isn't sargable the way I wrote it here. But many modern DBMSs, including recent versions of SQL server, allow the definition, and indexing, of computed or virtual columns.
I've deployed this technique, to the delight of end users, in a health-care system with lots of record IDs like ABC123000654.
Not without a serious preparation effort, hwilson1.
At the risk of repeating the obvious - any search path optimisation - leading to the decision whether an index is used, or which type of join operator to use, etc. (independently of which DBMS we're talking about) - works on equality (equal to) or range checking (greater-than and less-than).
With leading wildcards, you're out of luck.
The workaround is a serious preparation effort, as stated up front:
It would boil down to Vertica's text search feature, where that problem is solved. See here:
https://my.vertica.com/docs/8.0.x/HTML/index.htm#Authoring/AdministratorsGuide/Tables/TextSearch/UsingTextSearch.htm
For any other database platform, including MS SQL, you'll have to do that manually.
In a nutshell: It relies on a primary key or unique identifier of the table whose text search you want to optimise.
You create an auxiliary table, whose primary key is the primary key of your base table, plus a sequence number, and a VARCHAR column that will contain a series of substrings of the base table's string you initially searched using wildcards. In an over-simplified way:
If your input table (just showing the columns that matter) is this:
id |the_search_col |other_col
42|The Restaurant at the End of the Universe|Arthur Dent
43|The Hitch-Hiker's Guide to the Galaxy |Ford Prefect
Your auxiliary search table could contain:
id |seq|search_token
42| 1|Restaurant
42| 2|End
42| 3|Universe
43| 1|Hitch-Hiker
43| 2|Guide
43| 3|Galaxy
Normally, you suppress typical "fillers" like articles and prepositions and apostrophe-s , and split into tokens separated by punctuation and white space. For your '%nham%' example, however, you'd probably need to talk to a linguist who has specialised in English morphology to find splitting token candidates .... :-]
You could start by the same technique that I use when I un-pivot a horizontal series of measures without the PIVOT clause, like here:
Pivot sql convert rows to columns
Then, use a combination of, probably nested, CHARINDEX() and SUBSTRING() using the index you get from the CROSS JOIN with a series of index integers as described in my post suggested above, and use that very index as the sequence for the auxiliary search table.
Lay an index on search_token and you'll have a very fast access path to a big table.
Not a stroll in the park, I agree, but promising ...
Happy playing -
Marco the Sane

Select rows in SQL with regex values that match input?

Rather than selecting rows based on whether their string value equals a given regular expression input, I want to select rows with regular expressions that match a given string input.
As far as purpose, I'm am trying to identify website names from input URLs.
TABLE
WEBSITE REGEX
The New York Times ^.+\.nytimes.com.*$
Is there a good way to do this? I'm using postgres, and I was hoping to avoid large loops.
Thanks!
This seems to work fine:
CREATE TABLE Sites
(
SiteName text,
RegEx text
);
INSERT INTO Sites VALUES ('NY Times', '^.+\.nytimes.com.*$');
Then you can do:
SELECT * FROM Sites
WHERE 'http://www.nytimes.com/Foo' ~ RegEx;
Fiddle
Keep in mind this might start to get slow if you had a lot of rows, as it's going to have to do a sequential table scan each time and run the regular expression against each row. A better approach might be to parse the URL first and normalize it in some way, then look for an exact match in the table.

How to overcome ( SQL Like operator ) performance issues?

I have Users table contains about 500,000 rows of users data
The Full Name of the User is stored in 4 columns each have type nvarchar(50)
I have a computed column called UserFullName that's equal to the combination of the 4 columns
I have a Stored Procedure searching in Users table by name using like operatior as below
Select *
From Users
Where UserFullName like N'%'+#FullName+'%'
I have a performance issue while executing this SP .. it take long time :(
Is there any way to overcome the lack of performance of using Like operator ?
Not while still using the like operator in that fashion. The % at the start means your search needs to read every row and look for a match. If you really need that kind of search you should look into using a full text index.
Make sure your computed column is indexed, that way it won't have to compute the values each time you SELECT
Also, depending on your indexing, using PATINDEX might be quicker, but really you should use a fulltext index for this kind of thing:
http://msdn.microsoft.com/en-us/library/ms187317.aspx
If you are using index it will be good.
So you can give the column an id or something, like this:
Alter tablename add unique index(id)
Take a look at that article http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning .
It easily describes how LIKE works in terms of performance.
Likely, you are facing such issues because your whole table shoould be traversed because of first % symbol.
You should try creating a list of substrings(in a separate table for example) representing k-mers and search for them without preceeding %. Also, an index for such column would help. Please read more about KMer here https://en.m.wikipedia.org/wiki/K-mer .
This will not break an index and will be more efficient to search.

How to use like condition with multiple values in sql server 2005?

I need to filter out records based on some text matching in nvarchar(1000) column.
Table has more than 400 thousands records and growing. For now, I am using Like condition:-
SELECT
*
FROM
table_01
WHERE
Text like '%A1%'
OR Text like '%B1%'
OR Text like '%C1%'
OR Text like '%D1%'
Is there any preferred work around?
SELECT
*
FROM
table_01
WHERE
Text like '%[A-Z]1%'
This will check if the texts contains A1, B1, C1, D1, ...
Reference to using the Like Condition in SQL Server
You can try the following if you know the exact position of your sub string:
SELECT
*
FROM
table_01
WHERE
SUBSTRING(Text,1,2) in ('B1','C1','D1')
Have a look at LIKE on msdn.
You could reduce the number filters by combining more details into a single LIKE clause.
SELECT
*
FROM
table_01
WHERE
Text like '%[ABCD]1%'
If you can create a FULLTEXT INDEX on that column of your table (that assumes a lot of research on performance and space), then you are probably going to see a big improvement on performance on text matching. You can go to this link to see what FULLTEXT SEARCH is
and this link to see how to create a FULLTEXT INDEX.
I needed to do this so that I could allow two different databases in a filter for the DatabaseName column in an SQL Server Profiler Trace Template.
All you can do is fill in the body of a Like clause.
Using the reference in John Hartscock's answer, I found out that the like clause uses a sort of limited regex pattern.
For the OP's scenario, MSMS has the solution.
Assuming I want databases ABCOne, ABCTwo, and ABCThree, I come up with what is essentially independent whitelists for each character:
Like ABC[OTT][NWH][EOR]%
Which is easily extensible to any set of strings. It won't be ironclad, that last pattern would also match ABCOwe, ABCTnr, or ABCOneHippotamus, but if you're filtering a limited set of possible values there's a good chance you can make it work.
You could alternatively use the [^] operator to present a blacklist of unacceptable characters.

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.