I'm trying to find a way to match a query to a regular expression in a database. As far as I can tell (although I'm no expert), while most DBMS like MySQL have a regex option for searching, you can only do something like:
Find all rows in Column 1 that match the regex in my query.
What I want to be able to do is the opposite, i.e.:
Find all rows in Column 1 such that the regex in Column 1 matches my query.
Simple example - say I had a database structured like so:
+----------+-----------+
| Column 1 | Column 2 |
+----------+-----------+
| [a-z]+ | whatever |
+----------+-----------+
| [\w]+ | whatever |
+----------+-----------+
| [0-9]+ | whatever |
+----------+-----------+
So if I queried "dog", I would want it to return the rows with [a-z]+ and [\w]+, and if I queried 123, it would return the row with [0-9]+.
If you know of a way to do this in SQL, a short SELECT example or a link with an example would be much appreciated.
For MySQL (and may be other databases too):
SELECT * FROM table WHERE "dog" RLIKE(`Column 1`)
In PostgreSQL it would be:
SELECT * FROM table WHERE 'dog' ~ "Column 1";
Related
I have the following table:
postgres=# \d so_rum;
Table "public.so_rum"
Column | Type | Collation | Nullable | Default
-----------+-------------------------+-----------+----------+---------
id | integer | | |
title | character varying(1000) | | |
posts | text | | |
body | tsvector | | |
parent_id | integer | | |
Indexes:
"so_rum_body_idx" rum (body)
I wanted to do phrase search query, so I came up with the below query, for example:
select id from so_rum
where body ## phraseto_tsquery('english','Is it possible to toggle the visibility');
This gives me the results, which only match's the entire text. However, there are documents, where the distance between lexmes are more and the above query doesn't gives me back those data. For example: 'it is something possible to do toggle between the. . . visibility' doesn't get returned. I know I can get it returned with <2> (for example) distance operator by giving in the to_tsquery, manually.
But I wanted to understand, how to do this in my sql statement itself, so that I get the results first with distance of 1 and then 2 and so on (may be till 6-7). Finally append results with the actual count of the search words like the following query:
select count(id) from so_rum
where body ## to_tsquery('english','string & string . . . ')
Is it possible to do in a single query with good performance?
I don't see a canned solution to this. It sounds like you need to use plainto_tsquery to get all the results with all the lexemes, and then implement your own custom ranking function to rank them by distance between the lexemes, and maybe filter out ones with the wrong order.
I have a database of industry-specific terms, each of which may have zero or more synonyms. Users of the system can search for terms by keyword and the results should include any term that contains the keyword or that has at least one synonym that contains the keyword. The result should then include the term and ONLY ONE of the matching synonyms.
Here's the setup... I have a term table with 2 fields: id and term. I also have a synonym table with 3 fields: id, termId, and synonym. So there would data like:
term Table
id | term
-- | -----
1 | dog
2 | cat
3 | bird
synonym Table
id | termId | synonym
-- | ------ | --------
1 | 1 | canine
2 | 1 | man's best friend
3 | 2 | feline
A keyword search for (the letter) "i" should return the following as a result:
id | term | synonym
-- | ------ | --------
1 | dog | canine <- because of the "i" in "canine"
2 | cat | feline <- because of the "i" in "feline"
3 | bird | <- because of the "i" in "bird"
Notice how, even though both "dog" synonyms contain the letter "i", only one was returned in the result (doesn't matter which one).
Because I need to return all matches from the term table regardless of whether or not there's a synonym and I need no more than 1 matching synonym, I'm using an OUTER APPLY as follows:
<!-- language: sql -->
SELECT
term.id,
term.term,
synonyms.synonym
FROM
term
OUTER APPLY (
SELECT
TOP 1
term.id,
synonym.synonym
FROM
synonym
WHERE
term.id = synonym.termId
AND synonym.synonym LIKE #keyword
) AS synonyms
WHERE
term.term LIKE #keyword
OR synonyms.synonym LIKE #keyword
There are indexes on term.term, synonym.termId and synonym.synonym. #Keyword is always something like '%foo%'. The problem is that, with close to 50,000 terms (not that much for databases, I know, but...), the performance is horrible. Any thoughts on how this can be done more efficiently?
Just a note, one thing I had thought to try was flattening the synonyms into a comma-delimited list in the term table so that I could get around the OUTER APPLY. Unfortunately though, that list can easily exceed 900 characters which would then prevent SQL Server from adding an index to that column. So that's a no-go.
Thanks very much in advance.
You've got a lot of unnecessary logic in there. There's no telling how SQL server is creating an execution path. It's simpler and more efficient to split this up into two separate db calls and then merge them in your code:
Get matches based on synonyms:
SELECT
term.id
,term.term
,synonyms.synonym
FROM
term
INNER JOIN synonyms ON term.termId = synonyms.termId
WHERE
synonyms.synonym LIKE #keyword
Get matches based on terms:
SELECT
term.id
,term.term
FROM
term
WHERE
term.term LIKE #keyword
For "flattening the synonyms into a comma-delimited list in the term table: - Have you considered using Full Text Search feature? It would be much faster even when your data goes on becoming bulky.
You can put all synonyms (as comma delimited) in "synonym" column and put full text index on the same.
If you want to get results also with the synonyms of the words, I recommend you to use Freetext. This is an example:
SELECT Title, Text, * FROM [dbo].[Post] where freetext(Title, 'phone')
The previous query will match the words with ‘phone’ by it’s meaning, not the exact word. It will also compare the inflectional forms of the words. In this case it will return any title that has ‘mobile’, ‘telephone’, ‘smartphone’, etc.
Take a look at this article about SQL Server Full Text Search, hope it helps
I need to test if any part of a column value is in a given string, instead of whether the string is part of a column value.
For instance:
This way, I can find if any of the rows in my table contains the string 'bricks' in column:
SELECT column FROM table
WHERE column ILIKE '%bricks%';
But what I'm looking for, is to find out if any part of the sentence "The ships hung in the sky in much the same way that bricks don’t" is in any of the rows.
Something like:
SELECT column FROM table
WHERE 'The ships hung in the sky in much the same way that bricks don’t' ILIKE '%' || column || '%';
So the row from the first example, where the column contains 'bricks', will show up as result.
I've looked through some suggestions here and some other forums but none of them worked.
Your simple case can be solved with a simple query using the ANY construct and ~*:
SELECT *
FROM tbl
WHERE col ~* ANY (string_to_array('The ships hung in the sky ... bricks don’t', ' '));
~* is the case insensitive regular expression match operator. I use that instead of ILIKE so we can use original words in your string without the need to pad % for ILIKE. The result is the same - except for words containing special characters: %_\ for ILIKE and !$()*+.:<=>?[\]^{|}- for regular expression patterns. You may need to escape special characters either way to avoid surprises. Here is a function for regular expressions:
Escape function for regular expression or LIKE patterns
But I have nagging doubts that will be all you need. See my comment. I suspect you need Full Text Search with a matching dictionary for your natural language to provide useful word stemming ...
Related:
IN vs ANY operator in PostgreSQL
PostgreSQL LIKE query performance variations
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
This query:
SELECT
regexp_split_to_table(
'The ships hung in the sky in much the same way that bricks don’t',
'\s' );
gives a following result:
| regexp_split_to_table |
|-----------------------|
| The |
| ships |
| hung |
| in |
| the |
| sky |
| in |
| much |
| the |
| same |
| way |
| that |
| bricks |
| don’t |
Now just do a semijoin against a result of this query to get desired results
SELECT * FROM table t
WHERE EXISTS (
SELECT * FROM (
SELECT
regexp_split_to_table(
'The ships hung in the sky in much the same way that bricks don’t',
'\s' ) x
) x
WHERE t.column LIKE '%'|| x.x || '%'
)
First: I'm using Access 2010.
What I need to do is pull everything in a field out that is NOT a certain string. Say for example you have this:
00123457*A8V*
Those last 3 characters that are bolded are just an example; that portion can be any combination of numbers/letters and from 2-4 characters long. The 00123457 portion will always be the same. So what I would need to have returned by my query in the example above is the "A8V".
I have a vague idea of how to do this, which involved using the Right function, with (field length - the last position in that string). So what I had was
SELECT Right(Facility.ID, (Len([ID) - InstrRev([ID], "00123457")))
FROM Facility;
Logically in this mind it would work, however Access 2010 complains that I am using the Right function incorrectly. Can someone here help me figure this out?
Many thanks!
Why not use a replace function?
REPLACE(Facility.ID, "00123457", "")
You are missing a closing square bracket in here Len([ID)
You also need to reverse this "00123457" in InStrRev(), but you don't need InStrRev(), just InStr().
If I understand correctly, you want the last three characters of the string.
The simple syntax: Right([string],3) will yield the results you desire.
(http://msdn.microsoft.com/en-us/library/ms177532.aspx)
For example:
(TABLE1)
| ID | STRING |
------------------------
| 1 | 001234567A8V |
| 2 | 008765432A8V |
| 3 | 005671234A8V |
So then you'd run this query:
SELECT Right([Table1.STRING],3) AS Result from Table1;
And the Query returns:
(QUERY)
| RESULT |
---------------
| A8V |
| A8V |
| A8V |
EDIT:
After seeing the need for the end string to be 2-4 characters while the original, left portion of the string is 00123457 (8 characters), try this:
SELECT Right([Table1].[string],(Len([Table1].[string])-'8')) AS Result
FROM table1;
I have some table 'TableName' like Id,Name:
1 | something
2 | _something
3 | something like that
4 | some_thing
5 | ...
I want to get all rows from this table where name containes 'some'.
I have 2 ways:
SELECT * FROM TableName WHERE Name like '%some%'
Result is table :
1 | something
2 | _something
3 | something like that
4 | some_thing
But if I use CONTAINS function
SELECT * FROM TableName WHERE CONTAINS(Name,'"*some*"')
I get only
1 | something
3 | something like that
What should I do to make CONTAINS function work properly?
The last time I looked (admittedly SQL Server 2000) CONTAINS didn't support wildcard matching at the beginning of words, only at the end. Also, you might need to check your noise files to see if the "_" character is being ignored.
Also see
How do you get leading wildcard full-text searches to work in SQL Server?
http://doc.ddart.net/mssql/sql70/ca-co_15.htm
If you read this article you will see that * means prefix this means that word must start with this, but like means the word contains key phrase.
Best Regards,
Iordan
Try this:
SELECT * FROM TableName WHERE CONTAINS(Name,'some')