SELECT if string contains column value - sql

Manufacturer
==========================
id name
--------------------------
1 Company Inc.
2 Google Test.
3 3M (UNITY) USA. INC.
4 CE EE
Say, I have a string 'Google Test. 1257 SCS RANDOM 31233DD' and I want to find all rows in table manufacturer where ht name is part of the given string:
SELECT * FROM manufacturer
WHERE 'Google Test. 1257 SCS RANDOM 31233DD' ILIKE '%' || name || '%'
Correctly returns:
id name
--------------------------
2 Google Test.
But when I do:
SELECT * FROM manufacturer
WHERE '3dad QTICE EEN ' ILIKE '%' || name || '%'
it returns:
id name
--------------------------
4 CE EE
I don't want partial matches like this. The name shall not match in the middle of a word. I tried substring():
SELECT * from manufacturer
WHERE SUBSTRING('Google Test. 1257 SCS RANDOM 31233DD' from name) != '';
But I get:
ERROR: invalid regular expression: quantifier operand invalid
Unfortunately I don't have the exact spec to go off since I am querying this off external db. But from what I have seen, column is varchar(256). All values are upper cased and use plain spaces. All start with either character or number and end with either number, char, or special character. Ex: 'CLEVLAND DRILL (GREEN)'. There are special characters in the value, such as ,.()&/
I am not really looking for efficiency as long as it doesn't take over 50ms to do one query.
As of right now, there are about 10000+ entries but it could def grow over time.

One method with LIKE is to add spaces to the begining and end:
SELECT *
FROM db
WHERE ' ' || '3dad QTICE EEN ' || ' ' ILIKE '% ' || manufacturer || ' %'
If you need more sophisticated matching, then you might need to use regular expressions with word boundaries.

All the values start with either character or a number and end with either number, char, or special character. ... There are special characters in the value, such as ,.()&/.
I suggest the regular expression match operator ~. Carefully define boundaries and escape special characters in name:
Create once:
CREATE OR REPLACE FUNCTION f_regexp_escape(text)
RETURNS text AS
$func$
SELECT regexp_replace($1, '([!$()*+.:<=>?[\\\]^{|}-])', '\\\1', 'g')
$func$ LANGUAGE sql IMMUTABLE;
Then:
SELECT * FROM manufacturer
WHERE '3dad QTICE EEN ' ~ ('\m' || f_regexp_escape(name) || '( |$)')
How? Why?
\m .. beginning of a word. Works, since: values start with either character or number
( |$) .. a space or the end of the string. We need this since values: end with either number, char, or special character
The content of manufacturer.name is the core of the pattern. You want the literal meaning of all its characters, so strip any special meaning by escaping properly. This is true for LIKE (few special characters) as well as the regular expression match operator ~ (more special characters). Often overlooked and quite a pitfall. That got you (and the tricky definition of bounds). Read this!
Escape function for regular expression or LIKE patterns
And then use the function f_regexp_escape() as demonstrated. A name like:
3M (UNITY) USA. INC.
becomes:
3M \(UNITY\) USA\. INC\.
Might be convenient to store readily escaped patterns in table manufacturer, maybe as additional column. And maybe with added padding like this:
\m3M \(UNITY\) USA\. INC\.( |$)
Or generate the pattern on the fly like demonstrated.
This way name can be a single word or a whole phrase, and end with any characters. But start and end never match in the middle of a "word" on the other side.
There is an arsenal of other pattern matching tools in Postgres:
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
If your table is big, consider the full text search infrastructure with optimized indexes and phrase search capability:
How to search hyphenated words in PostgreSQL full text search?

To solve this problem you really need to use regex, as adding a space either side of the string will not match at the beginning or end of the line. By using regex, we can check for that situation too. For example:
SELECT *
FROM manufacturer
WHERE 'Google Test. 1257 36700 SCS RANDOM WORD 31233DD' ~ ('(^| )' || name || '( |$)');
Output:
id name
2 Google Test.
Query:
SELECT *
FROM manufacturer
WHERE '3dad QTICE EEN ' ~ ('(^| )' || name || '( |$)');
Output:
There are no results to be displayed.
Query:
SELECT *
FROM manufacturer
WHERE 'CE EE ' ~ ('(^| )' || name || '( |$)');
Output:
id name
4 CE EE
Demo on dbfiddle
Update
Because the name values in the table can contain characters that have special meaning in a regex, they need to be escaped before the name is included into the regex. You can do this with REGEXP_REPLACE e.g.
REGEXP_REPLACE(name, '([\\.+*?[^\]$(){}=!<>|:\-#])', '\\\1', 'g')
So your query should be:
SELECT *
FROM manufacturer
WHERE 'Google Test. 1257 36700 SCS RANDOM WORD 31233DD' ~ ('(^| )' || REGEXP_REPLACE(name, '([\\.+*?[^\]$(){}=!<>|:\-#])', '\\\1', 'g') || '( |$)');
Updated demo

Related

SQL Using LIKE and ANY at the same time

I have a table with a column feature of type text and a text array (text[]) named args. I need to select from the table those rows in which the feature column contains at least one of the elements of the args array.
I've tried different options, including this:
SELECT * FROM myTable WHERE feature LIKE '%' + ANY (args) + '%';
But that does not work.
The simple solution is to use the regular expression match operator ~ instead, which works with strings in arg as is (without concatenating wildcards):
SELECT *
FROM tbl
WHERE feature ~ ANY(args);
string ~ 'pattern' is mostly equivalent to string LIKE '%pattern%', but not exactly, as LIKE uses different (and fewer) special characters than ~. See:
Escape function for regular expression or LIKE patterns
If that subtle difference is not acceptable, here is an exact implementation of what you are asking for:
SELECT *
FROM tbl t
WHERE t.feature LIKE ANY (SELECT '%' || a || '%' FROM unnest(t.args) a);
Unnest the array, pad each element with wildcards, and use LIKE ANY with the resulting set.
See:
IN vs ANY operator in PostgreSQL

UpperCase Replace(Split_Part())

Community,
I need assistance with removing the UNDER SCORES '_' and make the name readable first name letter UpperCase last name UpperCase, while removing the number as well. Hope this makes sense. I am running Presto and using Query Fabric. I there a better way to write this syntax?
Email Address
Full_Metal_Jacket#movie.com
TOP_GUN2#movie.email.com
Needed Outcome
Full Metal Jacket
Top Gun
Partical working Resolution:
,REPLACE(SPLIT_PART(T.EMAIL, '#', 1),'_',' ') Name
Something like this:
,LOWER(REPLACE(UPPER(SPLIT_PART(T.EMAIL, '#', 1)),'_',' '))Name
Try this:
WITH t(email) AS (
VALUES 'Full_Metal_Jacket#movie.com', 'TOP_GUN2#movie.email.com'
)
SELECT array_join(
transform(
split(regexp_extract(email, '(^[^0-9#]+)', 1), '_'),
part -> upper(substr(part, 1, 1)) || lower(substr(part, 2))),
' ')
FROM t;
How it works:
extract the non-numeric prefix up to the # using a regex via regexp_extract
split the prefix on _ to produce an array
transform the array by capitalizing the first letter of each element and lowercasing the rest.
Finally, join them all together with a space using the array_join function.
Update:
Here's another variant without involving transform and the intermediate array:
regexp_replace(
replace(regexp_extract(email, '(^[^0-9#]+)', 1), '_', ' '),
'(\w)(\w*)',
x -> upper(x[1]) || lower(x[2]))
Like the approach above, it first extracts the non-numeric prefix, then it replaces underscores with spaces with the replace function, and finally, it uses regexp_replace to process each word. The (\w)(\w*) regular expression captures the first letter of the word and the rest of the word into two separate capture groups. The x -> upper(x[1]) || lower(x[2]) lambda expression then capitalizes the first letter (first capture group -- x[1]) and lower cases the rest (second capture group -- x[2]).

Full Text Search Using Multiple Partial Words

I have a sql server database that has medical descriptions in it. I've created a full text index on it, but I'm still figuring out how this works.
The easiest example to give is if there is a description of Hypertensive heart disease
Now they would like to be able to type hyp hea as a search term and have it return that.
So from what I've read it seems like my query needs to be something like
DECLARE #Term VARCHAR(100)
SET #Term = 'NEAR(''Hyper*'',''hea*'')'
SELECT * FROM Icd10Codes WHERE CONTAINS(Description, #Term)
If I take the wild card out for Hypertensive and heart, and type out the full words it works, but adding the wild card in returns nothing.
If it makes any difference I'm using Sql Server 2017
So it was a weird syntax issue that didn't cause an error, but stopped the search from working.
I changed it to
SELECT * FROM Icd10Codes where CONTAINS(description, '"hyper*" NEAR "hea*"')
The key here being I needed double quotes " and not to single quotes. I assumed it was two single quotes, the first to escape the second, but it was actually double quotes. The above query returns the results exactly as expected.
this will work:
SELECT * FROM Icd10Codes where SOUNDEX(description)=soundex('Hyp');
SELECT * FROM Icd10Codes where DIFFERENCE(description,'hyp hea')>=2;
You could try a like statement. You can find a thorough explanation here.
Like so:
SELECT * FROM Icd10Codes WHERE Icd10Codes LIKE '%hyp hea%';
And then instead of putting the String in there just use a variable.
If you need to search for separated partial words, as in an array of search terms, it gets a bit tricky, since you need to dynamically build the SQL statement.
MSSQL provides a few features for full text search. You can find those here. One of them is the CONTAINS keyword:
SELECT column FROM table WHERE CONTAINS (column , 'string1 string2 string3');
For me - this had more mileage.
create a calculated row with fields as full text search.
fullname / company / lastname all searchable.
ALTER TABLE profiles ADD COLUMN fts tsvector generated always as (to_tsvector('english', coalesce(profiles.company, '') || ' ' || coalesce(profiles.broker, '') || ' ' || coalesce(profiles.firstname, '') || ' ' || coalesce(profiles.lastname, '') || ' ' )) stored;
let { data, error } = await supabase.from('profiles')
.select()
.textSearch('fts',str)

Not Understanding Where Statement

I am new to SQL, I'm looking through some code for a small database for a medical office. What does the following mean and what will it do...Patient I get, its a field in the DB. This code is repeated for each of the fields.
WHERE ( LOWER ( "Patient" ) LIKE ( '%' || LOWER ( :Patient ) || '%' )
It does a case insensitive comparison looking for rows where the "Patient" column contains the substring passed in the :Patient parameter.
LOWER converts both sides to lower case.
|| is the ANSI SQL string concatenation operator.
% in a LIKE pattern is a wildcard meaning "match any set of zero or more characters".
So if :Patient was Smith the expression works out as
WHERE LOWER ( "Patient" ) LIKE '%smith%'

SQL (MySQL): Match first letter of any word in a string?

(Note: This is for MySQL's SQL, not SQL Server.)
I have a database column with values like "abc def GHI JKL". I want to write a WHERE clause that includes a case-insensitive test for any word that begins with a specific letter. For example, that example would test true for the letters a,c,g,j because there's a 'word' beginning with each of those letters. The application is for a search that offers to find records that have only words beginning with the specified letter. Also note that there is not a fulltext index for this table.
You can use a LIKE operation. If your words are space-separated, append a space to the start of the string to give the first word a chance to match:
SELECT
StringCol
FROM
MyTable
WHERE
' ' + StringCol LIKE '% ' + MyLetterParam + '%'
Where MyLetterParam could be something like this:
'[acgj]'
To look for more than a space as a word separator, you can expand that technique. The following would treat TAB, CR, LF, space and NBSP as word separators.
WHERE
' ' + StringCol LIKE '%['+' '+CHAR(9)+CHAR(10)+CHAR(13)+CHAR(160)+'][acgj]%'
This approach has the nice touch of being standard SQL. It would work unchanged across the major SQL dialects.
Using REGEXP opearator:
SELECT * FROM `articles` WHERE `body` REGEXP '[[:<:]][acgj]'
It returns records where column body contains words starting with a,c,g or i (case insensitive)
Be aware though: this is not a very good idea if you expect any heavy load (not using index - it scans every row!)
Check the Pattern Matching and Regular Expressions sections of the MySQL Reference Manual.