How do I compare multiple fields to multiple substrings? - sql

I'm working on a Presto query that checks multiple fields against multiple substrings to see if at least one field contains any of the given substrings. For example, let's say I want to check if either column1, column2, or column3 contain test, inactive, or deprecated.
I could write multiple LIKE comparisons for each field and substring, but it seems a bit repetitive.
-- Functional, but cumbersome
SELECT *
FROM table
WHERE
column1 LIKE '%test%' OR column1 LIKE '%inactive%' OR column1 LIKE '%deprecated%'
OR column2 LIKE '%test%' OR column2 LIKE '%inactive%' OR column2 LIKE '%deprecated%'
OR column3 LIKE '%test%' OR column3 LIKE '%inactive%' OR column3 LIKE '%deprecated%'
I can simplify it a bit with regexp_like() but it's still a bit repetitive.
-- Functional, less cumbersome
SELECT *
FROM table
WHERE
REGEXP_LIKE(column1, 'test|inactive|deprecated')
OR REGEXP_LIKE(column2, 'test|inactive|deprecated')
OR REGEXP_LIKE(column3, 'test|inactive|deprecated')
Ideally I'd like to have a single comparison that covers each field and substring.
-- Non functional pseudocode
SELECT *
FROM table
WHERE (column1, column2, column3) LIKE ('%test%', '%inactive%', '%deprecated%')
Is there a simple way to compare multiple fields to multiple substrings?

You could search on a concatenation of the three columns.
SELECT *
FROM table
WHERE
REGEXP_LIKE(column1+' ' + column2+' ' +column3, 'test|inactive|deprecated')
Also you could put the words your matching against as rows in a new MatchWord table, then be able to add/remove words without changing your query.
SELECT
*
FROM
Data d
WHERE
EXISTS(
SELECT
*
FROM MatchWord w
WHERE
d.column1+' ' +d.column2+' ' +d.column3 LIKE '%' + w.word + '%'
)

Related

How to search a row to see if any of the columns contain a certain string? In SQL

I have a table with 10 columns that each contain string values. I want to run a query that will return any of the rows which have any column value that matches a given string or set of strings.
Is there any way to do this?
The DBMS is MsSQL
if you want exact match, you can use IN keyword and check in all columns
SELECT *
FROM tablename
WHERE 'your string' IN (column1, column2, )
if you want partial match then you have to use LIKE
SELECT *
FROM tablename
WHERE column1 LIKE '%your string%' or column2 LIKE '%your string%' ...
or you can add all columns and do one LIKE check
SELECT *
FROM tablename
WHERE CONCAT(column1,'#',column2,'#',column3,'#',...) LIKE '%your string%'

WHERE for filtering out data based on multiple conditions

Suppose I had a large table TBL_LARGE_TABLE with say 100 columns (column1, column2,...column100 all nullable) and my client gave me a query so i could filter out certain rows:
SELECT * FROM TBL_LARGE_TABLE
WHERE
COLUMN2='00123'
AND
(COLUMN3 LIKE '%garbage%' OR COLUMN3 LIKE '%trash%')
AND
COLUMN100='0';
Now, I want to put the data from TBL_LARGE_TABLE into another table TBL_ANOTHER_LARGE_ONE. What would be the best way to insert from TBL_LARGE_TABLE into TBL_ANOTHER_LARGE_ONE excluding all the rows that will pass the above SELECT statement? I dont want to delete any data, I want the original table to stay as it is. I just want to select the exact opposite of the SELECT statement above.
INSERT INTO TBL_ANOTHER_LARGE_ONE
SELECT *
FROM TBL_LARGE_TABLE
WHERE NOT
(
COLUMN2='00123'
AND
(COLUMN3 LIKE '%garbage%' OR COLUMN3 LIKE '%trash%')
AND
COLUMN100='0
)
INSERT INTO TBL_ANOTHER_LARGE_ONE
SELECT *
FROM TBL_LARGE_TABLE
WHERE
COLUMN2 <>'00123'
AND
(COLUMN3 NOT LIKE '%garbage%' OR COLUMN3 NOT LIKE '%trash%')
AND
COLUMN100 <>'0'
)

SQL SELECT LIKE containing only specific words

I have this query:
SELECT * FROM mytable
WHERE column1 LIKE '%word1%'
AND column1 LIKE '%word2%'
AND column1 LIKE '%word3%'
I need to modify this query to return records for which column1 contains word1 word2 and word3 and nothing else! no other words, just these words.
Example: searching for samsung galaxy s3 should return any combination of samsung s3 galaxy but NOT samsung galaxy s3 lte
Assuming that column1 contains space separated words, and you only want to match on whole words, something like:
SELECT * FROM
(select ' ' + REPLACE(column1,' ',' ') + ' ' as column1 from mytable) t
WHERE
column1 like '% word1 %' AND
column1 like '% word2 %' AND
column1 like '% word3 %' AND
REPLACE(REPLACE(REPLACE(column1,
' word1 ',''),
' word2 ',''),
' word3 ','') = ''
Note that this construction does allow the same word to appear multiple times. It's not clear from the question whether that should be allowed. (Fiddle)
It would be a far better design if these words were stored as separate rows in a separate table that relates back to mytable. We could then use more normal SQL to satisfy this query. Your example looks like it's some kind of tagging example. Having a table storing each tag as a separate row (with an ordinal position also recorded, if required) would turn this into a simple relational division problem.
A way to count how many times a word appears in a column is the expression:
(LEN(column2) - LEN(REPLACE(column2,'word',''))/LEN('word')
but this would again revert back to matching subsequences of larger words as well as the word itself, without more work.
Try This
SELECT * FROM mytable
WHERE column1 LIKE 'word1'
AND column1 LIKE 'word2'
AND column1 LIKE 'word3'
in MySQL you can use regexp as
SELECT * FROM mytable
WHERE column1 regexp '^word[1-3]$';
in postgres you can use 'similar to' key word
i think oracle also has regexp

Is there a single SQL (or its variations) function to check not equals for multiple columns at once?

Just as I can check if a column does not equal one of the strings given in a set.
SELECT * FROM table1 WHERE column1 NOT IN ('string1','string2','string3');
Is there a single function that I can make sure that multiple columns does not equal a single string? Maybe like this.
SELECT * FROM table1 WHERE EACH(column1,column2,column3) <> 'string1';
Such that it gives the same effect as:
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
If not, what's the most concise way to do so?
I believe you can just reverse the columns and constants in your first example:
SELECT * FROM table1 WHERE 'string1' NOT IN (column1, column2, column3);
This assumes you are using SQL Server.
UPDATE:
A few people have pointed out potential null comparison problems (even though your desired query would have the same potential problem). This could be worked around by using COALESCE in the following way:
SELECT * FROM table1 WHERE 'string1' NOT IN (
COALESCE(column1,'NA'),
COALESCE(column2,'NA'),
COALESCE(column3,'NA')
);
You should replace 'NA' with a value that will not match whatever 'string1' is. If you do not allow nulls for columns 1,2 and 3 this is not even an issue.
No, there is no standard SQL way to do this. Barring any special constraints on what the string fields contain there's no more concise way to do it than you've already hit upon (col1 <> 'String1' AND col2 <> 'String2').
Additionally, this kind of requirement is often an indication that you have a flaw in your database design and that you're storing the same information in several different columns. If that is true in your case then consider refactoring if possible into a separate table where each column becomes its own row.
The most concise way to do this is
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
Yes, I cut & pasted that from your original question. :-)
I'm more concerned why you're wanting to compare against all three columns. It sounds like you might have a table that needs normalization. What are the actual columns of column1, column2 and column3. Are they something like phone1, phone2, and phone3? Perhaps those three columns should actually be in a subtable.

A reverse IN statement?

I feel like I am either overthinking this or it's not possible but is there a way to do something like a reverse IN statement in SQL?
Instead of saying:
WHERE column_name NOT IN (x, y, z)
I want to have three columns exclude the same value like:
WHERE column1 NOT LIKE 'X' AND column2 NOT LIKE 'X' AND column3 NOT LIKE 'X'
Is it possible to do this more efficiently with less code?
Edit: I am using a string value. Instead of nulls our DB has a space value, ''.
I used the suggested comment and changed to:
WHERE '' NOT IN (column1, column2, column3)
and it worked perfectly
Is it possible to do this more efficiently with less code?
You can shorten the expression to:
where ' ' not in (column_1, column_2, column_3)
But in most databases, this will have little impact on performance. Such a construct will probably not use an index.
I cannot readily think of a way of expressing this that will use an index (in most databases). Obviously, if this is something you often need to do, you could use a function-based index.
A possibility is to concatenate the columns, like
CONCAT(column1, column2, column3) NOT LIKE '%X%'
another one is to use the suggestion of IronMan and Gordon Linoff, like
'X' not in (column1, column2, column3)
The first approach works in most cases, but not when any of the columns is null (isnull is a remedy for that problem, but makes the code less appealing). The second approach should work in all cases, except when the left operand being part of any of the items in the right operand values (instead of being equal to any of them).
Operator LIKE uses pattern matching, while IN does not.
You can use NOT LIKE or NOT IN but you cannot substitute IN by LIKE or vice versa.
You can use following technique:
DECLARE #A VARCHAR(10) = 'A', #B VARCHAR(10) = 'A', #C VARCHAR(10) = 'B'
SELECT id, COUNT(*) FROM (VALUES (#A), (#B), (#C)) T (id) GROUP BY Id HAVING COUNT(*) > 1
Now, you could have rewritten your where statement as following:
WHERE NOT EXISTS(SELECT id, COUNT(*) FROM (VALUES (column1), (column2), (column3)) T (id) GROUP BY Id HAVING COUNT(*) > 1)
Not sure if it simplifies things, but if the above WHERE statement is true, that will ensure that all 3 columns have different values.