A reverse IN statement? - sql

I feel like I am either overthinking this or it's not possible but is there a way to do something like a reverse IN statement in SQL?
Instead of saying:
WHERE column_name NOT IN (x, y, z)
I want to have three columns exclude the same value like:
WHERE column1 NOT LIKE 'X' AND column2 NOT LIKE 'X' AND column3 NOT LIKE 'X'
Is it possible to do this more efficiently with less code?
Edit: I am using a string value. Instead of nulls our DB has a space value, ''.
I used the suggested comment and changed to:
WHERE '' NOT IN (column1, column2, column3)
and it worked perfectly

Is it possible to do this more efficiently with less code?
You can shorten the expression to:
where ' ' not in (column_1, column_2, column_3)
But in most databases, this will have little impact on performance. Such a construct will probably not use an index.
I cannot readily think of a way of expressing this that will use an index (in most databases). Obviously, if this is something you often need to do, you could use a function-based index.

A possibility is to concatenate the columns, like
CONCAT(column1, column2, column3) NOT LIKE '%X%'
another one is to use the suggestion of IronMan and Gordon Linoff, like
'X' not in (column1, column2, column3)
The first approach works in most cases, but not when any of the columns is null (isnull is a remedy for that problem, but makes the code less appealing). The second approach should work in all cases, except when the left operand being part of any of the items in the right operand values (instead of being equal to any of them).

Operator LIKE uses pattern matching, while IN does not.
You can use NOT LIKE or NOT IN but you cannot substitute IN by LIKE or vice versa.

You can use following technique:
DECLARE #A VARCHAR(10) = 'A', #B VARCHAR(10) = 'A', #C VARCHAR(10) = 'B'
SELECT id, COUNT(*) FROM (VALUES (#A), (#B), (#C)) T (id) GROUP BY Id HAVING COUNT(*) > 1
Now, you could have rewritten your where statement as following:
WHERE NOT EXISTS(SELECT id, COUNT(*) FROM (VALUES (column1), (column2), (column3)) T (id) GROUP BY Id HAVING COUNT(*) > 1)
Not sure if it simplifies things, but if the above WHERE statement is true, that will ensure that all 3 columns have different values.

Related

How do I compare multiple fields to multiple substrings?

I'm working on a Presto query that checks multiple fields against multiple substrings to see if at least one field contains any of the given substrings. For example, let's say I want to check if either column1, column2, or column3 contain test, inactive, or deprecated.
I could write multiple LIKE comparisons for each field and substring, but it seems a bit repetitive.
-- Functional, but cumbersome
SELECT *
FROM table
WHERE
column1 LIKE '%test%' OR column1 LIKE '%inactive%' OR column1 LIKE '%deprecated%'
OR column2 LIKE '%test%' OR column2 LIKE '%inactive%' OR column2 LIKE '%deprecated%'
OR column3 LIKE '%test%' OR column3 LIKE '%inactive%' OR column3 LIKE '%deprecated%'
I can simplify it a bit with regexp_like() but it's still a bit repetitive.
-- Functional, less cumbersome
SELECT *
FROM table
WHERE
REGEXP_LIKE(column1, 'test|inactive|deprecated')
OR REGEXP_LIKE(column2, 'test|inactive|deprecated')
OR REGEXP_LIKE(column3, 'test|inactive|deprecated')
Ideally I'd like to have a single comparison that covers each field and substring.
-- Non functional pseudocode
SELECT *
FROM table
WHERE (column1, column2, column3) LIKE ('%test%', '%inactive%', '%deprecated%')
Is there a simple way to compare multiple fields to multiple substrings?
You could search on a concatenation of the three columns.
SELECT *
FROM table
WHERE
REGEXP_LIKE(column1+' ' + column2+' ' +column3, 'test|inactive|deprecated')
Also you could put the words your matching against as rows in a new MatchWord table, then be able to add/remove words without changing your query.
SELECT
*
FROM
Data d
WHERE
EXISTS(
SELECT
*
FROM MatchWord w
WHERE
d.column1+' ' +d.column2+' ' +d.column3 LIKE '%' + w.word + '%'
)

SQL query to find columns having at least one non null value

I am developing a data validation framework where I have this requirement of checking that the table fields should have at least one non-null value i.e they shouldn't be completely empty having all values as null.
For a particular column, I can easily check using
select count(distinct column_name) from table_name;
If it's greater than 0 I can tell that the column is not empty. I already have a list of columns. So, I can execute this query in the loop for every column but this would mean a lot of requests and it is not the ideal way.
What is the better way of doing this? I am using Microsoft SQL Server.
I would not recommend using count(distinct) because it incurs overhead for removing duplicate values. You can just use count().
You can construct the query for counts using a query like this:
select count(col1) as col1_cnt, count(col2) as col2_cnt, . . .
from t;
If you have a list of columns you can do this as dynamic SQL. Something like this:
declare #sql nvarchar(max);
select #sql = concat('select ',
string_agg(concat('count(', quotename(s.value), ') as cnt_', s.value),
' from t'
)
from string_split(#list) s;
exec sp_executesql(#sql);
This might not quite work if your columns have special characters in them, but it illustrates the idea.
You should probably use exists since you aren't really needing a count of anything.
You don't indicate how you want to consume the results of multiple counts, however one thing you could do is use concat to return a list of the columns meeting your criteria:
The following sample table has 5 columns, 3 of which have a value on at least 1 row.
create table t (col1 int, col2 int, col3 int, col4 int, col5 int)
insert into t select null,null,null,null,null
insert into t select null,2,null,null,null
insert into t select null,null,null,null,5
insert into t select null,null,null,null,6
insert into t select null,4,null,null,null
insert into t select null,6,7,null,null
You can name the result of each case expression and concatenate, only the columns that have a non-null value are included as concat ignores nulls returned by the case expressions.
select Concat_ws(', ',
case when exists (select * from t where col1 is not null) then 'col1' end,
case when exists (select * from t where col2 is not null) then 'col2' end,
case when exists (select * from t where col3 is not null) then 'col3' end,
case when exists (select * from t where col4 is not null) then 'col4' end,
case when exists (select * from t where col5 is not null) then 'col5' end)
Result:
col2, col3, col5
I asked a similar question about a decade ago. The best way of doing this in my opinion would meet the following criteria.
Combine the requests for multiple columns together so they can all be calculated in a single scan.
If the scan encounters a not null value in every column under consideration allow it to exit early without reading the rest of the table/index as reading subsequent rows won't change the result.
This is quite a difficult combination to get in practice.
The following might give you the desired behaviour
SELECT DISTINCT TOP 2 ColumnWithoutNull
FROM YourTable
CROSS APPLY (VALUES(CASE WHEN b IS NOT NULL THEN 'b' END),
(CASE WHEN c IS NOT NULL THEN 'c' END)) V(ColumnWithoutNull)
WHERE ColumnWithoutNull IS NOT NULL
OPTION ( HASH GROUP, MAXDOP 1, FAST 1)
If it gives you a plan like this
Hash match usually reads all its build input first meaning that no shortcircuiting of the scan will happen. If the optimiser gives you an operator in "flow distinct" mode it won't do this however and the query execution can potentially stop as soon as TOP receives its first two rows signalling that a NOT NULL value has been found in both columns and query execution can stop.
But there is no hint to request the mode for hash aggregate so you are dependent on the whims of the optimiser as to whether you will get this in practice. The various hints I have added to the query above are an attempt to point it in that direction however.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

Using LIKE in an Oracle IN clause

I know I can write a query that will return all rows that contain any number of values in a given column, like so:
Select * from tbl where my_col in (val1, val2, val3,... valn)
but if val1, for example, can appear anywhere in my_col, which has datatype varchar(300), I might instead write:
select * from tbl where my_col LIKE '%val1%'
Is there a way of combing these two techniques. I need to search for some 30 possible values that may appear anywhere in the free-form text of the column.
Combining these two statements in the following ways does not seem to work:
select * from tbl where my_col LIKE ('%val1%', '%val2%', 'val3%',....)
select * from tbl where my_col in ('%val1%', '%val2%', 'val3%',....)
What would be useful here would be a LIKE ANY predicate as is available in PostgreSQL
SELECT *
FROM tbl
WHERE my_col LIKE ANY (ARRAY['%val1%', '%val2%', '%val3%', ...])
Unfortunately, that syntax is not available in Oracle. You can expand the quantified comparison predicate using OR, however:
SELECT *
FROM tbl
WHERE my_col LIKE '%val1%' OR my_col LIKE '%val2%' OR my_col LIKE '%val3%', ...
Or alternatively, create a semi join using an EXISTS predicate and an auxiliary array data structure (see this question for details):
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT 1
-- Alternatively, store those values in a temp table:
FROM TABLE (sys.ora_mining_varchar2_nt('%val1%', '%val2%', '%val3%'/*, ...*/))
WHERE t.my_col LIKE column_value
)
For true full-text search, you might want to look at Oracle Text: http://www.oracle.com/technetwork/database/enterprise-edition/index-098492.html
A REGEXP_LIKE will do a case-insensitive regexp search.
select * from Users where Regexp_Like (User_Name, 'karl|anders|leif','i')
This will be executed as a full table scan - just as the LIKE or solution, so the performance will be really bad if the table is not small. If it's not used often at all, it might be ok.
If you need some kind of performance, you will need Oracle Text (or some external indexer).
To get substring indexing with Oracle Text you will need a CONTEXT index. It's a bit involved as it's made for indexing large documents and text using a lot of smarts. If you have particular needs, such as substring searches in numbers and all words (including "the" "an" "a", spaces, etc) , you need to create custom lexers to remove some of the smart stuff...
If you insert a lot of data, Oracle Text will not make things faster, especially if you need the index to be updated within the transactions and not periodically.
No, you cannot do this. The values in the IN clause must be exact matches. You could modify the select thusly:
SELECT *
FROM tbl
WHERE my_col LIKE %val1%
OR my_col LIKE %val2%
OR my_col LIKE %val3%
...
If the val1, val2, val3... are similar enough, you might be able to use regular expressions in the REGEXP_LIKE operator.
Yes, you can use this query (Instead of 'Specialist' and 'Developer', type any strings you want separated by comma and change employees table with your table)
SELECT * FROM employees em
WHERE EXISTS (select 1 from table(sys.dbms_debug_vc2coll('Specialist', 'Developer')) mt
where em.job like ('%' || mt.column_value || '%'));
Why my query is better than the accepted answer: You don't need a CREATE TABLE permission to run it. This can be executed with just SELECT permissions.
In Oracle you can use regexp_like as follows:
select *
from table_name
where regexp_like (name, '^(value-1|value-2|value-3....)');
The caret (^) operator to indicate a beginning-of-line character &
The pipe (|) operator to indicate OR operation.
This one is pretty fast :
select * from listofvalue l
inner join tbl on tbl.mycol like '%' || l.value || '%'
Just to add on #Lukas Eder answer.
An improvement to avoid creating tables and inserting values
(we could use select from dual and unpivot to achieve the same result "on the fly"):
with all_likes as
(select * from
(select '%val1%' like_1, '%val2%' like_2, '%val3%' like_3, '%val4%' as like_4, '%val5%' as like_5 from dual)
unpivot (
united_columns for subquery_column in ("LIKE_1", "LIKE_2", "LIKE_3", "LIKE_4", "LIKE_5"))
)
select * from tbl
where exists (select 1 from all_likes where tbl.my_col like all_likes.united_columns)
I prefer this
WHERE CASE WHEN my_col LIKE '%val1%' THEN 1
WHEN my_col LIKE '%val2%' THEN 1
WHEN my_col LIKE '%val3%' THEN 1
ELSE 0
END = 1
I'm not saying it's optimal but it works and it's easily understood. Most of my queries are adhoc used once so performance is generally not an issue for me.
select * from tbl
where exists (select 1 from all_likes where all_likes.value = substr(tbl.my_col,0, length(tbl.my_col)))
You can put your values in ODCIVARCHAR2LIST and then join it as a regular table.
select tabl1.* FROM tabl1 LEFT JOIN
(select column_value txt from table(sys.ODCIVARCHAR2LIST
('%val1%','%val2%','%val3%')
)) Vals ON tabl1.column LIKE Vals.txt WHERE Vals.txt IS NOT NULL
You don't need a collection type as mentioned in https://stackoverflow.com/a/6074261/802058. Just use an subquery:
SELECT *
FROM tbl t
WHERE EXISTS (
SELECT 1
FROM (
SELECT 'val1%' AS val FROM dual
UNION ALL
SELECT 'val2%' AS val FROM dual
-- ...
-- or simply use an subquery here
)
WHERE t.my_col LIKE val
)

Is there a single SQL (or its variations) function to check not equals for multiple columns at once?

Just as I can check if a column does not equal one of the strings given in a set.
SELECT * FROM table1 WHERE column1 NOT IN ('string1','string2','string3');
Is there a single function that I can make sure that multiple columns does not equal a single string? Maybe like this.
SELECT * FROM table1 WHERE EACH(column1,column2,column3) <> 'string1';
Such that it gives the same effect as:
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
If not, what's the most concise way to do so?
I believe you can just reverse the columns and constants in your first example:
SELECT * FROM table1 WHERE 'string1' NOT IN (column1, column2, column3);
This assumes you are using SQL Server.
UPDATE:
A few people have pointed out potential null comparison problems (even though your desired query would have the same potential problem). This could be worked around by using COALESCE in the following way:
SELECT * FROM table1 WHERE 'string1' NOT IN (
COALESCE(column1,'NA'),
COALESCE(column2,'NA'),
COALESCE(column3,'NA')
);
You should replace 'NA' with a value that will not match whatever 'string1' is. If you do not allow nulls for columns 1,2 and 3 this is not even an issue.
No, there is no standard SQL way to do this. Barring any special constraints on what the string fields contain there's no more concise way to do it than you've already hit upon (col1 <> 'String1' AND col2 <> 'String2').
Additionally, this kind of requirement is often an indication that you have a flaw in your database design and that you're storing the same information in several different columns. If that is true in your case then consider refactoring if possible into a separate table where each column becomes its own row.
The most concise way to do this is
SELECT * FROM table1 WHERE column1 <> 'string1'
AND column2 <> 'string1'
AND column3 <> 'string1';
Yes, I cut & pasted that from your original question. :-)
I'm more concerned why you're wanting to compare against all three columns. It sounds like you might have a table that needs normalization. What are the actual columns of column1, column2 and column3. Are they something like phone1, phone2, and phone3? Perhaps those three columns should actually be in a subtable.