How to combine IN operator with LIKE condition (or best way to get comparable results) - sql

I need to select rows where a field begins with one of several different prefixes:
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
What is the best way to do this using SQL in Oracle or SQL Server? I'm looking for something like the following statements (which are incorrect):
select * from table where field like in ('ab%', 'cd%', 'ef%', ...)
or
select * from table where field like in (select foo from bar)
EDIT:
I would like to see how this is done with either giving all the prefixes in one SELECT statement, of having all the prefixes stored in a helper table.
Length of the prefixes is not fixed.

Joining your prefix table with your actual table would work in both SQL Server & Oracle.
DECLARE #Table TABLE (field VARCHAR(32))
DECLARE #Prefixes TABLE (prefix VARCHAR(32))
INSERT INTO #Table VALUES ('ABC')
INSERT INTO #Table VALUES ('DEF')
INSERT INTO #Table VALUES ('ABDEF')
INSERT INTO #Table VALUES ('DEFAB')
INSERT INTO #Table VALUES ('EFABD')
INSERT INTO #Prefixes VALUES ('AB%')
INSERT INTO #Prefixes VALUES ('DE%')
SELECT t.*
FROM #Table t
INNER JOIN #Prefixes pf ON t.field LIKE pf.prefix

you can try regular expression
SELECT * from table where REGEXP_LIKE ( field, '^(ab|cd|ef)' );

If your prefix is always two characters, could you not just use the SUBSTRING() function to get the first two characters of "field", and then see if it's in the list of prefixes?
select * from table
where SUBSTRING(field, 1, 2) IN (prefix1, prefix2, prefix3...)
That would be "best" in terms of simplicity, if not performance. Performance-wise, you could create an indexed virtual column that generates your prefix from "field", and then use the virtual column in your predicate.

Depending on the size of the dataset, the REGEXP solution may or may not be the right answer. If you're trying to get a small slice of a big dataset,
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
may be rewritten behind the scenes as
select * from table
where field like 'ab%'
union all
select * from table
where field like 'cd%'
union all
select * from table
where field like 'ef%'
Doing three index scans instead of a full scan.
If you know you're only going after the first two characters, creating a function-based index could be a good solution as well. If you really really need to optimize this, use a global temporary table to store the values of interest, and perform a semi-join between them:
select * from data_table
where transform(field) in (select pre_transformed_field
from my_where_clause_table);

You can also try like this, here tmp is temporary table that is populated by the required prefixes. Its a simple way, and does the job.
select * from emp join
(select 'ab%' as Prefix
union
select 'cd%' as Prefix
union
select 'ef%' as Prefix) tmp
on emp.Name like tmp.Prefix

Related

Redshift - Extract value matching a condition in Array

I have a Redshift table with the following column
How can I extract the value starting by cat_ from this column please (there is only one for each row and at different position in the array)?
I want to get those results:
cat_incident
cat_feature_missing
cat_duplicated_request
Thanks!
There is no easy way to extract multiple values from within one column in SQL (or at least not in the SQL used by Redshift).
You could write a User-Defined Function (UDF) that returns a string containing those values, separated by newlines. Whether this is acceptable depends on what you wish to do with the output (eg JOIN against it).
Another option is to pre-process the data before it is loaded into Redshift, to put this information in a separate one-to-many table, with each value in its own row. It would then be trivial to return this information.
You can do this using tally table (table with numbers). Check this link on information how to create this table: http://www.sqlservercentral.com/articles/T-SQL/62867/
Here is example how you would use it. In real life you should replace temporary #tally table with a permanent one.
--create sample table with data
create table #a (tags varchar(500));
insert into #a
select 'blah,cat_incident,mcr_close_ticket'
union
select 'blah-blah,cat_feature_missing,cat_duplicated_request';
--create tally table
create table #tally(n int);
insert into #tally
select 1
union select 2
union select 3
union select 4
union select 5
;
--get tags
select * from
(
select TRIM(SPLIT_PART(a.tags, ',', t.n)) AS single_tag
from #tally t
inner join #a a ON t.n <= REGEXP_COUNT(a.tags, ',') + 1 and n<1000
)
where single_tag like 'cat%'
;
Thanks!
In the end I managed to do it with the following query:
SELECT SUBSTRING(SUBSTRING(tags, charindex('cat_', tags), len(tags)), 0, charindex(',', SUBSTRING(tags, charindex('cat_', tags), len(tags)))) tags
FROM table

select TableData where ColumnData start with list of strings

Following is the query to select column data from table, where column data starts with a OR b OR c. But the answer i am looking for is to Select data which starts with List of Strings.
SELECT * FROM Table WHERE Name LIKE '[abc]%'
But i want something like
SELECT * FROM Table WHERE Name LIKE '[ab,ac,ad,ae]%'
Can anybody suggest what is the best way of selecting column data which starts with list of String, I don't want to use OR operator, List of strings specifically.
The most general solution you would have to use is this:
SELECT *
FROM Table
WHERE Name LIKE 'ab%' OR Name LIKE 'ac%' OR Name LIKE 'ad%' OR Name LIKE 'ae%';
However, certain databases offer some regex support which you might be able to use. For example, in SQL Server you could write:
SELECT *
FROM Table
WHERE NAME LIKE 'a[bcde]%';
MySQL has a REGEXP operator which supports regex LIKE operations, and you could write:
SELECT *
FROM Table
WHERE NAME REGEXP '^a[bcde]';
Oracle and Postgres also have regex like support.
To add to Tim's answer, another approach could be to join your table with a sub-query of those values:
SELECT *
FROM mytable t
JOIN (SELECT 'ab' AS value
UNION ALL
SELECT 'ac'
UNION ALL
SELECT 'ad'
UNION ALL
SELECT 'ae') v ON t.vame LIKE v.value || '%'

How to detect numbers in string in many observations?

I would like to detect all numbers from Name to gain ID. I use this code:
select Name, ID
from [my_table]
where [Name] like ('%000234%')
But I need that working for many names. I tried Name like in (000234, 000235, ...), but it doesn't work. is it possible to gain whole list of IDs searching by Names?
If you want to select names for a specific range of numbers
select Name, ID
from [my_table]
where [Name] like '%00023[4-9]%'
Which will search for names 000234 - 000239
For other wild card reference
Insert data to temp table. Then you can merge it with the main table.
DECLARE #TempTable TABLE (TempName NVARCHAR(10))
INSERT INTO #TempTable
VALUES
('000234'),
('000235')
SELECT
A.Name,
A.ID
FROM
[my_table] A INNER JOIN
#TempTable B ON A.Name LIKE '%' + B.TempName + '%'
Added test table for clarity.
CREATE TABLE #Test (
NameString varchar(25) )
INSERT INTO #Test (NameString)
SELECT 'Mike387592'
UNION ALL
SELECT 'Nancy2387'
UNION ALL
SELECT 'Tim0088297234'
WITH CTE AS (
SELECT
NameString
,PATINDEX('%[0-9]%', NameString) AS [Start]
,LEN(NameString) AS [End]
FROM #Test )
SELECT
NameString
,SUBSTRING(NameString, [Start], [End])
FROM CTE
If you have a slew of names in a table that have nonsequential numbers that don't have a pattern shared amongst them, one way to gather that data is to use an IN predicate and to dynamically construct your query.
E.G. You want names with numbers 000412, 001523 & 001687.
You would dynamically generate a query like this:
SELECT Name, ID
FROM [my_table]
WHERE Name IN ( '000412', '001523', '001687' )
Dynamic queries could be generated from the database or by the software calling to the database, but are primarily discouraged because they pose a security threat and aren't reusable. Nevertheless, this is an option.
If you must use the LIKE predicate because there are other characters surrounding your numeric string, something like this would be the generated query:
SELECT Name, ID
FROM [my_table]
WHERE Name LIKE '%000412%' OR
Name LIKE '%001523%' OR
Name LIKE '%001687%'

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique