sql, strategies to find out string contains certain texts

sql, strategies to find out string contains certain texts - sql

I want to select any data that contains 800, 805, 888... (there are 8 pattern texts) in the column.
Do I have to use like statement 8 times for each one or is there a faster way?
Example:
SELECT * FROM caller,
WHERE id LIKE '%805%' OR id LIKE'%800' OR ... ;
(PS. I am not allowed to create another table, just using sql queries.)

LIKE is for strings, not for numbers. Assuming id is actually a number, you first need to cast it to a string in order to be able to apply a LIKE condition on it.
But once you do that, you can use an array for that:
SELECT *
FROM caller
WHERE id::text LIKE ANY (array['%805%', '%800', '81%']);

Use any() with an array of searched items:
with test(id, col) as (
values
(1, 'x800'),
(2, 'x855'),
(3, 'x900'),
(4, 'x920')
)
select *
from test
where col like any(array['%800', '%855'])
id | col
----+------
1 | x800
2 | x855
(2 rows)
This is shorter to write but not faster to execute I think.

Related

array column is returned as as string instead of array

I have array column in the pgsql db, when i select the column
SELECT scopes FROM users
it Returns data like this '{"admin","agent"}' instead of ["admin","agent"].
Is there a way to do this.
I expect it to be ["admin","agent"] and not '{"admin","agent"}'

Seems like it matches the common PostgreSql syntax for arrays, ex. a column can also be defined as an array to introduce another dimension.
Example input
INSERT INTO sal_emp
VALUES ('Bill',
'{10000, 10000, 10000, 10000}',
'{{"meeting", "lunch"}, {"training", "presentation"}}');
INSERT INTO sal_emp
VALUES ('Carol',
'{20000, 25000, 25000, 25000}',
'{{"breakfast", "consulting"}, {"meeting", "lunch"}}');
Becomes
SELECT * FROM sal_emp;
name | pay_by_quarter | schedule
-------+---------------------------+-------------------------------------------
Bill | {10000,10000,10000,10000} | {{meeting,lunch},{training,presentation}}
Carol | {20000,25000,25000,25000} | {{breakfast,consulting},{meeting,lunch}}
(2 rows)
From https://www.postgresql.org/docs/9.1/arrays.html
Is there a way to do this.
If the output isn't for ingestion by PostgreSql, then String substitution is an option.
SELECT REPLACE(REPLACE('<your input>', '{', '['), '}', ']') ..;

Select Where Like regular expression

I need to create a SQL Query.
This query need to select from a table where a column contains regular expression.
For example, I have those values:
TABLE test (name)
XHRTCNW
DHRTRRR
XHRTCOP
CPHCTPC
CDDHRTF
PEOFOFD
I want to select all the data who have "HRT" after 1 char (value 1, 2 and 3 - Values who looks like "-HRT---") but not those who might have "HRT" after 1 char (value 5).
So I'm not sure how to do it because a simple
SELECT *
FROM test
WHERE name LIKE "%HRT%"
will return value 1, 2, 3 and 5.
Sorry if I'm not really clear with what I want/need.

You can also change the pattern. Instead of using % which means zero-or-more anything, you can use _ which means exactly one.
SELECT * FROM test WHERE name like '_HRT%';

You can use substring.
SELECT * FROM test WHERE substring(name from 2 for 3) = 'HRT'

Are the names always 7 letters? Do:
SELECT substring (2, 4, field) from sometable
That will just select the 2-4th characters and then you can use like "%HRT"

Humanized or natural number sorting of mixed word-and-number strings

Following up on this question by Sivaram Chintalapudi, I'm interested in whether it's practical in PostgreSQL to do natural - or "humanized" - sorting " of strings that contain a mixture of multi-digit numbers and words/letters. There is no fixed pattern of words and numbers in the strings, and there may be more than one multi-digit number in a string.
The only place I've seen this done routinely is in the Mac OS's Finder, which sorts filenames containing mixed numbers and words naturally, placing "20" after "3", not before it.
The collation order desired would be produced by an algorithm that split each string into blocks at letter-number boundaries, then ordered each part, treating letter-blocks with normal collation and number-blocks as integers for collation purposes. So:
'AAA2fred' would become ('AAA',2,'fred') and 'AAA10bob' would become ('AAA',10,'bob'). These can then be sorted as desired:
regress=# WITH dat AS ( VALUES ('AAA',2,'fred'), ('AAA',10,'bob') )
regress-# SELECT dat FROM dat ORDER BY dat;
dat
--------------
(AAA,2,fred)
(AAA,10,bob)
(2 rows)
as compared to the usual string collation ordering:
regress=# WITH dat AS ( VALUES ('AAA2fred'), ('AAA10bob') )
regress-# SELECT dat FROM dat ORDER BY dat;
dat
------------
(AAA10bob)
(AAA2fred)
(2 rows)
However, the record comparison approach doesn't generalize because Pg won't compare ROW(..) constructs or records of unequal numbers of entries.
Given the sample data in this SQLFiddle the default en_AU.UTF-8 collation produces the ordering:
1A, 10A, 2A, AAA10B, AAA11B, AAA1BB, AAA20B, AAA21B, X10C10, X10C2, X1C1, X1C10, X1C3, X1C30, X1C4, X2C1
but I want:
1A, 2A, 10A, AAA1BB, AAA10B, AAA11B, AAA20B, AAA21B, X1C1, X1C3, X1C4, X1C10, X1C30, X2C1, X10C10, X10C2
I'm working with PostgreSQL 9.1 at the moment, but 9.2-only suggestions would be fine. I'm interested in advice on how to achieve an efficient string-splitting method, and how to then compare the resulting split data in the alternating string-then-number collation described. Or, of course, on entirely different and better approaches that don't require splitting strings.
PostgreSQL doesn't seem to support comparator functions, otherwise this could be done fairly easily with a recursive comparator and something like ORDER USING comparator_fn and a comparator(text,text) function. Alas, that syntax is imaginary.
Update: Blog post on the topic.

Building on your test data, but this works with arbitrary data. This works with any number of elements in the string.
Register a composite type made up of one text and one integer value once per database. I call it ai:
CREATE TYPE ai AS (a text, i int);
The trick is to form an array of ai from each value in the column.
regexp_matches() with the pattern (\D*)(\d*) and the g option returns one row for every combination of letters and numbers. Plus one irrelevant dangling row with two empty strings '{"",""}' Filtering or suppressing it would just add cost. Aggregate this into an array, after replacing empty strings ('') with 0 in the integer component (as '' cannot be cast to integer).
NULL values sort first - or you have to special case them - or use the whole shebang in a STRICT function like #Craig proposes.
Postgres 9.4 or later
SELECT data
FROM alnum
ORDER BY ARRAY(SELECT ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai
FROM regexp_matches(data, '(\D*)(\d*)', 'g') x)
, data;
db<>fiddle here
Postgres 9.1 (original answer)
Tested with PostgreSQL 9.1.5, where regexp_replace() had a slightly different behavior.
SELECT data
FROM (
SELECT ctid, data, regexp_matches(data, '(\D*)(\d*)', 'g') AS x
FROM alnum
) x
GROUP BY ctid, data -- ctid as stand-in for a missing pk
ORDER BY regexp_replace (left(data, 1), '[0-9]', '0')
, array_agg(ROW(x[1], CASE x[2] WHEN '' THEN '0' ELSE x[2] END)::ai)
, data -- for special case of trailing 0
Add regexp_replace (left(data, 1), '[1-9]', '0') as first ORDER BY item to take care of leading digits and empty strings.
If special characters like {}()"', can occur, you'd have to escape those accordingly.
#Craig's suggestion to use a ROW expression takes care of that.
BTW, this won't execute in sqlfiddle, but it does in my db cluster. JDBC is not up to it. sqlfiddle complains:
Method org.postgresql.jdbc3.Jdbc3Array.getArrayImpl(long,int,Map) is
not yet implemented.
This has since been fixed: http://sqlfiddle.com/#!17/fad6e/1

I faced this same problem, and I wanted to wrap the solution in a function so I could re-use it easily. I created the following function to achieve a 'human style' sort order in Postgres.
CREATE OR REPLACE FUNCTION human_sort(text)
RETURNS text[] AS
$BODY$
/* Split the input text into contiguous chunks where no numbers appear,
and contiguous chunks of only numbers. For the numbers, add leading
zeros to 20 digits, so we can use one text array, but sort the
numbers as if they were big integers.
For example, human_sort('Run 12 Miles') gives
{'Run ', '00000000000000000012', ' Miles'}
*/
select array_agg(
case
when a.match_array[1]::text is not null
then a.match_array[1]::text
else lpad(a.match_array[2]::text, 20::int, '0'::text)::text
end::text)
from (
select regexp_matches(
case when $1 = '' then null else $1 end, E'(\\D+)|(\\d+)', 'g'
) AS match_array
) AS a
$BODY$
LANGUAGE sql IMMUTABLE;
tested to work on Postgres 8.3.18 and 9.3.5
No recursion, should be faster than recursive solutions
Can be used in just the order by clause, don't have to deal with primary key or ctid
Works for any select (don't even need a PK or ctid)
Simpler than some other solutions, should be easier to extend and maintain
Suitable for use in a functional index to improve performance
Works on Postgres v8.3 or higher
Allows an unlimited number of text/number alternations in the input
Uses just one regex, should be faster than versions with multiple regexes
Numbers longer than 20 digits are ordered by their first 20 digits
Here's an example usage:
select * from (values
('Books 1', 9),
('Book 20 Chapter 1', 8),
('Book 3 Suffix 1', 7),
('Book 3 Chapter 20', 6),
('Book 3 Chapter 2', 5),
('Book 3 Chapter 1', 4),
('Book 1 Chapter 20', 3),
('Book 1 Chapter 3', 2),
('Book 1 Chapter 1', 1),
('', 0),
(null::text, 0)
) as a(name, sort)
order by human_sort(a.name)
-----------------------------
|name | sort |
-----------------------------
| | 0 |
| | 0 |
|Book 1 Chapter 1 | 1 |
|Book 1 Chapter 3 | 2 |
|Book 1 Chapter 20 | 3 |
|Book 3 Chapter 1 | 4 |
|Book 3 Chapter 2 | 5 |
|Book 3 Chapter 20 | 6 |
|Book 3 Suffix 1 | 7 |
|Book 20 Chapter 1 | 8 |
|Books 1 | 9 |
-----------------------------

Adding this answer late because it looked like everyone else was unwrapping into arrays or some such. Seemed excessive.
CREATE FUNCTION rr(text,int) RETURNS text AS $$
SELECT regexp_replace(
regexp_replace($1, '[0-9]+', repeat('0',$2) || '\&', 'g'),
'[0-9]*([0-9]{' || $2 || '})',
'\1',
'g'
)
$$ LANGUAGE sql;
SELECT t,rr(t,9) FROM mixed ORDER BY t;
t | rr
--------------+-----------------------------
AAA02free | AAA000000002free
AAA10bob | AAA000000010bob
AAA2bbb03boo | AAA000000002bbb000000003boo
AAA2bbb3baa | AAA000000002bbb000000003baa
AAA2fred | AAA000000002fred
(5 rows)
(reverse-i-search)`OD': SELECT crypt('richpass','$2$08$aJ9ko0uKa^C1krIbdValZ.dUH8D0R0dj8mqte0Xw2FjImP5B86ugC');
richardh=>
richardh=> SELECT t,rr(t,9) FROM mixed ORDER BY rr(t,9);
t | rr
--------------+-----------------------------
AAA2bbb3baa | AAA000000002bbb000000003baa
AAA2bbb03boo | AAA000000002bbb000000003boo
AAA2fred | AAA000000002fred
AAA02free | AAA000000002free
AAA10bob | AAA000000010bob
(5 rows)
I'm not claiming two regexps are the most efficient way to do this, but rr() is immutable (for fixed length) so you can index it. Oh - this is 9.1
Of course, with plperl you could just evaluate the replacement to pad/trim it in one go. But then with perl you've always got just-one-more-option (TM) than any other approach :-)

The following function splits a string into an array of (word,number) pairs of arbitrary length. If the string begins with a number then the first entry will have a NULL word.
CREATE TYPE alnumpair AS (wordpart text,numpart integer);
CREATE OR REPLACE FUNCTION regexp_split_numstring_depth_pairs(instr text)
RETURNS alnumpair[] AS $$
WITH x(match) AS (SELECT regexp_matches($1, '(\D*)(\d+)(.*)'))
SELECT
ARRAY[(CASE WHEN match[1] = '' THEN '0' ELSE match[1] END, match[2])::alnumpair] || (CASE
WHEN match[3] = '' THEN
ARRAY[]::alnumpair[]
ELSE
regexp_split_numstring_depth_pairs(match[3])
END)
FROM x;$$ LANGUAGE 'sql' IMMUTABLE;
allowing PostgreSQL's composite type sorting to come into play:
SELECT data FROM alnum ORDER BY regexp_split_numstring_depth_pairs(data);
and producing the expected result, as per this SQLFiddle. I've adopted Erwin's substitution of 0 for the empty string in all strings beginning with a number so that numbers sort first; it's cleaner than using ORDER BY left(data,1), regexp_split_numstring_depth_pairs(data).
While the function is probably horrifically slow it can at least be used in an expression index.
That was fun!

create table dat(val text)
insert into dat ( VALUES ('BBB0adam'), ('AAA10fred'), ('AAA2fred'), ('AAA2bob') );
select
array_agg( case when z.x[1] ~ E'\\d' then lpad(z.x[1],10,'0') else z.x[1] end ) alnum_key
from (
SELECT ctid, regexp_matches(dat.val, E'(\\D+|\\d+)','g') as x
from dat
) z
group by z.ctid
order by alnum_key;
alnum_key
-----------------------
{AAA,0000000002,bob}
{AAA,0000000002,fred}
{AAA,0000000010,fred}
{BBB,0000000000,adam}
Worked on this for almost an hour and posted without looking -- I see Erwin arrived at a similar place. Ran into the same "could not find array type for data type text[]" trouble as #Clodoaldo. Had a lot of trouble getting the cleanup exercise to not agg all the rows until I thought of grouping by the ctid (which feels like cheating really -- and doesn't work on a psuedo table as in the OP example WITH dat AS ( VALUES ('AAA2fred'), ('AAA10bob') )
...). It would be nicer if array_agg could accept a set-producing subselect.

I'm not a RegEx guru, but I can work it to some extent. Enough to produce this answer.
It will handle up to 2 numeric values within the content. I don't think OSX goes further than that, if it even handles 2.
WITH parted AS (
select data,
substring(data from '([A-Za-z]+).*') part1,
substring('a'||data from '[A-Za-z]+([0-9]+).*') part2,
substring('a'||data from '[A-Za-z]+[0-9]+([A-Za-z]+).*') part3,
substring('a'||data from '[A-Za-z]+[0-9]+[A-Za-z]+([0-9]+).*') part4
from alnum
)
select data
from parted
order by part1,
cast(part2 as int),
part3,
cast(part4 as int),
data;
SQLFiddle

The following solution is a combination of various ideas presented in other answers, as well as some ideas from the classic solution:
create function natsort(s text) returns text immutable language sql as $$
select string_agg(r[1] || E'\x01' || lpad(r[2], 20, '0'), '')
from regexp_matches(s, '(\D*)(\d*)', 'g') r;
$$;
The design goals of this function were simplicity and pure string operations (no custom types and no arrays), so it can easily be used as a drop-in solution, and is trivial to be indexed over.
Note: If you expect numbers with more than 20 digits, you'll have to replace the hard-coded maximum length 20 in the function with a suitable larger length. Note that this will directly affect the length of the resulting strings, so don't make that value larger than needed.

Is it possible to query a comma separated column for a specific value?

I have (and don't own, so I can't change) a table with a layout similar to this.
ID | CATEGORIES
---------------
1 | c1
2 | c2,c3
3 | c3,c2
4 | c3
5 | c4,c8,c5,c100
I need to return the rows that contain a specific category id. I starting by writing the queries with LIKE statements, because the values can be anywhere in the string
SELECT id FROM table WHERE categories LIKE '%c2%';
Would return rows 2 and 3
SELECT id FROM table WHERE categories LIKE '%c3%' and categories LIKE '%c2%'; Would again get me rows 2 and 3, but not row 4
SELECT id FROM table WHERE categories LIKE '%c3%' or categories LIKE '%c2%'; Would again get me rows 2, 3, and 4
I don't like all the LIKE statements. I've found FIND_IN_SET() in the Oracle documentation but it doesn't seem to work in 10g. I get the following error:
ORA-00904: "FIND_IN_SET": invalid identifier
00904. 00000 - "%s: invalid identifier"
when running this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories); (example from the docs) or this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories) <> 0; (example from Google)
I would expect it to return rows 2 and 3.
Is there a better way to write these queries instead of using a ton of LIKE statements?

You can, using LIKE. You don't want to match for partial values, so you'll have to include the commas in your search. That also means that you'll have to provide an extra comma to search for values at the beginning or end of your text:
select
*
from
YourTable
where
',' || CommaSeparatedValueColumn || ',' LIKE '%,SearchValue,%'
But this query will be slow, as will all queries using LIKE, especially with a leading wildcard.
And there's always a risk. If there are spaces around the values, or values can contain commas themselves in which case they are surrounded by quotes (like in csv files), this query won't work and you'll have to add even more logic, slowing down your query even more.
A better solution would be to add a child table for these categories. Or rather even a separate table for the catagories, and a table that cross links them to YourTable.

You can write a PIPELINED table function which return a 1 column table. Each row is a value from the comma separated string. Use something like this to pop a string from the list and put it as a row into the table:
PIPE ROW(ltrim(rtrim(substr(l_list, 1, l_idx - 1),' '),' '));
Usage:
SELECT * FROM MyTable
WHERE 'c2' IN TABLE(Util_Pkg.split_string(categories));
See more here: Oracle docs

Yes and No...
"Yes":
Normalize the data (strongly recommended) - i.e. split the categorie column so that you have each categorie in a separate... then you can just query it in a normal faschion...
"No":
As long as you keep this "pseudo-structure" there will be several issues (performance and others) and you will have to do something similar to:
SELECT * FROM MyTable WHERE categories LIKE 'c2,%' OR categories = 'c2' OR categories LIKE '%,c2,%' OR categories LIKE '%,c2'
IF you absolutely must you could define a function which is named FIND_IN_SET like the following:
CREATE OR REPLACE Function FIND_IN_SET
( vSET IN varchar2, vToFind IN VARCHAR2 )
RETURN number
IS
rRESULT number;
BEGIN
rRESULT := -1;
SELECT COUNT(*) INTO rRESULT FROM DUAL WHERE vSET LIKE ( vToFine || ',%' ) OR vSET = vToFind OR vSET LIKE ('%,' || vToFind || ',%') OR vSET LIKE ('%,' || vToFind);
RETURN rRESULT;
END;
You can then use that function like:
SELECT * FROM MyTable WHERE FIND_IN_SET (categories, 'c2' ) > 0;

For the sake of future searchers, don't forget the regular expression way:
with tbl as (
select 1 ID, 'c1' CATEGORIES from dual
union
select 2 ID, 'c2,c3' CATEGORIES from dual
union
select 3 ID, 'c3,c2' CATEGORIES from dual
union
select 4 ID, 'c3' CATEGORIES from dual
union
select 5 ID, 'c4,c8,c5,c100' CATEGORIES from dual
)
select *
from tbl
where regexp_like(CATEGORIES, '(^|\W)c3(\W|$)');
ID CATEGORIES
---------- -------------
2 c2,c3
3 c3,c2
4 c3
This matches on a word boundary, so even if the comma was followed by a space it would still work. If you want to be more strict and match only where a comma separates values, replace the '\W' with a comma. At any rate, read the regular expression as:
match a group of either the beginning of the line or a word boundary, followed by the target search value, followed by a group of either a word boundary or the end of the line.

As long as the comma-delimited list is 512 characters or less, you can also use a regular expression in this instance (Oracle's regular expression functions, e.g., REGEXP_LIKE(), are limited to 512 characters):
SELECT id, categories
FROM mytable
WHERE REGEXP_LIKE('c2', '^(' || REPLACE(categories, ',', '|') || ')$', 'i');
In the above I'm replacing the commas with the regular expression alternation operator |. If your list of delimited values is already |-delimited, so much the better.

Searching Technique in SQL (Like,Contain)

I want to compare and select a field from DB using Like keyword or any other technique.
My query is the following:
SELECT * FROM Test WHERE name LIKE '%xxxxxx_Ramakrishnan_zzzzz%';
but my fields only contain 'Ramakrishnan'
My Input string contain some extra character xxxxxx_Ramakrishnan_zzzzz
I want the SQL query for this. Can any one please help me?

You mean you want it the other way round? Like this?
Select * from Test where 'xxxxxx_Ramakrishnan_zzzzz' LIKE '%' + name + '%';

You can use the MySQL functions, LOCATE() precisely like,
SELECT * FROM WHERE LOCATE("Ramakrishnan",input) > 0

Are the xxxxxx and zzzzz bits always 6 and 5 characters? If so, then this is doable with a bit of string cutting.
with Test (id,name) as (
select 1, 'Ramakrishnan'
union
select 2, 'Coxy'
union
select 3, 'xxxxxx_Ramakrishnan_zzzzz'
)
Select * from Test where name like '%'+SUBSTRING('xxxxxx_Ramakrishnan_zzzzz', 8, CHARINDEX('_',SUBSTRING('xxxxxx_Ramakrishnan_zzzzz',8,100))-1)+'%'
Results in:
id name
1 Ramakrishnan
3 xxxxxx_Ramakrishnan_zzzzz
If they are variable lengths, then it will be a horrible construction of SUBSTRING,CHARINDEX, REVERSE and LEN functions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql, strategies to find out string contains certain texts - sql

LIKE is for strings, not for numbers. Assuming id is actually a number, you first need to cast it to a string in order to be able to apply a LIKE condition on it. But once you do that, you can use an array for that: SELECT * FROM caller WHERE id::text LIKE ANY (array['%805%', '%800', '81%']);

Use any() with an array of searched items: with test(id, col) as ( values (1, 'x800'), (2, 'x855'), (3, 'x900'), (4, 'x920') ) select * from test where col like any(array['%800', '%855']) id | col ----+------ 1 | x800 2 | x855 (2 rows) This is shorter to write but not faster to execute I think.

Related

array column is returned as as string instead of array

Select Where Like regular expression

Humanized or natural number sorting of mixed word-and-number strings

Is it possible to query a comma separated column for a specific value?

Searching Technique in SQL (Like,Contain)

Categories

Resources