substring matching in sql

substring matching in sql - sql

I have a query
select distinct(tad.ASP_NAME)
from TABLE_ASP_DETAILS tad
where tad.ASSIGNED_FE_LAST_NAME = 'asurekam2'
where ASSIGNED_FE_LAST_NAME will be equivalent to SureKAM2 and it should return SureKAM2 for the above query.
Similarly
select distinct(tad.ASP_NAME)
from TABLE_ASP_DETAILS tad
where tad.ASSIGNED_FE_LAST_NAME = 'ABT_Dallas1_TX'
should return ABT from table.
So basically I want contains like functionality in my input string and it should be able to search for something similar in tad.ASP_NAME. ASP Name would be a substring of the input string
ACS_ITALY_CATANIA,ACS_ITALY_BARI,ACS_xxxxx any input should be able to find ACS value in tad.ASP_NAME
ADNTELECOM_Sayedur_Rahman , ADNTELECOM_Reza_Bin_Mujib, ADNTELECOM_Reza_Bin_Mujib
should be able to find ADNTELECOM value in tad.ASP_NAME

This seems to do what you want:
where lower('asurekam2') like '%' || lower(tad.ASSIGNED_FE_LAST_NAME) || '%'

Would this do? You didn't provide test case so I improvised. You need lines 6-8.
SQL> with table_asp_details(asp_name, assigned_fe_last_name) as
2 (select 'ACS_ITALY_CATANIA', 'ACS_Dallas1_TX' from dual union all
3 select 'ACS_ITALY_BARI', 'ACS_Dallas1_TX' from dual
4 )
5 -- this is what you need
6 select distinct regexp_substr(asp_name, '[[:alpha:]]+') result
7 from table_asp_details tad
8 where tad.assigned_fe_last_name = 'ACS_Dallas1_TX';
RESULT
-----------------
ACS
SQL>

I think you are looking for something like this, but its hard to say with so few details:
select distinct(tad.ASP_NAME)
from TABLE_ASP_DETAILS tad
where regexp_like('asurekam2', tad.ASSIGNED_FE_LAST_NAME)

Related

PLSQL - order by string with REGEX

I'm trying to sort the result set of a query where the row is VARCHAR2.
I've tried using just:
ORDER BY
UPPER(SERVER_NAME) ASC
But I get inconstant results, for example:
120157
777555
AKO
a20064
Elilikes
kagan
1200165_DAVID
As you can see, 1200165_DAVID appears last, in addition, I tried using a regular expression like so:
ORDER BY
(CASE WHEN REGEXP_LIKE(UPPER(SERVER_NAME), '^[0-9]+$') THEN 1 ELSE 2 END) ASC,
UPPER(SERVER_NAME) ASC
But I get the same results, I would like to get the following ordring is possible:
120157
1200165_DAVID
777555
a20064
AKO
Elilikes
kagan
Please advise.

Three things.
First: Why do you want 1200165_DAVID to appear AFTER 120157? It should appear before it, if you order alphabetically.
Second: Running your query on your test data, I get the correct result. So I am inclined to believe either your query is different from what you reported, or there is some other error somewhere.
Third: You may have who-knows-what characters in your data. Selecting str and dump(str) side by side (or whatever the name of your expression; I like to use str in my test data) to see what characters are in each string. Look especially at those that seem to be sorted "out of order".
with
inputs ( str ) as (
select '120157' from dual union all
select '777555' from dual union all
select 'AKO' from dual union all
select 'a20064' from dual union all
select 'Elilikes' from dual union all
select 'kagan' from dual union all
select '1200165_DAVID' from dual
)
select str from inputs
order by upper(str);
STR
-------------
1200165_DAVID
120157
777555
a20064
AKO
Elilikes
kagan
7 rows selected.

This is too long for a comment.
Your data would appear to not be all characters that you recognize. In particular, the first character is suspicious.
I would suggest that you run a query like this:
select ASCII(SUBSTR(server_name, 1, 1)) as first_char-ascii,
'|' || SUBSTR(server_name, 1, 1) || '|' as first_char,
COUNT(*), min(server_name), max(server_name)
from t
group by SUBSTR(server_name, 1, 1)
order by count(*) asc;
Then you will see what characters are actually at the beginning of the string. My guess is you will find at least one interesting character. You will then need to modify the data (or the query) to handle that.

Querying substrings against a list of values

I'm reading from a dataset that I unfortunately don't have the access to modify. It has concatenated strings of values, and I want to select records for which any of those substrings (as split by a given character) matches any of the values in a specific list. I'll be passing the queries in via Python, so it won't be compared against a static list.
For example, the table looks like:
CrappyColumn
-----------
1;2
4
1
2;1
1;3
2
And I might want to return anything that has 2 or 4 in it. So, my result should be:
1;2
4
2
2;1
I have played with regexp_substr and gotten something that actually works; however, it just runs indefinitely (as much as 10 minutes before I give up) when I run it on the full dataset (which only includes about three thousand records with values that are often a couple hundred characters long). I need something that works in a reasonable amount of time for repeated execution.
I realize that--even with a variable comparison list--I could just write my Python code to parse the list and construct multiple LIKE statements, but that seems inefficient, and I assume that there is a better way.
And here's what I've done that takes too long:
SELECT DISTINCT CrappyColumn
FROM
(SELECT DISTINCT CrappyColumn, regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) as UGH
FROM CrappyTable
CONNECT BY regexp_substr(CrappyColumn, '[^;]+', 1, LEVEL) IS NOT NULL)
WHERE UGH IN ('2', '4')
Is there a better, faster, cleaner way to accomplish this?
EDIT - RESOLUTION:
Thanks to vkp's help, here is what I implemented:
regexp_like(SITE_ID, '^(2|4)(:)|(:)(2|4)(:)|(:)(2|4)$|^(2|4)$')
I modified it for my final product, so that it can handle strings of more than one character--by changing [2|4] to (2|4). This works in cases of searching for numbers that aren't single-digit.

You can use like:
select t.*
from crappytable t
where ';' || crappycolumn || ';' like '%;2;%' or
';' || crappycolumn || ';' like '%;4;%';
You seem to know that storing lists of values in a single column is a bad idea, so I'll spare the harangue ;)
EDIT:
If you don't like like, you can use regexp_like() like this:
where regexp_like(';' || crappycolumn || ';', ';2;|;4;')

A simpler method would be to use regexp_like to check if the list has 2 or 4 in it.
select *
from tablename
where regexp_like(crappycolumn,'^[2|4][^0-9]|[^0-9][2|4][^0-9]|[^0-9][2|4]$|^[2|4]$')
^[2|4][^0-9] - Starts with 2 or 4 not followed by a digit.
[^0-9][2|4][^0-9] - 2 or 4 not succeeded or preceded by a digit.
[^0-9][2|4]$ - Ends with 2 or 4 not preceded by a digit.
^[2|4]$ - 2 or 4 is the only character in the string.

Another form of regexp_like(). This regex looks for 2 or 4 only when proceeded by the beginning of the line or a semi-colon and when followed by a semi-colon or the end of the line:
SQL> with crappy_tbl(crappy_col) as (
select '1;2' from dual union
select '4' from dual union
select '1' from dual union
select '2;1' from dual union
select '1;3' from dual union
select '2' from dual union
select '22;;44;' from dual
)
select crappy_col
from crappy_tbl
where regexp_like(crappy_col, '(^|;)(2|4)(;|$)');
CRAPPY_
-------
1;2
2
2;1
4
SQL>

Finding rows that don't contain numeric data in Oracle

I am trying to locate some problematic records in a very large Oracle table. The column should contain all numeric data even though it is a varchar2 column. I need to find the records which don't contain numeric data (The to_number(col_name) function throws an error when I try to call it on this column).

I was thinking you could use a regexp_like condition and use the regular expression to find any non-numerics. I hope this might help?!
SELECT * FROM table_with_column_to_search WHERE REGEXP_LIKE(varchar_col_with_non_numerics, '[^0-9]+');

To get an indicator:
DECODE( TRANSLATE(your_number,' 0123456789',' ')
e.g.
SQL> select DECODE( TRANSLATE('12345zzz_not_numberee',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"contains char"
and
SQL> select DECODE( TRANSLATE('12345',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
and
SQL> select DECODE( TRANSLATE('123405',' 0123456789',' '), NULL, 'number','contains char')
2 from dual
3 /
"number"
Oracle 11g has regular expressions so you could use this to get the actual number:
SQL> SELECT colA
2 FROM t1
3 WHERE REGEXP_LIKE(colA, '[[:digit:]]');
COL1
----------
47845
48543
12
...
If there is a non-numeric value like '23g' it will just be ignored.

In contrast to SGB's answer, I prefer doing the regexp defining the actual format of my data and negating that. This allows me to define values like $DDD,DDD,DDD.DD
In the OPs simple scenario, it would look like
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^[0-9]+$');
which finds all non-positive integers. If you wau accept negatiuve integers also, it's an easy change, just add an optional leading minus.
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+$');
accepting floating points...
SELECT *
FROM table_with_column_to_search
WHERE NOT REGEXP_LIKE(varchar_col_with_non_numerics, '^-?[0-9]+(\.[0-9]+)?$');
Same goes further with any format. Basically, you will generally already have the formats to validate input data, so when you will desire to find data that does not match that format ... it's simpler to negate that format than come up with another one; which in case of SGB's approach would be a bit tricky to do if you want more than just positive integers.

Use this
SELECT *
FROM TableToSearch
WHERE NOT REGEXP_LIKE(ColumnToSearch, '^-?[0-9]+(\.[0-9]+)?$');

After doing some testing, i came up with this solution, let me know in case it helps.
Add this below 2 conditions in your query and it will find the records which don't contain numeric data
and REGEXP_LIKE(<column_name>, '\D') -- this selects non numeric data
and not REGEXP_LIKE(column_name,'^[-]{1}\d{1}') -- this filters out negative(-) values

Starting with Oracle 12.2 the function to_number has an option ON CONVERSION ERROR clause, that can catch the exception and provide default value.
This can be used for the test of number values. Simple set NULL when the conversion fails and filer all not NULL values.
Example
with num as (
select '123' vc_col from dual union all
select '1,23' from dual union all
select 'RV12P2000' from dual union all
select null from dual)
select
vc_col
from num
where /* filter numbers */
vc_col is not null and
to_number(vc_col DEFAULT NULL ON CONVERSION ERROR) is not null
;
VC_COL
---------
123
1,23

From http://www.dba-oracle.com/t_isnumeric.htm
LENGTH(TRIM(TRANSLATE(, ' +-.0123456789', ' '))) is null
If there is anything left in the string after the TRIM it must be non-numeric characters.

I've found this useful:
select translate('your string','_0123456789','_') from dual
If the result is NULL, it's numeric (ignoring floating point numbers.)
However, I'm a bit baffled why the underscore is needed. Without it the following also returns null:
select translate('s123','0123456789', '') from dual
There is also one of my favorite tricks - not perfect if the string contains stuff like "*" or "#":
SELECT 'is a number' FROM dual WHERE UPPER('123') = LOWER('123')

After doing some testing, building upon the suggestions in the previous answers, there seem to be two usable solutions.
Method 1 is fastest, but less powerful in terms of matching more complex patterns.
Method 2 is more flexible, but slower.
Method 1 - fastest
I've tested this method on a table with 1 million rows.
It seems to be 3.8 times faster than the regex solutions.
The 0-replacement solves the issue that 0 is mapped to a space, and does not seem to slow down the query.
SELECT *
FROM <table>
WHERE TRANSLATE(replace(<char_column>,'0',''),'0123456789',' ') IS NOT NULL;
Method 2 - slower, but more flexible
I've compared the speed of putting the negation inside or outside the regex statement. Both are equally slower than the translate-solution. As a result, #ciuly's approach seems most sensible when using regex.
SELECT *
FROM <table>
WHERE NOT REGEXP_LIKE(<char_column>, '^[0-9]+$');

You can use this one check:
create or replace function to_n(c varchar2) return number is
begin return to_number(c);
exception when others then return -123456;
end;
select id, n from t where to_n(n) = -123456;

I tray order by with problematic column and i find rows with column.
SELECT
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND,
D.COL1 AS COL1
FROM
VW_DATA_ALL_GC D
WHERE
(D.PERIOADA IN (:pPERIOADA)) AND
(D.FORM = 62)
AND D.COL1 IS NOT NULL
-- AND REGEXP_LIKE (D.COL1, '\[\[:alpha:\]\]')
-- AND REGEXP_LIKE(D.COL1, '\[\[:digit:\]\]')
--AND REGEXP_LIKE(TO_CHAR(D.COL1), '\[^0-9\]+')
GROUP BY
D.UNIT_CODE,
D.CUATM,
D.CAPITOL,
D.RIND ,
D.COL1
ORDER BY
D.COL1

Is it possible to query a comma separated column for a specific value?

I have (and don't own, so I can't change) a table with a layout similar to this.
ID | CATEGORIES
---------------
1 | c1
2 | c2,c3
3 | c3,c2
4 | c3
5 | c4,c8,c5,c100
I need to return the rows that contain a specific category id. I starting by writing the queries with LIKE statements, because the values can be anywhere in the string
SELECT id FROM table WHERE categories LIKE '%c2%';
Would return rows 2 and 3
SELECT id FROM table WHERE categories LIKE '%c3%' and categories LIKE '%c2%'; Would again get me rows 2 and 3, but not row 4
SELECT id FROM table WHERE categories LIKE '%c3%' or categories LIKE '%c2%'; Would again get me rows 2, 3, and 4
I don't like all the LIKE statements. I've found FIND_IN_SET() in the Oracle documentation but it doesn't seem to work in 10g. I get the following error:
ORA-00904: "FIND_IN_SET": invalid identifier
00904. 00000 - "%s: invalid identifier"
when running this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories); (example from the docs) or this query: SELECT id FROM table WHERE FIND_IN_SET('c2', categories) <> 0; (example from Google)
I would expect it to return rows 2 and 3.
Is there a better way to write these queries instead of using a ton of LIKE statements?

You can, using LIKE. You don't want to match for partial values, so you'll have to include the commas in your search. That also means that you'll have to provide an extra comma to search for values at the beginning or end of your text:
select
*
from
YourTable
where
',' || CommaSeparatedValueColumn || ',' LIKE '%,SearchValue,%'
But this query will be slow, as will all queries using LIKE, especially with a leading wildcard.
And there's always a risk. If there are spaces around the values, or values can contain commas themselves in which case they are surrounded by quotes (like in csv files), this query won't work and you'll have to add even more logic, slowing down your query even more.
A better solution would be to add a child table for these categories. Or rather even a separate table for the catagories, and a table that cross links them to YourTable.

You can write a PIPELINED table function which return a 1 column table. Each row is a value from the comma separated string. Use something like this to pop a string from the list and put it as a row into the table:
PIPE ROW(ltrim(rtrim(substr(l_list, 1, l_idx - 1),' '),' '));
Usage:
SELECT * FROM MyTable
WHERE 'c2' IN TABLE(Util_Pkg.split_string(categories));
See more here: Oracle docs

Yes and No...
"Yes":
Normalize the data (strongly recommended) - i.e. split the categorie column so that you have each categorie in a separate... then you can just query it in a normal faschion...
"No":
As long as you keep this "pseudo-structure" there will be several issues (performance and others) and you will have to do something similar to:
SELECT * FROM MyTable WHERE categories LIKE 'c2,%' OR categories = 'c2' OR categories LIKE '%,c2,%' OR categories LIKE '%,c2'
IF you absolutely must you could define a function which is named FIND_IN_SET like the following:
CREATE OR REPLACE Function FIND_IN_SET
( vSET IN varchar2, vToFind IN VARCHAR2 )
RETURN number
IS
rRESULT number;
BEGIN
rRESULT := -1;
SELECT COUNT(*) INTO rRESULT FROM DUAL WHERE vSET LIKE ( vToFine || ',%' ) OR vSET = vToFind OR vSET LIKE ('%,' || vToFind || ',%') OR vSET LIKE ('%,' || vToFind);
RETURN rRESULT;
END;
You can then use that function like:
SELECT * FROM MyTable WHERE FIND_IN_SET (categories, 'c2' ) > 0;

For the sake of future searchers, don't forget the regular expression way:
with tbl as (
select 1 ID, 'c1' CATEGORIES from dual
union
select 2 ID, 'c2,c3' CATEGORIES from dual
union
select 3 ID, 'c3,c2' CATEGORIES from dual
union
select 4 ID, 'c3' CATEGORIES from dual
union
select 5 ID, 'c4,c8,c5,c100' CATEGORIES from dual
)
select *
from tbl
where regexp_like(CATEGORIES, '(^|\W)c3(\W|$)');
ID CATEGORIES
---------- -------------
2 c2,c3
3 c3,c2
4 c3
This matches on a word boundary, so even if the comma was followed by a space it would still work. If you want to be more strict and match only where a comma separates values, replace the '\W' with a comma. At any rate, read the regular expression as:
match a group of either the beginning of the line or a word boundary, followed by the target search value, followed by a group of either a word boundary or the end of the line.

As long as the comma-delimited list is 512 characters or less, you can also use a regular expression in this instance (Oracle's regular expression functions, e.g., REGEXP_LIKE(), are limited to 512 characters):
SELECT id, categories
FROM mytable
WHERE REGEXP_LIKE('c2', '^(' || REPLACE(categories, ',', '|') || ')$', 'i');
In the above I'm replacing the commas with the regular expression alternation operator |. If your list of delimited values is already |-delimited, so much the better.

Searching Technique in SQL (Like,Contain)

I want to compare and select a field from DB using Like keyword or any other technique.
My query is the following:
SELECT * FROM Test WHERE name LIKE '%xxxxxx_Ramakrishnan_zzzzz%';
but my fields only contain 'Ramakrishnan'
My Input string contain some extra character xxxxxx_Ramakrishnan_zzzzz
I want the SQL query for this. Can any one please help me?

You mean you want it the other way round? Like this?
Select * from Test where 'xxxxxx_Ramakrishnan_zzzzz' LIKE '%' + name + '%';

You can use the MySQL functions, LOCATE() precisely like,
SELECT * FROM WHERE LOCATE("Ramakrishnan",input) > 0

Are the xxxxxx and zzzzz bits always 6 and 5 characters? If so, then this is doable with a bit of string cutting.
with Test (id,name) as (
select 1, 'Ramakrishnan'
union
select 2, 'Coxy'
union
select 3, 'xxxxxx_Ramakrishnan_zzzzz'
)
Select * from Test where name like '%'+SUBSTRING('xxxxxx_Ramakrishnan_zzzzz', 8, CHARINDEX('_',SUBSTRING('xxxxxx_Ramakrishnan_zzzzz',8,100))-1)+'%'
Results in:
id name
1 Ramakrishnan
3 xxxxxx_Ramakrishnan_zzzzz
If they are variable lengths, then it will be a horrible construction of SUBSTRING,CHARINDEX, REVERSE and LEN functions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

substring matching in sql - sql

This seems to do what you want: where lower('asurekam2') like '%' || lower(tad.ASSIGNED_FE_LAST_NAME) || '%'

I think you are looking for something like this, but its hard to say with so few details: select distinct(tad.ASP_NAME) from TABLE_ASP_DETAILS tad where regexp_like('asurekam2', tad.ASSIGNED_FE_LAST_NAME)

Related

PLSQL - order by string with REGEX

Querying substrings against a list of values

Finding rows that don't contain numeric data in Oracle

Is it possible to query a comma separated column for a specific value?

Searching Technique in SQL (Like,Contain)

Categories

Resources