Oracle SQL compare strings and find matching sub-strings - sql

I have colon separated tags associated to two different entities in two tables. Would like to do a sub-string matching for the tags and relate the entities.
Table 1 - Table of issues
Issue ---------------- Tag
Issue 1 -------------- Dual UOM:Serial Control:Material Issue
Issue 2 -------------- Validity rule:Effectivity date
Table 2 - Table of Tests
Test ----------------- Tag
Test 1 --------------- Inventory:Outbound:Material Issue
Test 2 --------------- Items:Single UOM
Test 3 --------------- Items:Dual UOM
Test 4 --------------- Recipe:Validity Rule
Test 5 --------------- Formula:Version control:date
Test 6 --------------- Formula:Effectivity date
Now, for each issue in table 1, I need to compare its associated tag with the tags in table 2 and find applicable tests.
In above ex,
Issue 1 - Matched tests will be Test 1, Test 3
Issue 2 - Matched tests will be Test 4, Test 5
All the tags associated to the issues and tests will come from a common tag master.
Any help in providing the sql code snippet that would do this sub-string to sub-string matching is much appreciated.

Here's one option: split issues into rows and compare them to test tags, using the INSTR function. Note that letter case must match. If it doesn't (in reality), use lower or upper function.
Read comments within code (which is split into several parts, to improve readability).
Sample data first:
SQL> with
2 -- sample data
3 issues (issue, tag) as
4 (select 1, 'Dual UOM:Serial Control:Material Issue' from dual union all
5 select 2, 'Validity Rule:Effectivity date' from dual
6 ),
7 tests (test, tag) as
8 (select 1, 'Inventory:Outbound:Material Issue' from dual union all
9 select 2, 'Items:Single UOM' from dual union all
10 select 3, 'Items:Dual UOM' from dual union all
11 select 4, 'Recipe:Validity Rule' from dual union all
12 select 5, 'Formula:Version control:date' from dual union all
13 select 6, 'Formula:Effectivity date' from dual
14 ),
Split issues into rows (splitiss), compare them to test tags (temp)
15 -- split issues into rows ...
16 splitiss as
17 (select issue,
18 tag,
19 regexp_substr(tag, '[^:]+', 1, column_value) val
20 from issues cross join
21 table(cast(multiset(select level from dual
22 connect by level <= regexp_count(tag, ':') + 1
23 ) as sys.odcinumberlist))
24 ),
25 -- ... and now compare them to test tags
26 temp as
27 (select i.issue, i.tag issue_tag, i.val, t.test, t.tag test_tag,
28 instr(t.tag, i.val) ins
29 from splitiss i cross join tests t
30 )
Return the result:
31 -- return only test tags which match to separate issues (INS > 0)
32 select t.issue,
33 t.issue_tag,
34 listagg(t.test, ', ') within group (order by t.test) matched_tests
35 from temp t
36 where t.ins > 0
37 group by t.issue, t.issue_tag;
ISSUE ISSUE_TAG MATCHED_TESTS
---------- -------------------------------------- --------------------
1 Dual UOM:Serial Control:Material Issue 1, 3
2 Validity Rule:Effectivity date 4, 6
SQL>
P.S. I believe you posted wrong test tags for issue #2; should be 4, 6, not 4, 5.

Thanks, this worked
I did break the tags into rows and then used substr,instr matching.

Related

How can I create a right aligned string with n digits from a number without the character for the sign?

I have integers in the range between 1 and 999 and want to create a right aligned string from them, without a space for the sign.
This is possible, for example with
select
substr(to_char(theNumber, '999'), 2) as number3digits
from
theTable;
The substr(...,2) removes the space that is provided for the sign. I am wondering if there is a still shorter way, without the need to use substr to achieve the same result.
You didn't provide sample data so I'm just wondering what's wrong with pure TO_CHAR?
SQL> with test (col) as
2 (select 1 from dual union all
3 select 857 from dual union all
4 select 34 from dual
5 )
6 select substr(to_char(col, '999'), 2) result_rn,
7 lpad(col, 3, ' ') result_lf
8 from test;
RESULT_RN RESULT_LF
--------------- ---------------
1 1
857 857
34 34
SQL>

How to process a column that holds a comma-separated or range string values in Oracle

Using Oracle 12c DB, I have the following table data example that I need assistance with using SQL and PL/SQL.
Table data is as follows:
Table Name: my_data
ID ITEM ITEM_LOC
------- ----------- ----------------
1 Item-1 0,1
2 Item-2 0,1,2,3,4,7
3 Item-3 0-48
4 Item-4 0,1,2,3,4,5,6,7,8
5 Item-5 1-33
6 Item-6 0,1
7 Item-7 0,1,5,8
Using the data above within the my_data table, what is the best way to process this ITEM_LOC as I need to use the values in this column as an individual value, i.e:
0,1 means the SQL needs to return either 0 or 1 or
range values, i.e:
0-48 means the SQL needs to return a value between 0 and 48.
The returned values for both scenarios should commence from lowest to highest and can't be re-used once processed.
Based on the above, it would be great to have a function that takes the ID and returns an individual value from ITEM_LOC that hasn't been used, based on my description above. This could be a comma-separated string value or a range string value.
Desired result for ID = 2 could be 7. For this ID = 2, ITEM_LOC = 7 could not be used again.
Desired result for ID = 5 could be 31. For this ID = 5, ITEM_LOC = 31 could not be used again.
For the ITEM_LOC data that could not be used again, against that ID, I am looking at holding another table to hold this or perhaps separate all data into separate rows with a new column called VALUE_USED.
This query shows how to extract list of ITEM_LOC values based on whether they are comma-separated (which means "take exactly those values") or dash-separated (which means "find all values between starting and end point"). I modified your sample data a little bit (didn't feel like displaying ~50 values if 5 of them do the job).
lines #1 - 6 represent sample data.
the first select (lines #7 - 15) splits comma-separated values into rows
the second select (lines #17 - 26) uses a hierarchical query which adds 1 to the starting value, up to item's end value.
SQL> with my_data (id, item, item_loc) as
2 (select 2, 'Item-2', '0,2,4,7' from dual union all
3 select 7, 'Item-7', '0,1,5' from dual union all
4 select 3, 'Item-3', '0-4' from dual union all
5 select 8, 'Item-8', '5-8' from dual
6 )
7 select id,
8 item,
9 regexp_substr(item_loc, '[^,]+', 1, column_value) loc
10 from my_data
11 cross join table(cast(multiset
12 (select level from dual
13 connect by level <= regexp_count(item_loc, ',') + 1
14 ) as sys.odcinumberlist))
15 where instr(item_loc, '-') = 0
16 union all
17 select id,
18 item,
19 to_char(to_number(regexp_substr(item_loc, '^\d+')) + column_value - 1) loc
20 from my_data
21 cross join table(cast(multiset
22 (select level from dual
23 connect by level <= to_number(regexp_substr(item_loc, '\d+$')) -
24 to_number(regexp_substr(item_loc, '^\d+')) + 1
25 ) as sys.odcinumberlist))
26 where instr(item_loc, '-') > 0
27 order by id, item, loc;
ID ITEM LOC
---------- ------ ----------------------------------------
2 Item-2 0
2 Item-2 2
2 Item-2 4
2 Item-2 7
3 Item-3 0
3 Item-3 1
3 Item-3 2
3 Item-3 3
3 Item-3 4
7 Item-7 0
7 Item-7 1
7 Item-7 5
8 Item-8 5
8 Item-8 6
8 Item-8 7
8 Item-8 8
16 rows selected.
SQL>
I don't know what you meant by saying that "item_loc could not be used again". Used where? If you use the above query in, for example, cursor FOR loop, then yes - those values would be used only once as every loop iteration fetches next item_loc value.
As others have said, it's a bad idea to store data in this way. You very likely could have input like this, and you likely could need to display the data like this, but you don't have to store the data the way it is input or displayed.
I'm going to store the data as individual LOC elements based on the input. I assume the data contains only integers separated by commas, or pairs of integers separated by a hyphen. Whitespace is ignored. The comma-separated list does not have to be in any order. In pairs, if the left integer is greater than the right integer I return no LOC element.
create table t as
with input(id, item, item_loc) as (
select 1, 'Item-1', ' 0,1' from dual union all
select 2, 'Item-2', '0,1,2,3,4,7' from dual union all
select 3, 'Item-3', '0-48' from dual union all
select 4, 'Item-4', '0,1,2,3,4,5,6,7,8' from dual union all
select 5, 'Item-5', '1-33' from dual union all
select 6, 'Item-6', '0,1' from dual union all
select 7, 'Item-7', '0,1,5,8,7 - 11' from dual
)
select distinct id, item, loc from input, xmltable(
'let $item := if (contains($X,",")) then ora:tokenize($X,"\,") else $X
for $i in $item
let $j := if (contains($i,"-")) then ora:tokenize($i,"\-") else $i
for $k in xs:int($j[1]) to xs:int($j[count($j)])
return $k'
passing item_loc as X
columns loc number path '.'
);
Now to "use" an element I just delete it from the table:
delete from t where rowid = (
select min(rowid) keep (dense_rank first order by loc)
from t
where id = 7
);
To return the data in the same format it was input, use MATCH_RECOGNIZE:
select id, item, listagg(item_loc, ',') within group(order by first_loc) item_loc
from t
match_recognize(
partition by id, item order by loc
measures a.loc first_loc,
a.loc || case count(*) when 1 then null else '-'||b.loc end item_loc
pattern (a b*)
define b as loc = prev(loc) + 1
)
group by id, item;
ID ITEM ITEM_LOC
1 Item-1 0-1
2 Item-2 0-4,7
3 Item-3 0-48
4 Item-4 0-8
5 Item-5 1-33
6 Item-6 0-1
7 Item-7 1,5,7-11
Note that the output here will not be exactly like the input, because any consecutive integers will be compressed into a pair.

SQL query: where any dump value is greater than 96

I recently encountered a couple of values that are causing exceptions in our application. A look into the db reveiled that we may have imported erroneous data (which must not be modified!). We now found that the cause of this error lies in unicode problems.
To find all relevant error records, I dumped the values I have already identified (manually) and could see, that the problematic vaules lie above the value 96, an example below:
Typ=96 Len=10: 83,85,49,89,36,73,219,190,159,87
Whereas 219,190,159 are problematic. This can be obtained through select dump(col) from table, however I would like to further only select records where one of the values in the dump exceed 99 -pretty much like (pseudo code) length(string(value)) for value in dump_record > 96 Is there any way to do this? Thanks folks.
This is how I understood the question. See if it helps; follow comments within code:
SQL> with
2 data (col) as
3 -- sample data:
4 -- 123LITTlefOOT results in Typ=96 Len=13: 49,50,51,76,73,84,84,108,101,102,79,79,84
5 -- where marked values are > 96 --- --- ---
6 --
7 -- BIG997FOOT results in Typ=96 Len=10: 66,73,71,57,57,55,70,79,79,84 which is OK
8 (select '123LITTlefOOT' from dual union all
9 select 'BIG997FOOT' from dual
10 ),
11 test as
12 -- DUMP of sample data
13 (select col,
14 dump(col) dmp
15 from data
16 ),
17 -- remove TYP=XX Len=yy
18 test2 as
19 (select col,
20 dmp,
21 trim(substr(dmp, instr(dmp, ':') + 1)) tdmp
22 from test
23 ),
24 -- split TDMP into rows
25 trows as
26 (select col,
27 dmp,
28 regexp_substr(tdmp, '[^,]+', 1, column_value) str
29 from test2 cross join table(cast(multiset(select level from dual
30 connect by level <= regexp_count(tdmp, ',') + 1
31 ) as sys.odcinumberlist))
32 )
33 select distinct col, dmp
34 from trows
35 where to_number(str) > 96;
COL DMP
------------- ------------------------------------------------------------
123LITTlefOOT Typ=1 Len=13: 49,50,51,76,73,84,84,108,101,102,79,79,84
SQL>
Regex totally did it, I used the simple below solution:
SELECT t.* FROM table t WHERE regexp_like(dump(t.error_col), '\d{3}');
Thanks for the quick replies!

How to get all substring occurences between some characters?

What i'm trying to get is the part of a column text that is between some characters ($$ to be exact) but the trick is those characters can occur more than twice (but always even like if there are more than 2 than it must be like $$xxx$$ ... $$yyy$$) and I need to get them separately.
When I try this, if the pattern only occur once then it's no problem :
regexp_substr(txt,'\$\$(.*)\$\$',1,1,null,1)
But lets say the column text is : $$xxx$$ ... $$yyy$$
then it gives me : xxx$$ ... $$yyy
but what I need is two get them in separate lines like :
xxx
yyy
which I couldn't get it done so how?
You could use a recursive query that matches the first occurrence and then removes that from the string for the next iteration of the recursive query.
Assuming your table and column are called tbl and txt:
with cte(match, txt) as (
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from tbl
where regexp_like(txt,'\$\$(.*?)\$\$')
union all
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from cte
where regexp_like(txt,'\$\$(.*?)\$\$')
)
select match from cte
One could also use CONNECT BY to "loop" through the elements surrounded by the double dollar signs, returning the data inside (the 2nd grouping). This method handles NULL elements (ID 7, element 2) and since the dollar signs are consumed as the regex moves from left to right, characters in between the groups are not falsely matched.
SQL> with tbl(id, txt) as (
select 1, '$$xxx$$' from dual union all
select 2, '$$xxx$$ ... $$yyy$$' from dual union all
select 3, '' from dual union all
select 4, '$$xxx$$abc$$yyy$$' from dual union all
select 5, '$$xxx$$ ... $$yyy$$ ... $$www$$ ... $$zzz$$' from dual union all
select 6, '$$aaa$$$$bbb$$$$ccc$$$$ddd$$' from dual union all
select 7, '$$aaa$$$$$$$$ccc$$$$ddd$$' from dual
)
select id, level, regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level,null,2) element
from tbl
connect by regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level) is not null
and prior txt = txt
and prior sys_guid() is not null
order by id, level;
ID LEVEL ELEMENT
---------- ---------- -------------------------------------------
1 1 xxx
2 1 xxx
2 2 yyy
3 1
4 1 xxx
4 2 yyy
5 1 xxx
5 2 yyy
5 3 www
5 4 zzz
6 1 aaa
6 2 bbb
6 3 ccc
6 4 ddd
7 1 aaa
7 2
7 3 ccc
7 4 ddd
18 rows selected.
SQL>

Populate a numbered row until reaching a specific value with another column

I have a table full of account numbers and period/terms for loan(loan term is in months)
What I need to do is populate a numbered row for each account number that is less than or equal to the loan term. I've attached a screen shot below:
Example
So for this specific example, I will need 48 numbered rows for this account number, as the term is only 48 months.
Thanks for the help!
with
test_data ( account_nmbr, term ) as (
select 'ABC200', 6 from dual union all
select 'DEF100', 8 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select level as row_nmbr, term, account_nmbr
from test_data
connect by level <= term
and prior account_nmbr = account_nmbr
and prior sys_guid() is not null
order by account_nmbr, row_nmbr -- If needed
;
ROW_NMBR TERM ACCOUNT_NMBR
-------- ---------- ------------
1 6 ABC200
2 6 ABC200
3 6 ABC200
4 6 ABC200
5 6 ABC200
6 6 ABC200
1 8 DEF100
2 8 DEF100
3 8 DEF100
4 8 DEF100
5 8 DEF100
6 8 DEF100
7 8 DEF100
8 8 DEF100
In Oracle 12, you can use the LATERAL clause for the same:
with
test_data ( account_nmbr, term ) as (
select 'ABC200', 6 from dual union all
select 'DEF100', 8 from dual
)
-- End of simulated inputs (for testing purposes only, not part of the solution).
-- SQL query begins BELOW THIS LINE.
select l.row_nmbr, t.term, t.account_nmbr
from test_data t,
lateral (select level as row_nmbr from dual connect by level <= term) l
order by account_nmbr, row_nmbr -- If needed
;