How to get all substring occurences between some characters? - sql

What i'm trying to get is the part of a column text that is between some characters ($$ to be exact) but the trick is those characters can occur more than twice (but always even like if there are more than 2 than it must be like $$xxx$$ ... $$yyy$$) and I need to get them separately.
When I try this, if the pattern only occur once then it's no problem :
regexp_substr(txt,'\$\$(.*)\$\$',1,1,null,1)
But lets say the column text is : $$xxx$$ ... $$yyy$$
then it gives me : xxx$$ ... $$yyy
but what I need is two get them in separate lines like :
xxx
yyy
which I couldn't get it done so how?

You could use a recursive query that matches the first occurrence and then removes that from the string for the next iteration of the recursive query.
Assuming your table and column are called tbl and txt:
with cte(match, txt) as (
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from tbl
where regexp_like(txt,'\$\$(.*?)\$\$')
union all
select regexp_substr(txt,'\$\$(.*?)\$\$', 1, 1, null, 1),
regexp_replace(txt,'\$\$(.*?)\$\$', '', 1, 1)
from cte
where regexp_like(txt,'\$\$(.*?)\$\$')
)
select match from cte

One could also use CONNECT BY to "loop" through the elements surrounded by the double dollar signs, returning the data inside (the 2nd grouping). This method handles NULL elements (ID 7, element 2) and since the dollar signs are consumed as the regex moves from left to right, characters in between the groups are not falsely matched.
SQL> with tbl(id, txt) as (
select 1, '$$xxx$$' from dual union all
select 2, '$$xxx$$ ... $$yyy$$' from dual union all
select 3, '' from dual union all
select 4, '$$xxx$$abc$$yyy$$' from dual union all
select 5, '$$xxx$$ ... $$yyy$$ ... $$www$$ ... $$zzz$$' from dual union all
select 6, '$$aaa$$$$bbb$$$$ccc$$$$ddd$$' from dual union all
select 7, '$$aaa$$$$$$$$ccc$$$$ddd$$' from dual
)
select id, level, regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level,null,2) element
from tbl
connect by regexp_substr(txt,'(\$\$(.*?)\$\$)',1,level) is not null
and prior txt = txt
and prior sys_guid() is not null
order by id, level;
ID LEVEL ELEMENT
---------- ---------- -------------------------------------------
1 1 xxx
2 1 xxx
2 2 yyy
3 1
4 1 xxx
4 2 yyy
5 1 xxx
5 2 yyy
5 3 www
5 4 zzz
6 1 aaa
6 2 bbb
6 3 ccc
6 4 ddd
7 1 aaa
7 2
7 3 ccc
7 4 ddd
18 rows selected.
SQL>

Related

How to return a numeric substring given exceptions?

It is necessary to pull out numerical sequences consisting of 5 or 6 digits,
excluding numeric sequences containing sequences of letters on the left side 'CR', 'MRLID_', 'GEO_'. The first hyphen is not a search key
My example:
SELECT REGEXP_SUBSTR('84830-Soc_Dem-Carousel-CR236666',
'([^CR\d{6}]+)|([^MRLID_\d{5}]+)|([GEO_\d{5}]+)\d{5,6}',
1,
1,
'i')
FROM dual
If input string
has the following form:
'McCombo_Mar-Apr11119_mcd_installs;759678/;CR759428-Soc_Dem-Multi_roll_15sec-R27?<MRLID_12345>%GEO_78934?]ysl_fraw_blackopium_display_aw'
It is necessary to pull out the value 759678. This is a numeric sequence, it can consist of 5 or 6 characters, it can be located anywhere in the line, it is not possible to somehow select it with a separator.
A REGEXP_SUBSTR with a regex to get a number with 5 to 6 digits that's not proceeded by a letter or digit, and not followed by a letter/digit/underscore.
with DATA as (
select '84830-Soc_Dem-Carousel-CR236666' String from dual
union all select 'McCombo_Mar-Apr11119_mcd_installs;759678/;CR759428-Soc_Dem-Multi_roll_15sec-R27?<MRLID_12345>%GEO_78934?]ysl_fraw_blackopium_display_aw' from dual
union all select 'Couriers_75942-Soc_Dem-Multi_roll_15sec-R27_McCombo_Mar-Apr19_mcd_installs' from dual
)
select REGEXP_SUBSTR(String, '(^|[^[:alnum:]])([[:digit:]]{5,6})([^[:alnum:]_]|$)',1,1,'',2) as ID
from DATA;
| ID |
| :----- |
| 84830 |
| 759678 |
| 75942 |
db<>fiddle here
Use regexp_replace instead to replace -.* with blank:
select regexp_replace('84830-Soc_Dem-Carousel-CR236666', '-.*','') from dual
See live demo.
How about this?
SQL> with test (col) as
2 (select 'McCombo_Mar-Apr19_mcd_installs;759678/;CR759428-Soc_Dem-Multi_roll_15sec-R27?<MRLID_12345>%GEO_78934?]ysl_fraw_blackopium_display_aw' from dual)
3 select val
4 from (select regexp_substr(col, '[[:alnum:]_]+', 1, level) val
5 from test
6 connect by level <= regexp_count(col, '[[:alnum:]_]+')
7 )
8 where regexp_like(val, '^\d+$');
VAL
--------------------------------------------------------------------------------
759678
SQL>
subquery returns all alphanumeric substrings (that's regexp_substr)
the main query returns values that consist of numbers only (that's regexp_like)

How to modify Column Data in BigQuery Table

I've been trying to search online however, I am only able to see how to add, remove, change a column in a table. Basically, I need to go through an entire column of email addresses in BigQuery and add a 2nd email address in each of the rows.
ID|Name |email
1 |Name1|email1#address.com
2 |Name2|email2#address.com
3 |Name3|email3#address.com
4 |Name4|email4#address.com
5 |Name5|email5#address.com
6 |Name6|email6#address.com
What I am looking for is some script that will go through a column and add a 2nd emailadd with a "," in the middle so that it'll look like this:
ID|Name |email
1 |Name1|email1#address.com,secondemail#address.com
2 |Name2|email2#address.com,secondemail#address.com
3 |Name3|email3#address.com,secondemail#address.com
4 |Name4|email4#address.com,secondemail#address.com
5 |Name5|email5#address.com,secondemail#address.com
6 |Name6|email6#address.com,secondemail#address.com
While all of the beginning data remains intact. Please let me know if this is possible. Also the "secondemail#address.com" is just one email address it doesn't change per user. I just need this format for a business reason.
Below is for BigQuery Standard SQL
#standardSQL
SELECT * REPLACE(email || ',secondemail#address.com' AS email)
FROM `project.dataset.table`
You can test, play with above using sample data from your question, as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 'Name1' name, 'email1#address.com' email UNION ALL
SELECT 2, 'Name2', 'email2#address.com' UNION ALL
SELECT 3, 'Name3', 'email3#address.com' UNION ALL
SELECT 4, 'Name4', 'email4#address.com' UNION ALL
SELECT 5, 'Name5', 'email5#address.com' UNION ALL
SELECT 6, 'Name6', 'email6#address.com'
)
SELECT * REPLACE(email || ',secondemail#address.com' AS email)
FROM `project.dataset.table`
with output
Row id name email
1 1 Name1 email1#address.com,secondemail#address.com
2 2 Name2 email2#address.com,secondemail#address.com
3 3 Name3 email3#address.com,secondemail#address.com
4 4 Name4 email4#address.com,secondemail#address.com
5 5 Name5 email5#address.com,secondemail#address.com
6 6 Name6 email6#address.com,secondemail#address.com

Oracle SQL compare strings and find matching sub-strings

I have colon separated tags associated to two different entities in two tables. Would like to do a sub-string matching for the tags and relate the entities.
Table 1 - Table of issues
Issue ---------------- Tag
Issue 1 -------------- Dual UOM:Serial Control:Material Issue
Issue 2 -------------- Validity rule:Effectivity date
Table 2 - Table of Tests
Test ----------------- Tag
Test 1 --------------- Inventory:Outbound:Material Issue
Test 2 --------------- Items:Single UOM
Test 3 --------------- Items:Dual UOM
Test 4 --------------- Recipe:Validity Rule
Test 5 --------------- Formula:Version control:date
Test 6 --------------- Formula:Effectivity date
Now, for each issue in table 1, I need to compare its associated tag with the tags in table 2 and find applicable tests.
In above ex,
Issue 1 - Matched tests will be Test 1, Test 3
Issue 2 - Matched tests will be Test 4, Test 5
All the tags associated to the issues and tests will come from a common tag master.
Any help in providing the sql code snippet that would do this sub-string to sub-string matching is much appreciated.
Here's one option: split issues into rows and compare them to test tags, using the INSTR function. Note that letter case must match. If it doesn't (in reality), use lower or upper function.
Read comments within code (which is split into several parts, to improve readability).
Sample data first:
SQL> with
2 -- sample data
3 issues (issue, tag) as
4 (select 1, 'Dual UOM:Serial Control:Material Issue' from dual union all
5 select 2, 'Validity Rule:Effectivity date' from dual
6 ),
7 tests (test, tag) as
8 (select 1, 'Inventory:Outbound:Material Issue' from dual union all
9 select 2, 'Items:Single UOM' from dual union all
10 select 3, 'Items:Dual UOM' from dual union all
11 select 4, 'Recipe:Validity Rule' from dual union all
12 select 5, 'Formula:Version control:date' from dual union all
13 select 6, 'Formula:Effectivity date' from dual
14 ),
Split issues into rows (splitiss), compare them to test tags (temp)
15 -- split issues into rows ...
16 splitiss as
17 (select issue,
18 tag,
19 regexp_substr(tag, '[^:]+', 1, column_value) val
20 from issues cross join
21 table(cast(multiset(select level from dual
22 connect by level <= regexp_count(tag, ':') + 1
23 ) as sys.odcinumberlist))
24 ),
25 -- ... and now compare them to test tags
26 temp as
27 (select i.issue, i.tag issue_tag, i.val, t.test, t.tag test_tag,
28 instr(t.tag, i.val) ins
29 from splitiss i cross join tests t
30 )
Return the result:
31 -- return only test tags which match to separate issues (INS > 0)
32 select t.issue,
33 t.issue_tag,
34 listagg(t.test, ', ') within group (order by t.test) matched_tests
35 from temp t
36 where t.ins > 0
37 group by t.issue, t.issue_tag;
ISSUE ISSUE_TAG MATCHED_TESTS
---------- -------------------------------------- --------------------
1 Dual UOM:Serial Control:Material Issue 1, 3
2 Validity Rule:Effectivity date 4, 6
SQL>
P.S. I believe you posted wrong test tags for issue #2; should be 4, 6, not 4, 5.
Thanks, this worked
I did break the tags into rows and then used substr,instr matching.

How to process a column that holds a comma-separated or range string values in Oracle

Using Oracle 12c DB, I have the following table data example that I need assistance with using SQL and PL/SQL.
Table data is as follows:
Table Name: my_data
ID ITEM ITEM_LOC
------- ----------- ----------------
1 Item-1 0,1
2 Item-2 0,1,2,3,4,7
3 Item-3 0-48
4 Item-4 0,1,2,3,4,5,6,7,8
5 Item-5 1-33
6 Item-6 0,1
7 Item-7 0,1,5,8
Using the data above within the my_data table, what is the best way to process this ITEM_LOC as I need to use the values in this column as an individual value, i.e:
0,1 means the SQL needs to return either 0 or 1 or
range values, i.e:
0-48 means the SQL needs to return a value between 0 and 48.
The returned values for both scenarios should commence from lowest to highest and can't be re-used once processed.
Based on the above, it would be great to have a function that takes the ID and returns an individual value from ITEM_LOC that hasn't been used, based on my description above. This could be a comma-separated string value or a range string value.
Desired result for ID = 2 could be 7. For this ID = 2, ITEM_LOC = 7 could not be used again.
Desired result for ID = 5 could be 31. For this ID = 5, ITEM_LOC = 31 could not be used again.
For the ITEM_LOC data that could not be used again, against that ID, I am looking at holding another table to hold this or perhaps separate all data into separate rows with a new column called VALUE_USED.
This query shows how to extract list of ITEM_LOC values based on whether they are comma-separated (which means "take exactly those values") or dash-separated (which means "find all values between starting and end point"). I modified your sample data a little bit (didn't feel like displaying ~50 values if 5 of them do the job).
lines #1 - 6 represent sample data.
the first select (lines #7 - 15) splits comma-separated values into rows
the second select (lines #17 - 26) uses a hierarchical query which adds 1 to the starting value, up to item's end value.
SQL> with my_data (id, item, item_loc) as
2 (select 2, 'Item-2', '0,2,4,7' from dual union all
3 select 7, 'Item-7', '0,1,5' from dual union all
4 select 3, 'Item-3', '0-4' from dual union all
5 select 8, 'Item-8', '5-8' from dual
6 )
7 select id,
8 item,
9 regexp_substr(item_loc, '[^,]+', 1, column_value) loc
10 from my_data
11 cross join table(cast(multiset
12 (select level from dual
13 connect by level <= regexp_count(item_loc, ',') + 1
14 ) as sys.odcinumberlist))
15 where instr(item_loc, '-') = 0
16 union all
17 select id,
18 item,
19 to_char(to_number(regexp_substr(item_loc, '^\d+')) + column_value - 1) loc
20 from my_data
21 cross join table(cast(multiset
22 (select level from dual
23 connect by level <= to_number(regexp_substr(item_loc, '\d+$')) -
24 to_number(regexp_substr(item_loc, '^\d+')) + 1
25 ) as sys.odcinumberlist))
26 where instr(item_loc, '-') > 0
27 order by id, item, loc;
ID ITEM LOC
---------- ------ ----------------------------------------
2 Item-2 0
2 Item-2 2
2 Item-2 4
2 Item-2 7
3 Item-3 0
3 Item-3 1
3 Item-3 2
3 Item-3 3
3 Item-3 4
7 Item-7 0
7 Item-7 1
7 Item-7 5
8 Item-8 5
8 Item-8 6
8 Item-8 7
8 Item-8 8
16 rows selected.
SQL>
I don't know what you meant by saying that "item_loc could not be used again". Used where? If you use the above query in, for example, cursor FOR loop, then yes - those values would be used only once as every loop iteration fetches next item_loc value.
As others have said, it's a bad idea to store data in this way. You very likely could have input like this, and you likely could need to display the data like this, but you don't have to store the data the way it is input or displayed.
I'm going to store the data as individual LOC elements based on the input. I assume the data contains only integers separated by commas, or pairs of integers separated by a hyphen. Whitespace is ignored. The comma-separated list does not have to be in any order. In pairs, if the left integer is greater than the right integer I return no LOC element.
create table t as
with input(id, item, item_loc) as (
select 1, 'Item-1', ' 0,1' from dual union all
select 2, 'Item-2', '0,1,2,3,4,7' from dual union all
select 3, 'Item-3', '0-48' from dual union all
select 4, 'Item-4', '0,1,2,3,4,5,6,7,8' from dual union all
select 5, 'Item-5', '1-33' from dual union all
select 6, 'Item-6', '0,1' from dual union all
select 7, 'Item-7', '0,1,5,8,7 - 11' from dual
)
select distinct id, item, loc from input, xmltable(
'let $item := if (contains($X,",")) then ora:tokenize($X,"\,") else $X
for $i in $item
let $j := if (contains($i,"-")) then ora:tokenize($i,"\-") else $i
for $k in xs:int($j[1]) to xs:int($j[count($j)])
return $k'
passing item_loc as X
columns loc number path '.'
);
Now to "use" an element I just delete it from the table:
delete from t where rowid = (
select min(rowid) keep (dense_rank first order by loc)
from t
where id = 7
);
To return the data in the same format it was input, use MATCH_RECOGNIZE:
select id, item, listagg(item_loc, ',') within group(order by first_loc) item_loc
from t
match_recognize(
partition by id, item order by loc
measures a.loc first_loc,
a.loc || case count(*) when 1 then null else '-'||b.loc end item_loc
pattern (a b*)
define b as loc = prev(loc) + 1
)
group by id, item;
ID ITEM ITEM_LOC
1 Item-1 0-1
2 Item-2 0-4,7
3 Item-3 0-48
4 Item-4 0-8
5 Item-5 1-33
6 Item-6 0-1
7 Item-7 1,5,7-11
Note that the output here will not be exactly like the input, because any consecutive integers will be compressed into a pair.

SQL to distinct part of the string - Oracle SQL

I have a table table1 with column line which is of type CLOB
Here are the values:
seq line
------------------------------
1 ISA*00*TEST
ISA*00*TEST1
GS*123GG*TEST*456:EHE
ST*ERT*RFR*EDRR*EER
GS*123GG*TEST*456:EHE
-------------------------------
2 ISA*01*TEST
GS*124GG*TEST*456:EHE
GS*125GG*TEST*456:EHE
ST*ERQ*RFR*EDRR*EER
ST*ERW*RFR*EDRR*EER
ST*ERR*RFR*EDRR*EER
I am trying to find the distinct string of the substring before the second star.
The output would be:
distinct_line_value count
ISA*00 2
GS*123GG 2
ST*ERT 1
ISA*01 1
GS*124GG 1
GS*125GG 1
ST*ERQ 1
ST*ERW 1
ST*ERR 1
Any ideas how I can do it based on distinct for the first 2 stars?
Here's one option:
Test case:
SQL> select * from test;
SEQ LINE
---------- --------------------------------------------------
1 ISA*00*TEST
ISA*00*TEST1
GS*123GG*TEST*456:EHE
ST*ERT*RFR*EDRR*EER
GS*123GG*TEST
2 ISA*01*TEST
GS*124GG*TEST*456:EHE
GS*125GG*TEST*456:EHE
ST*ERQ*RFR*EDRR*EER
ST*E
Query (see comments within the code; apart from that REGEXP_SUBSTR is crucial here, along with its 'm' match parameter which treats the input string as multiple lines):
SQL> with
2 -- split CLOB values to rows
3 inter as
4 (select seq,
5 regexp_substr(line, '^.*$', 1, column_value, 'm') res
6 from test,
7 table(cast(multiset(select level from dual
8 connect by level <= regexp_count(line, chr(10)) + 1
9 ) as sys.odcinumberlist))
10 ),
11 -- convert CLOB to VARCHAR2 (so that SUBSTR works)
12 inter2 as
13 (select to_char(res) res From inter)
14 -- the final result
15 select substr(res, 1, instr(res, '*', 1, 2)) val, count(*)
16 from inter2
17 group by substr(res, 1, instr(res, '*', 1, 2))
18 order by 1;
VAL COUNT(*)
-------------------------------------------------- ----------
GS*123GG* 2
GS*124GG* 1
GS*125GG* 1
ISA*00* 2
ISA*01* 1
ST*ERQ* 1
ST*ERR* 1
ST*ERT* 1
ST*ERW* 1
9 rows selected.
SQL>