Extract delimited data from a column into multiple rows with SQLite [duplicate]

Extract delimited data from a column into multiple rows with SQLite [duplicate] - sql

I'm struggling to convert
a | a1,a2,a3
b | b1,b3
c | c2,c1
to:
a | a1
a | a2
a | a3
b | b1
b | b2
c | c2
c | c1
Here are data in sql format:
CREATE TABLE data(
"one" TEXT,
"many" TEXT
);
INSERT INTO "data" VALUES('a','a1,a2,a3');
INSERT INTO "data" VALUES('b','b1,b3');
INSERT INTO "data" VALUES('c','c2,c1');
The solution is probably recursive Common Table Expression.
Here's an example which does something similar to a single row:
WITH RECURSIVE list( element, remainder ) AS (
SELECT NULL AS element, '1,2,3,4,5' AS remainder
UNION ALL
SELECT
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM list
WHERE remainder IS NOT NULL
)
SELECT * FROM list;
(originally from this blog post: https://blog.expensify.com/2015/09/25/the-simplest-sqlite-common-table-expression-tutorial)
It produces:
element | remainder
-------------------
NULL | 1,2,3,4,5
1 | 2,3,4,5
2 | 3,4,5
3 | 4,5
4 | 5
5 | NULL
the problem is thus to apply this to each row in a table.

Yes, a recursive common table expression is the solution:
with x(one, firstone, rest) as
(select one, substr(many, 1, instr(many, ',')-1) as firstone, substr(many, instr(many, ',')+1) as rest from data where many like "%,%"
UNION ALL
select one, substr(rest, 1, instr(rest, ',')-1) as firstone, substr(rest, instr(rest, ',')+1) as rest from x where rest like "%,%" LIMIT 200
)
select one, firstone from x UNION ALL select one, rest from x where rest not like "%,%"
ORDER by one;
Output:
a|a1
a|a2
a|a3
b|b1
b|b3
c|c2
c|c1

Check my answer in How to split comma-separated value in SQLite?.
This will give you the transformation in a single query rather than having to apply to each row.
-- using your data table assuming that b3 is suppose to be b2
WITH split(one, many, str) AS (
SELECT one, '', many||',' FROM data
UNION ALL SELECT one,
substr(str, 0, instr(str, ',')),
substr(str, instr(str, ',')+1)
FROM split WHERE str !=''
) SELECT one, many FROM split WHERE many!='' ORDER BY one;
a|a1
a|a2
a|a3
b|b1
b|b2
c|c2
c|c1

Related

REGEXP to validate a specific number

How can I search for a specific number in an array using REGEXP?
I have an array and need to verify if it has a specific number.
Ex: [5,2,1,4,6,19] and I am looking for number 1, but just the number 1 and not any number that contain the digit 1.
I had to do this:
case when REGEXP_INSTR(JSON_QUERY(MY_JSON_COLUMN,'$.path') , '[[]{1}[1][,]')<>0
or REGEXP_INSTR(JSON_QUERY(MY_JSON_COLUMN,'$.path') , '[,]{1}[1][,]{1}')<>0
or REGEXP_INSTR(JSON_QUERY(MY_JSON_COLUMN,'$.path') , '[,]{1}[1][]]')<>0
or REGEXP_INSTR(JSON_QUERY(MY_JSON_COLUMN,'$.path') , '[[]{1}[1][]]') <>0
then 'DIGIT_ONE' else 'NO_DIGIT_ONE'
end
Is there anything simpler?

You can use
(^|\D)1(\D|$)
This will seach for 1 not enclosed with other digits.
See this regex demo.
Details
(^|\D) - start of string or non-digit
1 - a 1 char
(\D|$) - non-digit or end of string.

Do NOT use regular expressions, use a proper JSON parser and then filter for the number you want:
SELECT my_json_column,
CASE
WHEN JSON_EXISTS( my_json_column, '$?(#.path[*] == 1)' )
THEN 'DIGIT ONE'
ELSE 'NO DIGIT ONE'
END AS has_one
FROM table_name;
or (if you are using Oracle 12.1 and cannot use path filter expressions with JSON_EXISTS, which is only available from Oracle 12.2):
SELECT my_json_column,
CASE
WHEN EXISTS(
SELECT 'X'
FROM JSON_TABLE(
t.my_json_column,
'$.path[*]'
COLUMNS (
value NUMBER PATH '$'
)
)
WHERE value = 1
)
THEN 'DIGIT ONE'
ELSE 'NO DIGIT ONE'
END
FROM table_name t;
Which, for the sample data:
CREATE TABLE table_name (
my_json_column CHECK ( my_json_column IS JSON )
) AS
SELECT '{"path":[5,2,1,4,6,19],"not_this_path":[1,2,3,4,5]}' FROM DUAL UNION ALL
SELECT '{"path":[5,2,4,6,19],"not_this_path":[1,2,3,4,5]}' FROM DUAL UNION ALL
SELECT '{"path":[11],"not_this_path":[1]}' FROM DUAL UNION ALL
SELECT '{"path":[2],"not_this_path":[1]}' FROM DUAL UNION ALL
SELECT '{"path":[1,11]}' FROM DUAL;
Both output:
MY_JSON_COLUMN | HAS_ONE
:-------------------------------------------------- | :-----------
{"path":[5,2,1,4,6,19],"not_this_path":[1,2,3,4,5]} | DIGIT ONE
{"path":[5,2,4,6,19],"not_this_path":[1,2,3,4,5]} | NO DIGIT ONE
{"path":[11],"not_this_path":[1]} | NO DIGIT ONE
{"path":[2],"not_this_path":[1]} | NO DIGIT ONE
{"path":[1,11]} | DIGIT ONE
db<>fiddle here

Alternatively, with a little bit more typing (a little bit? Am I kidding?!), splitting the string into rows and comparing values to the search string:
SQL> with test (col) as
2 (select '[5,2,1,4,6,19]' from dual)
3 select t.col,
4 case when '&par_search_string' in
5 (select regexp_substr(substr(col, 2, length(col) - 1), '[^,]+', 1, level) val
6 from test
7 connect by level <= regexp_count(col, ',') + 1
8 )
9 then 'Search string exists'
10 else 'Search string does not exist'
11 end result
12 from test t;
Enter value for par_search_string: 1
COL RESULT
-------------- ----------------------------
[5,2,1,4,6,19] Search string exists
SQL> /
Enter value for par_search_string: 24
COL RESULT
-------------- ----------------------------
[5,2,1,4,6,19] Search string does not exist
SQL>

SQL How to perform multiple look-ups from a list, in one query

We have a weird database table (wt) for which I can construct a query that can return a single row with these fields:
wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9
Then, there's another lookup table (dt) that holds descriptions for these numbers, you could imagine it like this:
id desc
5 "flour"
12 "cups"
9 "barley"
what I need to end up with is numbers from wt, along with its description from dt.
I can do 3 simple queries, one to look up each of my three thing_ values (select desc from dt where id = ) but I was hoping to do it all in one query.
Is there a way to do this?
Even better, is there way to do my query to get my single row of thing id's and combine them with their descriptions? I think the fundamental problem/challenge is that my thing id's are not one per row, but that they come back as fields in just one row. This makes it really hard to join against them, for example.
Michael

You seem to want conditional aggregation:
select
max(case when id = 3 then descr end) descr_3,
max(case when id = 12 then descr end) descr_12,
max(case when id = 9 then descr end) descr_9
from dt
where id in (3, 12, 9)
Note that desc is a SQL keyword, hence a poor choice for a column name. I renamed it descr in the query.

You will need multiple joins to the dt table to get the description of each of the "things" you want in a single row:
SELECT thing_a_id, dta.desc AS thing_a_desc,
thing_b_id, dtb.desc AS thing_b_desc,
thing_c_id, dtc.desc AS thing_c_desc
FROM wt
JOIN dt dta ON dta.id = wt.thing_a_id
JOIN dt dtb ON dtb.id = wt.thing_b_id
JOIN dt dtc ON dtc.id = wt.thing_c_id

I love to play with common table expressions (CTE), this is an ideal candidate for one.
In the example below, decriptions and dataset are substitutes for the actual tables you use. I am just building them in memory rather than an actual table.
In the "breakdown" CTE I am splitting up the CSV value from dataset into multiple rows.
In the last part of the select I am converting everything after the = sign to a number, and then matching that on id from the descriptions CTE. The resulting dataset is I believe what you requested.
WITH
descriptions AS
(SELECT 5 AS id, 'flour' AS description FROM DUAL
UNION ALL
SELECT 12 AS id, 'cups' AS description FROM DUAL
UNION ALL
SELECT 9 AS id, 'barley' AS description FROM DUAL),
dataset AS
(SELECT 'wt.thing_a_id = 5, wt.thing_b_id = 12, wt.thing_c_id = 9' AS result FROM DUAL),
breakdown ( result, REMAINDER ) AS
(SELECT TRIM( SUBSTR( result
, 1
, INSTR( result || ',', ',' ) - 1 ) ) AS result
, TRIM( SUBSTR( result, INSTR( result || ',', ',' ) + 1 ) || ',' ) AS REMAINDER
FROM dataset
UNION ALL
SELECT TRIM( SUBSTR( REMAINDER
, 1
, INSTR( REMAINDER, ',' ) - 1 ) )
, SUBSTR( REMAINDER, INSTR( REMAINDER || ',', ',' ) + 1 ) AS REMAINDER
FROM breakdown
WHERE REMAINDER IS NOT NULL)
SELECT result, TO_NUMBER( TRIM( SUBSTR( result, INSTR( result, '=' ) + 1 ) ) ) AS id, description
FROM breakdown
LEFT OUTER JOIN descriptions
ON TO_NUMBER( TRIM( SUBSTR( breakdown.result, INSTR( breakdown.result, '=' ) + 1 ) ) ) =
descriptions.id
Results:
Result ID DESCRIPTION
wt.thing_a_id = 5 5 flour
wt.thing_b_id = 12 12 cups
wt.thing_c_id = 9 9 barley

How to remove duplicated values from attribute

I have a column with duplicated values in single cell, please tell me how can i remove duplicated values using sql or pl/sql only.
| Test
-+--------------------------------------------------------------------
| 999999999(10145) 999999999(10145) 999999999(10145) 999999999(10145)
|--------------------------------------------------------------------
| 113307425(2) 310122174(2) 310122174(2) 113307425(2)

Use a regular expression with a back-reference to match the repeating terms:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
Query:
SELECT REGEXP_REPLACE( value, '([^ ]+)( \1)+', '\1' ) AS replaced_value
FROM test_data
Output:
| REPLACED_VALUE |
| :------------- |
| 9999999(12345) |
db<>fiddle here
Updated: For new data in the 6th edit:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL UNION ALL
SELECT '113307425(2) 310122174(2) 310122174(2) 113307425(2)' FROM DUAL;
Query:
Use a recursive sub-query factoring clause to find the terms in the string and then use DISTINCT to remove the duplicates and the LISTAGG to concatenate them back into a single string.
WITH bounds ( id, value, start_pos, end_pos ) AS (
SELECT ROWID,
value,
1,
INSTR( value, ' ', 1 )
FROM test_data
UNION ALL
SELECT id,
value,
end_pos + 1,
INSTR( value, ' ', end_pos + 1 )
FROM bounds
WHERE end_pos > 0
),
strings ( id, value ) AS (
SELECT DISTINCT
id,
CASE end_pos
WHEN 0
THEN SUBSTR( value, start_pos )
ELSE SUBSTR( value, start_pos, end_pos - start_pos )
END
FROM bounds
)
SELECT LISTAGG( value, ' ' ) WITHIN GROUP ( ORDER BY value ) AS unique_values
FROM strings
GROUP BY id
Output:
| UNIQUE_VALUES |
| :------------------------ |
| 9999999(12345) |
| 113307425(2) 310122174(2) |
db<>fiddle here

Oracle allows for recursive subquery factoring that can be harnessed to apply regexp based substitutions repeatedly:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
WITH rep(n,s,n_maxrep) AS (
SELECT 1
, value
, 1 + LENGTH(REGEXP_REPLACE(value, '[^ ]', ''))
FROM test_data
UNION ALL
SELECT n+1
, REGEXP_REPLACE ( s, '([^ ]+)(( [^ ]+)*)( \1)+', '\1\2' )
, n_maxrep
FROM rep
WHERE n <= n_maxrep
)
SELECT s FROM rep WHERE n = n_maxrep;
Explanation
The query repeatedly applies the same basic regex-based replacement of a single verb duplicate. to the original column. A 'verb' in this context is the maximal sequence of consecutive non-space chars. The duplicates may be next to each other or be separated by other verbs.
The maximum possible number of such replacements is known beforehand: n-1 for n verbs, when all verbs are identical. This is equivalent to the number of occurrences of the separating character in the original value.
Everything else is syntactic sugar. Oracle builds the nested chain of subqueries on its own.
Note that the limit n_maxrep is actually 1 + <number_of_separator_occurrences>. This is necessary as the base case ( n=1 ) does no replacement.

Replace string with random text - Oracle SQL

I have a table table1 with 1 column - edi_value which is of type CLOB.
These are the entries:
seq edi_message
1 ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
2 ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
Please note - there can be varying number of lines, from 3 to 500.
What I'm looking for is the following conditions:
Ignore text before first * in each line, for every line, before the first *, it should not change. For ex. GS, ST should not change. ONLY after the first * should randomize
Replace numbers [0-9] with random numbers, for ex. if 0 is replaced with 1, then it should be 1 througout.
Replace text [A-Za-z] with random text, for ex. if A is replaced with W, then it should be replaced with W throughout
Leave special characters as is
One character/number should ONLY map to one random character/number
Output can be:
seq edi_message
1 ISA*11* *11* *13*4030111101 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130A1~
2 ISA*11* *11* *13*30234320023 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130W1~
How can this be achieved in Oracle SQL?

You can use translate with a helper function for generating random strings (though #LukStorms has a much neater SQL solution for that using LISTAGG), along with a method to tokenise and then re-concatenate the values into lines (I use a pure SQL method here for demonstration):
create or replace function f(p_low integer, p_high integer)
return varchar as
r varchar(2000) := '';
x integer;
begin
for i in p_low..p_high loop
x := dbms_random.value(0,length(r)+1);
r := substr(r,1,x)||chr(i)||substr(r,x+1);
end loop;
return r;
end;
/
select * from table1;
| EDI_VALUE |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A1~ |
| ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A |
with t as (select f(48,57)||f(65,90) translate_chars from dual)
select (select new_value
from (select substr(sys_connect_by_path(r_line,'
'),2) new_value, connect_by_isleaf isleaf
from (select lvl
, substr(line,1,instr(line,'*')-1)||
translate(substr(line,instr(line,'*'))
,'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
,(select translate_chars from t)) r_line
from (select level lvl
, regexp_substr(edi_value,'^.*$',1,level,'m') line
from (select table1.edi_value from dual)
connect by level <= regexp_count(edi_value,'^.*$',1,'m')))
start with lvl=1 connect by lvl=(prior lvl)+1)
where isleaf=1)
from table1;
| (SELECTNEW_VALUEFROM(SELECTSUBSTR(SYS_CONNECT_BY_PATH(R_LINE,''),2)NEW_VALUE,CONNECT_BY_ISLEAFISLEAFFROM(SELECTLVL,SUBSTR(LINE,1,INSTR(LINE,'*')-1)||TRANSLATE(SUBSTR(LINE,INSTR(LINE,'*')),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',(SELECTTRANSLATE_CHARSFR |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*66* *66* *67*1935006626 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G0~ |
| ISA*66* *66* *67*32471742247 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G |
db<>fiddle here

You can use CTE's with a CONNECT to generate the strings for the letters and numbers.
Then use the ordered and scrambled strings in the translate.
A CROSS APPLY can be used to REGEX split the message into parts.
Then only translate those that start with a *.
And use LISTAGG to glue the parts back together.
WITH
NUMS as
(
select
LISTAGG(n, '') WITHIN GROUP (ORDER BY n) as n_from,
LISTAGG(n, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as n_to
from (select level-1 n from dual connect by level <= 10)
),
LETTERS as
(
select
LISTAGG(c, '') WITHIN GROUP (ORDER BY c) as c_from,
LISTAGG(c, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as c_to
from (select chr(ascii('A')+level-1 ) c from dual connect by level <= 26)
)
SELECT ca.scrambled as scrambled_message
FROM table1 t
CROSS JOIN NUMS
CROSS JOIN LETTERS
CROSS APPLY
(
SELECT LISTAGG(CASE WHEN part like '*%' then translate(part, n_from||c_from, n_to||c_to) else part end, '') WITHIN GROUP (ORDER BY lvl) as scrambled
FROM
(
SELECT
level AS lvl,
REGEXP_SUBSTR(t.edi_message,'[*]\S+|[^*]+',1,level,'m') AS part
FROM dual
CONNECT BY level <= regexp_count(t.edi_message, '[*]\S+|[^*]+')+1
) parts
) ca;
A test on db<>fiddle here
Example output:
SCRAMBLED_MESSAGE
-----------------------------------------------------------------------------------------------------------
ISA*99* *99* *92*3525999959 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925V9~
ISA*99* *99* *92*25023205502 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925W9~

comma Separated List

I have procedure that has parameter that takes comma separated value ,
so when I enter Parameter = '1,0,1'
I want to return ' one , Zero , One' ?

You could use REPLACE function.
For example,
SQL> WITH DATA(str) AS(
2 SELECT '1,0,1' FROM dual
3 )
4 SELECT str,
5 REPLACE(REPLACE(str, '0', 'Zero'), '1', 'One') new_str
6 FROM DATA;
STR NEW_STR
----- ------------------------------------------------------------
1,0,1 One,Zero,One
SQL>

This query splits list into into numbers, converts numbers into words and joins them again together with function listagg:
with t1 as (select '7, 0, 11, 132' col from dual),
t2 as (select level lvl,to_number(regexp_substr(col,'[^,]+', 1, level)) col
from t1 connect by regexp_substr(col, '[^,]+', 1, level) is not null)
select listagg(case
when col=0 then 'zero'
else to_char(to_date(col,'j'), 'jsp')
end,
', ') within group (order by lvl) col
from t2
Output:
COL
-------------------------------------------
seven, zero, eleven, one hundred thirty-two
The limitation of this solution is that values range is between 0 and 5373484 (because 5373484 is maximum value for function to_date).
If you need higher values you can find hints in this article.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract delimited data from a column into multiple rows with SQLite [duplicate] - sql

Related

REGEXP to validate a specific number

SQL How to perform multiple look-ups from a list, in one query

How to remove duplicated values from attribute

Replace string with random text - Oracle SQL

comma Separated List

Categories

Resources