I have a column with duplicated values in single cell, please tell me how can i remove duplicated values using sql or pl/sql only.
| Test
-+--------------------------------------------------------------------
| 999999999(10145) 999999999(10145) 999999999(10145) 999999999(10145)
|--------------------------------------------------------------------
| 113307425(2) 310122174(2) 310122174(2) 113307425(2)
Use a regular expression with a back-reference to match the repeating terms:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
Query:
SELECT REGEXP_REPLACE( value, '([^ ]+)( \1)+', '\1' ) AS replaced_value
FROM test_data
Output:
| REPLACED_VALUE |
| :------------- |
| 9999999(12345) |
db<>fiddle here
Updated: For new data in the 6th edit:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL UNION ALL
SELECT '113307425(2) 310122174(2) 310122174(2) 113307425(2)' FROM DUAL;
Query:
Use a recursive sub-query factoring clause to find the terms in the string and then use DISTINCT to remove the duplicates and the LISTAGG to concatenate them back into a single string.
WITH bounds ( id, value, start_pos, end_pos ) AS (
SELECT ROWID,
value,
1,
INSTR( value, ' ', 1 )
FROM test_data
UNION ALL
SELECT id,
value,
end_pos + 1,
INSTR( value, ' ', end_pos + 1 )
FROM bounds
WHERE end_pos > 0
),
strings ( id, value ) AS (
SELECT DISTINCT
id,
CASE end_pos
WHEN 0
THEN SUBSTR( value, start_pos )
ELSE SUBSTR( value, start_pos, end_pos - start_pos )
END
FROM bounds
)
SELECT LISTAGG( value, ' ' ) WITHIN GROUP ( ORDER BY value ) AS unique_values
FROM strings
GROUP BY id
Output:
| UNIQUE_VALUES |
| :------------------------ |
| 9999999(12345) |
| 113307425(2) 310122174(2) |
db<>fiddle here
Oracle allows for recursive subquery factoring that can be harnessed to apply regexp based substitutions repeatedly:
CREATE TABLE test_data ( value ) AS
SELECT '9999999(12345) 9999999(12345) 9999999(12345) 9999999(12345)' FROM DUAL;
WITH rep(n,s,n_maxrep) AS (
SELECT 1
, value
, 1 + LENGTH(REGEXP_REPLACE(value, '[^ ]', ''))
FROM test_data
UNION ALL
SELECT n+1
, REGEXP_REPLACE ( s, '([^ ]+)(( [^ ]+)*)( \1)+', '\1\2' )
, n_maxrep
FROM rep
WHERE n <= n_maxrep
)
SELECT s FROM rep WHERE n = n_maxrep;
Explanation
The query repeatedly applies the same basic regex-based replacement of a single verb duplicate. to the original column. A 'verb' in this context is the maximal sequence of consecutive non-space chars. The duplicates may be next to each other or be separated by other verbs.
The maximum possible number of such replacements is known beforehand: n-1 for n verbs, when all verbs are identical. This is equivalent to the number of occurrences of the separating character in the original value.
Everything else is syntactic sugar. Oracle builds the nested chain of subqueries on its own.
Note that the limit n_maxrep is actually 1 + <number_of_separator_occurrences>. This is necessary as the base case ( n=1 ) does no replacement.
Related
I'm trying to return the list of "words" (separated by spaces) containing a certain substring within a string as part of an Oracle Sql query. Would like to return the result as a comma separated list. Separate rows for each match would also work.
Example String in [text_col] field:
some words 123-asdf-789A and also this one 456-asdf-555A more words etc.
Desired result: 123-asdf-789A, 456-asdf-555A
This is what I have so far but it only returns the first result and the fact that it's two separate regular expressions makes it difficult to concatenate all matches as I would like to do.
CONCAT(REGEXP_SUBSTR(text_col, ''(([^[:space:]]+)\asdf)'', 1, 1, ''i'', 1),
REGEXP_SUBSTR(text_col, ''\asdf([^[:space:]]+)'', 1, 1, ''i'', 1))
You can use some regexp functions together as :
with tab(str) as
(
select 'some words 123-asdf-789A and also this one 456-asdf-555A more words etc' from dual
), t as
(
select regexp_substr(str,'[^[:space:]]+',1,level) as str, level as lvl
from tab
connect by level <= regexp_count(str,'[:space:]')
)
select listagg(str,',') within group (order by lvl) as "Result"
from t
where regexp_like(str,'-');
Result
---------------------------------
123-asdf-789A,456-asdf-555A
Demo
first split by spaces (through [:space:] posix) and take the ones containing dash characters, and finally concatenate by listagg() function
Use a recursive sub-query factoring clause and iterate through all the matches concatenating the string as you go:
Oracle Setup:
CREATE TABLE test_data ( value ) AS
SELECT 'some words 123-asdf-789A and also this one 456-asdf-555A more words etc.' FROM DUAL UNION ALL
SELECT 'some words without the expected sub-string' FROM DUAL UNION ALL
SELECT 'asdf asdf-123 456-asdf 78-asdf-90' FROM DUAL
Query:
WITH matches ( value, idx, cnt, match ) AS (
SELECT value,
0,
REGEXP_COUNT( value, '\S*asdf\S*' ),
CAST( NULL AS VARCHAR2(4000) )
FROM test_data
UNION ALL
SELECT value,
idx + 1,
cnt,
CASE idx WHEN 0 THEN '' ELSE match || ' ' END
|| REGEXP_SUBSTR( value, '\S*asdf\S*', 1, idx + 1 )
FROM matches
WHERE idx < cnt
)
SELECT value, match
FROM matches
WHERE idx = cnt;
Output:
VALUE | MATCH
:----------------------------------------------------------------------- | :--------------------------------
some words without the expected sub-string | null
some words 123-asdf-789A and also this one 456-asdf-555A more words etc. | 123-asdf-789A 456-asdf-555A
asdf asdf-123 456-asdf 78-asdf-90 | asdf asdf-123 456-asdf 78-asdf-90
db<>fiddle here
I have a table table1 with 1 column - edi_value which is of type CLOB.
These are the entries:
seq edi_message
1 ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
2 ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~
GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~
ST*824*021390001*005010X186A1~
Please note - there can be varying number of lines, from 3 to 500.
What I'm looking for is the following conditions:
Ignore text before first * in each line, for every line, before the first *, it should not change. For ex. GS, ST should not change. ONLY after the first * should randomize
Replace numbers [0-9] with random numbers, for ex. if 0 is replaced with 1, then it should be 1 througout.
Replace text [A-Za-z] with random text, for ex. if A is replaced with W, then it should be replaced with W throughout
Leave special characters as is
One character/number should ONLY map to one random character/number
Output can be:
seq edi_message
1 ISA*11* *11* *13*4030111101 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130A1~
2 ISA*11* *11* *13*30234320023 *QQ*102030234 *101010*1313*U*11311*111143121*1*V*>~
GS*WE*3122000233*102030234*01101010*1313*43121*X*113111~
ST*300*101241111*113111X130W1~
How can this be achieved in Oracle SQL?
You can use translate with a helper function for generating random strings (though #LukStorms has a much neater SQL solution for that using LISTAGG), along with a method to tokenise and then re-concatenate the values into lines (I use a pure SQL method here for demonstration):
create or replace function f(p_low integer, p_high integer)
return varchar as
r varchar(2000) := '';
x integer;
begin
for i in p_low..p_high loop
x := dbms_random.value(0,length(r)+1);
r := substr(r,1,x)||chr(i)||substr(r,x+1);
end loop;
return r;
end;
/
select * from table1;
| EDI_VALUE |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*00* *00* *08*9254110060 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A1~ |
| ISA*00* *00* *08*56789876678 *ZZ*123456789 *041216*0805*U*00501*000095071*0*P*>~<br> GS*AG*5137624388*123456789*20041216*0805*95071*X*005010~<br> ST*824*021390001*005010X186A |
with t as (select f(48,57)||f(65,90) translate_chars from dual)
select (select new_value
from (select substr(sys_connect_by_path(r_line,'
'),2) new_value, connect_by_isleaf isleaf
from (select lvl
, substr(line,1,instr(line,'*')-1)||
translate(substr(line,instr(line,'*'))
,'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'
,(select translate_chars from t)) r_line
from (select level lvl
, regexp_substr(edi_value,'^.*$',1,level,'m') line
from (select table1.edi_value from dual)
connect by level <= regexp_count(edi_value,'^.*$',1,'m')))
start with lvl=1 connect by lvl=(prior lvl)+1)
where isleaf=1)
from table1;
| (SELECTNEW_VALUEFROM(SELECTSUBSTR(SYS_CONNECT_BY_PATH(R_LINE,''),2)NEW_VALUE,CONNECT_BY_ISLEAFISLEAFFROM(SELECTLVL,SUBSTR(LINE,1,INSTR(LINE,'*')-1)||TRANSLATE(SUBSTR(LINE,INSTR(LINE,'*')),'0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',(SELECTTRANSLATE_CHARSFR |
| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ISA*66* *66* *67*1935006626 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G0~ |
| ISA*66* *66* *67*32471742247 *VV*098532471 *650902*6763*K*66360*666613640*6*P*>~<br> GS*GZ*3084295877*098532471*96650902*6763*13640*I*663606~<br> ST*795*690816660*663606I072G |
db<>fiddle here
You can use CTE's with a CONNECT to generate the strings for the letters and numbers.
Then use the ordered and scrambled strings in the translate.
A CROSS APPLY can be used to REGEX split the message into parts.
Then only translate those that start with a *.
And use LISTAGG to glue the parts back together.
WITH
NUMS as
(
select
LISTAGG(n, '') WITHIN GROUP (ORDER BY n) as n_from,
LISTAGG(n, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as n_to
from (select level-1 n from dual connect by level <= 10)
),
LETTERS as
(
select
LISTAGG(c, '') WITHIN GROUP (ORDER BY c) as c_from,
LISTAGG(c, '') WITHIN GROUP (ORDER BY DBMS_RANDOM.VALUE) as c_to
from (select chr(ascii('A')+level-1 ) c from dual connect by level <= 26)
)
SELECT ca.scrambled as scrambled_message
FROM table1 t
CROSS JOIN NUMS
CROSS JOIN LETTERS
CROSS APPLY
(
SELECT LISTAGG(CASE WHEN part like '*%' then translate(part, n_from||c_from, n_to||c_to) else part end, '') WITHIN GROUP (ORDER BY lvl) as scrambled
FROM
(
SELECT
level AS lvl,
REGEXP_SUBSTR(t.edi_message,'[*]\S+|[^*]+',1,level,'m') AS part
FROM dual
CONNECT BY level <= regexp_count(t.edi_message, '[*]\S+|[^*]+')+1
) parts
) ca;
A test on db<>fiddle here
Example output:
SCRAMBLED_MESSAGE
-----------------------------------------------------------------------------------------------------------
ISA*99* *99* *92*3525999959 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925V9~
ISA*99* *99* *92*25023205502 *PP*950525023 *959595*9292*A*99299*999932909*9*J*>~
GS*WQ*2900555022*950525023*59959595*9292*32909*I*992999~
ST*255*959039999*992999I925W9~
I'm struggling to convert
a | a1,a2,a3
b | b1,b3
c | c2,c1
to:
a | a1
a | a2
a | a3
b | b1
b | b2
c | c2
c | c1
Here are data in sql format:
CREATE TABLE data(
"one" TEXT,
"many" TEXT
);
INSERT INTO "data" VALUES('a','a1,a2,a3');
INSERT INTO "data" VALUES('b','b1,b3');
INSERT INTO "data" VALUES('c','c2,c1');
The solution is probably recursive Common Table Expression.
Here's an example which does something similar to a single row:
WITH RECURSIVE list( element, remainder ) AS (
SELECT NULL AS element, '1,2,3,4,5' AS remainder
UNION ALL
SELECT
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM list
WHERE remainder IS NOT NULL
)
SELECT * FROM list;
(originally from this blog post: https://blog.expensify.com/2015/09/25/the-simplest-sqlite-common-table-expression-tutorial)
It produces:
element | remainder
-------------------
NULL | 1,2,3,4,5
1 | 2,3,4,5
2 | 3,4,5
3 | 4,5
4 | 5
5 | NULL
the problem is thus to apply this to each row in a table.
Yes, a recursive common table expression is the solution:
with x(one, firstone, rest) as
(select one, substr(many, 1, instr(many, ',')-1) as firstone, substr(many, instr(many, ',')+1) as rest from data where many like "%,%"
UNION ALL
select one, substr(rest, 1, instr(rest, ',')-1) as firstone, substr(rest, instr(rest, ',')+1) as rest from x where rest like "%,%" LIMIT 200
)
select one, firstone from x UNION ALL select one, rest from x where rest not like "%,%"
ORDER by one;
Output:
a|a1
a|a2
a|a3
b|b1
b|b3
c|c2
c|c1
Check my answer in How to split comma-separated value in SQLite?.
This will give you the transformation in a single query rather than having to apply to each row.
-- using your data table assuming that b3 is suppose to be b2
WITH split(one, many, str) AS (
SELECT one, '', many||',' FROM data
UNION ALL SELECT one,
substr(str, 0, instr(str, ',')),
substr(str, instr(str, ',')+1)
FROM split WHERE str !=''
) SELECT one, many FROM split WHERE many!='' ORDER BY one;
a|a1
a|a2
a|a3
b|b1
b|b2
c|c2
c|c1
I am new to oracle and to this forum. I have searched and found answers on how to do this with a column of just numbers but this has txt at the beginning then a sequenced number.
I have a table that has a varchar2 column named myid which has characters with a number at the end which is in order the number at the end is always 6 digits with leading zeros.
Hello_002190
Hello_002188
Bye_000187
Bye_000185
Bye_000184
Get_008133
Get_008131
Gone_001112
Gone_001110
Gone_001109
I need an Oracle SQL script that will show me all the missing rows.
The result for the above should be:
Hello_002189
Bye_000186
Get_008132
Gone_001111
Thanks in advance for the help
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( value ) AS
SELECT 'Hello_002190' FROM DUAL UNION ALL
SELECT 'Hello_002188' FROM DUAL UNION ALL
SELECT 'Bye_000187' FROM DUAL UNION ALL
SELECT 'Bye_000185' FROM DUAL UNION ALL
SELECT 'Bye_000184' FROM DUAL UNION ALL
SELECT 'Get_008133' FROM DUAL UNION ALL
SELECT 'Get_008131' FROM DUAL UNION ALL
SELECT 'Gone_001112' FROM DUAL UNION ALL
SELECT 'Gone_001110' FROM DUAL UNION ALL
SELECT 'Gone_001109' FROM DUAL;
Query 1:
WITH data ( prefix, suffix ) AS (
SELECT SUBSTR( value, 1, INSTR( value, '_' ) ),
TO_NUMBER( SUBSTR( value, INSTR( value, '_' ) + 1 ) )
FROM table_name
),
bounds ( prefix, min_suffix, max_suffix ) AS (
SELECT prefix, MIN( suffix ), MAX( suffix )
FROM data
GROUP BY prefix
)
SELECT prefix || TO_CHAR( column_value, 'FM000000' ) AS missing_value
FROM bounds b
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT b.min_suffix + LEVEL - 1
FROM DUAL
CONNECT BY b.min_suffix + LEVEL - 1 <= b.max_suffix
) AS SYS.ODCINUMBERLIST
)
)
MINUS
SELECT value FROM table_name
Results:
| MISSING_VALUE |
|---------------|
| Bye_000186 |
| Get_008132 |
| Gone_001111 |
| Hello_002189 |
Below is the interview question, can some please help me resolve it?
select 'a1b2c3d4e5f6g7' from dual;
Output is sum of given integer number(1+2+3+4+5+6+7)=28.
Any help?
Use a Regex to keep only the numbers,then connect by to add each number
With T
as (select regexp_replace('a1b2c3d4e5f6g7', '[A-Za-z]') as col from dual)
select sum(val)
From
(
select substr(col,level,1) val from t connect by level <= length(col)
)
FIDDLE
Since it is only 1 digit numbers you can use SUBSTR() to extract every other character:
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7' from dual
)
SELECT SUM( TO_NUMBER( SUBSTR( value, 2*LEVEL, 1 ) ) ) AS total
FROM data
CONNECT BY 2 * LEVEL <= LENGTH( value )
Results:
| TOTAL |
|-------|
| 28 |
However, if you have two digit numbers then you can do:
Query 2:
WITH data ( value ) AS (
select 'a1b2c3d4e5f6g7h8i9j10' from dual
)
SELECT SUM( TO_NUMBER( REGEXP_SUBSTR( value, '\d+', 1, LEVEL ) ) ) AS total
FROM data
CONNECT BY LEVEL <= REGEXP_COUNT( value, '\d+' )
Results:
| TOTAL |
|-------|
| 55 |
You can use regexp_substr to extract exactly the numbers, then just sum them:
with t as (select 'a1b2c3d4e5f6g7' expr from dual)
select sum(regexp_substr(t.expr, '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr(t.expr, '[0-9]+',1, level);
example:
select sum(regexp_substr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level)) as col
from dual
connect by level < regexp_instr('a1b2c3d4e5f6g7r22g4', '[0-9]+',1, level);
Result:
54
This solution works with numbers with more than 1 digit and it doesn't matter how many characters are between the numbers:
with t as (select 'a1b2c3d4e5f6g7' as str from dual)
select sum(to_number(regexp_substr(str,'[0-9]+',1,level)))
from t
connect by regexp_substr(str,'[0-9]+',1,level) is not null