remove duplicate values from a oracle sql query's output - sql

I have a situation where I want to remove the duplicated record from the result by using sql query in oracle 10g. I am using regular expression to remove the alphabets from the result
Original value = 1A,1B,2C,2F,4A,4z,11A,11B
Current Sql query
select REGEXP_REPLACE( tablex.column, '[A-Za-z]' , '' )
from db1
gives me the following output
1,1,2,3,4,4,11,11
how can i remove duplicate from the output to just show unique values
i.e.
1,2,3,4,11

Assuming that your table contains strings with values separated with commas.
You can try something like this:
Here is a sqlfiddle demo
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', ',</n><n>')||',</n></r>'
).extract('//n[not(preceding::n = .)]/text()').getstringval(), ',')
from tablex;
What it does is after using your regexp_replace it makes a xmltype from it and then uses XPATH to get the desired output.
If you also want to sort the values (and still use the xml approach) then you need XSL
select rtrim(xmltype('<r><n>' ||
replace(REGEXP_REPLACE( col, '[A-Za-z]' , '' ), ',', '</n><n>')||'</n></r>'
).extract('//n[not(preceding::n = .)]')
.transform(xmltype('<?xml version="1.0" ?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/"><xsl:for-each select="//n[not(preceding::n = .)]"><xsl:sort select="." data-type="number"/><xsl:value-of select="."/>,</xsl:for-each></xsl:template></xsl:stylesheet>'))
.getstringval(), ',')
from tablex;
But you can also try different approaches, such as splitting the tokens to rows and then recollecting them
select rtrim(xmlagg(xmlelement(e, n || ',') order by to_number(n))
.extract('//text()'), ',')
from(
SELECT distinct rn, trim(regexp_substr(col, '[^,]+', 1, level)) n
FROM (select row_number() over (order by col) rn ,
REGEXP_REPLACE( col, '[A-Za-z]' , '' ) col
from tablex) t
CONNECT BY instr(col, ',', 1, level - 1) > 0
)
group by rn;

Related

Split one column with weird string into multiple columns by specific delimiter in single select sql

I am seeking/hoping for a simpler solution, although I got a working solution already.
But it is hard for me to accept, that this is the only way. Therefore my hope is, that someone who is a good sql poweruser may have a better idea.
Background:
A simple table looking like that:
weirdstring
ID
A;GHL+BH;BC,NA-NB,[AB]
1
B;GHL+BH;BC,NA-NB,[AB]
2
C;GHL+BH;BC,NA-NB,[AB]
3
CREATE TABLE TESTTABLE (weirdstring varchar(MAX),
ID int);
INSERT INTO TESTTABLE
VALUES ('A;GHL+BH;BC,NA-NB,[AB]', 1);
INSERT INTO TESTTABLE
VALUES ('B;GHL+BH;BC,NA-NB,[AB]', 2);
INSERT INTO TESTTABLE
VALUES ('C;GHL+BH;BC,NA-NB,[AB]', 3);
All I need in the end is the first 3 "letter-groups" (1-3 letterst) from weirdstring (eg.ID 1 = A,GHL and BH, the rest of the string is not important now) in seperate columns:
ID
weirdstring
group1
group2
group3
1
A;GHL+BH;BC,NA-NB,[AB]
A
GHL
BH
2
B;GHL+BH;BC,NA-NB,[AB]
B
GHL
BH
3
C;GHL+BH;BC,NA-NB,[AB]
C
GHL
BH
What have been done so far is:
change all weird delimiters(;+- and potential more) in the string to comma, eliminate the brackets around "letter-groups". REPLACE daisy-chained is being used. So from A;GHL+BH;BC,NA-NB,[AB] to
A,GHL,BH,BC,NA,NB,AB first.
split the new string to columns by comma as delimiter.
The query used is:
SELECT t1.ID,
t1.weirdstring,
t2.group1,
t2.group2,
t2.group3
FROM TESTTABLE t1
LEFT JOIN (SELECT grp1.ID,
grp1.weirdstring AS group1,
grp2.weirdstring AS group2,
grp3.weirdstring AS group3
FROM (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s1
WHERE ROWNUM = 1) grp1
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s2
WHERE ROWNUM = 2) grp2 ON grp1.ID = grp2.ID
LEFT JOIN (SELECT ID,
weirdstring
FROM (SELECT ID,
weirdstring,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) AS ROWNUM
FROM (SELECT ID,
value AS weirdstring
FROM TESTTABLE
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(weirdstring, '[', ''), ']', ''), ';', ','), '+', ','), '-', ','), '.', ','), ',')
WHERE weirdstring IS NOT NULL) splitted ) s3
WHERE ROWNUM = 3) grp3 ON grp3.ID = grp2.ID) t2 ON t1.ID = t2.ID;
But I could not believe how much of a query have been created in the end for my small task. At least I believe its small. I am on an older version (14) of sql-server and therefore I cannot use string_split with its third parameter (enable-ordinal) Syntax:
STRING_SPLIT ( string , separator [ , enable_ordinal ] )
Note: https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver16 : The enable_ordinal argument and ordinal output column are currently supported in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics (serverless SQL pool only). Beginning with SQL Server 2022 (16.x) Preview, the argument and output column are available in SQL Server.
Is there some other, shorter ways to achieve the same results? I know that topic has been discussed many many times, but I could not find a solution to my specific problem here. Thanks in advance for any kind of help!
It seems that you are using SQL Server 2017 (v.14), so a possible option is the following JSON-based approach. The idea is to transform the stored text into a valid JSON array (A;GHL+BH;BC,NA-NB,[AB] into ["A","GHL","BH","BC","NA","NB","AB"]) using TRANSLATE() for character replacement and get the expected parts of the string using JSON_VALUE():
SELECT
weirdstring,
JSON_VALUE(jsonweirdstring, '$[0]') AS group1,
JSON_VALUE(jsonweirdstring, '$[1]') AS group2,
JSON_VALUE(jsonweirdstring, '$[2]') AS group3
FROM (
SELECT
weirdstring,
CONCAT('["', REPLACE(TRANSLATE(weirdstring, ';+-,[]', '######'), '#', '","'), '"]') AS jsonweirdstring
FROM TESTTABLE
) t

Split a Column with Delimited Values and Compare Each Value

I have a column that contains multiple values in a delimited(comma-separated) format -
id | code
------------
1 11,19,21
2 55,87,33
3 3,11
4 11
I want to be able to compare to each value inside the 'code' column as below -
SELECT id FROM myTbl WHERE code = '11'
This should return -
1
3
4
I've tried the solution below but it does not work for all cases -
SELECT id FROM myTbl WHERE POSITION('11' IN code) <> 0
This will work with a 2 digit number like '11' as it will return a value that is <> 0 if it finds a match. But it will fail when searching for say '3' because rows with 'id' 2 and 3 both will be returned.
Here is link that talks about the POSITION function in REDSHIFT.
Any other approach that will solve this problem?
you can get the count of this string
SELECT id FROM myTbl WHERE regexp_count(user_action, '[11]') > 0
I think we can use regexp_substr() as follow.
select tb .id from myTbl tb where '11' in (
select regexp_substr( (select code from myTbl where id=tb.id),'[^,]+', 1, LEVEL) from dual
connect by regexp_substr((select code from myTbl where id=tb.id) , '[^,]+', 1, LEVEL) is not null);
just try this.
Use split_part() function
SELECT distinct id
FROM myTbl
WHERE '11' in ( split_part( code||',' , ',', 1 ),
split_part( code||',' , ',', 2 ),
split_part( code||',' , ',', 3 ) )
This is a very, very bad data model. You should be storing this information in a junction/association table, with one row per value.
But, if you have no choice, you can use like:
SELECT id
FROM myTbl
WHERE ',' || code || ',' LIKE '%,11,%';

How to combine return results of query in one row

I have a table that save personnel code.
When I select from this table I get 3 rows result such as:
2129,3394,3508,3534
2129,3508
4056
I want when create select result combine in one row such as:
2129,3394,3508,3534,2129,3508,4056
or distinct value such as:
2129,3394,3508,3534,4056
You should ideally avoid storing CSV data at all in your tables. That being said, for your first result set we can try using STRING_AGG:
SELECT STRING_AGG(col, ',') AS output
FROM yourTable;
Your second requirement is more tricky, and we can try going through a table to remove duplicates:
WITH cte AS (
SELECT DISTINCT VALUE AS col
FROM yourTable t
CROSS APPLY STRING_SPLIT(t.col, ',')
)
SELECT STRING_AGG(col, ',') WITHIN GROUP (ORDER BY CAST(col AS INT)) AS output
FROM cte;
Demo
I solved this by using STUFF and FOR XML PATH:
SELECT
STUFF((SELECT ',' + US.remain_uncompleted
FROM Table_request US
WHERE exclusive = 0 AND reqact = 1 AND reqend = 0
FOR XML PATH('')), 1, 1, '')
Thank you Tim

Using Oracle REGEXP_SUBSTR to extract uppercase data separated by underscores

sample column data:
Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:
Error in CREATE_ACC_STATEMENT() [line 16]
I am looking for a way to extract only the uppercase words (table names) separated by underscores. I want the whole table name, the maximum is 3 underscores and the minimum is 1 underscore. I would like to ignore any capital letters that are initcap.
You can just use regexp_substr():
select regexp_substr(str, '[A-Z_]{3,}', 1, 1, 'c')
from (select 'Failure on table TOLL_USR_TRXN_HISTORY' as str from dual) x;
The pattern says to find substrings with capital letters or underscores, at least 3 characters long. The 1, 1 means start from the first position and return the first match. The 'c' makes the search case-sensitive.
You may use such a SQL Select statement for each substituted individual line
( Failure on table TOLL_USR_TRXN_HISTORY in the below case )
from your text :
select regexp_replace(q.word, '[^a-zA-Z0-9_]+', '') as word
from
(
select substr(str,nvl(lag(spc) over (order by lvl),1)+1*sign(lvl-1),
abs(decode(spc,0,length(str),spc)-nvl(lag(spc) over (order by lvl),1))) word,
nvl(lag(spc) over (order by lvl),1) lg
from
(
with tab as
( select 'Failure on table TOLL_USR_TRXN_HISTORY' str from dual )
select instr(str,' ',1,level) spc, str, level lvl
from tab
connect by level <= 10
)
) q
where lg > 0
and upper(regexp_replace(q.word, '[^a-zA-Z0-9_]+', ''))
= regexp_replace(q.word, '[^a-zA-Z0-9_]+', '')
and ( nvl(length(regexp_substr(q.word,'_',1,1)),0)
+ nvl(length(regexp_substr(q.word,'_',1,2)),0)
+ nvl(length(regexp_substr(q.word,'_',1,3)),0)) > 0
and nvl(length(regexp_substr(q.word,'_',1,4)),0) = 0;
Alternate way to get only table name from below error message , the below query will work only if table_name at end in the mentioned way
with t as( select 'Failure on table TOLL_USR_TRXN_HISTORY:' as data from dual)
SELECT RTRIM(substr(data,instr(data,' ',-1)+1),':') from t
New Query for all messages :
select replace (replace ( 'Failure on table TOLL_USR_TRXN_HISTORY:
Failure on table DOCUMENT_IMAGES:' , 'Failure on table', ' ' ),':',' ') from dual

Regexp_replace processing result

I have a string with groups of nubmers. And Id like to make constant length string. Now I use two regexp_replace. First to add 10 numbers to string and next to cut string and take last 10 values:
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace(
regexp_replace(txt, '(\d+)','0000000000\1')
,'\d+(\d{10})','\1') from s ;
But Id like to use only one regex something like
regexp_replace(txt, '(\d+)',lpad('\1',10,'0'))
But it don't work. lpad executed before regexp. Could you have any ideas?
With a slightly different approach, you can try the following:
with s(id, txt) as
(
select rownum, txt
from (
select '1030123:12031:1341' as txt from dual union all
select '1234:0123456789:1341' from dual
)
)
SELECT listagg(lpad(regexp_substr(s.txt, '[^:]+', 1, lines.column_value), 10, '0'), ':') within group (order by column_value) txt
FROM s,
TABLE (CAST (MULTISET
(SELECT LEVEL FROM dual CONNECT BY instr(s.txt, ':', 1, LEVEL - 1) > 0
) AS sys.odciNumberList )) lines
group by id
TXT
-----------------------------------
0001030123:0000012031:0000001341
0000001234:0123456789:0000001341
This uses the CONNECT BY to split every string based on the separator ':', then uses LPAD to pad to 10 and then aggregates the strings to build rows containing the concatenation of padded values
This works for non-empty sequences (e.g. 123::456)
with s(txt) as ( select '1030123:12031:1341' from dual)
select regexp_replace (regexp_replace (txt,'(\d+)',lpad('0',10,'0') || '\1'),'0*(\d{10})','\1')
from s
;