Concat columns from multiple tables into one row without duplicates

Concat columns from multiple tables into one row without duplicates - sql

I need to concatenate two columns from diffrent tables, delimited with ";" into one row without duplicates.
Table 1:
Name
John;Sue
Table 2:
Name
Mary;John
Desired output
Names
John;Sue;Mary
I tried with :
select listagg(a.Name, ';') within group (order by a.Name) as Names
from Table1 a
join Table2 b on a.id = b.id;
but I get "ORA-01489: result of string concatenation is too long" error.
How to do that properly in Oracle?

You can do it with simple string functions:
WITH t1_positions (id, name, spos, epos) AS (
SELECT id,
name,
1,
INSTR(name, ';', 1)
FROM table1
UNION ALL
SELECT id,
name,
epos + 1,
INSTR(name, ';', epos + 1)
FROM t1_positions
WHERE epos > 0
),
t1_strings (id, item) AS (
SELECT id,
CASE epos
WHEN 0
THEN SUBSTR(name, spos)
ELSE SUBSTR(name, spos, epos - spos)
END
FROM t1_positions
),
t2_positions (id, name, spos, epos) AS (
SELECT id,
name,
1,
INSTR(name, ';', 1)
FROM table2
UNION ALL
SELECT id,
name,
epos + 1,
INSTR(name, ';', epos + 1)
FROM t2_positions
WHERE epos > 0
),
t2_strings (id, item) AS (
SELECT id,
CASE epos
WHEN 0
THEN SUBSTR(name, spos)
ELSE SUBSTR(name, spos, epos - spos)
END
FROM t2_positions
)
SELECT id,
LISTAGG(item, ';') WITHIN GROUP (ORDER BY item) AS name
FROM (SELECT * FROM t1_strings
UNION
SELECT * FROM t2_strings)
GROUP BY id;
Which, for the sample data:
CREATE TABLE Table1 (id, name) AS
SELECT 1, 'John;Sue' FROM DUAL;
CREATE TABLE Table2 (id, name) AS
SELECT 1, 'Mary;John' FROM DUAL;
Outputs:
ID
NAME
1
John;Mary;Sue
Note: you can do it with regular expressions; however, for a large dataset, it is likely to be of an order of magnitude slower.
Update
How to do that properly in Oracle?
Do not store delimited strings and store the data in first normal form (1NF):
CREATE TABLE table1 (id, name) AS
SELECT 1, 'John' FROM DUAL UNION ALL
SELECT 1, 'Sue' FROM DUAL;
CREATE TABLE table2 (id, name) AS
SELECT 1, 'Mary' FROM DUAL UNION ALL
SELECT 1, 'John' FROM DUAL;
Then the query is simply:
SELECT id,
LISTAGG(name, ';') WITHIN GROUP (ORDER BY name) AS name
FROM (SELECT * FROM table1
UNION
SELECT * FROM table2)
GROUP BY id;
db<>fiddle here

Presuming those are names and the result doesn't span over more than 4000 characters (which is the listagg limit) then one option is to do this (read comments within code):
SQL> with
2 -- sample data
3 table1 (id, name) as
4 (select 1, 'John;Sue' from dual union all
5 select 2, 'Little;Foot' from dual),
6 table2 (id, name) as
7 (select 1, 'Mary;John' from dual),
8 --
9 union_jack (id, name) as
10 -- union those two tables
11 (select id, name from table1
12 union
13 select id, name from table2
14 ),
15 distname as
16 -- distinct names
17 (select distinct
18 id,
19 regexp_substr(name, '[^;]+', 1, column_value) name
20 from union_jack cross join
21 table(cast(multiset(select level from dual
22 connect by level <= regexp_count(name, ';') + 1
23 ) as sys.odcinumberlist))
24 )
25 select id,
26 listagg(d.name, ';') within group (order by d.name) as names
27 from distname d
28 group by id;
ID NAMES
---------- ------------------------------
1 John;Mary;Sue
2 Foot;Little
SQL>
If it really spans over more than 4000 characters, switch to XMLAGG; lines #25 onward would be
25 select id,
26 rtrim(xmlagg (xmlelement (e, d.name || ';') order by d.name).extract
27 ('//text()'), ';') as names
28 from distname d
29 group by id;
ID NAMES
---------- ------------------------------
1 John;Mary;Sue
2 Foot;Little
SQL>

You can use a XML-style tecnique before applying LISTAGG() in order to provide writing distinct names such as
WITH t AS
(
SELECT RTRIM(DBMS_XMLGEN.CONVERT(
XMLAGG(
XMLELEMENT(e,name||';')
).EXTRACT('//text()').GETCLOBVAL() ,1),
';') AS name
FROM ( SELECT t1.name||';'||t2.name AS name
FROM table1 t1 JOIN table2 t2 ON t1.id=t2.id )
)
SELECT LISTAGG(REGEXP_SUBSTR(name,'[^;]+',1,level),';')
WITHIN GROUP (ORDER BY 0) AS "Names"
FROM t
CONNECT BY level <= REGEXP_COUNT(name,';')
Demo

Related

Remove duplicate from strings in sql [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
I have 2 cols
ID Value
ab^bc^ab^de
mn^mn^op
I want the output as
ID Value
ab^bc^de
mn^op
Can someone please help me in this.✋ I have around 500 rows in the table.
I tried using stuff and other ways but errors are popping up.

You can use a recursive query and simple string functions (which is much faster than regular expressions, but a little more to type) to split the string and then, in later Oracle versions, can re-aggregate it using LISTAGG(DISTINCT ...:
WITH bounds ( rid, value, spos, epos ) AS (
SELECT ROWID, value, 1, INSTR(value, '^', 1)
FROM table_name
UNION ALL
SELECT rid, value, epos + 1, INSTR(value, '^', epos + 1)
FROM bounds
WHERE epos > 0
)
SELECT LISTAGG(
DISTINCT
CASE epos
WHEN 0
THEN SUBSTR(value, spos)
ELSE SUBSTR(value, spos, epos - spos)
END,
'^'
) WITHIN GROUP (ORDER BY spos) AS unique_values
FROM bounds
GROUP BY rid;
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'mn^mn^op' FROM DUAL UNION ALL
SELECT 'ab^bc^ab^de' FROM DUAL UNION ALL
SELECT 'one^two^three^one^two^one^four' FROM DUAL;
Outputs:
UNIQUE_VALUES
ab^bc^de
mn^op
ab^bc^de
one^two^three^four
If you are using earlier versions or Oracle that do not support DISTINCT in the LISTAGG then you can aggregate twice:
WITH bounds ( rid, value, spos, epos ) AS (
SELECT ROWID, value, 1, INSTR(value, '^', 1)
FROM table_name
UNION ALL
SELECT rid, value, epos + 1, INSTR(value, '^', epos + 1)
FROM bounds
WHERE epos > 0
),
words (rid, word, spos) AS (
SELECT rid,
CASE epos
WHEN 0
THEN SUBSTR(value, spos)
ELSE SUBSTR(value, spos, epos - spos)
END,
spos
FROM bounds
),
unique_words ( rid, word, spos ) AS (
SELECT rid,
word,
MIN(spos)
FROM words
GROUP BY rid, word
)
SELECT LISTAGG(word, '^') WITHIN GROUP (ORDER BY spos) AS unique_values
FROM unique_words
GROUP BY rid;
Which gives the same output.
fiddle

For example:
Sample data:
SQL> with
2 test (col) as
3 (select 'ab^bc^ab^de' from dual union all
4 select 'mn^mn^op' from dual
5 ),
Split values into rows:
6 temp as
7 (select
8 col,
9 regexp_substr(col, '[^\^]+', 1, column_value) val,
10 column_value lvl
11 from test cross join
12 table(cast(multiset(select level from dual
13 connect by level <= regexp_count(col, '\^') + 1
14 ) as sys.odcinumberlist))
15 )
Aggregate them back, using only distinct values:
16 select col,
17 listagg(val, '^') within group (order by lvl) as result
18 from (select col, val, min(lvl) lvl
19 from temp
20 group by col, val
21 )
22 group by col;
COL RESULT
----------- --------------------
ab^bc^ab^de ab^bc^de
mn^mn^op mn^op
SQL>

Other solutions if your ORACLE version is recent enough to have LISTAGG DISTINCT:
with data(s) as (
select 'ab^bc^ab^de' from dual union all
select 'mn^mn^op' from dual
),
splitted(s, l, r) as (
select s, level, regexp_substr(s,'[^\^]+',1,level) from data
connect by regexp_substr(s,'[^\^]+',1,level) is not null and s = prior s and prior sys_guid() is not null
)
select s, listagg(distinct r, '^') within group(order by l) as r from splitted
group by s
;
And better if you have a PK, use it:
with data(id, s) as (
select 1, 'ab^bc^ab^de' from dual union all
select 2, 'mn^mn^op' from dual
),
splitted(id, l, r) as (
select id, level, regexp_substr(s,'[^\^]+',1,level) from data
connect by regexp_substr(s,'[^\^]+',1,level) is not null and id = prior id and prior sys_guid() is not null
)
select id, listagg(distinct r, '^') within group(order by l) as r from splitted
group by id
;

For the fun, using XML:
with data(s) as (
select 'ab^bc^ab^de' from dual union all
select 'mn^mn^op' from dual
)
select *
from data,
xmltable(
q'{string-join( for $atom in distinct-values((ora:tokenize($X,"\^"))) order by $atom return $atom, "^" )}'
passing s as "X"
columns
column_value varchar2(64) path '.'
)
;
(or fn:tokenize, depending on the DB version)

How can I eliminate duplicate data in multiple columns query

I just asked the question about how I eliminate duplicate data in a column
How can I eliminate duplicate data in column
this code below can delete duplicates in a column
with data as
(
select 'apple, apple, apple, apple' col from dual
)
select listagg(col, ',') within group(order by 1) col
from (
select distinct regexp_substr(col, '[^,]+', 1, level) col
from data
connect by level <= regexp_count(col, ',')
)
next question is
now I do not know how to eliminate data in multiple columns
select 'apple, apple, apple' as col1,
'prince,prince,princess' as col2,
'dog, cat, cat' as col3
from dual;
I would like to show
COL1 COL2 COL3
----- ---------------- --------
apple prince, princess dog, cat

You may use such a combination :
select
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('apple, apple, apple','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('apple, apple, apple',',') + 1
)
) as str1,
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('prince,prince,princess','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('prince,prince,princess',',') + 1
)
) as str2,
(
select listagg(str,',') within group (order by 0)
from
(
select distinct trim(regexp_substr('dog, cat, cat','[^,]+', 1, level)) as str
from dual
connect by level <= regexp_count('dog, cat, cat',',') + 1
)
) as str3
from dual;
STR1 STR2 STR3
------ --------------- --------
apple prince,princess cat,dog
Rextester Demo

Oracle SQL Replace multiple characters in different positions

I'm using Oracle 11g and I'm having trouble replacing multiple characters based on positions mentioned in a different table. For example:
Table 1
PRSKEY POSITION CHARACTER
123 3 ć
123 9 ć
Table 2
PRSKEY NAME
123 Becirovic
I have to replace the NAME in Table 2 to Bećirović.
I've tried regexp_replace but this function doesn't provide replacing more then 1 position, is there an easy way to fix this?

Here's another way to do it.
with tab1 as (select 123 as prskey, 3 as position, 'ć' as character from dual
union select 123, 9, 'ć' from dual),
tab2 as (select 123 as prskey, 'Becirovic' as name from dual)
select listagg(nvl(tab1.character, namechar)) within group(order by lvl)
from
(select prskey, substr(name, level, 1) as namechar, level as lvl
from tab2
connect by level <= length(name)
) splitname
left join tab1 on position = lvl and tab1.prskey = splitname.prskey
;

Simple solution using cursor ...
create table t1 (
prskey int,
pos int,
character char(1)
);
create table t2
(
prskey int,
name varchar2(100)
);
insert into t1 values (1, 1, 'b');
insert into t1 values (1, 3, 'e');
insert into t2 values (1, 'dear');
begin
for t1rec in (select * from t1) loop
update t2
set name = substr(name, 1, t1rec.pos - 1) || t1rec.character || substr(name, t1rec.pos + 1, length(name) - t1rec.pos)
where t2.prskey = t1rec.prskey;
end loop;
end;
/

I would prefer approach via PL/SQL, but in your tag only 'sql', so I made this monster:
with t as (
select 123 as id, 3 as pos, 'q' as new_char from dual
union all
select 123 as id, 6 as pos, 'z' as new_char from dual
union all
select 123 as id, 9 as pos, '1' as new_char from dual
union all
select 456 as id, 1 as pos, 'A' as new_char from dual
union all
select 456 as id, 4 as pos, 'Z' as new_char from dual
),
t1 as (
select 123 as id, 'Becirovic' as str from dual
union all
select 456 as id, 'Test' as str from dual
)
select listagg(out_text) within group (order by pos)
from(
select id, pos, new_char, str, prev, substr(str,prev,pos-prev)||new_char as out_text
from(
select id, pos, new_char, str, nvl(lag(pos) over (partition by id order by pos)+1,1) as prev
from (
select t.id, pos, new_char, str
from t, t1
where t.id = t1.id
) q
) a
) w
group by id
Result:
Beqirzvi1
AesZ

Finding sequence in data and grouping by it

Data in Phone_number column of my Temp_table looks like this
1234560200
1234560201
1234560202
2264540300
2264540301
2264540302
2264540303
2264540304
2264540305
2264540306
I want it to find sequence of last 4 digits and and find First and Last number of sequence of it. For eg.
There is sequence of first 3 rows as 0200, 0201, 0202, so First = 0200 and Last = 0202
Final Output of this query should be
First Last
0200 0202
0300 0306
I tried below query, but not sure about this approach.
WITH get_nxt_range AS
(
select substr(a.PHONE_NUMBER,7,4) range1,
LEAD(substr(a.PHONE_NUMBER,7,4)) OVER (ORDER BY a.PHONE_NUMBER ) nxt_range
from Temp_table a
)
SELECT range1,nxt_range FROM get_nxt_range
WHERE nxt_range = range1 +1
ORDER BY range1

One method to get sequences is to use the difference of row numbers approach. This works in your case as well:
select substr(phone_number, 1, 6),
min(substr(phone_number, 7, 4)), max(substr(phone_number, 7, 4))
from (select t.*,
(row_number() over (order by phone_number) -
row_number() over (partition by substr(phone_number, 1, 6) order by phone_number)
) as grp
from temp_table t
) t
group by substr(phone_number, 1, 6), grp;

I think something like this might work:
select
min (substr (phone_number, -4, 4)) as first,
max (substr (phone_number, -4, 4)) as last
from temp_table
group by
substr (phone_number, -4, 2)

SELECT DISTINCT
COALESCE(
first_in_sequence,
LAG( first_in_sequence ) IGNORE NULLS OVER ( ORDER BY phone_number )
) AS first_in_sequence,
COALESCE(
last_in_sequence,
LAG( last_in_sequence ) IGNORE NULLS OVER ( ORDER BY phone_number )
) AS last_in_sequence
FROM (
SELECT phone_number,
CASE phone_number
WHEN LAG( phone_number ) OVER ( ORDER BY phone_number ) + 1
THEN NULL
ELSE phone_number
END AS first_in_sequence,
CASE phone_number
WHEN LEAD( phone_number ) OVER ( ORDER BY phone_number ) - 1
THEN NULL
ELSE phone_number
END AS last_in_sequence
FROM temp_table
);
Update:
CREATE TABLE phone_numbers ( phone_number ) AS
select 1234560200 from dual union all
select 1234560201 from dual union all
select 1234560202 from dual union all
select 2264540300 from dual union all
select 2264540301 from dual union all
select 2264540302 from dual union all
select 2264540303 from dual union all
select 2264540304 from dual union all
select 2264540305 from dual union all
select 2264540306 from dual;
SELECT MIN( phone_number ) AS first_in_sequence,
MAX( phone_number ) AS last_in_sequence
FROM (
SELECT phone_number,
phone_number - ROW_NUMBER() OVER ( ORDER BY phone_number ) AS grp
FROM phone_numbers
)
GROUP BY grp;
Output:
FIRST_IN_SEQUENCE LAST_IN_SEQUENCE
----------------- ----------------
2264540300 2264540306
1234560200 1234560202

If 1234560201 1234560203 1234560204 are two instances then this should work:
with tt as (
select substr(PHONE_NUMBER,7,4) id from Temp_table
),
t as (
select
t1.id,
case when t3.id is null then 1 else 0 end start,
case when t2.id is null then 1 else 0 end "end"
from tt t1
-- no next adjacent element - we have an end of interval
left outer join tt t2 on t2.id - 1 = t1.id
-- not previous adjacent element - we have a start of interval
left outer join tt t3 on t3.id + 1 = t1.id
-- select starts and ends only
where t2.id is null or t3.id is null
)
-- find nearest end record for each start record (it may be the same record)
select t1.id, (select min(id) from t where id >= t1.id and "end" = 1)
from t t1
where t1.start = 1

I see guys already have answered for your question.
I just want to propose my variant how resolve this task:
with list_num (phone_number) as (
select 1234560200 from dual union all
select 1234560201 from dual union all
select 1234560202 from dual union all
select 2264540300 from dual union all
select 2264540301 from dual union all
select 2264540302 from dual union all
select 2264540303 from dual union all
select 2264540304 from dual union all
select 2264540305 from dual union all
select 2264540306 from dual)
select root as from_value,
max(phone_number) keep (dense_rank last order by lvl) as to_value
from
(select phone_number, level as lvl, CONNECT_BY_ROOT phone_number as root
from
(select phone_number,
decode(phone_number-lag (phone_number) over(order by phone_number),1,1,0) as start_value
from list_num) b
connect by nocycle phone_number = prior phone_number + 1
start with start_value = 0)
group by root
having count(1) > 1
If you need only last 4 numbers just substr it.
substr(root,7,4) as from_value,
substr(max(phone_number) keep (dense_rank last order by lvl),7,4) as to_value
Thanks.

split string into several rows

I have a table with a string which contains several delimited values, e.g. a;b;c.
I need to split this string and use its values in a query. For example I have following table:
str
a;b;c
b;c;d
a;c;d
I need to group by a single value from str column to get following result:
str count(*)
a 1
b 2
c 3
d 2
Is it possible to implement using single select query? I can not create temporary tables to extract values there and query against that temporary table.

From your comment to #PrzemyslawKruglej answer
Main problem is with internal query with connect by, it generates astonishing amount of rows
The amount of rows generated can be reduced with the following approach:
/* test table populated with sample data from your question */
SQL> create table t1(str) as(
2 select 'a;b;c' from dual union all
3 select 'b;c;d' from dual union all
4 select 'a;c;d' from dual
5 );
Table created
-- number of rows generated will solely depend on the most longest
-- string.
-- If (say) the longest string contains 3 words (wont count separator `;`)
-- and we have 100 rows in our table, then we will end up with 300 rows
-- for further processing , no more.
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select count(regexp_substr(t1.str, '[^;]+', 1, o.ocr)) as generated_for_3_rows
from t1
cross join occurrence o;
Result: For three rows where the longest one is made up of three words, we will generate 9 rows:
GENERATED_FOR_3_ROWS
--------------------
9
Final query:
with occurrence(ocr) as(
select level
from ( select max(regexp_count(str, '[^;]+')) as mx_t
from t1 ) t
connect by level <= mx_t
)
select res
, count(res) as cnt
from (select regexp_substr(t1.str, '[^;]+', 1, o.ocr) as res
from t1
cross join occurrence o)
where res is not null
group by res
order by res;
Result:
RES CNT
----- ----------
a 2
b 2
c 3
d 2
SQLFIddle Demo
Find out more about regexp_count()(11g and up) and regexp_substr() regular expression functions.
Note: Regular expression functions relatively expensive to compute, and when it comes to processing a very large amount of data, it might be worth considering to switch to a plain PL/SQL. Here is an example.

This is ugly, but seems to work. The problem with the CONNECT BY splitting is that it returns duplicate rows. I managed to get rid of them, but you'll have to test it:
WITH
data AS (
SELECT 'a;b;c' AS val FROM dual
UNION ALL SELECT 'b;c;d' AS val FROM dual
UNION ALL SELECT 'a;c;d' AS val FROM dual
)
SELECT token, COUNT(1)
FROM (
SELECT DISTINCT token, lvl, val, p_val
FROM (
SELECT
regexp_substr(val, '[^;]+', 1, level) AS token,
level AS lvl,
val,
NVL(prior val, val) p_val
FROM data
CONNECT BY regexp_substr(val, '[^;]+', 1, level) IS NOT NULL
)
WHERE val = p_val
)
GROUP BY token;
TOKEN COUNT(1)
-------------------- ----------
d 2
b 2
a 2
c 3

SELECT NAME,COUNT(NAME) FROM ( SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;b;c', '[^;]+', 1, LEVEL) IS NOT NULL))
UNION ALL (SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('b;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))
UNION ALL
(SELECT NAME FROM ( (SELECT rownum as ID, REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL ) NAME
FROM dual CONNECT BY REGEXP_SUBSTR('a;c;d', '[^;]+', 1, LEVEL) IS NOT NULL)))) GROUP BY NAME
NAME COUNT(NAME)
----- -----------
d 2
a 2
b 2
c 3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Concat columns from multiple tables into one row without duplicates - sql

Related

Remove duplicate from strings in sql [closed]

How can I eliminate duplicate data in multiple columns query

Oracle SQL Replace multiple characters in different positions

Finding sequence in data and grouping by it

split string into several rows

Categories

Resources