SQL - Splitting a single column into multiple rows - sql

I have a record that looks like this in the database (As an example).
ID, Name, Brand
1, 'Bike', 'Schwinn'
2, 'Car', 'Ford, Honda, Chevy'
3, 'Bike', 'Schwinn, Trex'
4, 'Car', 'Honda'
I need to export the data out and create multiple records where Brand has multiple entries. I also need to increase the ID on output so I don't have duplicates. (I can use a sequence for this and would set it higher to my max value in db).
My output would look like
ID, Name, Brand
1, Bike, Schwinn
2, Car, Ford
Sequence.nextval, Car, Honda
Sequence.nextval, Car, Chevy
3, Bike, Schwinn
Sequence.nextval, Bike, Trex
4, Car, Honda
I would like to try and to this with a SQL statement. Basically I'm dumping this data as a csv file via straight SQL.
My difficulty is trying to loop/split through the Brand column.

You can use following select statement:
with test_tab (ID, Name, Brand) as (
select 1, 'Bike', 'Schwinn' from dual union all
select 2, 'Car', 'Ford, Honda, Chevy' from dual union all
select 3, 'Bike', 'Schwinn, Trex' from dual union all
select 4, 'Car', 'Honda' from dual)
--------------------
-- End of Data Preparation
--------------------
select case when level <> 1 then <your_sequece>.nextval else id end as id,
name,
trim(regexp_substr(Brand, '[^,]+', 1, level)) BRAND
from test_tab
connect by regexp_substr(Brand, '[^,]+', 1, level) is not null
and prior Brand = Brand
and prior sys_guid() is not null;
output would be:
ID NAME BRAND
---------------------
2 Car Ford
5 Car Honda
6 Car Chevy
4 Car Honda
1 Bike Schwinn
3 Bike Schwinn
7 Bike Trex
You can write Insert statement as
Insert into <destination_table>
select case when level <> 1 then <your_sequece>.nextval else id end as id,
name,
trim(regexp_substr(Brand, '[^,]+', 1, level)) BRAND
from <source_table>
connect by regexp_substr(Brand, '[^,]+', 1, level) is not null
and prior Brand = Brand
and prior sys_guid() is not null;
PS: If ID is unique, you can try replacing and prior Brand = Brand with and prior ID = ID to remove duplicate loop values.
select case when level <> 1 then <your_sequece>.nextval else id end as id,
name,
trim(regexp_substr(Brand, '[^,]+', 1, level)) BRAND
from <source_table>
connect BY regexp_substr(Brand, '[^,]+', 1, level) is not null
and prior ID = ID
and prior sys_guid() is not null;

Related

Separate house number and addition in oracle SQL

I have a problem with my Oracle SQL string and don't get the correct result.
I have a table with Housenumber and addition in one field, i.e. 16f
As result I want it in 2 Fields:
Housenumber Addition
16 f
Housenumber is a Number (1 or more digits)
Addition is a Letter
I have the same Problem with the Fields Ortsname and Ortszusatz there it works. But I can't get it with the Housenumber. the result is a duplication of my entries.
WITH TEST_DATA AS
(SELECT distinct '*' Ort, Nummer FROM adresses)
SELECT
Houseid,
Streetid,
Gemeindeschl,
Gemeinde,
Bundesland,
Landkreis,
REGEXP_SUBSTR(t.Ort, '[^,]+', 1, 1) Ort,
REGEXP_SUBSTR(t.Ort, '[^,]+', 1, 2) Ortszusatz,
Strasse,
regexp_substr(t.Nummer, '[^0-9,]',1, 1) Housenumber,
regexp_substr(t.Nummer, '[^A-Z,]',1, 2) Addition,
Objektkl,
Lng,
Lat,
Plz
FROM adresses T
As you said - fetch "digits" and "letters" separately:
SQL> with addresses (houseid, housenumber) as
2 (select 1, '16f' from dual union all
3 select 2, '20' from dual
4 )
5 select houseid,
6 regexp_substr(housenumber, '[[:digit:]]+') housenumber,
7 regexp_substr(housenumber, '[[:alpha:]]+') addition
8 from addresses;
HOUSEID HOUSENUMBER ADDITION
---------- --------------- ---------------
1 16 f
2 20
SQL>

Process distinct comma-separated value in oracle

I have a column with the following data:
Brand
-------------
Audi, Opel, Ford
Skoda, Renault
Audi, BMW
Audi, Volkswagen, Opel
Toyota, Hyundai
I would like to have query which automates assign the data into group as following:
Brand
-------------------
Audi, Opel, Ford, BMW, Volkwagen
Skoda, Renault
Toyota, Hyundai
Note that if we insert another record into the table like this ...
Toyota, BMW
... the required output would be:
Brand
-------------------
Audi, Opel, Ford, BMW, Volkwagen, Toyota, Hyundai
Skoda, Renault
This is an interesting and difficult problem, obscured by your poor data model (which violates First Normal Form). Normalizing the data - and de-normalizing at the end - is trivial, it's just an annoyance (and it will make the query much slower). The interesting part: the input groups are the nodes of a graph, two nodes are connected if they have a "make" in common. You need to find the connected components of the graph; this is the interesting problem.
Here is a complete solution (creating the testing data on the fly, in the first factored subquery in the with clause). Question for you though: even assuming that this solution works for you and you put it in production, who is going to maintain it in the future?
EDIT It occurred to me that my original query can be simplified. Here is the revised version; you can click on the Edited link below the answer if you are curious to see the original version.
with
sample_data (brand) as (
select 'Audi, Opel, Ford' from dual union all
select 'Skoda, Renault' from dual union all
select 'Audi, BMW' from dual union all
select 'Audi, Volkswagen, Opel' from dual union all
select 'Toyota, Hyundai' from dual union all
select 'Tesla' from dual
)
, prep (id, brand) as (
select rownum, brand
from sample_data
)
, fnf (id, brand) as (
select p.id, ca.brand
from prep p cross apply
( select trim(regexp_substr(p.brand, '[^,]+', 1, level)) as brand
from dual
connect by level <= regexp_count(p.brand, '[^,]+')
) ca
)
, g (b1, b2) as (
select distinct fnf1.brand, fnf2.brand
from fnf fnf1 join fnf fnf2 on fnf1.id = fnf2.id
)
, cc (rt, brand) as (
select min(connect_by_root b1), b2
from g
connect by nocycle b1 = prior b2
group by b2
)
select listagg(brand, ', ') within group (order by null) as brand
from cc
group by rt;
Output:
BRAND
---------------------------------------------
Audi, BMW, Ford, Opel, Volkswagen
Hyundai, Toyota
Renault, Skoda
Tesla
That is standard Connected components problem. You can find fast pl/sql solution for production use here: http://orasql.org/2017/09/29/connected-components/
Or in case of just educational purposes, you can use SQL-only solution:
https://gist.github.com/xtender/b6e5cac4dec461c0121145b0e62c5cf5
with t(Brand) as (
select 'Audi, Opel, Ford' brand from dual union all
select 'Skoda, Renault' from dual union all
select 'Audi, BMW' from dual union all
select 'Audi, Volkswagen, Opel' from dual union all
select 'Toyota, Hyundai' from dual union all
select 'Tesla' from dual union all
select 'A'||level||', A'||(level+1) from dual connect by level<=500 union all
select 'B'||level||', B'||(level+1) from dual connect by level<=500 union all
select 'C'||level||', C'||(level+1) from dual connect by level<=500
)
,split_tab as (
select
dense_rank()over(order by t.brand) rn
,x.*
from t,
xmltable(
'ora:tokenize(concat(",",.),",")[position()>1]'
passing t.brand
columns
n for ordinality
,name varchar2(20) path 'normalize-space(.)'
) x
)
,pairs as (
select
t1.rn, t1.name name1, t2.name name2
from split_tab t1
,split_tab t2
where t1.rn=t2.rn
)
select listagg(x,',')within group(order by x)
from (
select x, min(root) grp
from (
select distinct connect_by_root(name1) root, name1 x
from pairs
connect by nocycle
prior name1 = name2
)
group by x
)
group by grp
/
PS. I've split my solution into smallest possible steps, so you can check each CTE separately step-by-step to view how to get results.

Sort strings/words alphabetically separated by comma within a column in SQL (entire column)

Lets say that I have a table the following data:
(there are a 1000+ more rows like this)
Bird
----------------------------
Sparrow, Eagle, Crow
Woodpecker, Sparrow
Crow, Eagle
etc. etc.
I want the final column to be sorted out alphabetically. Something like this:
Bird
--------------------
Crow, Eagle, Sparrow
Sparrow, Woodpecker
Crow, Eagle
etc. etc.
Need to know a SQL query that can do that. Possibly SQL Developer.
Here is an Oracle solution using Common Table Expressions (CTEs) to break the problem down. Not sure if this will help, but maybe it will give you an idea or a starting point that you can apply to your environment.
SQL> -- Set up original data set
SQL> with bird_tbl(id, unsorted_list) as (
select 1, 'Sparrow, Eagle, Crow' from dual union all
select 2, 'Woodpecker, Sparrow' from dual union all
select 3, 'Crow, Eagle' from dual
),
-- Split the list into a row for each element
split_tbl(id, bird) as (
select id, regexp_substr(unsorted_list, '(.*?)(, |$)', 1, level, null, 1)
from bird_tbl
connect by level <= regexp_count(unsorted_list, ', ')+1
and prior id = id
and prior sys_guid() is not null
)
-- select * from split_tbl;
-- Rebuild the sorted row
select id, listagg(bird, ', ')
within group (order by bird) sorted_list
from split_tbl
group by id;
ID SORTED_LIST
---------- --------------------
1 Crow, Eagle, Sparrow
2 Sparrow, Woodpecker
3 Crow, Eagle
EDIT: Here's how to apply to your situation. Just replace <your_primary_key> with the primary key column name, <your_column_name> with the name of the column that contains the unsorted list and <your_table_name> with the name of the table.
with split_tbl(<your_primary_key>, <your_column_name>) as (
select <your_primary_key>, regexp_substr(<your_column_name>, '(.*?)(, |$)', 1, level, null, 1)
from <your_table_name>
connect by level <= regexp_count(<your_column_name>, ', ')+1
and prior <your_primary_key> = <your_primary_key>
and prior sys_guid() is not null
)
-- select * from split_tbl;
-- Rebuild the sorted row
select <your_primary_key>, listagg(<your_column_name>, ', ')
within group (order by <your_column_name>) sorted_list
from split_tbl
group by <your_primary_key>;

How do I separate and parse out data from multiple columns into separate rows (Oracle)

I have columns with multiple values delimited by a comma in each column and row. I am trying to separate them out into separate rows. If i have a null value for one of them (as shown below) I will still include the null value as long as one of the other values are still present for that particular row.
What I'm given
First_Name (John, ,Phil)
Last_Name (Smith,No, )
Location (CA,GA,NY)
What I want
(John, Smith, CA)
( , No, GA)
(Phil, ,NY)
I've tried using the regexp_substr method but it's not returning any rows that have a null in any one of the 3 columns listed above.
with
inputs ( id, first_name, last_name, location ) as (
select 101, 'John,,Phil' , 'Smith,No,' , 'CA,GA,NY' from dual union all
select 102, 'Jo,Al,Ed,Li', 'Ng,Tso,,Roth', ',ZZ,,BB' from dual
)
-- End of simulated inputs (for testing only, not part of the solution).
-- SQL query begins BELOW THIS LINE. Use your actual table and column names.
select id,
regexp_substr(first_name, '([^,]*)(,|$)', 1, level, null, 1) as first_name,
regexp_substr(last_name , '([^,]*)(,|$)', 1, level, null, 1) as last_name,
regexp_substr(location , '([^,]*)(,|$)', 1, level, null, 1) as location
from inputs
connect by level <= regexp_count(first_name, ',') + 1
and prior id = id
and prior sys_guid() is not null
;
ID FIRST_NAME LAST_NAME LOCATION
---- ----------- ------------ --------
101 John Smith CA
101 No GA
101 Phil NY
102 Jo Ng
102 Al Tso ZZ
102 Ed
102 Li Roth BB
You can try something like this.
SET SERVEROUTPUT ON;
DECLARE
TYPE etype IS TABLE OF VARCHAR2(100);
erec etype;
BEGIN
for rec IN ( SELECT first_name,last_name,location FROM Table1 )
LOOP
WITH fname
AS (SELECT LEVEL lvl,
REGEXP_SUBSTR(rec.first_name, '[^,]+', 1, LEVEL)First_name
FROM DUAL
CONNECT BY REGEXP_SUBSTR(rec.first_name, '[^,]+', 1, LEVEL) IS NOT NULL),
lname
AS (SELECT LEVEL lvl,
REGEXP_SUBSTR(rec.last_name, '[^,]+', 1, LEVEL)Last_Name
FROM DUAL
CONNECT BY REGEXP_SUBSTR(rec.last_name, '[^,]+', 1, LEVEL) IS NOT NULL),
loc
AS (SELECT LEVEL lvl,
REGEXP_SUBSTR(rec.location, '[^,]+', 1, LEVEL)Location
FROM DUAL
CONNECT BY REGEXP_SUBSTR(rec.location, '[^,]+', 1, LEVEL) IS NOT NULL)
SELECT first_name
||','
|| last_name
||','
|| location BULK COLLECT INTO erec
FROM fname fn
FULL OUTER join lname ln
ON fn.lvl = ln.lvl
FULL OUTER join loc lo
ON ln.lvl = lo.lvl;
FOR i IN 1..erec.COUNT
LOOP
DBMS_OUTPUT.PUT_LINE(erec(i));
END LOOP;
END LOOP;
END;
/

PRIOR in SELECT list

I can't understand what it adds to the result of the query. From the book that I'm learning:
If you prefix a column name with PRIOR in the
select list (SELECT PRIOR EMPLOYEE_ID, ...), you specify the “prior” row’s value.
SELECT PRIOR EMPLOYEE_ID, MANAGER_ID, LPAD(' ', LEVEL * 2) || EMPLOYEES.JOB_ID
FROM EMPLOYEES
START WITH EMPLOYEE_ID = 100
CONNECT BY PRIOR EMPLOYEE_ID = MANAGER_ID;
The only difference I see, is that it adds a NULL value in the first row and increments IDs of employees by 1.
PRIOR just takes a record from a previous record in the traversed hierarchy.
I think the best way to undestand how it works is to play with a simple hierarchy:
create table qwerty(
id int,
name varchar2(100),
parent_id int
);
insert all
into qwerty values( 1, 'Grandfather', null )
into qwerty values( 2, 'Father', 1 )
into qwerty values( 3, 'Son', 2 )
into qwerty values( 4, 'Grandson', 3 )
select 1234 from dual;
The below query traverses the above hierarchy:
select level, t.*
from qwerty t
start with name = 'Grandfather'
connect by prior id = parent_id
LEVEL ID NAME PARENT_ID
---------- ---------- -------------------- ----------
1 1 Grandfather
2 2 Father 1
3 3 Son 2
4 4 Grandson 3
If we add "PRIOR name" to the above query, then the name of "parent" is displayed. This vaue is taken from prevoius record in the hierarchy (from LEVEL-1)
select level, prior name as parent_name, t.*
from qwerty t
start with name = 'Grandfather'
connect by prior id = parent_id;
LEVEL PARENT_NAME ID NAME PARENT_ID
---------- -------------------- ---------- -------------------- ----------
1 1 Grandfather
2 Grandfather 2 Father 1
3 Father 3 Son 2
4 Son 4 Grandson 3
PRIOR operator returns previous value in a hierarchy build using CONNECT BY clause.
WITH hierarchy(id, parent_id, value) AS (
SELECT 1, NULL, 'root' FROM dual UNION ALL
SELECT 2, 1, 'child 1' FROM dual UNION ALL
SELECT 3, 1, 'child 2' FROM dual UNION ALL
SELECT 4, 3, 'grand child 1' FROM dual
)
SELECT
hierarchy.*, LEVEL depth, PRIOR value
FROM
hierarchy
START WITH
parent_id IS NULL
CONNECT BY
PRIOR id = parent_id
This simple query connects the rows from root to leafs. The PRIORVALUE column returns value of VALUE column of row's parent row (predecessor within the hierarchy), so 'grand child 1' parent is 'child 2' or 'child 1' parent is 'root'. 'root', the first row within the hierarchy (LEVEL = 1) doesn't have any parent therefore PRIOR returns NULL.
If you connect the hierarchy in opposite direction, from a leaf to the root, the PRIOR operator will return child row that was used to connect the row you're looking at.
The LEVEL column shows the depth of specific row within the hierarchy.