How to split comma-separated values into multiple rows in Oracle table - sql

SELECT year, movietitle, director, actorname
FROM films11
WHERE actorname like '%Christina Ricci%'
order by year asc;
produces the following in ORACLE SQL Developer from the original data schema.
I want to transform the whole table so that the primary key becomes the actor name. (like in the second table)
This way the query
SELECT year, movietitle, director, actorname
FROM films11
WHERE actorname like '%Christina Ricci%'
order by year asc;
will produce only the searched item (either create a new view, or change the data schema completely.) (third table)

Step 1 : "How to blow up a database"
From :
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select * from films11
Results:
| YEAR | DIRECTOR | MOVIETITLE | ACTORNAME |
|------|----------|------------|----------------|
| 2000 | dir1 | title1 | act1,act2 |
| 2001 | dir2 | title2 | act1,act2,act3 |
| 2002 | dir1 | title3 | act4 |
Query 2:
select YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
from films11 YT
join (select level as lvl
from dual
connect by level <= (select max(regexp_count(actorname,',')+1) from films11)
) lvl on lvl.lvl <= regexp_count(YT.actorname,',')+1
order by YT.year, YT.movietitle, actorname
With a nice Cartesian product :
Results:
| YEAR | MOVIETITLE | ACTORNAME |
|------|------------|-----------|
| 2000 | title1 | act1 |
| 2000 | title1 | act2 |
| 2001 | title2 | act1 |
| 2001 | title2 | act2 |
| 2001 | title2 | act3 |
| 2002 | title3 | act4 |
You run it ONCE and use it to move everything to a normalized DB
Here is the full script to change your schema to something more convenient...
CREATE TABLE actors(
id_actor NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
act_name VARCHAR2(100)
)
;
CREATE TABLE directors(
id_director NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
dir_name VARCHAR2(100)
)
;
CREATE TABLE movies(
id_movie NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
mov_year NUMBER,
mov_name VARCHAR2(100),
director_id NUMBER
)
;
CREATE TABLE playedby(
movie_id NUMBER,
actor_id NUMBER
)
;
INSERT INTO directors (dir_name)
SELECT DISTINCT director dir_name
FROM films11
;
INSERT INTO movies (mov_year, mov_name, director_id)
SELECT year mov_year, movietitle mov_name, directors.id_director director_id
FROM films11
INNER JOIN directors ON directors.dir_name = films11.director
;
INSERT INTO actors (act_name)
SELECT DISTINCT t.actorname act_name
FROM (
SELECT YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
FROM films11 YT
JOIN (SELECT level AS lvl
FROM dual
CONNECT BY level <= (SELECT MAX(REGEXP_COUNT(actorname,',')+1) FROM films11)
) lvl ON lvl.lvl <= REGEXP_COUNT(YT.actorname,',')+1
) t
;
INSERT INTO playedby (movie_id, actor_id)
SELECT movies.id_movie movie_id, actors.id_actor actor_id
FROM (
SELECT YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
FROM films11 YT
JOIN (SELECT level AS lvl
FROM dual
CONNECT BY level <= (SELECT MAX(REGEXP_COUNT(actorname,',')+1) FROM films11)
) lvl ON lvl.lvl <= REGEXP_COUNT(YT.actorname,',')+1
) t
INNER JOIN actors ON t.actorname = actors.act_name
INNER JOIN movies ON t.year = movies.mov_year AND t.movietitle = movies.mov_name
;
After that you can just make a select like that :
Query 3:
SELECT mov_year, mov_name, dir_name, act_name
FROM movies
INNER JOIN directors ON directors.id_director = movies.director_id
INNER JOIN playedby ON movies.id_movie = playedby.movie_id
INNER JOIN actors ON playedby.actor_id = actors.id_actor
WHERE act_name like '%act2%'
order by mov_year asc
Results:
| MOV_YEAR | MOV_NAME | DIR_NAME | ACT_NAME |
|----------|----------|----------|----------|
| 2000 | title1 | dir1 | act2 |
| 2001 | title2 | dir2 | act2 |

Related

A better way to aggregate into a default value

For this example I have three tables (individual, business, and ind_to_business). Individual has information on people. Business has information on businesses. And ind_to_business has information on which people are linked to which business. Here are their DDL:
CREATE TABLE individual
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE business
(
ID INTEGER PRIMARY KEY,
NAME VARCHAR2(100) NOT NULL,
ENTERPRISE_ID VARCHAR2(25) NOT NULL UNIQUE
);
CREATE TABLE ind_to_business
(
ID INTEGER PRIMARY KEY,
IND_ID REFERENCES individual(id),
BUS_ID REFERENCES business(id),
START_DT DATE NOT NULL,
END_DT DATE
);
I'm looking for the best way to display one row for each person. If they are linked to one business, I want to display the the business's ENTERPRISE_ID. If they are linked to more than one business, I want to display the default value 'Multiple'. They will always be linked to a business, so there is no LEFT JOIN necessary. They can also be linked to a business more than once (Leaving and coming back). Multiple records for the same business would be aggregated.
So for the following sample data:
Individual:
+----+------------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+------------+---------------+
| 1 | John Smith | 53a23B7 |
| 2 | Jane Doe | 63f2a35 |
+----+------------+---------------+
Business:
+----+----------+---------------+
| ID | NAME | ENTERPRISE_ID |
+----+----------+---------------+
| 3 | ABC Corp | 2a34d9b |
| 4 | XYZ Inc | 34bf21e |
+----+----------+---------------+
ind_to_business
+----+--------+--------+-------------+-------------+
| ID | IND_ID | BUS_ID | START_DT | END_DT |
+----+--------+--------+-------------+-------------+
| 5 | 1 | 3 | 01-JAN-2000 | 31-DEC-2002 |
| 6 | 1 | 3 | 01-JAN-2015 | |
| 7 | 2 | 3 | 01-JAN-2000 | |
| 8 | 2 | 4 | 01-MAR-2006 | 05-JUN-2010 |
| 9 | 2 | 4 | 15-DEC-2019 | |
+----+--------+--------+-------------+-------------+
I would expect the following output:
+---------+------------+------------+
| IND_ID | NAME | LINKED_BUS |
+---------+------------+------------+
| 53a23B7 | John Smith | 2a34d9b |
| 63f2a35 | Jane Doe | Multiple |
+---------+------------+------------+
Here is my current query:
SELECT DISTINCT
sub.ind_id,
sub.name,
DECODE(sub.bus_count, 1, sub.bus_id, 'Multiple') AS LINKED_BUS
FROM (SELECT i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID,
COUNT(DISTINCT b.enterprise_id) OVER (PARTITION BY i.id) AS BUS_COUNT
FROM individual i
INNER JOIN ind_to_business i2b ON i.id = i2b.ind_id
INNER JOIN business b ON i2b.bus_id = b.id) sub;
My query works, but this is running on a large dataset and taking a long time to run. I'm wondering if anyone has any ideas on how improve this so that there isn't so much wasted processing (i.e Needing to do a DISTINCT on the final result or doing COUNT(DISTINCT) in the inline view only to use that value in the DECODE above).
I've also created a DBFiddle for this question. (Link)
Thanks in advance for any input.
You could try and use a correlated subquery. This removes the need for outer distinct:
SELECT
i.enterprise_id ind_id,
i.name,
(
SELECT DECODE(COUNT(DISTINCT b.enterprise_id), 1, MIN(bus_id), 'Multiple')
FROM ind_to_business i2b
INNER JOIN business b ON i2b.bus_id = b.id
WHERE i2b.ind_id = i.id
) linked_bus
FROM individual i
You can join with the aggregated ind_to_business per individual. One way to do this:
select i.id, i.name, coalesce(b.enterprise_id, 'Multiple')
from individual i
join
(
select
ind_id,
case when min(bus_id) = max(bus_id) then min(bus_id) else null end as bus_id
from ind_to_business
group by ind_id
) ib on ib.ind_id = i.id
left join business b on b.id = ib.bus_id
order by i.id;
First you should sub-query to get all needed dimensions and then do all your final aggregation using CASE statement.
select
ind_id,
name,
case
when count(*) > 1 then 'Multiple'
else ind_id
end as linked_bus
from
(
select
distinct i.enterprise_id as ind_id,
i.name,
b.enterprise_id as bus_id
from individual i
join ind_to_business i2b
on i.id = i2b.ind_id
join business b
on i2b.bus_id = b.id
) vals
group by
ind_id,
name
order by
ind_id
No need of using DISTINCT twice. You could use subquery factoring and put the in-line view in WITH clause, and make the data set DISTINCT in the subquery itself.
WITH data AS
(
SELECT distinct
i.enterprise_id AS IND_ID,
i.name,
b.enterprise_id AS BUS_ID
FROM individual i
JOIN ind_to_business i2b ON i.id = i2b.ind_id
JOIN business b ON i2b.bus_id = b.id
)
SELECT ind_id,
name,
case
when count(*) = 1 then MIN(bus_id)
else 'Multiple'
end AS LINKED_BUS
FROM data
GROUP BY ind_id, name;
IND_ID NAME LINKED_BUS
---------- ---------- -------------------------
53a23B7 John Smith 2a34d9b
63f2a35 Jane Doe Multiple

SQL - Join two tables and count different things

I have two tables TEILNEHMERKURS and KURS.
Kurs:
| Bezeichnung |
| ---------------- |
| Java Programming |
| Java Programming |
| Database |
and the second Table TEILNEHMERKURS
| Bezeichnung |
| ---------------- |
| Database |
| Java Programming |
| Database |
And I need a Statment to generate following output:
| Bezeichnung | Count in Table Kurs |Count in Table Teilnehmerkurs
| ---------------- |-------------------- |-----------------------------
| Database | 1 |2
| Java Programming | 2 |1
I tried following statement:
select k.bezeichnung, count(k.bezeichnung), count(tk.bezeichnung)
from kurs k
left join teilnehmerkurs tk on tk.kursnr = k.kursnr
group by k.bezeichnung;
and my actual output is:
| Bezeichnung | Count in Table Kurs |Count in Table Teilnehmerkurs
| ---------------- |-------------------- |-----------------------------
| Database | 2 |2
| Java Programming | 2 |1
You can use FULL JOIN after group byseparately.
DECLARE #Kurs TABLE ( Bezeichnung VARCHAR(50))
INSERT INTO #Kurs VALUES
('Java Programming'),
('Java Programming'),
('Database')
DECLARE #TEILNEHMERKURS TABLE( Bezeichnung VARCHAR(50))
INSERT INTO #TEILNEHMERKURS VALUES
('Database'),
('Java Programming'),
('Database')
SELECT COALESCE(K.Bezeichnung, T.Bezeichnung) Bezeichnung
, K.[Count in Table Kurs]
, T.[Count in Table Teilnehmerkurs]
FROM
(SELECT Bezeichnung, COUNT(*) [Count in Table Kurs] FROM #Kurs GROUP BY Bezeichnung ) K
FULL JOIN
(SELECT Bezeichnung, COUNT(*) [Count in Table Teilnehmerkurs] FROM #TEILNEHMERKURS GROUP BY Bezeichnung) T
ON K.Bezeichnung = T.Bezeichnung
Result:
Bezeichnung Count in Table Kurs Count in Table Teilnehmerkurs
-------------------- ------------------- -----------------------------
Database 1 2
Java Programming 2 1
This can be tricky. A pretty fool-proof method is union all and group by:
select Bezeichnung, sum(inkurs), sum(inTeilnehmerkurs)
from ((select Bezeichnung, count(*) as inkurs, 0 as inTeilnehmerkurs
from kurs
group by Bezeichnung
) union all
(select Bezeichnung, 0 as inkurs, count(*) as inTeilnehmerkurs
from Teilnehmerkurs
group by Bezeichnung
)
) kt
group by Bezeichnung;
If you use a join to bring the results together, then you have to be careful about missing Bezeichnung in either table. If you use a full outer join, then you need to use coalesce().
You could use select union for obtain all the Bezeichnung from both the table and then left join the count
select t.Bezeichnung, tk.count_in_kurs, tt.count_in_teilnehmerkurs
from (
select Bezeichnung
from Kurs
union
select Bezeichnung
from TEILNEHMERKURS)
left join (
select Bezeichnung, count(*) as count_in_kurs
from kurs
group by Bezeichnung
) tk on t.Bezeichnung = tk.Bezeichnung
left join (
select Bezeichnung, count(*) as count_in_TEILNEHMERKURS
from kurs
group by Bezeichnung
) tt on t.Bezeichnung = tt.Bezeichnung

How to Inner Join Three Tables and Return Maximum of Value in Third

I have three tables as follows:
First table:
ordenes
id_orden | date | total | id_usuario
1 |15-may|50 | 1
2 |20-may|60 | 2
Second table:
usuario
id_usuario | name | phone
1 | abc | 999
2 | def | 888
Third table:
estado
id_orden | edo
1 | c
1 | b
1 | a
2 | b
2 | a
And this is the desired result:
Results:
id_orden | date | total | id_usuario | name | phone | maxedo
1 |15-may|50 | 1 | abc | 999 | c
2 |20-may|60 | 2 | def | 888 | b
maxedo needs to be the maximum record from the edo in the third table after aggregating based on order.
How do I do this?
The below code sample gives you the result.
CREATE TABLE #ordenes(id_orden int, datevalue date, total int, id_usuario int)
INSERT INTO #ordenes
VALUES
(1,'20160515',50,1),
(2,'20160520',60,2)
CREATE TABLE #usuario(id_usuario int, name varchar(10), phone int)
INSERT INTO #usuario
VALUES
(1,'abc',999),
(2,'def',888)
CREATE TABLE #estado(id_orden int, edo char(1))
INSERT INTO #estado
VALUES
(1,'c'),
(1,'b'),
(1,'a'),
(2,'b'),
(2,'a')
SELECT id_orden,datevalue,total,id_usuario,name,phone,edo as maxedo
FROM
(SELECT o.id_orden,o.datevalue,o.total,o.id_usuario,u.name,u.phone,e.edo,ROW_NUMBER() OVER(PARTITION BY o.id_orden ORDER BY e.edo DESC) as rnk
FROM #ordenes o
JOIN #usuario u
on o.id_usuario = u.id_usuario
join #estado e
on o.id_orden = e.id_orden) as t
where rnk = 1
The following should do the job (assuming edo is actually a numeric amount). I've included aliases using the AS command so you even get the column titles you want.
SELECT
oe.id_orden AS id_orden,
oe.date AS date,
oe.total AS total,
u.id_usario AS id_usuario,
u.name AS name,
u.phone AS phone,
oe.maxedo AS maxedo
FROM usuario u
INNER JOIN
(SELECT
o.id_orden AS id_orden,
o.date AS date,
o.total AS total,
o.id_usuario AS id_usuario,
e.maxestedo AS maxestedo
FROM ordenes o
INNER JOIN
(SELECT
id_orden AS id_orden,
MAX(edo) AS maxedo
FROM estado
GROUP BY id_orden) e
ON e.id_orden=o.id_orden) oe
ON u.id_usuario=oe.id_usuario
In order of processing (which is not how SQL works but is useful way of breaking it down into steps) it goes:
Create table of Maximum edos (NB: MAX also works on alphabetical order);
Links this to ordenes using the id_ordene;
Joins this to usuario data using the id_usuario; and
Publishes this as a table in the required format.
The problem can be split into the following three steps:
Step 1: Calculate maximum edo for each id_orden in the table estado:
Select id_orden, max(edo) maxedo
From estado
Group By id_orden;
Result:
| id_orden | edo |
| 1 | c |
| 2 | b |
Step 2: Join the two tables ordenes and usuario on the key "id_usuario":
Select o.id_orden, o.date, o.total, o.id_usuario, u.name, u.phone
From ordenes o Join usuario u
On o.id_usuario = u.id_usuario;
Result:
id_orden | date | total | id_usuario | name | phone
1 |15-may|50 | 1 | abc | 999
2 |20-may|60 | 2 | def | 888
Step 3: Join the table form the step1 and step2 on the key id_orden:
Select a.id_orden, a.date, a.total, a.id_usuario, a.name, a.phone, b.maxestado
From (Select o.id_orden, o.date, o.total, o.id_usuario, u.name, u.phone
From ordenes o Inner Join usuario u
On o.id_usuario = u.id_usuario ) a
Join (Select id_orden, max(edo) maxestado
From estado
Group By id_orden) b
On a.id_orden = b.id_orden;
Result:
id_orden | date | total | id_usuario | name | phone | maxedo
1 |15-may|50 | 1 | abc | 999 | c
2 |20-may|60 | 2 | def | 888 | b
SQLFiddle example: http://sqlfiddle.com/#!5/a79a1/2
:)
I think you need to get the max(edo) from the third table and group by id_orden, yes? try this.
select temp.*, max(edo) as maxedo
from estado
inner join(
select ordenes.*,usuario.name,usuario.phone
from ordenes,usuario
where ordenes.id_usuario = usuario.id_usuario
) as temp
on temp.id_orden = estado.id_orden
group by estado.id_orden

Find Min Value and value of a corresponding column for that result

I have a table of user data in my SQL Server database and I am attempting to summarize the data. Basically, I need some min, max, and sum values and to group by some columns
Here is a sample table:
Member ID | Name | DateJoined | DateQuit | PointsEarned | Address
00001 | Leyth | 1/1/2013 | 9/30/2013 | 57 | 123 FirstAddress Way
00002 | James | 2/1/2013 | 7/21/2013 | 34 | 4 street road
00001 | Leyth | 2/1/2013 | 10/15/2013| 32 | 456 LastAddress Way
00003 | Eric | 2/23/2013 | 4/14/2013 | 15 | 5 street road
I'd like the summarized table to show the results like this:
Member ID | Name | DateJoined | DateQuit | PointsEarned | Address
00001 | Leyth | 1/1/2013 | 10/15/2013 | 89 | 123 FirstAddress Way
00002 | James | 2/1/2013 | 7/21/2013 | 34 | 4 street road
00003 | Eric | 2/23/2013 | 4/14/2013 | 15 | 5 street road
Here is my query so far:
Select MemberID, Name, Min(DateJoined), Max(DateQuit), SUM(PointsEarned), Min(Address)
From Table
Group By MemberID
The Min(Address) works this time, it retrieves the address that corresponds to the earliest DateJoined. However, if we swapped the two addresses in the original table, we would retrieve "123 FirstAddress Way" which would not correspond to the 1/1/2013 date joined.
For almost everything you can use a simple groupby, but as you need "the same address than the row where the minimum datejoined is" is a little bit tricker and you can solve it in several ways, one is a subquery searching the address each time
SELECT
X.*,
(select Address
from #tmp t2
where t2.MemberID = X.memberID and
t2.DateJoined = (select MIN(DateJoined)
from #tmp t3
where t3.memberID = X.MemberID))
FROM
(select MemberID,
Name,
MIN(DateJoined) as DateJoined,
MAX(DateQuit) as DateQuit,
SUM(PointsEarned) as PointEarned
from #tmp t1
group by MemberID,Name
) AS X
`
Or other is a subquery with a Join
SELECT
X.*,
J.Address
FROM
(select
MemberID,
Name,
MIN(DateJoined) as DateJoined,
MAX(DateQuit) as DateQuit,
SUM(PointsEarned) as PointEarned
from #tmp t1
group by MemberID,Name
) AS X
JOIN #tmp J ON J.MemberID = X.MemberID AND J.DateJoined = X.DateJoined
You could rank your rows according to the date, and select the minimal one:
SELECT t.member_id,
name,
date_joined,
date_quit,
points_earned
address AS address
FROM (SELECT member_id
name,
MIN (date_joined) AS date_joined,
MAX (date_quit) AS date_quit,
SUM (points_earned) AS points_earned,
FROM my_table
GROUP BY member_id, name) t
JOIN (SELECT member_id,
address,
RANK() OVER (PARTITION BY member_id ORDER BY date_joined) AS rk
FROM my_table) addr ON addr.member_id = t.member_id AND rk = 1
SELECT DISTINCT st.memberid, st.name, m1.datejoined, m2.datequit, SUM(st.pointsearned), m1.Address
from SAMPLEtable st
LEFT JOIN ( SELECT memberid
, name
, MIN(datejoined)
, datequit
FROM sampletable
) m1 ON st.memberid = m1.memberid
LEFT JOIN ( SELECT memberid
, name
, datejoined
, MAX(datequit)
FROM sampletable
) m2 ON m1.memberid = m2.memberid

Distinct on two columns (separately) and MIN on another column

I've tried to strip this problem down the the bare bones; I hope I've still captured the essence of what I'm trying to achieve in the original query!
Code to generate the tables and data can be found here.
SQL flavour is Microsoft SQL Server 2000 (although I've been running this stripped down test case on MySQL)
The original table
+-----------+----------+----------+
| master_id | slave_id | distance |
+-----------+----------+----------+
| 1 | 1 | 0.1 |
| 1 | 3 | 10 |
| 2 | 2 | 3 |
| 3 | 2 | 2 |
+-----------+----------+----------+
Description of what is required
I would like to select slave_id master_id pairs with MIN(distance) with no duplicates of either master_id or slave_id.
The desired results table
+-----------+----------+----------+
| master_id | slave_id | distance |
+-----------+----------+----------+
| 1 | 1 | 0.1 |
| 3 | 2 | 2 |
+-----------+----------+----------+
My Attempt
SELECT
join_table.master_id,
join_table.slave_id,
join_table.distance
FROM join_table
INNER JOIN
(
SELECT
slave_id,
MIN(distance) AS distance
FROM join_table
GROUP BY slave_id
) AS self_join
ON self_join.slave_id = join_table.slave_id
AND self_join.distance = join_table.distance
What's wrong with my attempt
This query produces duplicates of master_id
Any help will be very much appreciated.
This should give the correct result:
select distinct t.master_id,
t.slave_id,
t.distance
from join_table t
inner join
(
SELECT ID, min(Distance) dist
FROM
(
SELECT master_ID ID, MIN(distance) AS Distance
FROM join_table
GROUP BY master_ID
UNION
SELECT slave_ID ID, MIN(distance) AS Distance
FROM join_table
GROUP BY slave_ID
) src
GROUP BY ID
) md
on t.distance = md.dist
and (t.master_id = md.id or t.slave_id = md.id)
See SQL Fiddle with Demo
If I got you right, here is what I would do:
SELECT DISTINCT t.master_id
,t.slave_id
,t.distance
FROM your_table t
INNER JOIN
(
SELECT master_id id, min(distance) distance
FROM your_table
GROUP BY master_id
UNION
SELECT slave_id id, min(distance) distance
FROM your_table
GROUP BY slave_id
) sub
ON (sub.id = t.master_id AND sub.distance = t.distance)
OR (sub.id = t.slave_id AND sub.distance = t.distance)