Using Oracle combine three tables to one with PIVOT - sql

I have three Oracle SQL select queries which returns following results.
First select query returns result:
user_id | user_name |
---------|-----------|
1 | user_1 |
2 | user_2 |
3 | user_3 |
4 | user_4 |
second select query returns result:
exam_id | exam_name |
---------|-----------|
1 | exam_1 |
2 | exam_2 |
3 | exam_3 |
and the third select query returns result:
exam_id | user_id | exam_date |
---------|---------|-----------|
1 | 1 | 2017 |
1 | 2 | 2018 |
1 | 3 | 2017 |
2 | 3 | 2018 |
I would like to combine these queries to get result:
user_id | user_name | exam_1 | exam_2 | exam_3 |
---------|-----------|--------|--------|--------|
1 | user_1 | 2017 | | |
2 | user_2 | 2018 | | |
3 | user_3 | 2017 | 2018 | |
4 | user_4 | | | |
I would be grateful for any help?
Thank you #shrek for helping me out here. I managed to create the variable for pivot values but couldn't put the variable in the pivot. So I got help for that here and the final version (for 11g) looks like this:
variable x REFCURSOR
DECLARE
exam_ids VARCHAR2(255);
BEGIN
SELECT
LISTAGG(''''
|| exam_id
|| ''' AS "'
|| exam_name
|| '"',',') WITHIN GROUP(
ORDER BY
exam_id ASC
)
INTO exam_ids
FROM
exam;
OPEN :x FOR 'SELECT
*
FROM
(
SELECT
u.user_id,
u.user_name,
e.exam_id,
eu.exam_date
FROM
users u
LEFT JOIN exam_user eu ON u.user_id = eu.user_id
LEFT JOIN exam e ON e.exam_id = eu.exam_id
ORDER BY
u.user_id
)
PIVOT ( MAX ( exam_date )
FOR exam_id
IN ( ' || EXAM_IDS || ' )
)
ORDER BY
1';
END;
/
print x
So that works in the SQL Developer and SQL*Plus. But not when trying to use database with PHP file. For that I needed to create procedure which then could be called from PHP file. Here is problem when trying to use code above from PHP file and the resolution.

This should get you going -
CREATE TABLE users
(user_id varchar2(9), user_name varchar2(11))
;
INSERT ALL
INTO users (user_id, user_name)
VALUES ('1', 'user_1')
INTO users (user_id, user_name)
VALUES ('2', 'user_2')
INTO users (user_id, user_name)
VALUES ('3', 'user_3')
INTO users (user_id, user_name)
VALUES ('4', 'user_4')
SELECT * FROM dual
;
CREATE TABLE exam
(exam_id varchar2(9), exam_name varchar2(11))
;
INSERT ALL
INTO exam (exam_id, exam_name)
VALUES ('1', 'exam_1')
INTO exam (exam_id, exam_name)
VALUES ('2', 'exam_2')
INTO exam (exam_id, exam_name)
VALUES ('3', 'exam_3')
SELECT * FROM dual
;
CREATE TABLE exam_user
(exam_id varchar2(9), user_id varchar2(9), exam_date varchar2(11))
;
INSERT ALL
INTO exam_user (exam_id, user_id, exam_date)
VALUES ('1', '1', '2017')
INTO exam_user (exam_id, user_id, exam_date)
VALUES ('1', '2', '2018')
INTO exam_user (exam_id, user_id, exam_date)
VALUES ('1', '3', '2017')
INTO exam_user (exam_id, user_id, exam_date)
VALUES ('2', '3', '2018')
SELECT * FROM dual
;
Query -
SELECT * FROM (
SELECT U.USER_ID, U.USER_NAME, E.EXAM_NAME,EU.EXAM_DATE
FROM USERS U, EXAM E, EXAM_USER EU
WHERE U.USER_ID = EU.USER_ID(+)
AND E.EXAM_ID(+) = EU.EXAM_ID
ORDER BY U.USER_ID
)
PIVOT (MAX(EXAM_DATE) FOR EXAM_NAME IN ('exam_1' as exam_1, 'exam_2' as exam_2,'exam_3' as exam_3))
order by 1
;
Output -
USER_ID USER_NAME EXAM_1 EXAM_2 EXAM_3
1 user_1 2017 (null) (null)
2 user_2 2018 (null) (null)
3 user_3 2017 2018 (null)
4 user_4 (null) (null) (null)

Related

How do I coalesce NULLs across multiple rows in BigQuery?

I have the following table:
Date |event_number| customer_id1 | customer_age | customer_gender
10/01/2020 | 1 | abc | NULL | NULL
10/01/2020 | 2 | abc | NULL | male
10/01/2020 | 3 | abc | 45 | NULL
10/01/2020 | 1 | def | 30 | NULL
I want to run a SQL query each day to look for new combinations of custom_id1, customer_age, customer_gender.
Output should look like this:
query_run_time | customer_id1 | customer_age | customer gender
11/01/2020 | abc | 45 | male
11/01/2020 | def | 30 | NULL
Query run time is the date the query was run. If the combination (customer_id, custmer_age, customer_gender) is already in the table I don't want to insert the row.
Thanks
You can use window functions to assign internal row numbers for merge multiple queries, e.g. like this:
SELECT COALESCE(a.customer_id, b.customer_id) as customer_id
, customer_age
, customer_gender
FROM (
SELECT customer_id, customer_age
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_age ) AS row_no
FROM customer_event
WHERE customer_age IS NOT NULL
) a
FULL JOIN (
SELECT customer_id, customer_gender
, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY customer_gender ) AS row_no
FROM customer_event
WHERE customer_gender IS NOT NULL
) b ON b.customer_id = a.customer_id
AND b.row_no = a.row_no
ORDER BY COALESCE(a.customer_id, b.customer_id)
, COALESCE(a.row_no, b.row_no)
Schema and Test Data
CREATE TABLE customer_event (
event_number INT NOT NULL,
customer_id VARCHAR(10) NOT NULL,
customer_age INT,
customer_gender VARCHAR(10)
);
INSERT INTO customer_event VALUES
( 1, 'abc', NULL, NULL ),
( 2, 'abc', NULL, 'male' ),
( 3, 'abc', 45 , NULL ),
( 4, 'abc', 50 , 'female' ),
( 5, 'abc', 27 , NULL ),
( 1, 'def', 30 , NULL );
Output
customer_id customer_age customer_gender
abc 27 female
abc 45 male
abc 50 (null)
def 30 (null)
The above is from testing with PostgreSQL 9.6 on SQL Fiddle.
Use Window function
SELECT query_run_time, customer_id, MAX(customer_age) customer_age,
MAX(customer_gender)customer_gender
FROM tbl
GROUP BY query_run_time, customer_id
FIDDLE DEMO
Output
query_run_time | customer_id1 | customer_age | customer gender
11/01/2010 | abc | 45 | male
11/01/2020 | def | 30 | NULL
I suspect that what you really want is the most recent value for each column. Here is one method:
select date, customerid1,
array_agg(customer_age ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as age,
array_agg(customer_gender ignore nulls order by event_number desc limit 1)[safe_ordinal(1) as gender
from t
group by date, customerid1;

Get records having the same value in 2 columns but a different value in a 3rd column

I am having trouble writing a query that will return all records where 2 columns have the same value but a different value in a 3rd column. I am looking for the records where the Item_Type and Location_ID are the same, but the Sub_Location_ID is different.
The table looks like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 1 | 00001 | 20 | 78 |
| 2 | 00001 | 110 | 124 |
| 3 | 00001 | 110 | 124 |
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
The result I am trying to get would look like this:
+---------+-----------+-------------+-----------------+
| Item_ID | Item_Type | Location_ID | Sub_Location_ID |
+---------+-----------+-------------+-----------------+
| 4 | 00002 | 3 | 18 |
| 5 | 00002 | 3 | 25 |
+---------+-----------+-------------+-----------------+
I have been trying to use the following query:
SELECT *
FROM Table1
WHERE Item_Type IN (
SELECT Item_Type
FROM Table1
GROUP BY Item_Type
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
)
But it returns all records with the same Item_Type and a different Sub_Location_ID, not all records with the same Item_Type AND Location_ID but a different Sub_Location_ID.
This should do the trick...
-- some test data...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
BEGIN DROP TABLE #TestData; END;
CREATE TABLE #TestData (
Item_ID INT NOT NULL PRIMARY KEY,
Item_Type CHAR(5) NOT NULL,
Location_ID INT NOT NULL,
Sub_Location_ID INT NOT NULL
);
INSERT #TestData (Item_ID, Item_Type, Location_ID, Sub_Location_ID) VALUES
(1, '00001', 20, 78),
(2, '00001', 110, 124),
(3, '00001', 110, 124),
(4, '00002', 3, 18),
(5, '00002', 3, 25);
-- adding a covering index will eliminate the sort operation...
CREATE NONCLUSTERED INDEX ix_indexname ON #TestData (Item_Type, Location_ID, Sub_Location_ID, Item_ID);
-- the actual solution...
WITH
cte_count_group AS (
SELECT
td.Item_ID,
td.Item_Type,
td.Location_ID,
td.Sub_Location_ID,
cnt_grp_2 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID),
cnt_grp_3 = COUNT(1) OVER (PARTITION BY td.Item_Type, td.Location_ID, td.Sub_Location_ID)
FROM
#TestData td
)
SELECT
cg.Item_ID,
cg.Item_Type,
cg.Location_ID,
cg.Sub_Location_ID
FROM
cte_count_group cg
WHERE
cg.cnt_grp_2 > 1
AND cg.cnt_grp_3 < cg.cnt_grp_2;
You can use exists :
select t.*
from table t
where exists (select 1
from table t1
where t.Item_Type = t1.Item_Type and
t.Location_ID = t1.Location_ID and
t.Sub_Location_ID <> t1.Sub_Location_ID
);
Sql server has no vector IN so you can emulate it with a little trick. Assuming '#' is illegal char for Item_Type
SELECT *
FROM Table1
WHERE Item_Type+'#'+Cast(Location_ID as varchar(20)) IN (
SELECT Item_Type+'#'+Cast(Location_ID as varchar(20))
FROM Table1
GROUP BY Item_Type, Location_ID
HAVING COUNT (DISTINCT Sub_Location_ID) > 1
);
The downsize is the expression in WHERE is non-sargable
I think you can use exists:
select t1.*
from table1 t1
where exists (select 1
from table1 tt1
where tt1.Item_Type = t1.Item_Type and
tt1.Location_ID = t1.Location_ID and
tt1.Sub_Location_ID <> t1.Sub_Location_ID
);

Identifying duplicate records in SQL along with primary key

I have a business case scenario where I need to do a lookup into our SQL "Users" table to find out email addresses which are duplicated. I was able to do that by the below query:
SELECT
user_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
user_email
HAVING
COUNT(*) > 1
ORDER BY
DuplicateEmails DESC
I get an output like this:
user_email DuplicateEmails
--------------------------------
abc#gmail.com 2
xyz#yahoo.com 3
Now I am asked to list out all the duplicate records in a single row of its own and display some additional properties like first name , last name and userID. All this information is stored in this table "Users". I am having difficulty doing so. Can anyone help me or put me toward right direction?
My output needs to look like this:
user_email DuplicateEmails FirstName LastName UserID
------------------------------------------------------------------------------
abc#gmail.com 2 Tim Lentil timLentil
abc#gmail.com 2 John Doe johnDoe12
xyz#yahoo.com 3 brian boss brianTheBoss
xyz#yahoo.com 3 Thomas Hood tHood
xyz#yahoo.com 3 Mark Brown MBrown12
There are several ways you could do this. Here is one using a cte.
with FoundDuplicates as
(
SELECT
uter_email, COUNT(*) as DuplicateEmails
FROM
Users
GROUP BY
uter_email
HAVING
COUNT(*) > 1
)
select fd.user_email
, fd.DuplicateEmails
, u.FirstName
, u.LastName
, u.UserID
from Users u
join FoundDuplicates fd on fd.uter_email = u.uter_email
ORDER BY fd.DuplicateEmails DESC
Use count() over( Partition by ), example
You can solve it like:
DECLARE #T TABLE
(
UserID VARCHAR(20),
FirstName NVARCHAR(45),
LastName NVARCHAR(45),
UserMail VARCHAR(45)
);
INSERT INTO #T (UserMail, FirstName, LastName, UserID) VALUES
('abc#gmail.com', 'Tim', 'Lentil', 'timLentil'),
('abc#gmail.com', 'John', 'Doe', 'johnDoe12'),
('xyz#yahoo.com', 'brian', 'boss', 'brianTheBoss'),
('xyz#yahoo.com', 'Thomas', 'Hood', 'tHood'),
('xyz#yahoo.com', 'Mark', 'Brown', 'MBrown12');
SELECT *, COUNT (1) OVER (PARTITION BY UserMail) MailCount
FROM #T;
Results:
+--------------+-----------+----------+---------------+-----------+
| UserID | FirstName | LastName | UserMail | MailCount |
+--------------+-----------+----------+---------------+-----------+
| timLentil | Tim | Lentil | abc#gmail.com | 2 |
| johnDoe12 | John | Doe | abc#gmail.com | 2 |
| brianTheBoss | brian | boss | xyz#yahoo.com | 3 |
| tHood | Thomas | Hood | xyz#yahoo.com | 3 |
| MBrown12 | Mark | Brown | xyz#yahoo.com | 3 |
+--------------+-----------+----------+---------------+-----------+
Use a window function like this:
SELECT u.*
FROM (SELECT u.*, COUNT(*) OVER (PARTITION BY user_email) as numDuplicateEmails
FROM Users
) u
WHERE numDuplicateEmails > 1
ORDER BY numDuplicateEmails DESC;
I think this will also work.
WITH cte (
SELECT
*
,DuplicateEmails = ROW_NUMBER() OVER (Partition BY user_email ORder by user_email)
FROM Users
)
Select * from CTE
where DuplicateEmails > 1

How to split comma-separated values into multiple rows in Oracle table

SELECT year, movietitle, director, actorname
FROM films11
WHERE actorname like '%Christina Ricci%'
order by year asc;
produces the following in ORACLE SQL Developer from the original data schema.
I want to transform the whole table so that the primary key becomes the actor name. (like in the second table)
This way the query
SELECT year, movietitle, director, actorname
FROM films11
WHERE actorname like '%Christina Ricci%'
order by year asc;
will produce only the searched item (either create a new view, or change the data schema completely.) (third table)
Step 1 : "How to blow up a database"
From :
SQL Fiddle
Oracle 11g R2 Schema Setup:
Query 1:
select * from films11
Results:
| YEAR | DIRECTOR | MOVIETITLE | ACTORNAME |
|------|----------|------------|----------------|
| 2000 | dir1 | title1 | act1,act2 |
| 2001 | dir2 | title2 | act1,act2,act3 |
| 2002 | dir1 | title3 | act4 |
Query 2:
select YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
from films11 YT
join (select level as lvl
from dual
connect by level <= (select max(regexp_count(actorname,',')+1) from films11)
) lvl on lvl.lvl <= regexp_count(YT.actorname,',')+1
order by YT.year, YT.movietitle, actorname
With a nice Cartesian product :
Results:
| YEAR | MOVIETITLE | ACTORNAME |
|------|------------|-----------|
| 2000 | title1 | act1 |
| 2000 | title1 | act2 |
| 2001 | title2 | act1 |
| 2001 | title2 | act2 |
| 2001 | title2 | act3 |
| 2002 | title3 | act4 |
You run it ONCE and use it to move everything to a normalized DB
Here is the full script to change your schema to something more convenient...
CREATE TABLE actors(
id_actor NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
act_name VARCHAR2(100)
)
;
CREATE TABLE directors(
id_director NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
dir_name VARCHAR2(100)
)
;
CREATE TABLE movies(
id_movie NUMBER GENERATED BY DEFAULT ON NULL AS IDENTITY,
mov_year NUMBER,
mov_name VARCHAR2(100),
director_id NUMBER
)
;
CREATE TABLE playedby(
movie_id NUMBER,
actor_id NUMBER
)
;
INSERT INTO directors (dir_name)
SELECT DISTINCT director dir_name
FROM films11
;
INSERT INTO movies (mov_year, mov_name, director_id)
SELECT year mov_year, movietitle mov_name, directors.id_director director_id
FROM films11
INNER JOIN directors ON directors.dir_name = films11.director
;
INSERT INTO actors (act_name)
SELECT DISTINCT t.actorname act_name
FROM (
SELECT YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
FROM films11 YT
JOIN (SELECT level AS lvl
FROM dual
CONNECT BY level <= (SELECT MAX(REGEXP_COUNT(actorname,',')+1) FROM films11)
) lvl ON lvl.lvl <= REGEXP_COUNT(YT.actorname,',')+1
) t
;
INSERT INTO playedby (movie_id, actor_id)
SELECT movies.id_movie movie_id, actors.id_actor actor_id
FROM (
SELECT YT.year, YT.movietitle,
REPLACE(REGEXP_SUBSTR(YT.actorname||',','.*?,',1,lvl.lvl),',','') AS actorname
FROM films11 YT
JOIN (SELECT level AS lvl
FROM dual
CONNECT BY level <= (SELECT MAX(REGEXP_COUNT(actorname,',')+1) FROM films11)
) lvl ON lvl.lvl <= REGEXP_COUNT(YT.actorname,',')+1
) t
INNER JOIN actors ON t.actorname = actors.act_name
INNER JOIN movies ON t.year = movies.mov_year AND t.movietitle = movies.mov_name
;
After that you can just make a select like that :
Query 3:
SELECT mov_year, mov_name, dir_name, act_name
FROM movies
INNER JOIN directors ON directors.id_director = movies.director_id
INNER JOIN playedby ON movies.id_movie = playedby.movie_id
INNER JOIN actors ON playedby.actor_id = actors.id_actor
WHERE act_name like '%act2%'
order by mov_year asc
Results:
| MOV_YEAR | MOV_NAME | DIR_NAME | ACT_NAME |
|----------|----------|----------|----------|
| 2000 | title1 | dir1 | act2 |
| 2001 | title2 | dir2 | act2 |

Select rows into columns and show a flag in the column

Trying to get an output like the below:
| UserFullName | JAVA | DOTNET | C | HTML5 |
|--------------|--------|--------|--------|--------|
| Anne San | | | | |
| John Khruf | 1 | 1 | | 1 |
| Mary Jane | 1 | | | 1 |
| George Mich | | | | |
This shows the roles of a person. A person could have 0 or N roles. When a person has a role, I am showing a flag, like '1'.
Actually I have 2 blocks of code:
Block #1: The tables and a simple output which generates more than 1 rows per person.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE AvailableRoles
(
id int identity primary key,
CodeID varchar(5),
Description varchar(500),
);
INSERT INTO AvailableRoles
(CodeID, Description)
VALUES
('1', 'JAVA'),
('2', 'DOTNET'),
('3', 'C'),
('4', 'HTML5');
CREATE TABLE PersonalRoles
(
id int identity primary key,
UserID varchar(100),
RoleID varchar(5),
);
INSERT INTO PersonalRoles
(UserID, RoleID)
VALUES
('John.Khruf', '1'),
('John.Khruf', '2'),
('Mary.Jane', '1'),
('Mary.Jane', '4'),
('John.Khruf', '4');
CREATE TABLE Users
(
UserID varchar(20),
EmployeeType varchar(1),
EmployeeStatus varchar(1),
UserFullName varchar(500),
);
INSERT INTO Users
(UserID, EmployeeType, EmployeeStatus, UserFullName)
VALUES
('John.Khruf', 'E', 'A', 'John Khruf'),
('Mary.Jane', 'E', 'A', 'Mary Jane'),
('Anne.San', 'E', 'A', 'Anne San'),
('George.Mich', 'T', 'A', 'George Mich');
Query 1:
SELECT
A.UserFullName,
B.RoleID
FROM
Users A
LEFT JOIN PersonalRoles B ON B.UserID = A.UserID
WHERE
A.EmployeeStatus = 'A'
ORDER BY
A.EmployeeType ASC,
A.UserFullName ASC
Results:
| UserFullName | RoleID |
|--------------|--------|
| Anne San | (null) |
| John Khruf | 1 |
| John Khruf | 2 |
| John Khruf | 4 |
| Mary Jane | 1 |
| Mary Jane | 4 |
| George Mich | (null) |
Block #2: An attempt to convert the rows into columns to be used in the final result
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE AvailableRoles
(
id int identity primary key,
CodeID varchar(5),
Description varchar(500),
);
INSERT INTO AvailableRoles
(CodeID, Description)
VALUES
('1', 'JAVA'),
('2', 'DOTNET'),
('3', 'C'),
('4', 'HTML5');
Query 1:
SELECT
*
FROM
(
SELECT CodeID, Description
FROM AvailableRoles
) d
PIVOT
(
MAX(CodeID)
FOR Description IN (Java, DOTNET, C, HTML5)
) piv
Results:
| Java | DOTNET | C | HTML5 |
|--------|--------|-------|--------|
| 1 | 2 | 3 | 4 |
Any help in mixing both blocks to show the top output will be welcome. Thanks.
Another option without PIVOT operator is:
select u.UserFullName,
max(case when a.CodeID='1' then '1' else '' end) JAVA,
max(case when a.CodeID='2' then '1' else '' end) DOTNET,
max(case when a.CodeID='3' then '1' else '' end) C,
max(case when a.CodeID='4' then '1' else '' end) HTML5
from
Users u
LEFT JOIN PersonalRoles p on (u.UserID = p.UserID)
LEFT JOIN AvailableRoles a on (p.RoleID = a.CodeID)
group by u.UserFullName
order by u.UserFullName
SQLFiddle: http://sqlfiddle.com/#!3/630c3/19
You can try this.
SELECT *
FROM
(
select u.userfullname,
case when p.roleid is not null then 1 end as roleid,
a.description
from users u
left join personalroles p
on p.userid = u.userid
left join availableroles a
on a.codeid = p.roleid
) d
PIVOT
(
MAX(roleID)
FOR Description IN (Java, DOTNET, C, HTML5)
) piv
Fiddle