SQL occurrences of combinations into a matrix - sql

This seems like a rather common thing to do/query, but I'm not sure what is the best way to approach this and I cannot find similar examples. Basically I have 5 different systems where I've extracted the unique user IDs from the user log table. I want to know the overlap of users across the systems. The resulting tables are like this (user IDs are shared between systems):
sysA
-----
user1
user2
user3
sysB
-----
user2
user3
user4
user5
sysC
-----
user5
Now the output should be like this:
sysA sysB sysC sysD sysE count_distinct(userkey)
1 0 0 0 0 1
1 1 0 0 0 2
1 0 1 0 0 0
etc.
I tried doing this by using GROUP BY CUBE, which is something specific to Oracle and seemed useful in this case, but I couldn't get it to work as it seems I need to join every possible combination first in order to get the right result. Another thing I tried is this:
SELECT sysA_flag, sysB_flag, sysC_flag, sysD_flag, sysE_flag, COUNT(*)
FROM (
SELECT DISTINCT userId, 1 sysA_flag
FROM sysA_input_table
) sysA
FULL OUTER JOIN (
SELECT DISTINCT userId, 1 sysB_flag
FROM sysB_input_table
) sysB
ON sysA.userId = sysB.userId
FULL OUTER JOIN (
SELECT DISTINCT userId, 1 sysC_flag
FROM sysC_input_table
) sysC
ON sysA.userId = sysC.userId
FULL OUTER JOIN (
SELECT DISTINCT userId, 1 sysD_flag
FROM sysD_input_table
) sysD
ON sysA.userId = sysD.userId
FULL OUTER JOIN (
SELECT DISTINCT userId, 1 sysE_flag
FROM sysE_input_table
) sysE
ON sysA.userId = sysE.userId
GROUP BY (sysA_flag, sysB_flag, sysC_flag, sysD_flag, sysE_flag)
In principle this gives the right output, but doesn't give all combinations (only for sysA). It might work by expanding on this, but it seems like an inefficient way to do it.
How should this be done in a proper way?

The straightforward way should be something like this to start with. Answering the "how to do it" part of your question.
with sysA as (
select 'user1' as user_id from dual union all
select 'user2' from dual union all
select 'user3' from dual
),
sysB as
(
select 'user2' from dual union all
select 'user3' from dual union all
select 'user4' from dual union all
select 'user5' from dual
),
sysC as
(
select 'user5' from dual
), cte as (
select * from (
select t1.*,'a' sys from sysa t1
union all
select t1.*,'b' from sysb t1
union all
select t1.*,'c' from sysc t1
) x
pivot (count(*) for sys in ('a' as a,'b' as b,'c' as c))
),
combinations as (select * from (select level-1 as a from dual connect by level <= 2) t1
cross join (select level-1 as b from dual connect by level <= 2) t2
cross join (select level-1 as c from dual connect by level <= 2) t3
)
select t1.a,t1.b,t1.c,count(t2.user_id) as user_count
from combinations t1
left join cte t2 on t1.a = t2.a and t1.b = t2.b and t1.c = t2.c
group by t1.a,t1.b,t1.c
--having t1.a + t1.b + t1.c = 2
fiddle

Related

Can "value in list or list is empty" be written shorter?

Given this SQL:
select * from table1
where
table1.columnFoo = 123
and
(
some_value is null
or
some_value in (select column1 from table2 where table1.colX=table2.colY)
or
not exists (select column1 from table2 where table1.colX=table2.colY)
);
-- some_value is a constant or an input parameter in an (PL/)SQL procedure
-- if it is non null, then we want to filter by it. Except if the list selection is empty.
Is there a way to write the "in list or list is empty" part shorter?
Preferably in a way that contains the list only once (see the Don't_repeat_yourself principle )
I'm interested for Oracle SQL or PL/SQL, but other information is also welcome.
As requested, a MRE that works in SQL*Plus:
create table table1 as select 1 id, 'one' name , 12 price from dual
union select 2 , 'two' , 22 from dual
union select 3 , 'thr' , 33 from dual;
create table table2 as select 1 id1, 88 idX, sysdate-1 validDate from dual -- valid
union select 1 , 99 , sysdate+2 from dual -- these two are not valid (yet)
union select 2 , 99 , sysdate+3 from dual;
var some_value number
--exec :some_value := 3 -- uncomment for non null values
with cte as (select id1,idX from table2 where validDate<sysdate)
select * from table1
where
table1.price > 10
and
(
:some_value is null
or
:some_value in (select idX from cte where table1.id=cte.id1)
or
not exists (select idX from cte where table1.id=cte.id1)
);
From Oracle 12, you could use a LATERAL join with conditional aggregation:
SELECT t1.*
FROM table1 t1
CROSS JOIN LATERAL(
SELECT 1 AS matched
FROM table2 t2
WHERE t1.colX=t2.colY
HAVING COUNT(*) = 0
OR COUNT(CASE t2.column1 WHEN :some_value THEN 1 END) > 0
) t2
WHERE t1.columnFoo = 123
AND ( :some_value is null OR t2.matched = 1);
Or a similar technique using EXISTS:
select *
from table1
WHERE columnFoo = 123
AND ( :some_value is null
OR EXISTS(
SELECT 1
FROM table2
WHERE table1.colX=colY
HAVING COUNT(*) = 0
OR COUNT(CASE column1 WHEN :some_value THEN 1 END) > 0
)
);
db<>fiddle here

Move columns to rows

I need to add the third column to the first column (my desire is that the first column will include also the third column)
Current status:
Desired Results:
You want UNION ALL :
SELECT t.entity, t.activity
FROM table t
UNION ALL
SELECT t.entity2, t.activity2
FROM table t;
If you have a lot of data, you may not want to scan the table multiple times -- which is what union all does.
Instead:
select (case when n.n = 1 then entity
when n.n = 2 then entity_2
end) as entity,
(case when n.n = 1 then activity
when n.n = 2 then activity_2
end) as activity
from t cross join
(select 1 as n from dual union all
select 2 as n from dual
) n;
In Oracle 12C+, this is simplified using lateral joins:
select t.entity, s.activity
from t cross join lateral
(select t.entity, t.activity from dual union all
select t.entity_2, t.activity_2 from dual
) s;
select entity, activity from <table>
union all
select entity_2, activity_2 from <table>
in general:
select col1,col2 from table1
union all
select col3,col4 form table1;

Remove duplicates using only where condition

Today, i got a problem from a friend.
Problem - Write a SQL query using UNION ALL(not union) that uses the where clause to eliminate duplicates.
I can not use group by expression
I can not use unique , distinct keywords.
Input -
id(Table 1)
1
2
fk_id(Table 2)
1
1
2
I gave him the solution below query
select id from
(
select id , row_number() over(partition by id order by id) rn from
(
select id from T1
union all
select fk_ID id from T2
)
)where rn = 1;
Output -
id
1
2
which is generating unique id's.
Now suspense by him i also can not use row_number(). i just have to use where condition. i am writing query on oracle database.
Please suggest.
Thanks in advance.
From its name and the data shown, we can assume that id in table t1 is unique.
From its name and the data shown, we can assume that fk_id in table t2 is a foreign key to table1.id.
So the union of the IDs in the two tables are simply the IDs that we find in table t1.
As we are forced to use UNION ALL on the two tables, though, we can use a pseudo UNION ALL not adding anything:
select id from t1
union all
select fk_id from t2 where 1 = 2;
If t2.fk_id were not a foreign key referencing t1.id, we would use NOT EXISTS or NOT IN in the where clause instead. If this is to give a result without duplicates, however, there must be no duplicates in t2 then to start with. (As you are showing that duplicate values in t2 do exist, this approach would not work then.) Here is a query for unique values from t1 plus unique values from t2 that are not referencing the t1 values:
select id from t1
union all
select fk_id from t2 where fk_id not in (select id from t1);
In a more generic case, where you can have duplicates in both tables, this could be a way.
test data:
create table table1(id) as (
select 1 from dual union all
select 1 from dual union all
select 2 from dual union all
select 2 from dual union all
select 1 from dual
)
create table table2(fk_id) as (
select 1 from dual union all
select 1 from dual union all
select 1 from dual union all
select 3 from dual union all
select 4 from dual union all
select 1 from dual union all
select 4 from dual union all
select 2 from dual
)
query:
with tab1_union_all_tab2 as (
select 'tab1'||rownum as uniqueId, id from table1 UNION ALL
select 'tab2'||rownum , fk_id from table2
)
select id
from tab1_union_all_tab2 u1
where not exists ( select 1
from tab1_union_all_tab2 u2
where u1.id = u2.id
and u1.uniqueId < u2.uniqueId
)
result:
ID
----------
3
4
1
2
This should clarify the idea behind:
with tab1_union_all_tab2 as (
select 'tab1'||rownum as uniqueId, id from table1 UNION ALL
select 'tab2'||rownum , fk_id from table2
)
select uniqueId, id,
( select nvl(listagg ( uniqueId, ', ') within group ( order by uniqueId), 'NO DUPLICATES')
from tab1_union_all_tab2 u2
where u1.id = u2.id
and u1.uniqueId < u2.uniqueId
) duplicates
from tab1_union_all_tab2 u1
UNIQUEID ID DUPLICATES
---------- ---------- --------------------------------------------------
tab11 1 tab12, tab15, tab21, tab22, tab23, tab26
tab12 1 tab15, tab21, tab22, tab23, tab26
tab13 2 tab14, tab28
tab14 2 tab28
tab15 1 tab21, tab22, tab23, tab26
tab21 1 tab22, tab23, tab26
tab22 1 tab23, tab26
tab23 1 tab26
tab24 3 NO DUPLICATES
tab25 4 tab27
tab26 1 NO DUPLICATES
tab27 4 NO DUPLICATES
tab28 2 NO DUPLICATES
As rightly observed by Thorsten Kettner, you can easily edit this to use rowid instead of building a unique id by concatenating a string and the rownum:
with tab1_union_all_tab2 as (
select rowid uniqueId, id from table1 UNION ALL
select rowid , fk_id from table2
)
select id
from tab1_union_all_tab2 u1
where not exists ( select 1
from tab1_union_all_tab2 u2
where u1.id = u2.id
and u1.uniqueId < u2.uniqueId
)
write a where statement for the second select in the union all as where id != fk_id

Is there something equivalent to putting an order by clause in a derived table?

This is sybase 15.
Here's my problem.
I have 2 tables.
t1.jobid t1.date
------------------------------
1 1/1/2012
2 4/1/2012
3 2/1/2012
4 3/1/2012
t2.jobid t2.userid t2.status
-----------------------------------------------
1 100 1
1 110 1
1 120 2
1 130 1
2 100 1
2 130 2
3 100 1
3 110 1
3 120 1
3 130 1
4 110 2
4 120 2
I want to find all the people who's status for THEIR two most recent jobs is 2.
My plan was to take the top 2 of a derived table that joined t1 and t2 and was ordered by date backwards for a given user. So the top two would be the most recent for a given user.
So that would give me that individuals most recent job numbers. Not everybody is in every job.
Then I was going to make an outer query that joined against the derived table searching for status 2's with a having a sum(status) = 4 or something like that. That would find the people with 2 status 2s.
But sybase won't let me use an order by clause in the derived table.
Any suggestions on how to go about this?
I can always write a little program to loop through all the users, but I was gonna try to make one horrendus sql out of it.
Juicy one, no?
You could rank the rows in the subquery by adding an extra column using a window function. Then select the rows that have the appropriate ranks within their groups.
I've never used Sybase, but the documentation seems to indicate that this is possible.
With Table1 As
(
Select 1 As jobid, '1/1/2012' As [date]
Union All Select 2, '4/1/2012'
Union All Select 3, '2/1/2012'
Union All Select 4, '3/1/2012'
)
, Table2 As
(
Select 1 jobid, 100 As userid, 1 as status
Union All Select 1,110,1
Union All Select 1,120,2
Union All Select 1,130,1
Union All Select 2,100,1
Union All Select 2,130,2
Union All Select 3,100,1
Union All Select 3,110,1
Union All Select 3,120,1
Union All Select 3,130,1
Union All Select 4,110,2
Union All Select 4,120,2
)
, MostRecentJobs As
(
Select T1.jobid, T1.date, T2.userid, T2.status
, Row_Number() Over ( Partition By T2.userid Order By T1.date Desc ) As JobCnt
From Table1 As T1
Join Table2 As T2
On T2.jobid = T1.jobid
)
Select *
From MostRecentJobs As M2
Where Not Exists (
Select 1
From MostRecentJobs As M1
Where M1.userid = M2.userid
And M1.JobCnt <= 2
And M1.status <> 2
)
And M2.JobCnt <= 2
I'm using a number of features here which do exist in Sybase 15. First, I'm using common-table expressions both for my sample data and clump my queries together. Second, I'm using the ranking function Row_Number to order the jobs by date.
It should be noted that in the example data you gave, no user satisfies the requirement of having their two most recent jobs both be of status "2".
__
Edit
If you are using a version of Sybase that does not support ranking functions (e.g. Sybase 15 prior to 15.2), then you need simulate the ranking function using Counts.
Create Table #JobRnks
(
jobid int not null
, userid int not null
, status int not null
, [date] datetime not null
, JobCnt int not null
, Primary Key ( jobid, userid, [date] )
)
Insert #JobRnks( jobid, userid, status, [date], JobCnt )
Select T1.jobid, T1.userid, T1.status, T1.[date], Count(T2.jobid)+ 1 As JobCnt
From (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T1
Left Join (
Select T1.jobid, T2.userid, T2.status, T1.[date]
From #Table2 As T2
Join #Table1 As T1
On T1.jobid = T2.jobid
) As T2
On T2.userid = T1.userid
And T2.[date] < T1.[date]
Group By T1.jobid, T1.userid, T1.status, T1.[date]
Select *
From #JobRnks As J1
Where Not Exists (
Select 1
From #JobRnks As J2
Where J2.userid = J1.userid
And J2.JobCnt <= 2
And J2.status <> 2
)
And J1.JobCnt <= 2
The reason for using the temp table here is for performance and ease of reading. Technically, you could plug in the query for the temp table into the two places used as a derived table and achieve the same result.

SQL Select Condition Question

I have a quick question about a select statement condition.
I have the following table with the following items. What I need to get is the object id that matches both type id's.
TypeId ObjectId
1 10
2 10
1 11
So I need to get both object 10 because it matches type id 1 and 2.
SELECT ObjectId
FROM Table
WHERE TypeId = 1
AND TypeId = 2
Obviously this doesn't work because it won't match both conditions for the same row. How do I perform this query?
Also note that I may pass in 2 or more type id's to narrow down the results.
Self-join:
SELECT t1.ObjectId
FROM Table AS t1
INNER JOIN Table AS t2
ON t1.ObjectId = t2.ObjectId
AND t1.TypeId = 1
AND t2.TypeId = 2
Note sure how you want the behavior to work when passing in values, but that's a start.
I upvoted the answer from #Cade Roux, and that's how I would do it.
But FWIW, here's an alternative solution:
SELECT ObjectId
FROM Table
WHERE TypeId IN (1, 2)
GROUP BY ObjectId
HAVING COUNT(*) = 2;
Assuming uniqueness over TypeId, ObjectId.
Re the comment from #Josh that he may need to search for three or more TypeId values:
The solution using JOIN requires a join per value you're searching for. The solution above using GROUP BY may be easier if you find yourself searching for an increasing number of values.
This code is written with Oracle in mind. It should be general enough for other flavors of SQL
select t1.ObjectId from Table t1
join Table t2 on t2.TypeId = 2 and t1.ObjectId = t2.ObjectId
where t1.TypeId = 1;
To add additional TypeIds, you just have to add another join:
select t1.ObjectId from Table t1
join Table t2 on t2.TypeId = 2 and t1.ObjectId = t2.ObjectId
join Table t3 on t3.TypeId = 3 and t1.ObjectId = t3.ObjectId
join Table t4 on t4.TypeId = 4 and t1.ObjectId = t4.ObjectId
where t1.TypeId = 1;
Important note: as you add more joins, performance will suffer a LOT.
In regards to Bill's answer you can change it to the following to get rid of the need to assume uniqueness:
SELECT ObjectId
FROM (SELECT distinct ObjectId, TypeId from Table)
WHERE TypeId IN (1, 2)
GROUP BY ObjectId
HAVING COUNT(*) = 2;
His way of doing it scales better as the number of types gets larger.
Try this
Sample Input:(Case 1)
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 1,11
select * from #t
Sample Input:(Case 2)
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 3,10 union all select 4,10 union all
select 5,10 union all select 6,10 union all
select 1,11 union all select 2,11 union all
select 3,11 union all select 4,11 union all
select 5,11 union all select 1,12 union all
select 2,12 union all select 3,12 union all
select 4,12 union all select 5,12 union all
select 6,12
select * from #t
Sample Input:(Case 3)[Duplicate entries are there]
declare #t table(Typeid int,ObjectId int)
insert into #t
select 1,10 union all select 2,10 union all
select 1,10 union all select 2,10 union all
select 3,10 union all select 4,10 union all
select 5,10 union all select 6,10 union all
select 1,11 union all select 2,11 union all
select 3,11 union all select 4,11 union all
select 5,11 union all select 1,12 union all
select 2,12 union all select 3,12 union all
select 4,12 union all select 5,12 union all
select 6,12 union all select 3,12
For case 1, the output should be 10
For case 2 & 3, the output should be 10 and 12
Query:
select X.ObjectId from
(
select
T.ObjectId
,count(ObjectId) cnt
from(select distinct ObjectId,Typeid from #t)T
where T.Typeid in(select Typeid from #t)
group by T.ObjectId )X
join (select max(Typeid) maxcnt from #t)Y
on X.cnt = Y.maxcnt