SQL Query Logic suggestion - sql

I m out with peculiar scenario, need to get an logic for writing SQl query, tried my level best but still getting struck.
I have list of companies along with corresponding set of directors. Let's assume company 'x' has 5 directors (aa,bb,cc,dd,ee). Need to find out whether any other company in list has the same set of directors (ie) (aa,bb,cc,dd,ee present in company 'z' too). Even if one director gets differed there is no need to consider it.
lets consider simple example
company director
-------------------
a xx
a yy
b zz
b xx
c xx
c yy
O/P required (Since a and c has same set of directors)
company1 company2 director
---------------------------
a c xx
a c yy
Logic tried so far:
Replicated input table for comparison, performed a simple inner join, it fetches values, real problem exists in grouping company names which is troublesome in every iteration.
Can anyone help on the same. Really thankful

I can think of a hack using listagg
with x as (
select 'A' as company, 1 as director from dual union all
select 'A' as company, 2 as director from dual union all
select 'B' as company, 1 as director from dual union all
select 'B' as company, 3 as director from dual union all
select 'B' as company, 4 as director from dual union all
select 'C' as company, 1 as director from dual union all
select 'C' as company, 2 as director from dual union all
select 'D' as company, 4 as director from dual union all
select 'E' as company, 4 as director from dual union all
select 'F' as company, 5 as director from dual union all
select 'F' as company, 4 as director from dual union all
select 'G' as company, 4 as director from dual union all
select 'G' as company, 5 as director from dual
) , companies as (
select company,
listagg(director,',') within group (order by director) directors
from x
group by company
)
select directors,
listagg(company,',') within group (order by company) as companies
from companies
group by directors
having count(*) > 1
and the result is
DIRECTORS COMPANIES
------------------------------ ------------------------------
1,2 A,C
4 D,E
4,5 F,G
P.S: if you need to fetch the data for farther manipulation and the string types are not usable you can use COLLECT instead of LISTAGG but you would have to define a custom TYPE with MAP and ORDER functions to be able to group by the collected list of values.

My SQL is a bit rusty but I'd try something like this:
SELECT c1, c2 FROM
(select company as c0, count(director) as cd0 from data group by company) ALPHA
JOIN
(select count(d.director) as d0, d.company as c1, d.director as d1, e.company as c2, e.director as d2 from data d LEFT JOIN data e ON d.director = e.director
WHERE d.company <> e.company
group by c1, c2) BETA
ON ALPHA.cd0 = BETA.d0 AND ALPHA.c0 = BETA.c1
If you SELECT * above, you'll see that either column d1 or d2 gives you xx but not both xx and yy in the result set. However, it should be straightforward enough to write an outer query that gives you that.
Explanation: I'm computing the join of a company's directors against other company's directors and checking against the count of directors for the company itself. In your example:
Company a has 2 directors, and both match against only company c.
Company b has 2 directors and no matches for both against companies a and c.
Company c has 2 directors and both match against company a.
This makes a and c a perfect match.
I tested against some more dummy data, as well as companies with 3 or 4 directories and it worked on those too.

How about this:
WITH dirlist AS (
SELECT company, LISTAGG(director, ',') WITHIN GROUP (ORDER BY director) directors
FROM companies
)
SELECT DISTINCT c1.company, c2.company, c1.directors
FROM dirlist c1, dirlist c2
WHERE c1.directors=c2.directors;
Edit: I can see now that I haven't answered your question exactly. This query only detects matches (without eliminating duplicates and selfjoins). Hope it help a bit.

A slight update on the answer by GoranM
WITH dirlist AS (
SELECT company, LISTAGG(director, ',') WITHIN GROUP (ORDER BY director) directors
FROM companies
)
SELECT c1.company, MAX(c2.company), COUNT(c1.directors) AS DuplicateDirList
FROM dirlist c1, dirlist c2
WHERE c1.directors=c2.directors;
GROUP BY c1.company
HAVING COUNT(c1.directors) >1
This will give you a company with diplicate directors and the first duplicate company.

Related

ORACLE SQL group based on values in a reference table

Customer table and Acct tables has global scope, they share and increment this value
Below is customer table, SEQ NO 1 is beginning of customer data, SEQ_NO 238 is beginning of another customer data
Another is account table, all accounts with their SEQ_NOs inside a boundary of customer get same group (I want to group those accounts to the same customer, so that I can use listAgg to concatenate account id.), for example, below from SEQ_NO 2 and NO 224 (inclusive) should be assigned to the same group.
Is there a SQL way to do that, The worst case I was thinking is to define oracle type, and using function do that.
Any help is appreciate.
If I understand your question correctly, you want to be able to assign rows in the account table to groups, one per customer, so that you can then aggregate based on these groups.
So, the question is how to identify to which customer each account belongs, based on the sequence boundaries given in the first table ("customer") and the specific account numbers in the second table ("account").
This can be done in plain SQL, and relatively easily. You need a join between the accounts table and a subquery based on the customers table. The subquery must show the first and the last sequence number allocated to each client; to do that, you can use the lead analytic function. A bit of care must be taken regarding the last customer, for whom there is no upper limit for the sequence numbers.
You didn't provide test data in a usable format, so I created sample data in the with clause below (which is not part of the query - it's just there as a placeholder for test data).
with
customer (cust_id, seq_no) as (
select 101, 1 from dual union all
select 102, 34 from dual union all
select 200, 58 from dual union all
select 130, 90 from dual
)
, account (acct_id, seq_no) as (
select 1003, 3 from dual union all
select 1005, 11 from dual union all
select 1007, 33 from dual union all
select 1008, 60 from dual union all
select 1103, 77 from dual union all
select 1140, 92 from dual union all
select 1145, 99 from dual
)
select c.cust_id,
listagg(a.acct_id, ',') within group (order by a.acct_id) as acct_list
from (
select cust_id, seq_no as lower_no,
lead(seq_no) over (order by seq_no) - 1 as upper_no
from customer
) c
left outer join account a
on a.seq_no between c.lower_no and nvl(c.upper_no, a.seq_no)
group by c.cust_id
order by c.cust_id
;
OUTPUT
CUST_ID ACCT_LIST
------- --------------------
101 1003,1005,1007
102
130 1140,1145
200 1008,1103

SQL grouping/counting on a string split function

Ok so my original is this
select people, count(*)
from table
group by people
but some of the people have multiple people so this aggregation will not give you pure counts for A, B, C but also each iteration
A 10
B 5
A, B 1
A, C 2
C 15
A, B, C 3
etc.
This works to get the full list of individuals in legacy sql
select split(people,",") as person
from table
But I cannot use the group by on it
select split(people,",") as person, count(*)
from table
group by person
gives the error
Cannot group by an aggregate.
I feel like the solution is a subquery, somehow, but I'm not sure how to execute it
Try wrap with an outer query
select person, count(*)
from(
select split(people,",") as person
from table
) t
group by person

Case statement to determine if I should union

I currently want to do some sort of conditional union. Given the following example:
SELECT age, name
FROM users
UNION
SELECT 25 AS age, 'Betty' AS name
Say I wanted to only union the second statement if the count of 'users' was >=2 , otherwise do not union the two.
In summary I want to append a table with a row if the table only has 2 or more values.
You could use an ugly hack something like this, but I think Tim's answer is better:
SELECT age, name
FROM users
UNION ALL
SELECT 25, 'Betty'
WHERE (SELECT COUNT(*) FROM users) > 1;
If it's in a stored-procedure you could use If...Else:
IF (SELECT COUNT(*) FROM users) < 2
BEGIN
SELECT age, name
FROM users
END
ELSE
SELECT age, name
FROM users
UNION ALL
SELECT 25 AS age, 'Betty' AS name
Otherwise you could try something like this:
SELECT age, name
FROM users
UNION ALL
SELECT TOP 1 25 AS age, 'Betty' AS name
FROM users
WHERE (SELECT COUNT(*) FROM users) >= 2
Note that i've used UNION ALL since it doesn't seem that you want to eliminate duplicates.
Played around here: http://sqlfiddle.com/#!6/a7540/2323/0
Edit: Instead of my second approach i prefer Zohar's. So if you can use If....Else prefer that otherwise WHERE (SELECT COUNT(*) FROM users) > 1 without a table.
Something like the following should work:
SELECT age, name
FROM users
UNION ALL
SELECT age, name
FROM (SELECT 25 AS age, 'Betty' AS name) x
CROSS APPLY (SELECT COUNT(*) FROM users) y(cnt)
WHERE y.cnt >= 2
Second part of UNION ALL will be NULL in case users table has less than 2 records.
SELECT age
, name
FROM users
UNION
SELECT 25 As age
, 'Betty' As name
WHERE EXISTS (
SELECT Count(*)
FROM users
HAVING Count(*) >= 2
)
;

How to Correctly Sum Totals from a Table That Must be Joined to Another Table that Causes Duplicates

I have two tables like the following:
PAY_TABLE
EMPLID PAY
123 100
123 150
123 150
DEDUCTION_TABLE
EMPLID DEDUCTION
123 15
123 30
and I want a result like the following:
TOTAL_PAY
400
I would like to get that result with a fairly simple query and I feel like I'm missing an obvious way to do it, but I can't seem to figure out what is.
For instance, this query returns 800 because every row in the PAY_TABLE is being duplicated when joined to the DEDUCTION_TABLE:
SELECT SUM(PAY) AS TOTAL_PAY
FROM PAY_TABLE JOIN DEDUCTION_TABLE USING(EMPLID);
And this query returns 250 because the DISTINCT keyword causes the second 150 value in the PAY_TABLE to be ignored:
SELECT SUM(DISTINCT PAY) AS TOTAL_PAY
FROM PAY_TABLE JOIN DEDUCTION_TABLE USING(EMPLID);
There are probably several ways to do this, but I am looking for the simplest way to return a result of 400.
Here is some code to create the example tables to make it easier:
WITH
PAY_TABLE AS (
SELECT 123 AS EMPLID, 100 AS PAY FROM DUAL
UNION ALL
SELECT 123, 150 FROM DUAL
UNION ALL
SELECT 123, 150 FROM DUAL
),
DEDUCTION_TABLE AS (
SELECT 123 AS EMPLID, 15 AS DEDUCTION FROM DUAL
UNION ALL
SELECT 123, 30 FROM DUAL
)
It's unclear exactly what you need, since your example doesn't make use of the DEDUCTION_TABLE table, but I believe what you'll want is to aggregate before you JOIN:
;with pay AS (SELECT EmplID,SUM(PAY) AS Pay
FROM PAY_TABLE
GROUP BY EmplID
)
,ded AS (SELECT EmplID,SUM(DEDUCTION) AS Ded
FROM DEDUCTION_TABLE
GROUP BY EmplID
)
SELECT *
FROM pay
LEFT JOIN ded
ON pay.EmplID = ded.EmplID
Assuming you need the join to DEDUCTION_TABLE just to ensure that there is a deduction for the employee:
SELECT SUM(P.PAY) AS TOTAL_PAY
FROM PAY_TABLE P
WHERE EXISTS (SELECT NULL FROM DEDUCTION_TABLE D
WHERE D.EMPLID = P.EMPLID;

Count instances from a selected set of 2 unified columns

Let's say I have a table called "matches" where I save 2 teams from a soccer match.
-------------------------
| home_club | away_club |
-------------------------
| | |
-------------------------
And I have a query that returns all the clubs from that table, both the home and away clubs through UNION:
SELECT home_club AS clubs FROM matches UNION SELECT away_club FROM matches
Now I have a results set called "clubs" and I wish to count how many times each has appeared in the "matches" table. How do I go about doing this?
If you want to know how many times each has appeared in the matches table, then you need to get rid of the union in the subquery. The union is going to remove duplicates.
Here is how you would get a count:
select club, count(*) as NumAppears
from (SELECT home_club AS club FROM matches
UNION ALL
SELECT away_club FROM matches
) m
group by club;
Note the UNION is replaced with UNION ALL.
Try following
select sum(count) as Matches, Club from
(select count(*) as count, home_club as Club from matches group by home_club
union all
select count(*) as count, away_club as Club from matches group by away_club ) a
group by a.clubs
SELECT
M.clubs,
COUNT(M.clubs) [count]
FROM
(
SELECT
home_club AS clubs
FROM
matches
UNION ALL
SELECT
away_club
FROM
matches
) M
GROUP BY
M.clubs