Form and count the most frequented pair sql oracle - sql

I am creating a database for the video replay site and I have a table with users and table with viewing history. I need to find using SQL query several the most watched pairs of videos. Exemple: user 1 watched videos 12, 43, 50, 66, 78; user 2 watched 12, 43, 45, 50; user 3 watched 12, 35, 50, 66, 78; user 4 watched 33, 66, 69, 78
So the two most viewed couples are (12,50) and (66,78).
But I can't even get how to form this couples for the future counting.
So, my question is how to form all possible couples and count the quantity of views of each of them.

A self join is the right way to do this. I think the simplest form of the query is:
select vh.*
from (select vh1.movie as movie1, vh2.movie as movie2, count(*) as cnt,
rank() over (order by count(*) desc) as seqnum
from viewing_history vh1 inner join
viewing_history vh2
on vh1.userid = vh2.userid and vh1.movie < vh2.movie
group by vh1.movie, vh2.movie
) vh
where seqnum = 1;

In the solution below, I create a subquery to simulate input data. In your application, instead of viewing_history you should use your viewing history table. I don't see how the "users" table is relevant in this problem. The second subquery, which I named movie_pairs, is an inner join of the viewing history with itself - that's how you create the pairs. I went beyond that - with that in hand, I went on to identify the pairs that are viewed together most often.
with
viewing_history ( userid, movie ) as (
select 1, 12 from dual union all
select 1, 43 from dual union all
select 1, 50 from dual union all
select 1, 66 from dual union all
select 1, 78 from dual union all
select 2, 12 from dual union all
select 2, 43 from dual union all
select 2, 45 from dual union all
select 2, 50 from dual union all
select 3, 12 from dual union all
select 3, 35 from dual union all
select 3, 50 from dual union all
select 3, 66 from dual union all
select 3, 78 from dual union all
select 4, 33 from dual union all
select 4, 66 from dual union all
select 4, 69 from dual union all
select 4, 78 from dual
),
-- end test data, query begins here (but include the keyword WITH from above)
movie_pairs ( movie1, movie2, ct ) as (
select a.movie, b.movie, count(*)
from viewing_history a inner join viewing_history b
on a.userid = b.userid and a.movie < b.movie
group by a.movie, b.movie
)
select movie1, movie2
from movie_pairs
where ct = (select max(ct) from movie_pairs)
order by movie1, movie2 -- ORDER BY is optional
;
Output:
MOVIE1 MOVIE2
---------- ----------
12 50
66 78

Related

SQL Lookup to another table to do aggregation and merge it back to the outer query

Query: Being able to do lookup from Table A to Table B and use an aggregation function based on date criteria referencing date fields from Table A with the date fields from Table B.
Scenario:
I have a car table (contains CAR_ID,Car_START_DT,Car_END_DT) and a car_payments table (contains CAR_ID, Car_Payment_DT, Car_Payment_Amt).
For every car in the car table, I would like to do a lookup into car_payments table using CAR_ID and aggregate by counting the number of Car_Payment_Amt records between the Car_START_DT and Car_END_DT (from car table) using Car_Payment_DT.
For my attempt, I created a subquery to COUNT(Car_Payment_Amt) GROUP BY CAR_ID under car_payments table and JOIN it with Car table based on CAR_ID to get the results but realized that the subquery will be taking longer than expected as the data size grow larger.
How can I do this efficiently using SQL? I did a search and people are saying that using correlated query but it has performance bottleneck. Are there any other options?
Just use a simple join
select
c.car_id, count(cp.payment_amt) as pmt_count
from
car c
left join
car_payment cp on cp.car_id = c.car_id
and cp.payment_dt between c.car_start_dt and c.car_end_date
group by
c.car_id
You can try it without grouping like here:
SELECT c.CAR_ID,
(Select Count(*) From payments
Where CAR_ID = c.CAR_ID And PAY_DATE Between c.START_DATE And c.END_DATE) "NO_OF_PAYS"
FROM cars c
which with following sample data:
WITH
cars (CAR_ID, START_DATE, END_DATE) AS
(
Select 1, To_Date('01.01.2023', 'dd.mm.yyyy'), To_Date('04.01.2023', 'dd.mm.yyyy') From Dual Union All
Select 2, To_Date('01.01.2023', 'dd.mm.yyyy'), To_Date('06.01.2023', 'dd.mm.yyyy') From Dual Union All
Select 3, To_Date('03.01.2023', 'dd.mm.yyyy'), To_Date('08.01.2023', 'dd.mm.yyyy') From Dual Union All
Select 4, To_Date('03.01.2023', 'dd.mm.yyyy'), To_Date('10.01.2023', 'dd.mm.yyyy') From Dual Union All
Select 5, To_Date('05.01.2023', 'dd.mm.yyyy'), To_Date('12.01.2023', 'dd.mm.yyyy') From Dual Union All
Select 6, To_Date('07.01.2023', 'dd.mm.yyyy'), To_Date('14.01.2023', 'dd.mm.yyyy') From Dual
),
payments (CAR_ID, PAY_DATE, AMAUNT) AS
(
Select 1, To_Date('01.01.2023', 'dd.mm.yyyy'), 100 From Dual Union All
Select 1, To_Date('03.01.2023', 'dd.mm.yyyy'), 100 From Dual Union All
Select 1, To_Date('05.01.2023', 'dd.mm.yyyy'), 100 From Dual Union All
Select 1, To_Date('06.01.2023', 'dd.mm.yyyy'), 100 From Dual Union All
Select 3, To_Date('05.01.2023', 'dd.mm.yyyy'), 300 From Dual Union All
Select 3, To_Date('08.01.2023', 'dd.mm.yyyy'), 300 From Dual Union All
Select 3, To_Date('11.01.2023', 'dd.mm.yyyy'), 300 From Dual Union All
Select 5, To_Date('11.01.2023', 'dd.mm.yyyy'), 500 From Dual Union All
Select 5, To_Date('13.01.2023', 'dd.mm.yyyy'), 500 From Dual Union All
Select 6, To_Date('06.01.2023', 'dd.mm.yyyy'), 600 From Dual Union All
Select 6, To_Date('12.01.2023', 'dd.mm.yyyy'), 600 From Dual
)
... results as
-- R e s u l t :
-- CAR_ID NO_OF_PAYS
-- ---------- ----------
-- 1 2
-- 2 0
-- 3 2
-- 4 0
-- 5 1
-- 6 1

How to get the max count of an attribute with 3 tables?

I need to query which author sold the most books and how many books the author sold.
select a.firstname ||''|| a.lastname as fullname,
max(count(datesold))
from author a,
transaction t,
book b
where a.authorid = b.authorid
and b.bookid = t.bookid
group by
a.firstname,
a.lastname;
It gave me an error of not a single-group group function.
Any idea what is the issue here?
With some sample data
SQL> with
2 author (authorid, firstname, lastname) as
3 (select 1, 'Stephen', 'King' from dual union all
4 select 2, 'Jo' , 'Nesbo' from dual),
5 book (bookid, authorid) as
6 (select 100, 1 from dual union all
7 select 200, 1 from dual union all
8 select 300, 2 from dual
9 ),
10 transaction (trans_id, bookid) as
11 (select 1, 100 from dual union all
12 select 2, 100 from dual union all
13 select 3, 100 from dual union all
14 select 4, 300 from dual
15 ),
query uses the RANK analytic function which ranks rows by number of rows in the transaction table (it says how many books were sold). Finally, fetch row(s) that rank as highest:
16 temp as
17 (select a.firstname || ' ' || a.lastname AS fullname,
18 count(t.bookid) cnt,
19 rank() over (order by count(t.bookid) desc) rnk
20 from author a join book b on a.authorid = b.authorid
21 join transaction t on t.bookid = b.bookid
22 group by a.firstname, a.lastname
23 )
24 select fullname, cnt
25 from temp
26 where rnk = 1;
FULLNAME CNT
------------- ----------
Stephen King 3
SQL>
You can use:
select MAX(a.firstname ||' '|| a.lastname) as fullname,
COUNT(datesold)
from author a
INNER JOIN book b
ON (a.authorid = b.authorid)
INNER JOIN transaction t
ON (b.bookid = t.bookid)
GROUP BY
a.authorid
ORDER BY
COUNT(datesold) DESC
FETCH FIRST ROW ONLY;
Do not aggregate by firstname and lastname as there are many people in the world with identical names and you do not want to count everyone with the same name as a single person.
Which, for the sample data:
CREATE TABLE author (authorid, firstname, lastname, dateofbirth) AS
SELECT 1, 'Alice', 'Adams', DATE '1900-01-01' FROM DUAL UNION ALL
SELECT 2, 'Alice', 'Adams', DATE '1910-01-01' FROM DUAL UNION ALL
SELECT 3, 'Betty', 'Baron', DATE '1920-01-01' FROM DUAL UNION ALL
SELECT 4, 'Carol', 'Corrs', DATE '1930-01-01' FROM DUAL UNION ALL
SELECT 5, 'Carol', 'Corrs', DATE '1940-01-01' FROM DUAL;
CREATE TABLE book (bookid, authorid) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 2, 2 FROM DUAL UNION ALL
SELECT 3, 3 FROM DUAL UNION ALL
SELECT 4, 4 FROM DUAL UNION ALL
SELECT 5, 5 FROM DUAL;
CREATE TABLE transaction (bookid, datesold) AS
SELECT 1, DATE '1970-01-01' FROM DUAL UNION ALL
SELECT 1, DATE '1970-01-02' FROM DUAL UNION ALL
SELECT 1, DATE '1970-01-03' FROM DUAL UNION ALL
SELECT 1, DATE '1970-01-04' FROM DUAL UNION ALL
SELECT 3, DATE '1970-01-01' FROM DUAL UNION ALL
SELECT 4, DATE '1970-01-01' FROM DUAL UNION ALL
SELECT 4, DATE '1970-01-02' FROM DUAL UNION ALL
SELECT 5, DATE '1970-01-01' FROM DUAL UNION ALL
SELECT 5, DATE '1970-01-02' FROM DUAL UNION ALL
SELECT 5, DATE '1970-01-03' FROM DUAL;
Outputs:
FULLNAME
COUNT(DATESOLD)
Alice Adams
4
db<>fiddle here

How to get mean of exams by client with 2 tables?

I know a little bit of sql, only the basic, now I need to create a analytic query but can't do this yet.
I have 2 tables on my db oracle, client and exams:
I am tried a lot of ways to get the mean of exams by client, but no success yet.4
The result expected is:
exams = 13
clients = 6
13/6= 2.166666666...7
How can I do that?
If you have clients who have not taken any exams then you want:
SELECT AVG(COUNT(e.nu_ordem)) AS avg_exames_by_client
FROM cliente c
LEFT OUTER JOIN exames e
ON (c.id = e.id_cliente)
GROUP BY c.id;
or:
SELECT (SELECT COUNT(*) FROM exames) / (SELECT COUNT(*) FROM cliente)
AS avg_exames_by_client
FROM DUAL;
Which, for the sample data:
CREATE TABLE cliente (id PRIMARY KEY) AS
SELECT 1 FROM DUAL UNION ALL
SELECT 2 FROM DUAL UNION ALL
SELECT 3 FROM DUAL UNION ALL
SELECT 4 FROM DUAL UNION ALL
SELECT 5 FROM DUAL UNION ALL
SELECT 6 FROM DUAL;
CREATE TABLE exames (nu_ordem PRIMARY KEY, id_cliente) AS
SELECT 1, 1 FROM DUAL UNION ALL
SELECT 2, 5 FROM DUAL UNION ALL
SELECT 3, 5 FROM DUAL UNION ALL
SELECT 4, 2 FROM DUAL UNION ALL
SELECT 5, 6 FROM DUAL UNION ALL
SELECT 6, 1 FROM DUAL UNION ALL
SELECT 7, 1 FROM DUAL UNION ALL
SELECT 8, 4 FROM DUAL UNION ALL
SELECT 9, 5 FROM DUAL UNION ALL
SELECT 10, 3 FROM DUAL UNION ALL
SELECT 11, 6 FROM DUAL UNION ALL
SELECT 12, 2 FROM DUAL UNION ALL
SELECT 13, 1 FROM DUAL;
Both output:
AVG_EXAMES_BY_CLIENT
2.166666666666666667
If you then add a couple of clients but no more exams:
INSERT INTO cliente (id)
SELECT 7 FROM DUAL UNION ALL
SELECT 8 FROM DUAL
Then the average is:
AVG_EXAMES_BY_CLIENT
1.625
db<>fiddle here
You can try below formula to get the result -
SELECT COUNT(*)/COUNT(DISTINCT id_cliente)
FROM exams;

Efficient way to pull counts for all permutations of a field

I have an oracle DB w/ a table that contains records associated to a person (based on an ID). The records are categorized as category = 1, 2, or 3.
I would like to pull as follows:
- # of people with only a category 1 record (no category=2 or 3)
- # of people with only a category 2 record (no category=1 or 3)
- # of people with only a category 3 record (no category=1 or 2)
- # of people with both category 1 & 2 records (no category=3)
- # of people with both category 1 & 3 records (no category=2)
- # of people with all category records 1,2, & 3
- # of people with both a category 2 & 3 records (no category=1)
I could only think of the following solution (modified for each case):
select count(*) from table1
where id in (select id from table1 where category=1)
and id not in (select id from table1 where category=2)
and id not in (select id from table1 where category=3)
But, I believe this is a highly inefficient way of doing this, was wondering if anyone had quicker/better way of getting this info.
Thanks!
One way to do this is to bring the categories together, using listagg() and then reaggregate:
select categories, count(*)
from (select listagg(t1.category, ',') within group (order by t1.category) as categories, personid
from table1 t1
group by personid
) x
group by categories;
EDIT:
If you need distinct values:
select categories, count(*)
from (select listagg(t1.category, ',') within group (order by t1.category) as categories, personid
from (select distinct t1.category, t1.personid from table1 t1) t1
group by personid
) x
group by categories;
Here is a query that, for each ID, shows the count of distinct categories and the MIN and MAX category. This query can be used as a sub-query in further processing (you didn't explain exactly HOW you want the results to be presented). When the COUNT is 1, then the single category is that in the MIN_CAT column; when the COUNT is 3, then all three categories are present for that ID; and when the COUNT is 2, then the two categories that are present are in the MIN and the MAX columns. Whatever else you need to do from here should be very simple; for example you can now GROUP BY CT, MIN_CAT, MAX_CT and count ID's.
I do a count(distinct category) to allow the possibility of non-unique (id, category) - as illustrated in the sample data I include in a WITH clause (which is NOT part of the SQL query!)
with
test_data ( id, category ) as (
select 101, 3 from dual union all
select 101, 1 from dual union all
select 101, 3 from dual union all
select 104, 2 from dual union all
select 105, 2 from dual union all
select 105, 2 from dual union all
select 105, 1 from dual union all
select 106, 1 from dual union all
select 106, 2 from dual union all
select 106, 3 from dual union all
select 106, 3 from dual
)
select id,
count(distinct category) as ct,
min(category) as min_cat,
max(category) as max_cat
from test_data
group by id
;
ID CT MIN_CAT MAX_CAT
--- -- ------- -------
101 2 1 3
105 2 1 2
104 1 2 2
106 3 1 3
Oracle Setup:
CREATE TABLE test_data ( id, category ) as
select 101, 3 from dual union all
select 101, 1 from dual union all
select 101, 3 from dual union all
select 104, 2 from dual union all
select 105, 2 from dual union all
select 105, 2 from dual union all
select 105, 1 from dual union all
select 106, 1 from dual union all
select 106, 2 from dual union all
select 106, 3 from dual union all
select 106, 3 from dual union all
select 107, 1 from dual union all
select 107, 3 from dual;
Query:
SELECT c1,
c2,
c3,
LTRIM(
DECODE( c1, 1, ',1' ) || DECODE( c2, 1, ',2' ) || DECODE( c3, 1, ',3' ),
','
) AS categories,
COUNT(1) AS num_people,
LISTAGG( id, ',' ) WITHIN GROUP ( ORDER BY id ) AS people
FROM ( SELECT DISTINCT * FROM test_data )
PIVOT ( COUNT(1) FOR category IN ( 1 AS c1, 2 AS c2, 3 AS c3 ) )
GROUP BY c1, c2, c3;
Output:
C1 C2 C3 CATEGORIES NUM_PEOPLE PEOPLE
-- -- -- ---------- ---------- ----------
0 1 0 2 1 104
1 0 1 1,3 2 101,107
1 1 0 1,2 1 105
1 1 1 1,2,3 1 106

Distinct LISTAGG that is inside a subquery in the SELECT list

Here is a minimal working example of what I'm trying to do and what I'm getting:
I have a query as follows:
/*
with tran_party as -- ALL DUMMY DATA ARE IN THESE CTE FOR YOUR REFERENCE
(select 1 tran_party_id, 11 transaction_id, 101 team_id_redirect
from dual
union all
select 2, 11, 101 from dual
union all
select 3, 11, 102 from dual
union all
select 4, 12, 103 from dual
union all
select 5, 12, 103 from dual
union all
select 6, 12, 104 from dual
union all
select 7, 13, 104 from dual
union all
select 8, 13, 105 from dual),
tran as
(select 11 transaction_id, 1001 account_id, 1034.93 amount from dual
union all
select 12, 1001, 2321.89 from dual
union all
select 13, 1002, 3201.47 from dual),
account as
(select 1001 account_id, 111 team_id from dual
union all
select 1002, 112 from dual),
team as
(select 101 team_id, 'UUU' as team_code from dual
union all
select 102, 'VV' from dual
union all
select 103, 'WWW' from dual
union all
select 104, 'XXXXX' from dual
union all
select 105, 'Z' from dual)
-- */
-- The Actual Query
select a.account_id,
t.transaction_id,
(select listagg (tm_redir.team_code, ', ')
within group (order by tm_redir.team_code)
from tran_party tp_redir
inner join team tm_redir
on tp_redir.team_id_redirect = tm_redir.team_id
inner join tran t_redir
on tp_redir.transaction_id = t_redir.transaction_id
where t_redir.account_id = a.account_id
and t_redir.transaction_id != t.transaction_id)
as teams_redirected
from tran t inner join account a on t.account_id = a.account_id;
NOTE: tran_party.team_id_redirect is a foreign key that references team.team_id.
Current output:
ACCOUNT_ID TRANSACTION_ID TEAMS_REDIRECTED
---------- -------------- ----------------
1001 11 WWW, WWW, XXXXX
1001 12 UUU, UUU, VV
1002 13
Expected output:
I want the repeated items in TEAMS_REDIRECTED column to be selected only once, like this:
ACCOUNT_ID TRANSACTION_ID TEAMS_REDIRECTED
---------- -------------- ----------------
1001 11 WWW, XXXXX
1001 12 UUU, VV
1002 13
What I tried:
Instead of selecting from tran_party directly, I wrote an inline view that selects distinct values from tran_party like this:
select a.account_id,
t.transaction_id,
(select listagg (tm_redir.team_code, ', ')
within group (order by tm_redir.team_code)
from (select distinct transaction_id, team_id_redirect -- Note this inline view
from tran_party) tp_redir
inner join team tm_redir
on tp_redir.team_id_redirect = tm_redir.team_id
inner join tran t_redir
on tp_redir.transaction_id = t_redir.transaction_id
where t_redir.account_id = a.account_id
and t_redir.transaction_id != t.transaction_id)
as teams_redirected
from tran t inner join account a on t.account_id = a.account_id;
While this does give me the expected output, when I use this solution in my actual code, it takes about 13 seconds to retrieve just one row. Thus I cannot use what I already tried.
Any help will be appreciated.
The following method gets rid of the in-line view to fetch duplicates, it uses REGEXP_REPLACE and RTRIM on the LISTAGG function to get the distinct result set in the aggregated list. Thus, it won't do more than one scan.
Adding this piece to your code,
RTRIM(REGEXP_REPLACE(listagg (tm_redir.team_code, ',')
WITHIN GROUP (ORDER BY tm_redir.team_code),
'([^,]+)(,\1)+', '\1'),
',')
Modified query-
SQL> with tran_party as -- ALL DUMMY DATA ARE IN THESE CTE FOR YOUR REFERENCE
2 (select 1 tran_party_id, 11 transaction_id, 101 team_id_redirect
3 from dual
4 union all
5 select 2, 11, 101 from dual
6 union all
7 select 3, 11, 102 from dual
8 union all
9 select 4, 12, 103 from dual
10 union all
11 select 5, 12, 103 from dual
12 union all
13 select 6, 12, 104 from dual
14 union all
15 select 7, 13, 104 from dual
16 union all
17 select 8, 13, 105 from dual),
18 tran as
19 (select 11 transaction_id, 1001 account_id, 1034.93 amount from dual
20 union all
21 select 12, 1001, 2321.89 from dual
22 union all
23 select 13, 1002, 3201.47 from dual),
24 account as
25 (select 1001 account_id, 111 team_id from dual
26 union all
27 select 1002, 112 from dual),
28 team as
29 (select 101 team_id, 'UUU' as team_code from dual
30 union all
31 select 102, 'VV' from dual
32 union all
33 select 103, 'WWW' from dual
34 union all
35 select 104, 'XXXXX' from dual
36 union all
37 select 105, 'Z' from dual)
38 -- The Actual Query
39 select a.account_id,
40 t.transaction_id,
41 (SELECT RTRIM(
42 REGEXP_REPLACE(listagg (tm_redir.team_code, ',')
43 WITHIN GROUP (ORDER BY tm_redir.team_code),
44 '([^,]+)(,\1)+', '\1'),
45 ',')
46 from tran_party tp_redir
47 inner join team tm_redir
48 on tp_redir.team_id_redirect = tm_redir.team_id
49 inner join tran t_redir
50 on tp_redir.transaction_id = t_redir.transaction_id
51 where t_redir.account_id = a.account_id
52 and t_redir.transaction_id != t.transaction_id)
53 AS teams_redirected
54 from tran t inner join account a on t.account_id = a.account_id
55 /
ACCOUNT_ID TRANSACTION_ID TEAMS_REDIRECTED
---------- -------------- --------------------
1001 11 WWW,XXXXX
1001 12 UUU,VV
1002 13
SQL>