Conditional probability in SQL - sql

I think I have end up in a bit of a dead end.
Let's say I have a dataset, which is fairly easy -
person_id and book_id. Which is pretty much factual table that says person X bought books A, B and C.
I know how to find out how many persons have bought Book X and Book Y together.
This is
select a.book_id as B1, b.book_id as B2, count(b.person_id) as
Bought_Together
from dbo.data a
cross join dbo.data b
where a.book_id != b.book_id and a.person_id = b.person_id
group by a.book_id, b.book_id
Yet again this is where my brain decided to shut down. I know that I would probably need to do it so that
count(b.person_id) / all the people that bought book A * 100
but im not entirely sure.
I hope I was clear enough.
EDIT1: I'm using SQL Server 2017 currently, so i think the correct answer is T-SQL?.
In the end the format should be something similliar to this. Also there is no cases where person A could have bought three copies of book X.
Book1 Book2 HowManyPeopleBoughtBook2
1 2 50%
1 3 7%
2 3 15%
2 1 40%
3 1 60%
3 2 20%
EDIT2: Let it be said there is hundreds of thousands of rows in the database. Yes this is bit related to a data science course i am taking - hence huge amounts of data.

You can extend your logic to do this:
select a.book_id as B1, b.book_id as B2,
count(b.book_id) as bought_second_book,
count(b.book_id) * 1.0 / book_cnt as ratio_Bought_Together
from (select a.*, count(*) over (partition by a.book_id) as book_cnt
from dbo.data a
) a left join
dbo.data b
on a.person_id = b.person_id and a.book_id <> b.book_id
group by a.book_id, b.book_id, a.book_cnt;
This assumes that people buy a book only once. If there are duplicates, then count(distinct) would adjust for that.

If you would like to generate all possible combinations of the pairs of books bought together along with the percentage of the persons who bought that combination the following can help
create table data1(book_id int, person_id int)
insert into data1
select *
from (values(1,300)
,(2,300)
,(2,301)
,(1,301)
,(3,301)
)t(book_id,person_id)
with books
as (select distinct book_id
from data1 a
)
,tot_persons
as (select count(distinct person_id) as tot_cnt
from data1
)
,pairs
as (
select a.book_id as col1 /* This block generates all possible pair combinations of books*/
,b.book_id as col2
from books a
join books b
on a.book_id<b.book_id
)
select a.col1,a.col2
,count(b.person_id)*100/(select tot_cnt from tot_persons) as percent_of_persons_buying_both
from pairs a
join data1 b
on a.col1=b.book_id
where exists(select 1
from data1 b1
where b.person_id=b1.person_id
and a.col2=b1.book_id)
group by a.col1,a.col2

On my phone, apologies for typo's
SELECT
SUM(bought_b) * 100.0 / COUNT(*)
FROM
(
SELECT
person_id,
MAX(CASE WHEN book_id = 'A' THEN 1 END) AS bought_a,
MAX(CASE WHEN book_id = 'B' THEN 1 END) AS bought_b
FROM
data
WHERE
book_id IN ('A', 'B')
GROUP BY
person_id
)
person_stats
WHERE
bought_a = 1
On my phone, apologies for typo's
EDIT : just saw that you want all combinations, just just one set combination.
WITH
book AS
(
SELECT DISTINCT book_id FROM data
)
SELECT
book_a_id,
book_b_id,
bought_b * 100.0 / bought_b
FROM
(
SELECT
book_a.book_id AS book_a_id,
book_b.book_id AS book_b_id,
COUNT(DISTINCT data_a.person_id) AS bought_a,
COUNT(DISTINCT data_b.person_id) AS bought_b
FROM
book AS book_a
CROSS JOIN
book AS book_b
INNER JOIN
data AS data_a
ON data_a.book_id = book_a.book_id
LEFT JOIN
data AS data_b
ON data_b.book_id = book_b.book_id
GROUP BY
book_a.book_id,
book_b.book_id
)
stats

Related

Most popular pairs of shops for workers from each company

I've got 2 tables, one with sales and one with companies:
Sales Table
Transaction_Id Shop_id Sale_date Client_ID
92356 24234 11.09.2018 12356
92345 32121 11.09.2018 32121
94323 24321 11.09.2018 21231
94278 45321 11.09.2018 42123
Company table
Client_ID Company_name
12345 ABC
13322 ABC
32321 BCD
22221 BCD
What I want to achieve is distinct count of Clients from each Company for each pair of shops(Clients who had at least 1 transaction in both of shops) :
Shop_Id_1 Shop_id_2 Company_name Count(distinct Client_id)
12356 12345 ABC 31
12345 14278 ABC 23
14323 12345 BCD 32
14278 12345 BCD 43
I think that I have to use self join, but my queries even with filter for one week is killing DB, any thoughts on that? I'm using Microsoft SQL server 2012.
Thanks
I think this is a self-join and aggregation, with a twist. The twist is that you want to include the company in each sales record, so it can be used in the self-join:
with sc as (
select s.*, c.company_name
from sales s join
companies c
on s.client_id = c.client_id
)
select sc1.shop_id, sc2.shop_id, sc1.company_name, count(distinct sc1.client_id)
from sc sc1 join
sc sc2
on sc1.client_id = sc2.client_id and
sc1.company_name = sc2.company_name
group by sc1.shop_id, sc2.shop_id, sc1.company_name;
I think there are some issues with your question. I interpreted it as such that the company table contains the shop ID's, not the ClienId's.
First you can create a solution to get the shops as rows for each company. Here I chose a maximum of 5 shops per company. Don't forget the semicolon in the previous statement before the cte's.
WITH CTE_Comp AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY CompanyName ORDER BY ShopID) AS RowNumb
FROM Company AS C
)
SELECT C1.ShopID,
C2.ShopID AS ShopID_2,
C3.ShopID AS ShopID_3,
C4.ShopID AS ShopID_4,
C5.ShopID AS ShopID_5,
C1.CompanyName
INTO ShopsByCompany
FROM CTE_Comp AS C1
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C2.CompanyName AND RowNumb = 2
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C3.CompanyName AND RowNumb = 3
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C4.CompanyName AND RowNumb = 4
LEFT JOIN CTE_Comp AS C2 ON C1.CompanyName= C5.CompanyName AND RowNumb = 5
WHERE C1.RowNumb = 1
After that, in a few steps, I think you could get the desired result:
WITH ClientsPerShop AS
(
SELECT ShopID,
COUNT (DISTINCT ClientID) AS TotalClients
FROM Sales
GROUP BY ShopID
)
, ClienstsPerCompany AS
(
SELECT CompanyName,
SUM (TotalClients) AS ClientsPerComp
FROM Company AS C
INNER JOIN ClientsPerShop AS CPS ON C.ShopID = CPS.ShopID
GROUP BY CompanyName
)
SELECT *
FROM ClienstsPerCompany AS CPA
INNER JOIN ShopsByCompany AS SBC ON SBC.CompanyName = CPA.CompanyName
Hopefully this will bring you closer to your solution, best of luck!

How can I show all "article" which have more than 3 "bids"?

I wanna show the "ArticleName" of all "offers" that have more than 3 "bids". The number of the "Bids" should be output.
I don't know how I can write it down. But I think I know the Logic. It should count the same number of the Table "bid" and the column "OID" and in the end it should paste the number which is more than 3.
Picture:
Well that's easy enough:
Select ArticleName
, count(*) NumberOfBids
from Offer o
join Bid b
on b.oid = o.oid
group by ARticleName
having count(*) >= 3
SELECT * FROM (
SELECT o.ArticleName, count(b.BID) as numberOfBids
FROM Offer as o INNER JOIN bid as b ON o.oid = b.oid
GROUP BY o.ArticleName
) as c
WHERE c.numberOfBids > 3

SQL joining two tables with common row

I have 2 tables in sybase
Account_table
Id account_code
1 A
2 B
3 C
Associate_table
id account_code
1 A
1 B
1 C
2 A
2 B
3 A
3 C
I have this sql query
SELECT * FROM account_table account, associate_table assoc
WHERE account.account_code = assoc.account_code
This query will return 7 rows. What I want is to return the rows from associate_table that is only common to the 3 accounts like this:
account id account_code Assoc Id
1 A 1
2 B 1
3 C 1
Can anyone help what kind of join should I do?
SELECT b.id account_id,a.code account_code,a.id assoc_id
FROM associate a,
account b
WHERE a.code = b.code
AND a.id IN (SELECT a.id
FROM associate a,
account b
WHERE a.code = b.code
GROUP BY a.id
HAVING Count(*) = (SELECT Count(*)
FROM account));
NOTE: this query works only if you have unique values in Id and account_code columns in account table. And also, your associate_table should contain unique combination of (id, account,code). i.e., associate table should not contain (1,A) or any pair twice.
Try this
SELECT AC.ID,AC.account_code,ASS.ID
FROM account_table AC INNER JOIN associate_table AS ASS ON AC.account_code = ASS.account_code
OK so far answer is accepted I'll post simpler one:
SELECT *
FROM account_table AS account,
associate_table AS assoc
WHERE account.account_code = assoc.account_code
HAVING (
SELECT
COUNT(*)
FROM associate_table assoc_2
WHERE assoc_2.id = assoc.id
) = 3
here 3 is the number of codes account table has, if it's gonna be dynamic (changing over time),
you can use (SELECT COUNT(*) FROM account_table) instead of exact number. Also I'm sure it will be cached by database engine, so requires less resources

I need a SQL query for comparing column values against rows in the same table

I have a table called BB_BOATBKG which holds passengers travel details with columns Z_ID, BK_KEY and PAXSUM where:
Z_ID = BookingNumber* LegNumber
BK_KEY = BookingNumber
PAXSUM = Total number passengers travelled in each leg for a particular booking
For Example:
Z_ID BK_KEY PAXSUM
001234*01 001234 2
001234*02 001234 3
001287*01 001287 5
001287*02 001287 5
002323*01 002323 7
002323*02 002323 6
I would like to get a list of all Booking Numbers BK_KEY from BB_BOATBKG where the total number of passengers PAXSUM is different in each leg for the same booking
Example, For Booking number A, A*Leg01 might have 2 Passengers, A* Leg02 might have 3 passengers
Dependent of your RDBMs there might be several options availible. A solution that should work for most is:
SELECT A.Z_ID, A.BK_KEY, A.PAXSUM
FROM BB_BOATBKG A
JOIN (
SELECT BK_KEY
FBB_BOATBKGROM BB_BBK_KEY
GROUP BY BK_KEY
HAVING COUNT( DISTINCT PAXSUM ) > 1
) B
ON A.BK_KEY = B.BK_KEY
If your DBMS support OLAP functions, have a look at RANK() OVER (...)
It's a little counterintuitive, but you could join the table to itself on {BK_KEY, PAXSUM} and pull out only the records whose joined result is null.
I think this does it:
SELECT
a.BK_KEY
FROM
BB_BOATBKG a
LEFT OUTER JOIN BB_BOATBKG b ON a.BK_KEY = b.BK_KEY AND a.PAXSUM = b.PAXSUM
WHERE
b.Z_ID IS NULL
GROUP BY
a.BK_KEY
Edit: I think I missed anything beyond the trivial case. I think you can do it with some really nasty subselecting though, a la:
SELECT
b.BK_KEY
FROM
(
SELECT
a.BK_KEY,
Count = COUNT(*)
FROM
(
SELECT
a.BK_KEY,
a.PAXSUM
FROM
BB_BOATBKG a
GROUP BY
a.BK_KEY,
a.PAXSUM
HAVING
COUNT(*) = 1
) a
GROUP BY
a.BK_KEY
) b
INNER JOIN
(
SELECT
c.BK_KEY,
Count = COUNT(*)
FROM
BB_BOATBKG c
GROUP BY
c.BK_KEY
) c ON b.BK_KEY = c.BK_KEY AND b.Count = c.Count

SQL Oracle Select Group where all members are borrowed

I have two tables: Car and CarBorrowed.
The Car table contains all cars in the car pool with an ID and a group the car belongs to. For example:
ID 1, Car 1, Group Renault
ID 2, Car 3, Group Renault
ID 3, Car 4, Group VW
ID 4, Car 6, Group BMW
ID 5, Car 7, Group BMW
The CarBorrow table contains all cars which are borrowed on a particular day
Car 1, Borrowed on 23.08.2012
Car 3, Borrowed on 23.08.2012
Car 5, Borrowed on 23.08.2012
Now I want all groups, where no cars are left (today= 23.08.2012). So I should get "Group Renault"
First, join the tables, so we have for every car its borrows(a day).
select c.id, c.GroupName, cb.day
from car c
left join (select * from CarBorrow where day = '23 Aug 2012') cb
on (c.id = cb.id)
All cars not borrowed will have null at day.
After this, we shoud select all Groups that does not have nulls.
Bellow an trick to get it:
select GroupName
FROM(
select c.id, c.GroupName, cb.day
from car c
left join (select * from CarBorrow where day = '23 Aug 2012') cb
on (c.id = cb.id)
)
group by GroupName
having count(day) = count(*)
(Days that are null are not counted by COUNT)
SELECT distinct(D1.CARGROUP)
FROM den_car d1
MINUS
(SELECT D.CARGROUP
FROM den_car d
WHERE d.id IN (SELECT c.ID
FROM den_car c
MINUS
SELECT b.id
FROM den_car_borrow b
WHERE B.DATE_BORROW = TO_DATE (SYSDATE)))
This may be optimized but the idea is simple: Find the borrowed ones, subtract it from all cars. Then find the remaning groups.
Hope it helps. (By the way of course there are lots of other ways to do it.)
Hmmm . . . One way to approach this query is to count the cars in a group and also count the cars on a particular day, then take the groups where the borrowed equals the available:
select borrowed.BorrowedOn, available.CarGroup
from (select c.CarGroup, count(*) as cnt
from car c
group by c.CarGroup
) available left outer join
(select c.CarGroup, cb.BorrowedOn, count(*) as cnt
from CarBorrowed cb join
Car c
on cb.CarId = c.CarId
group by c.CarGroup, cb.BorrowedOn
) borrowed
on available.CarGroup = borrowed.CarGroup
where available.cnt = borrowed.cnt
By the way, "Group" is a bad name for a column, since it is a SQL reserved word. I've renamed it to CarGroup.
If the same car can be borrowed more than once on a given day, then change the count(*) in the second subquery to count(distinct cb.carId).
If you want just one day, you can add a clause to the WHERE clause.
I think I have a solution now:
select x.groupname from
(select a.groupname, count(*) as cnt from car a group by a.groupname) x
inner join
(
select b.groupname, count(*) as cnt from car b where b.carid in (select caraid from modavail where day ='23.08.2012')
group by b.groupname
) y
on x.groupname = y.groupname
where x.cnt = y.cnt and y.cnt ! = 0 ORDER BY GROUPNAME;
Thanks for your help!!!!