Find the pairs that appear the most times in StoreID and each STOREID has only this pair SQL

Find the pairs that appear the most times in StoreID and each STOREID has only this pair SQL - sql

I want to find the pairs of MOVIEID that appear more times in the STOREID.
Additionally, each STOREID should have only this pair as MOVIEIDs. My table has 2 columns: STOREID and MOVIEID.
For example:
STOREID | MOVIEID
--------|---------
1 | a
1 | b
1 | c
2 | a
2 | b
3 | a
3 | b
5 | a
5 | b
In this case the answer would be pair: (a,b) 3 times.
Thanks in advance!

As far as i understand you want to consider only stores that sell movie pairs. This makes it a lot simpler. First you group by stores and take only those results with two movies. Now to generate pairs of those would be tricky if there were more than two movies. You would need windowing functions. However, for two you get both movies with aggregation functions. One with min and the other with max. Further those functions ensure, the same pair always has the same order. For example the pair (a,b) will always be (a,b) and never (b,a).
SELECT COUNT(*), MOVIE_1, MOVIE_2
FROM (
SELECT MIN(MOVIEID) MOVIE_1
,MAX(MOVIEID) MOVIE_2
,STOREID
FROM STORE_MOVIES -- your table
GROUP BY STOREID
HAVING COUNT(*) = 2
) MOVIE_PAIRS
GROUP BY MOVIE_1, MOVIE_2
ORDER BY COUNT(*) DESC
FETCH FIRST ROW ONLY;
For HAVING COUNT(*) = 2 I assume MOVIEID together with STOREID is unique.

Although the request does not really make sense, that's not our design/implementation to concern. I have done with a 3-part self-join to your movies table. The first (m1) joins to second (m2) on the same store, but for second movie being greater than (m1) movie. This will prevent conditions of comparing (a,b) vs (b,a). Then, I am joining (m2) to (m3) by same store, but movie 3 greater than 2. This is an intentional LEFT-JOIN as not all stores will have more than 2. In this case, the value at (m3) will be NULL (non-existent). So, I am looking for where m3.storeID IS NULL. The JOIN between (m1) and (m2) requires the first and second to exist. Finally, tacking on the HAVING will show only those pairs that appear at multiple stores.
select
m1.movieID as Movie1,
m2.movieID as Movie2,
count(*) TimesPaired
from
Movies m1
JOIN movies m2
on m1.storeId = m2.storeId
AND m1.movieId < m2.movieId
LEFT JOIN movies m3
on m2.storeId = m3.storeId
AND m2.movieId < m3.movieId
where
m3.storeId IS NULL
group by
m1.movieID,
m2.movieID
having
count(*) > 1

Related

COUNT with multiple LEFT joins [duplicate]

This question already has answers here:
Two SQL LEFT JOINS produce incorrect result
(3 answers)
Closed 12 months ago.
I am having some troubles with a count function. The problem is given by a left join that I am not sure I am doing correctly.
Variables are:
Customer_name (buyer)
Product_code (what the customer buys)
Store (where the customer buys)
The datasets are:
Customer_df (list of customers and product codes of their purchases)
Store1_df (list of product codes per week, for Store 1)
Store2_df (list of product codes per day, for Store 2)
Final output desired:
I would like to have a table with:
col1: Customer_name;
col2: Count of items purchased in store 1;
col3: Count of items purchased in store 2;
Filters: date range
My query looks like this:
SELECT
DISTINCT
C_customer_name,
C.product_code,
COUNT(S1.product_code) AS s1_sales,
COUNT(S2.product_code) AS s2_sales,
FROM customer_df C
LEFT JOIN store1_df S1 USING(product_code)
LEFT JOIN store2_df S2 USING(product_code)
GROUP BY
customer_name, product_code
HAVING
S1_sales > 0
OR S2_sales > 0
The output I expect is something like this:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
4
8
James
100022
6
10
But instead, I get:
Customer_name
Product_code
Store1_weekly_sales
Store2_weekly_sales
Luigi
120012
290
60
James
100022
290
60
It works when instead of COUNT(product_code) I do COUNT(DSITINCT product_code) but I would like to avoid that because I would like to be able to aggregate on different timespans (e.g. if I do count distinct and take into account more than 1 week of data I will not get the right numbers)
My hypothesis are:
I am joining the tables in the wrong way
There is a problem when joining two datasets with different time aggregations
What am I doing wrong?

The reason as Philipxy indicated is common. You are getting a Cartesian result from your data thus bloating your numbers. To simplify, lets consider just a single customer purchasing one item from two stores. The first store has 3 purchases, the second store has 5 purchases. Your total count is 3 * 5. This is because for each entry in the first is also joined by the same customer id in the second. So 1st purchase is joined to second store 1-5, then second purchase joined to second store 1-5 and you can see the bloat. So, by having each store pre-query the aggregates per customer will have AT MOST, one record per customer per store (and per product as per your desired outcome).
select
c.customer_name,
AllCustProducts.Product_Code,
coalesce( PQStore1.SalesEntries, 0 ) Store1SalesEntries,
coalesce( PQStore2.SalesEntries, 0 ) Store2SalesEntries
from
customer_df c
-- now, we need all possible UNIQUE instances of
-- a given customer and product to prevent duplicates
-- for subsequent queries of sales per customer and store
JOIN
( select distinct customerid, product_code
from store1_df
union
select distinct customerid, product_code
from store2_df ) AllCustProducts
on c.customerid = AllCustProducts.customerid
-- NOW, we can join to a pre-query of sales at store 1
-- by customer id and product code. You may also want to
-- get sum( SalesDollars ) if available, just add respectively
-- to each sub-query below.
LEFT JOIN
( select
s1.customerid,
s1.product_code,
count(*) as SalesEntries
from
store1_df s1
group by
s1.customerid,
s1.product_code ) PQStore1
on AllCustProducts.customerid = PQStore1.customerid
AND AllCustProducts.product_code = PQStore1.product_code
-- now, same pre-aggregation to store 2
LEFT JOIN
( select
s2.customerid,
s2.product_code,
count(*) as SalesEntries
from
store2_df s2
group by
s2.customerid,
s2.product_code ) PQStore2
on AllCustProducts.customerid = PQStore2.customerid
AND AllCustProducts.product_code = PQStore2.product_code
No need for a group by or having since all entries in their respective pre-aggregates will result in a maximum of 1 record per unique combination. Now, as for your needs to filter by date ranges. I would just add a WHERE clause within each of the AllCustProducts, PQStore1, and PQStore2.

Filter with SQL Server by Group ID

I have two tables and I need to filter the data by filter id depends on the relation to to filter group id.
For example I have this two tables:
Table 1:
ItemID
FilterID
3
122
3
123
3
4
17
123
Table 2:
FilterID
FilterGroupID
122
5
123
5
4
1
If I search by filter id = 123 than all item id with this filter need to be returned.
If I search two or more different filter id that have different group id I need to get only the item ids that have all filter id and group id.
Desired output:
first input: 123 -> return item id =3 and item id = 17
second input: 123,4 -> return item id = 3 because filter id 123 belong to group id 5 and filter id 4 belong to group id 1 and item id 3 is the only one that has this two filters.
third input: 122,123 -> return item id =3 and item id = 17 because both filter id belong to same group.
I am getting a little lost with this query and I will be glad to get some help.
I’ll try to simplify it: Let’s say we have group filter of size and group filter of color. If I filter by size S or M than I need to get all items with this sizes. If I want to add color like blue than the answer will cut the result by: item with size S or M and Color blue. So filter from different group may cut some results

It seems that you want to get every ItemID which has at least one matching filter from each FilterGroupID within your filter input. So within each group you have or logic, and between groups you have and logic
If you store your input in a table variable or Table-Valued parameter, then you can just use normal relational division techniques.
This then becomes a question of Relational Division With Remainder, with multiple divisors.
There are many ways to slice this cake. Here is one option
Join the filter input to the groups, to get each filter's group ID
Use a combination of DENSE_RANK and MAX to get the total distinct groups (you can't use COUNT(DISTINCT in a window function so we need to hack it)
You can change this step to use a subquery instead of window functions. It may be faster or slower
Join the main table, and filter out any ItemIDs which do not have their total distinct groups the same as the main total
SELECT
t1.ItemID
FROM (
SELECT *,
TotalGroups = MAX(dr) OVER ()
FROM (
SELECT
fi.FilterID,
t2.FilterGroupID,
dr = DENSE_RANK() OVER (ORDER BY t2.FilterGroupID)
FROM #Filters fi
JOIN Table2 t2 ON t2.FilterID = fI.FilterID
) fi
) fi
JOIN Table1 t1 ON t1.FilterID = fi.FilterID
GROUP BY
t1.ItemID
HAVING COUNT(DISTINCT FilterGroupID) = MAX(fi.TotalGroups);
db<>fiddle

Grouping and Summing Totals in a Joined Table

I have two tables Medication and Inventory. I'm trying to SELECT all the below details from both tables but there are multiple listings of medication ids with different BRANCH_NO also in the INVENTORY table (the primary key in INVENTORY is actually BRANCH_NO, MEDICATION_ID composite key)
I need to total up the various medication_IDs and also join the tables in one SELECT command and display all the infomation for each med (there are 5) with a total sum of each med at the end of each row. But im getting all muddled trying Group by and Sum and at one point partition. Help please I'm new to this.
Below is the latest non working version - but it doesn't display
Medication Name
Medication Desc
Manufacturer
Pack Size
like i chanced it might.
SELECT I.MEDICATION_ID,
SUM(I.STOCK_LEVEL)
FROM INVENTORY I
INNER JOIN (SELECT MEDICATION_NAME, SUBSTR(MEDICATION_DESC,1,20) "Medication Description",
MANUFACTURER, PACK_SIZE FROM MEDICATION) M ON MEDICATION_ID=I.MEDICATION_ID
GROUP BY I.MEDICATION_ID;
For the data imagine I want this sort of output:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 10
2 Bravo 20
3 Charlie 20
1 Alpha 30
4 Delta 10
5 Echo 20
5 Echo 40
2 Bravo 10
grouping and totalling into this:
MEDICATION_ID MEDICATION_NAME STOCK_LEVEL OtherColumns.....
1 Alpha 40
2 Bravo 30
3 Charlie 20
4 Delta 10
5 Echo 60
I can get this when its just one table but when Im trying to join tables and also SELECT things its just not working.
Thanks in advance guys. I appreciate it may be a simple solution, but it will be a big help.

You need to write explicitly all non-aggregated columns into both SELECT and GROUP BY lists ( Btw, no need to use a nested query, and if it's the case MEDICATION_ID column is missing in it ) :
SELECT I.MEDICATION_ID, M.MEDICATION_NAME, SUM(I.STOCK_LEVEL) AS STOCK_LEVEL,
SUBSTR(M.MEDICATION_DESC,1,20) "Medication Description", M.MANUFACTURER, M.PACK_SIZE
FROM INVENTORY I
JOIN MEDICATION M ON M.MEDICATION_ID = I.MEDICATION_ID
GROUP BY I.MEDICATION_ID, M.MEDICATION_NAME, SUBSTR(M.MEDICATION_DESC,1,20),
M.MANUFACTURER, M.PACK_SIZE;
This way, you'll be able to return all the listed columns.

Find 'Most Similar' Items in Table by Foreign Key

I have a child table with a number of charact/value pairs for a given 'material' (MaterialID). Any material can have a number of charact values and may have several of the same name (see id's 2,3).
The table has a large number of records (8+ million). What I'm trying to do is find the materials that are the most similar to a supplied material. That is, when I supply a MaterialID, I would like an ordered list of the most similar other materials (those with the most matching charact/value pairs).
I've done some research but, I may be missing some key terms or just not conceptualizing the problem correctly.
Any hints as to how to go about this would be very much appreciated.
ID MaterialID Charact Value
1 1 ROT_DIR CCW
2 1 SPECIAL_FEATURE CATALOG_CP
3 1 SPECIAL_FEATURE CHROME
4 1 SCHEDULE 80
5 2 BEARING_TYPE SB
6 2 SCHEDULE 80
7 3 ROT_DIR CCW
8 3 SPECIAL_FEATURE CATALOG_HSB
9 3 BEARING_TYPE SP
10 4 NDE_STYLE W_FAN
11 4 BEARING_TYPE SB
12 4 ROT_DIR CW*

You can do this with a self join:
select t.materialid, count(*) as nummatches
from t join
t tmat
on t.Charact = tmat.Charact and t.value = tmat.value
where tmat.materialid = #MaterialId
group by t.materialid
order by nummatches desc;
Notes:
You might want to remove the specified material, by adding where t.MaterialId <> tmat.MaterialId to the where clause.
If you want all materials, then make the join a left join and move the where condition to the on clause.
If you want only one material with the most matches, use select top 1.
If you want all materials with the most matches when there are ties, use `select top (1) with ties.

Get MAX() on repeating IDs

This is how my query results look like currently. How can I get the MAX() value for each unique id ?
IE,
for 5267139 is 8.
for 5267145 is 4
5267136 5
5267137 8
5267137 2
5267139 8
5267139 5
5267139 3
5267141 4
5267141 3
5267145 4
5267145 3
5267146 1
5267147 2
5267152 3
5267153 3
5267155 8
SELECT DISTINCT st.ScoreID, st.ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st
ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
ORDER BY st.ScoreID, st.ScoreTrackingTypeID DESC

GROUP BY will partition your table into separate blocks based on the column(s) you specify. You can then apply an aggregate function (MAX in this case) against each of the blocks -- this behavior applies by default with the below syntax:
SELECT First_column, MAX(Second_column) AS Max_second_column
FROM Table
GROUP BY First_column
EDIT: Based on the query above, it looks like you don't really need the ScoreTrackingType table at all, but leaving it in place, you could use:
SELECT st.ScoreID, MAX(st.ScoreTrackingTypeID) AS ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
GROUP BY st.ScoreID
ORDER BY st.ScoreID
The GROUP BY will obviate the need for DISTINCT, MAX will give you the value you are looking for, and the ORDER BY will still apply, but since there will only be a single ScoreTrackingTypeID value for each ScoreID you can pull it out of the ordering.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas