find similarity of merchant with customers - sql

I have a table in sql server 2012 that have this columns:
user_id , merchant_id
I want to find top 5 similar partners for each merchant.
The similarity is simply defined by normalized number of overlapping costumers;
i can not find any solution for this problem.

The following query counts the number of common customers for two merchants:
select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers
from table t join
table t2
on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
group by t.merchantid, t2.merchantid;
The following gets the five based on the raw couns:
select *
from (select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers,
row_number() over (partition by t.merchantid order by count(*) desc) as seqnum
from table t join
table t2
on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
group by t.merchantid, t2.merchantid
) mm
where seqnum <= 5;
I do not know what you mean by "normalized". The term "normalized" in statistics would often not change the ordering of values (but would result in the sum of the squares being 1), so this may do what you want.

Related

JOINING the Same Tables in SQL

I have a table with 4 columns such as Customer ID, Person ID, Year, Unit Cost.
I want to join the same table with all the years from table and keep the all years for all customer ID and PErson ID's. IF there is no data in the table for respective customer ID and PErson ID then I want the Cost as NULL.
Expected Data:
If I understand correctly, you can use a cross join to generate the rows and a left join to bring in the existing data:
select pc.*, y.*, t.cost
from (select distinct customer, personid from customer_table) pc cross join
(select distinct year from customer_table) y left join
customer_table t
on t.customer = pc.customer and t.personid = pc.personid and t.year = y.year;

Calculate variable of max amount in a group

I have difficulties in doing the following exercise. I would need to find how frequent is that an id is not the max_id in the group with the most amount. This should be done considering groups that contain at least two different people.
Data comes from two different tables: max_id comes from table1 (I will call it a)as well as user and amount; id comes from table2 (b) as well as group.
From the text above, the conditions should be
(1) a.id<>b.max_id /* is not */
(2) people in group >=2
(3) a.id<> id of max amount
The dataset looks like
(a)
max_id user amount
(b)
group email
From a previous exercise, I had to compute distinct people as follows:
sel a.distinct users
a.max_id
b.id
from table1 as a
inner join table2 as b
on b.id=a.max_id
where
b.max_id is not null
and b.time is null
No information from amount was required in the exercise above. This is the main difference between the two exercises, but the structure and fields are quite similar.
Now, I would need to edit the code above in order to find how frequent is that an id is not the max_id in the group with the most amount. This makes sense only if groups have at least two different persons/users.
I think I will need to join tables to get the id of max amount in a group and count people in a group, but I do not know how to do it.
Any help would be greatly appreciated. Thank you.
Data sample
max_id user amount id group email
12 1 -2000 12 house email1
312 1 0 54 work email1
11 32 -213 11 house email32
41 13 -43 78 work email13
312 53 -650 34 work email53
1 67 -532 43 defense email67
64 76 -9650 98 work email76
For my understanding, what the exercise asks and based on the code above, I should find values for id<>max_id and having more than 2 users in a group (i.e. house, work, defence).
Then, what I would need to select is id <> id of max amount.
I hope this it can be a bit more clear.
assuming yoy have a query as
select t.User, m.Email, m.Model, m.Amount
from my_table m
inner join (
select user, max(amount) max_amount
from my_table
group by user
) t on t.user = m.user
and t.max_amount = m.amount
you can obatin the max di for each amoun using
select max(id), Amount
from (
select m.id, t.User, m.Email, m.Model, m.Amount
from my_table m
inner join (
select user, max(amount) max_amount
from my_table
group by user
) t on t.user = m.user
and t.max_amount = m.amount
) k
and you should obtain the valud of id that are not equal to max id as
select mm.id, t.User, mm.Email, mm.Model, mm.Amount
from my_table mm
inner join (
select user, max(amount) max_amount
from my_table
group by user
) t on t.user = m.user
and t.max_amount = m.amount
inner join (
select max(k.id) max_id, k.Amount
from (
select m.id, t.User, m.Email, m.Model, m.Amount
from my_table m
inner join (
select user, max(amount) max_amount
from my_table
group by user
) t on t.user = m.user
and t.max_amount = m.amount
) k
) kk ON kk.max_id <> mm.id
and based on your last sample the query should be
select m.*
from my_table
inner join (
select my_groups, count(distinct user)
from my_table
group by my_groups
having count(distinct user) >2
) t on t.my_group = m.my_group
and m.max_id <> m.id
PS group is a reserved word so i use my_groups for the column name

Return only the highest-valued row

I'm trying to find a solution to only returns the highest-valued row from a SQL query
I have a query that joins two tables together and then checks how many times the id matches within the different tables (within 'athelete' the id param is unique).
SELECT t.athlete_id, count(a.id) as 'Number of activities' FROM training_session t
INNER JOIN athlete a ON t.athlete_id = a.id
WHERE t.athlete_id = a.id
GROUP BY a.id
The following table is returned
athlete_id Number of activities
1 4
2 1
3 1
4 1
5 1
6 1
The issued problem is that I only want to return the row with the highest number of activities. According to the table above this should be
athlete_id = 1 since it has the greatest amount of activities.
I would appreciate some pointers on how I could improve my query to match these queries.
Use ORDER BY and LIMIT:
SELECT t.athlete_id, count(*) as `Number of activities`
FROM training_session t INNER JOIN
athlete a
ON t.athlete_id = a.id
GROUP BY t.athlete_id
ORDER BY COUNT(*) DESC
LIMIT 1;
I don't think a JOIN is needed for this query:
SELECT t.athlete_id, COUNT(*) as `Number of activities`
FROM training_session t
GROUP BY t.athlete_id
ORDER BY COUNT(*) DESC
LIMIT 1;
And if you want all rows in the event of ties, then this requires a bit more work. I would recommend ranking functions:
SELECT *
FROM (SELECT t.athlete_id, COUNT(*) as `Number of activities`,
RANK() OVER (ORDER BY COUNT(*) DESC) as seqnum
FROM training_session t
GROUP BY t.athlete_id
) t
WHERE seqnum = 1;

Join two tables but only get most recent associated record

I am having a hard time constructing an sql query that gets all the associated data with respect to another (associated) table and loops over into that set of data on which are considered as latest (or most recent).
The image below describes my two tables (Inventory and Sales), the Inventory table contains all the item and the Sales table contains all the transaction records. The Inventory.Id is related to Sales.Inventory_Id. And the Wanted result is the output that I am trying to work on to.
My objective is to associate all the sales record with respect to inventory but only get the most recent transaction for each item.
Using a plain join (left, right or inner) doesn't produce the result that I am looking into for I don't know how to add another category in which you can filter the most recent data to join to. Is this doable or should I change my table schema?
Thanks.
You can use APPLY
Select Item,Sales.Price
From Inventory I
Cross Apply(Select top 1 Price
From Sales S
Where I.id = S.Inventory_Id
Order By Date Desc) as Sales
WITH Sales_Latest AS (
SELECT *,
MAX(Date) OVER(PARTITION BY Inventory_Id) Latest_Date
FROM Sales
)
SELECT i.Item, s.Price
FROM Inventory i
INNER JOIN Sales_Latest s ON (i.Id = s.Inventory_Id)
WHERE s.Date = s.Latest_Date
Think carefully about what results you expect if there are two prices in Sales for the same date.
I would just use a correlated subquery:
select Item, Price
from Inventory i
inner join Sales s
on i.id = s.Inventory_Id
and s.Date = (select max(Date) from Sales where Inventory_Id = i.id)
select * from
(
select i.name,
row_number() over (partition by i.id order by s.date desc) as rownum,
s.price,
s.date
from inventory i
left join sales s on i.id = s.inventory_id
) tmp
where rownum = 1
SQLFiddle demo

How to get a master table record by count of detail table records without top(1)

I have a master table (Team) and a detail table (TeamMember). TeamMember has a FK to Team.
I need to get the Team record for the team that has the most team members. I at first had
SELECT team.name
FROM team
INNER JOIN (SELECT TOP 1 COUNT(*) AS membercount,
teamID
FROM teammember
GROUP BY teamID
ORDER BY Count(*) DESC) AS team_with_most_members
ON team.id = team_with_most_members.teamID
I was informed that I cannot use TOP(1) in my queries. Anyone have an idea how I can do it without?
Thanks!
Team
ID, Name
TeamMember
ID, TeamID, UserID
This one is crude but it works:
SELECT t.name
FROM team AS t
JOIN teammember AS tm ON tm.teamID = t.ID
GROUP BY t.Name
HAVING COUNT(tm.id) = (SELECT MAX(members) FROM (SELECT COUNT(id) members FROM teammember GROUP BY teamid) AS sub)
This makes me feel dirty. It will return a single team name even if there is a tie - if you want all rows in the event of a tie, use DENSE_RANK() instead of ROW_NUMBER().
SELECT t.ID, t.Name FROM
(
SELECT
TeamID, rn = ROW_NUMBER() OVER (ORDER BY c DESC)
FROM
(
SELECT TeamID, c = COUNT(*)
FROM dbo.TeamMember GROUP BY TeamID
) AS x
) AS y
INNER JOIN dbo.Team AS t
ON y.TeamID = t.ID
WHERE y.rn = 1; -- **EDIT** forgot the most important part!
I'd really stand up and challenge the "no TOP 1" rule. Ask the person who told you it was for performance reasons to compare the performance of your existing query with any of the kludges we've come up with.
TOP 1 is cleanest way. Here's a really convoluted way that might work:
SELECT ID FROM (
SELECT ID, Tally, MAX(Tally) over (partition by ID) AS MaxTally
FROM (SELECT t1.ID,
COUNT(t2.ID) AS Tally
FROM #Team t1
JOIN #TeamMember t2
ON t2.TeamID = t1.ID
GROUP BY t1.ID) x
) y WHERE Tally = MaxTally