Returning customized results with SQL Server - sql

This is a little complicated so I'm going to break it down. I'm trying to get results but couldn't figure out what the query is gonna look like. The premise is this, a users has purchased a specific set of items, their gear. When they go to my site, they see kits or setups that users have submitted. I want to only show them setups that have only the gear that they've purchased. They don't need to see setups with gear that they do not have. I hope this makes sense. Here's what my tables look like:
[Gear]
gearID is the unique key
has a list of all the gear (mics, heads, effects) with unique id's for each
[Kits]
kitID is the unique key
has a list of all the user submitted kits
[KitGearLink]
This table connects the [Kits] and [Gear] with each other. So this is the table that lists out what gear the user submitted kit has.
[Users]
userID is the unique key
list of all the users
[UserGear]
Links the [users] and [gear] table together. This is what stores what the user's personal gear consists of.
So how do I pull up records for each user that will show them all the kits that will work with the gear they have. If a kit has something the user doesn't own, then it won't show them. Any ideas?
Thanks guys!

Perhaps something like this:
SELECT *
FROM Kit k
WHERE k.KitID NOT IN (
SELECT DISTINCT kg.KitID
FROM KitGearLink kg
LEFT JOIN (
SELECT ug0.GearID
FROM UserGear ug0
WHERE ug0.UserID = #userParam
) ug ON kg.GearID = ug.GearID
WHERE ug.GearID = null
)
For a given user id the sub query return kits which fail the gear id join betwen user and kit (kits with a bit of gear the user doesn't have). This is used to filter the list of kits in the system.
Edit: Introduced second sub-query to filter user gear by user id parameter before the left join occurs, see comments.

Ok, you want a list of kits where all the gear in the kit has at some point been associated with a specific user. Best bet is to go through an SP and use a memory table. This assumes you are asking for kits for a specific user. you could get a complete list for all users, but more costly and complex....
DECLARE #UserId as int
SELECT #Userid = YOURUSER
Declare #tblUserKits TABLE (
UserId int null,
KitId int null,
GearId int null
)
-- GET ALL KITS THAT A USER HAS AT LEAST ONE PART OF
Insert Into #tbluserkits
select distinct u.userid, k.kitid, k.gearid
from usergear u
inner join Kitgearlink k on u.gearid = k.gearid
where u.Userid = #UserId
-- DELETE ALL KITS THAT THE USER IS MISSING A PART FOR
Delete from #tblUserKits WHERE KitId in
(select distinct t.kitid
from #tbluserkits t
inner join kitgearlink k on t.kitid = k.kitid
left outer join usergear u on k.gearid = u.gearid
WHERE u.gearid is null)
-- Finally return a list of kits
Select distinct KitId from #tblUserKits

Related

How to design for higher scalability, in SQL Server databases that need set operations?

Imagine a movie application that reommends the next movie to users based on this very simple algorithm:
Movie should be new to user
User has not marked the movie as "not interested"
This is a simple design of SQL Server's database:
Movies:
Id bigint
Name nvarchar(100)
SeenMovies:
Id bigint
UserId bigint
MovieId bigint
NotInterestedFlags:
Id bigint
UserId bigint
MovieId bigint
To get the next movie we run this query:
select top 1 *
from Movies
where Id not in
(
select MovieId
from SeenMovies
where UserId = 89283
)
and Id not in
(
select MovieId
from NotInterestedFlags
where UserId = 89283
)
This design is getting slower and slower by more usage of the application and more data.
So with an imaginary database with 100K movies and over 10 million customers, how to change this design to make it scale horizontally?
The following is something like the code I would recommend.
I assume SeenMovies and NotInterestedFlags are clustered, or at the very least indexed, on UserId. And that Movies is clustered on MovieId. If not, adding such indexes will be the first place to start.
I certainly don't see any reason why there should be poor performance per individual query with the sort of volumes you're talking about, because once we have confined the query to a specific user, both SeenMovies and NotInterestedFlags should only have at most a few thousand rows each for that user.
SELECT TOP 1
Movies.*
FROM
Users
CROSS JOIN
Movies
WHERE
NOT EXISTS
(
SELECT NULL
FROM SeenMovies
WHERE
SeenMovies.UserId = Users.Id
AND
SeenMovies.MovieId = Movies.Id
)
AND
NOT EXISTS
(
SELECT NULL
FROM NotInterestedFlags
WHERE
NotInterestedFlags.UserId = Users.Id
AND
NotInterestedFlags.MovieId = Movies.Id
)
AND
Users.Id = 89283
If this still performs poorly even with appropriate indexes, I could only imagine that first maybe first UNIONing the MovieId entries in the SeenMovies and NotInterestedFlags for that UserId, and then EXCEPTing these against Movies, might provide better performance.
On the other hand, if the problem is that the overall performance of the system is degrading under the load of many users, you might have to look at pre-preparing a list for every user of un-seen and un-blacklisted movies, from which you query the TOP 1.
And then, when a user watches a movie or blacklists it (or a new movie is added), this new table is modified at the same time as the separate SeenMovies and NotInterestedFlags tables.
Again, if that doesn't help performance enough, then you'd have to look at implementing a daily batch job, maybe, that pre-prepares a list of say 10 unseen and non-blacklisted movies per user, and this table is then queried and offered to the user one-at-a-time.
I think frankly though, if there is a prospect of you having 10 million users, you could probably afford an expert to write the code or evaluate the existing system.
Create a cache for each user with "shortlisted" movies. An indexed view might work well for that. The point is not to run the full query each time a user wants to see the list but occasionally update the shortlist. Those individual lists together with User flags tables can be scaled horizontally by some user attribute. User Location might be a good choice here for cloud migration in the future.

SQL with nested joins and sums

Hoping someone can give me a little bit of help with a query that I'm stuck on.
Using MS-Sql server 2012
This is part of a larger query but for the purposes of my questions I'm only concerned with 4 tables: Account, user, product, productstats
And a simplified layout of each table is as follows:
account: id, parentaccountID, name
user: id, accountID, email
product: id, accountID
productstats: id, productID, views
So user links to the account table and the account table can link to itself with the parentaccountID field. Product table links to the user table and the productstats table links to the product table.
The productstats contains statistics on each product. In my example above we have how many times someone has viewed a product.
I want to get the sum of all product views under each parent account, including it's child accounts. However, when people search for an account they can search either via account.name or user.email
so if they search by user.email, i want to include all products from that users account, and any child or parent account(s) that it's part of.
One note - the parent/child account structure is only 1 level deep. Meaning an account is either the parent or the child, it's never both. parent accounts have a null value for ParentAccountID.
SELECT a2.ParentAccountID, a.id, a.Name, SUM(ps.PageViews)
FROM account a
LEFT JOIN account a2 ON a.id = a2.ParentAccountID
LEFT JOIN product p ON a.id = p.AccountID OR p.AccountID = a2.ID
LEFT JOIN ProductStatistic ps ON p.id = ps.ProductID
WHERE a.ame LIKE 'test'
GROUP BY a.id, a2.ParentAccountID, a.DealerName
That's a simplified version of the query - I haven't even included the user table yet since i haven't gotten it working this far yet.
The values I get back on that query are:
ParentAccountID =4, ID =4, name=test, sum=1617
When I run the following query
SELECT SUM(pageviews) FROM ProductStatistic WHERE ProductID IN (
SELECT id FROM product WHERE AccountID IN (4, 32, 112, 3757, 3794))
I get 453 back as the result - those account IDs are the parent account ID and it's 4 child accounts. I have no idea how it's getting 1617 since that's not even a multiple of 453
When you break up your query into some smaller parts, it will become a lot clearer.
First obtain the accounts involved
Then determine the relevant products
Only then join in the stats table to obtain the view counts
Have a look at this this sql fiddle.
[EDIT]
Added a new fiddle that adresses your comments. Not so simple no more, but I think it does what you need.

Issues with subqueries for stored procedure

The query I am trying to perform is
With getusers As
(Select userID from userprofspecinst_v where institutionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID)
and professionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID))
select username from user where userID IN (select userID from getusers)
Here's what I'm trying to do. Given a userID and a view which contains the userID and the ID of their institution and profession, I want to get the list of other userID's who also have the same institutionID and and professionID. Then with that list of userIDs I want to get the usernames that correspond to each userID from another table (user). The error I am getting when I try to create the procedure is, "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.". Am I taking the correct approach to how I should build this query?
The following query should do what you want to do:
SELECT u.username
FROM user AS u
INNER JOIN userprofspecinst_v AS up ON u.userID = up.userID
INNER JOIN (SELECT institutionID, professionID FROM userprofspecinst_v
WHERE userID = #userID) AS ProInsts
ON (up.institutionID = ProInsts.institutionID
AND up.professionID = ProInsts.professionID)
Effectively the crucial part is the last INNER JOIN statement - this creates a table constituting the insitutionsids and professsionids the user id belongs to. We then get all matching items in the view with the same institution id and profession id (the ON condition) and then link these back to the user table on the corresponding userids (the first JOIN).
You can either run this for each user id you are interested in, or JOIN onto the result of a query (your getusers) (it depends on what database engine you are running).
If you aren't familiar with JOIN's, Jeff Atwood's introductory post is a good starting place.
The JOIN statement effectively allows you to explot the logical links between your tables - the userId, institutionID and professionID are all examples of candidates for foreign keys - so, rather than having to constantly subquery each table and piece the results together, you can link all the tables together and filter down to the rows you want. It's usually a cleaner, more maintainable approach (although that is opinion).

SQL Stored procedure

I have 3 tables:
tbl_Image from which a list of all images will be obtained
A user table from which User ID will be obtained
and an association table of Image and Member called tbl_MemberAssociation.
My work flow is that a user can upload image and this will be stored in to image table. Then all users can view this image and select one of three choice provided along with the image. If user selects an option it will be added to Association table. No user can watch same image more than once. So multiple entries will not be there.
Now I want to find the % of match by getting the list of members choose the same option and different option corresponding to all common images for which they have provided their option.
I.e. say if 3 users say A, B and C view an image of tajmahal. If A and B opted beautiful as choice and C as "Not Good ". For another image say Indian Flag A B and C opted same as Salute. Then for User A: B have 100 % match (since they selected same option both times). For A : C have 50% match one among 2 same.
So this is my scenario, in which I have to find all matched corresponding to currently logged in User.
Please help me.... I am totally disturbed with this procedure.
I have made some assumptions about the actual structure of your tables, but if I understand what you are looking for then I think this query will get the results you are wanting. You may have to make a few modifications to match your table structures.
SELECT
matches.UserName,
CAST(matches.SameRatings AS FLOAT) / CAST(ratings.UserRatingCount AS FLOAT) AS MatchPercent
FROM
tbl_User
CROSS APPLY
(
SELECT
COUNT(*) UserRatingCount
FROM
tbl_MemberAssociation
WHERE
UserId = tbl_User.UserId
) ratings
CROSS APPLY
(
SELECT
u1.UserId,
u1.UserName,
COUNT(*) AS SameRatings
FROM
tbl_MemberAssociation ma
INNER JOIN
tbl_MemberAssociation ma1
ON
ma.ImageId = ma1.ImageId
AND ma.Rating = ma1.Rating
AND ma.UserId <> ma1.UserId
INNER JOIN
tbl_User u1
ON
ma1.userId = u1.UserId
WHERE
ma.UserId = tbl_User.UserId
GROUP BY
u1.UserId,
u1.UserName
) matches
WHERE
tbl_User.UserId = #UserId
ORDER BY
MatchPercent DESC
#UserId could be passed as an input to the stored procedure.
The 1st CROSS APPLY "ratings" is getting a count of for the total number of ratings for the logged in user.
The 2nd CROSS APPLY "matches" is getting a count of the number of like ratings for the other users in the database.
The result set uses the counts calculated by the two CROSS APPLY queries to compute the match percentage between the logged in user and the other users who have rated the same images as the logged in user.

How to match/compare values in two resultsets in SQL Server 2008?

I'm working on a employee booking application. I've got two different entities Projects and Users that are both assigned a variable number of Skills.
I've got a Skills table with the various skills (columns: id, name)
I register the user skills in a table called UserSkills (with two foreign key columns: fk_user and fk_skill)
I register the project skills in another table called ProjectSkills (with two foreign key columns: fk_project and fk_skill).
A project can require maybe 6 different skills and users when registering sets up their Skills aswell.
The tricky part is when I have to find users for my Projects based on their skills. I'm only interested in users that meet that have ALL the skills required by the project. Users are ofcause allowed to have more skilled then required.
The following code will not work, (and even if it did, would not be very performance friendly), but it illustrates my idea:
SELECT * FROM Users u WHERE
( SELECT us.fk_skill FROM UserSkills us WHERE us.fk_user = u.id )
>=
( SELECT ps.fk_skill FROM ProjectSkills ps WHERE ps.fk_project = [some_id] )
I'm thinking about making my own function that takes two TABLE-variables, and then working out the comparisson in that (kind of a modified IN-function), but I'd rather find a solution that's more performance friendly.
I'm developing on SQL Server 2008.
I really appreciate any ideas or suggestions on this. Thanks!
SELECT *
FROM Users u
WHERE NOT EXISTS
(
SELECT NULL
FROM ProjectSkill ps
WHERE ps.pk_project = #someid
AND NOT EXISTS
(
SELECT NULL
FROM UserSkills us
WHERE us.fk_user = u.id
AND us.fk_skill = ps.fk_skill
)
)
-- Assumes existance of variable #ProjectId, specifying
-- which project to analyze
SELECT us.UserId
from UserSkills us
inner join ProjectSkills ps
on ps.SkillId = us.SkillId
and ps.ProjectId = #ProjectId
group by us.UserId
having count(*) = (select count(*)
from ProjectSkills
where ProjectId = #ProjectId)
You'd want to test an debug this, as I have no test data to run it through. Ditto for indexing to optimize it.
(Now to post, and see if someone's come up with a better way--there should be something more subtle and effective than this.)