SQL database design pattern for user favorites? - sql

Asked this on the database site but it seems to be really slow moving. So I'm new to SQL and databases in general, the only thing I have worked on with an SQL database used one to many relationships. I want to know the easiest way to go about implementing a "favorites" mechanism for users in my DB-similar to what loads of sites like Youtube, etc, offer. Users are of course unique, so one user can have many favorites, but one item can also be favorited by many users. Is this considered a many to many relationship? What is the typical design pattern for doing this? Many to many relationships look like a headache(I'm using SQLAlchemy so my tables are interacted with like objects) but this seems to be a fairly common feature on sites so I was wondering what is the most straightforward and easy way to go about it. Thanks

Yes, this is a classic many-to-many relationship. Usually, the way to deal with it is to create a link table, so in say, T-SQL you'd have...
create table user
(
user_id int identity primary key,
-- other user columns
)
create table item
(
item_id int identity primary key,
-- other item columns
)
create table userfavoriteitem
(
user_id int foreign key references user(user_id),
item_id int foreign key references item(item_id),
-- other information about favoriting you want to capture
)
To see who favorited what, all you need to do is run a query on the userfavoriteitem table which would now be a data mine of all sorts of useful stats about what items are popular and who liked them.
select ufi.item_id,
from userfavoriteitem ufi
where ufi.user_id = [id]
Or you can even get the most popular items on your site using the query below, though if you have a lot of users this will get slow and the results should be saved in a special table updated on by a schedules job on the backend every so often...
select top 10 ufi.item_id, count(ufi.item_id),
from userfavoriteitem ufi
where ufi.item_id = [id]
GROUP BY ufi.item_id

I've never seen any explicitly-for-database design patterns (except a couple of trivial misuses of the phrase 'design pattern' when it became fashionable some years ago).
M:M relationships are OK: use a link table (aka association table etc etc). Your example of a User and Favourite sounds like M:M indeed.
create table LinkTable
(
Id int IDENTITY(1, 1), -- PK of this table
IdOfTable1 int, -- PK of table 1
IdOfTable2 int -- PK of table 2
)
...and create a UNIQUE index on (IdOfTable1, IdOfTable2). Or do away with the Id column and make the PF on (IdOfTable1, IdOfTable2) instead.

Related

How do you model a typical friend-to-friend relationship in a relational database?

What is the appropriate way of modeling a friend-to-friend relationship in a relational database? I'll detail out what I mean by a friend-to-friend relationship below.
friend-to-friend relationship
Assuming we have a table with persons. These persons can be friends with each other. Being a friend means that a person and another person are connected through a relationship that only includes those two persons.
My best, not satisfactory, idea so far
A table containing persons
A table containing the relationships
A table containing the many-to-many-relationship between persons and
relationships
As in the image below.
proposed data model
This seems fine from a data-modeling perspective, but when writing a SQL query that would check whether two given person_ids are friends or not, the query becomes rather complex, which makes me think this is not the appropriate way to do it.
i would recommend a table with two person_id-columns and maybe a relationshiptype-id for different relationships
create table individuals (
individual_id int primary key,
full_name varchar(255)
);
create table relationships (
from_individual_id int references individuals(individual_id),
type varchar(255) check (type in ('FRIEND OF')), -- should be a lookup table
to_individual_id int references individuals(individual_id),
primary key (from_individual_id, to_individual_id, type),
check (from_individual_id <> to_individual_id)
);
I can think that you are my friend, but you don't think I am your friend...
But generally "from" thinks "to" is a friend and vice-versa, so you'll to add the reverse relationship as another row
I would go for [Friend - Friend - Relationship] link table.
[Friend]---[Link_table]---[Relationship]
|
[Friend]
:)
EDIT:
Where for example [Table]: columns...
[Friend]: PK_Friend, Name, LastName
[Link]: PK_Link, PK_Friend1, PK_Friend2, PK_RelationShip
[RelationShip]: PK_RelationShip, RelationShipDescription

In SQL Server I need to change data structure of relationships (FK)

Ok I wasn't entirely sure what to title this question, so here's the situation.
I'm big on data integrity... Meaning as many constraints and rules that I can use I want to use in SQL Server and not rely on the application.
So I have a website that has a business directory, and those businesses can create a post.
So I have two tables like this:
tbl_Business ( BusinessID, Title, etc. )
tbl_Business_Post ( PostID, BusinessID, PostTitle, etc. )
There's a FK relationship for the column BusinessID between the two tables. A post cannot exist in the tbl_Business_Post table without the BusinessID existing in the tbl_Business table.
So pretty standard...
I've recently added classifieds to the site. So now I have two more tables:
tbl_Classified ( ClassifiedID, SellerID, ClassifiedTitle, etc. )
tbl_Classified_Seller ( SellerID, SellerName, etc. )
What I'm wanting to do is take advantage of my tbl_Business_Post table to include classifieds in that as well. Think of its usage like a feed... So the site will show recent posts from businesses and classifieds all in one feed.
Here's where I need guidance.
I was tempted to remove the FK relationship on the tbl_Business_Posts...
I thought about creating another separate Posts table that holds the classifieds posts.
Is there a way to make a conditional FK relationship based on a column? For example, if it's a business posting the BusinessID must exist in the Business table, or if its a classifieds post, the SellerID must exist in the Seller table?
Or should I create a separate table to hold the classifieds posts and UNION both the tables on the query?
You might question why I have a "Posts" table and that's hard to explain... but I do need it for the way the site is organized and how the feed works.
It's just that the posts table is perfect and I wanted to combine all posts and organize them by type (Ie: 'business', 'classified', 'etc.') as there might be more later.
So it comes down to, what's the best way to organize this to sustain data integrity from SSMS?
Thank you for guidance.
======== EDIT =========
Full explanation of tbl_Business_Post
PostID PK
Post_Type int <-- 1-21 is business types, 22 for classified type
BusinessID INT <-- This is the FK currently for the tbl_Business
SiblingID INT <-- This is the ID of the related item they're posting on. So for example, if they post a story about one of their products, this is the ProductID, if it's a service, this is the ServiceID.
Post_Title <-- Depending on the post, this could be a Product title, a service title, etc.
So if I changed the structure so it's as follows:
PostID PK
Post_Type int
BusinessID INT <-- this is populated on insert if it's a business.
SellerID INT <-- This is populated on insert if it's a classified seller
SiblingID INT <-- This is either the classifiedID or ProductID, SeviceID, etc. Depending on post type.
So leaning toward Peter's 1st solution/example... interested in the proper way to create check constraints or triggers on this so that if the type is 1-21, it makes sure BusinessID exists in the Business table, or if it's type 22, make sure the SellerID exists in the seller table.
Even going further with this:
If Post_Type = 22, I should make sure that not only is the Seller in the seller table, but the SiblingID is also the ClassifiedID in the Classified table.
1) There's no way to do this kind of conditional FK you're thinking of. What you need here is basically a FK from tbl_Business_Post which points logically to one of two tables, depending on the value in another column of tbl_Business_Post. This situation is what people encounter quite often. But in a relational DB this is not a very native idea.
So OK, this cannot be enforced with a FK. Instead, you can probably enforce this with a trigger or check constraint on tbl_Business_Post.
2) Alternatively, you can do the below.
Create some table tbl_Basic_Post, put there all columns which pertain to the post itself (e.g. PostTitle) and not to the parent entity which this post record belongs/points to (Business or Classified). Then create two other tables which point via a FK to the tbl_Basic_Post table like e.g.
tbl_Business_Post.Basic_Post_ID (FK)
tbl_Classified_Post.Basic_Post_ID (FK)
Put in these two tables the columns which are Business_Post/Classified_Post-specific
(you see, this is basically inheritable in relational DB terms).
Also, make each of these two tables have FKs to their respective parent tables
tbl_Business and tbl_Classified too. Now these FKs become unconditional (in your sense).
To get business posts you join tbl_Basic_Post and tbl_Business_Post.
To get classified posts you join tbl_Basic_Post and tbl_Classified_Post.
Both approaches have their pros and cons.
Approach 1) is simple, does not lead to the creation of too many tables; but it's not trivial to enforce the data integrity.
Approach 2) does not require anything special to enforce data integrity but leads to the creation of more tables.

Create foreign key with non unique column in SQL Server

I am dealing with a table that contains both cars and owners (table CO). I am creating another table to contain attributes for an owner (table OwnerAttributes), that a user can assign to through a GUI. My problem lies in the fact that owners are not unique and since I am using SQL Server I cannot create a foreign key on them. There is an id in the table, but it identifies the car and owner as a whole.
The idea I had to get around this problem is to create a new table (table Owners) that contains distinct owners, and then adding a trigger to table CO that would update the Owners with any changes. I can then use table Owners for my OwnerAttributes table and solve my problem.
The question I want answered is if there is a better way to do this?
I am using a preexisting database, that is heavily used by an old application. The application is hooked up to use the table CO for owners and cars. There also exists several other tables that use the CO table. I wish I could split the table into Owners and Cars, but the company doesn't want me to spend all my time doing it as there are several more features I need to add to the application.
Your thoughts on the Owners table are on the right track! Your problem is because your schema is not normalized. It's the fact you're storing two things (cars, and owners) in one table (your table CO).
You are correct that you should make an Owner table, but you should then remove the Owner information from the CO table entirely, and replace it with a foreign key to the Owners table.
So you want something like this:
CREATE TABLE Owner (
ownerID int not null primary key indentity(1,0),
FirstName varchar(255),
LastName varchar(255),
/* other fields here */
)
GO
CREATE TABLE Car
carID int not null primary key identity(1,0),
ownerID int not null references Owner(ownerID),
/* other fields go here */
GO
/* a convenience, read only view to replace your old CAR OWNER table */
CREATE VIEW Car_Owner AS
SELECT c.*, o.FirstName, o.LastName FROM Car c INNER JOIN Owner o ON c.ownerID = o.ownerID
Now, you have everything properly normalized in SQL. A view has given you back the car_owner as one thing in a pseudo-table.
But the real answer is, normalize your schema. Let SQL do what it does best (relate things to other things). Combining the two things on one table will just lead to more problems like you're encountering downstream.
Hopefully this answer seems helpful and not condescending, which is what I was going for! I have learned the hard way that this approach (normalize everything, let the database do some extra work to retrieve/display/insert it) is the only one that works out in the end.
You should create Owner table, Car table, OwnerCar table(if person can has a few cars). Owner table contains fields, that describe owner(owner properties)

What is the preferred way of saving dynamic lists in database?

In our application user can create different lists (like sharepoint) for example a user can create a list of cars (name, model, brand) and a list of students (name, dob, address, nationality), e.t.c.
Our application should be able to query on different columns of the list so we can't just serialize each row and save it in one row.
Should I create a new table at runtime for each newly created list? If this was the best solution then probably Microsoft SharePoint would have done it as well I suppose?
Should I use the following schema
Lists (Id, Name)
ListColumns (Id, ListId, Name)
ListRows (Id, ListId)
ListData(RowId, ColumnId, Value)
Though a single row will create as many rows in list data table as there are columns in the list, this just doesn't feel right.
Have you dealt with this situation? How did you handle it in database?
what you did is called EAV (Entity-Attribute-Value Model).
For a list with 3 columns and 1000 entries:
1 record in Lists
3 records in ListColumns
and 3000 Entries in ListData
This is fine. I'm not a fan of creating tables on-the-fly because it could mess up your database and you would have to "generate" your SQL queries dynamically. I would get a strange feeling when users could CREATE/DROP/ALTER Tables in my database!
Another nice feature of the EAV model is that you could merge two lists easily without droping and altering a table.
Edit:
I think you need another table called ListRows that tells you which ListData records belong together in a row!
Well I've experienced something like this before - I don't want to share the actual table schema so lets do some thought exercises using some of the suggested table structures:
Lets have a lists table containing a list of all my lists
Lets also have a columns table containing the metadata (column names)
Now we need a values table which contains the column values
We also need a rows table which contains a list of all the rows, otherwise it gets very difficult to work out how many rows there actually are
To keep things simple lets just make everything a string (VARCAHR) and have a go at coming up with some queries:
Counting all the rows in a table
SELECT COUNT(*) FROM [rows]
JOIN [lists]
ON [rows].list_id = [Lists].id
WHERE [Lists].name = 'Cars'
Hmm, not too bad, compared to:
SELECT * FROM [Cars]
Inserting a row into a table
BEGIN TRANSACTION
DECLARE #row_id INT
DECLARE #list_id INT
SELECT #list_id = id FROM [lists] WHERE name = 'Cars'
INSERT INTO [rows] (list_id) VALUES (#list_id)
SELECT #row_id = ##IDENTITY
DECLARE #column_id INT
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Make'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Rover')
-- === Need one of these for each column ===
SELECT #column_id = id FROM [columns]
WHERE name = 'Model'
AND list_id = #list_id
INSERT INTO [values] (column_id, row_id, value)
VALUES (#column_id, #row_id, 'Metro')
COMMIT TRANSACTION
Um, starting to get a little bit hairy compared to:
INSERT INTO [Cars] ([Make], [Model}) VALUES ('Rover', 'Metro')
Simple queries
I'm now getting bored of constructing tediously complex SQL statements so maybe you can have a go at coming up with equivalent queries for the followng statements:
SELECT [Model] FROM [Cars] WHRE [Make] = 'Rover'
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
WHERE [Owners].Age > 50
SELECT [Cars].[Make], [Cars].[Model], [Owners].[Name] FROM [Cars]
JOIN [Owners] ON [Owners].id = [Cars].owner_id
JOIN [Addresses] ON [Addresses].id = [Owners].address_id
WHERE [Addresses].City = 'London'
I hope you are beginning to get the idea...
In short - I've experienced this before and I can assure you that creating a database inside a database in this way is definitely a Bad Thing.
If you need to do anything but the most basic querying on these lists (and literally I mean "Can I have all the items in this list please?"), you should try and find an alternative.
As long as each user pretty much has their own database I'll definitely recommend the CREATE TABLE approach. Even if they don't I'd still recommend that you at least consider it.
Perhaps a potential solution would be the creating of lists can involve CREATE TABLE statements for those entities/lists?
It sounds like the db structure or schema can change at runtime, or at the user's command, so perhaps something like this might help?
User wants to create a new list of an entity never seen before. Call it Computer.
User defines the attributes (screensize, CpuSpeed, AmountRAM, NumberOfCores)
System allows user to create in the UI
system generally lets them all be strings, unless can tell when all supplied values are indeed dates or numbers.
build the CREATE scripts, execute them against the DB.
insert the data that the user defined into that new table.
Properly coded, we're working with the requirements given: let users create new entities. There was no mention of scale here. Of course, this requires all input to be sanitized, queries parameterized, actions logged, etc.
The negative comment below doesn't actually give any good reasons, but creates a bit of FUD. I'd be interested in addressing any concerns with this potential solution. We haven't heard about scale, security, performance, or usage (internal LAN vs. internet).
You should absolutely not dynamically create tables when your users create lists. That isn't how databases are meant to work.
Your schema is correct, and the pluralization is, in my opinion, also correct, though I would remove the camel case and call them lists, list_columns, list_rows and list_data.
I would further improve upon your schema by skipping rows and columns tables, they serve no purpose. Simply have a row/column number attached to each cell, and keep things sparse: Don't bother holding empty cells in the database. You retain the ability to query/sort based on row/column, your queries will be (potentially very much) faster because the number of list_cells will be reduced, and you won't have to do any crazy joining to link your data back to its table.
Here is the complete schema:
create table lists (
id int primary key,
name varchar(25) not null
);
create table list_cells (
id int primary key,
list_id int not null references lists(id)
on delete cascade on update cascade,
row int not null,
col int not null,
data varchar(25) not null
);
It sounds like you might have Sharepoint already deployed in your environment.
Consider integrating your application with Sharepoint, and have it be your datastore. No need to recreate all the things you like about Sharepoint, when you could leverage it.
It'd take a bit of configuring, but you could call SP web services to CRUD your list data for you.
inserting list data into Sharepoint via web services
reading SP lists via web services
Sharepoint 2010 can also expose lists via OData, which would be simple to consume from any application.

Records linked to any table?

Hi Im struggling a bit with this and could use some ideas...
Say my database has the following tables ;
Customers
Supplers
SalesInvoices
PurchaseInvoices
Currencies
etc etc
I would like to be able to add a "Notes" record to ANY type of record
The Notes table would like this
NoteID Int (PK)
NoteFK Int
NoteFKType Varchar(3)
NoteText varchar(100)
NoteDate Datetime
Where NoteFK is the PK of a customer or supplier etc and NoteFKType says what type of record the note is against
Now i realise that I cannot add a FK which references multiple tables without NoteFK needing to be present in all tables.
So how would you design the above ?
The note FK needs to be in any of the above tables
Cheers,
Daniel
You have to accept the limitation that you cannot teach the database about this foreign key constraint. So you will have to do without the integrity checking (and cascading deletes).
Your design is fine.
It is easily extensible to extra tables, you can have multiple notes per entity, and the target tables do not even need to be aware of the notes feature.
An advantage that this design has over using a separate notes table per entity table is that you can easily run queries across all notes, for example "most recent notes", or "all notes created by a given user".
As for the argument of that table growing too big, splitting it into say five table will shrink the table to about a fifth of its size, but this will not make any difference for index-based access. Databases are built to handle big tables (as long as they are properly indexed).
I think your design is ok, if you can accept the fact, that the db system will not check whether a note is referencing an existing entity in other table or not. It's the only design I can think of that doesn't require duplication and is scalable to more tables.
The way you designed it, when you add another entity type that you'd like to have notes for, you won't have to change your model. Also, you don't have to include any additional columns in your existing model, or additional tables.
To ensure data integrity, you can create set of triggers or some software solution that will clean notes table once in a while.
I would think twice before doing what you suggest. It might seem simple and elegant in the short term, but if you are truly interested in data integrity and performance, then having separate notes tables for each parent table is the way to go. Over the years, I've approached this problem using the solutions found in the other answers (triggers, GUIDs, etc.). I've come to the conclusion that the added complexity and loss of performance isn't worth it. By having separate note tables for each parent table, with an appropriate foreign key constraints, lookups and joins will be simple and fast. When combining the related items into one table, join syntax becomes ugly and your notes table will grow to be huge and slow.
I agree with Michael McLosky, to a degree.
The question in my mind is: What is the technical cost of having multiple notes tables?
In my mind, it Is preferable to consolidate the same functionality into a single table. It aso makes reporting and other further development simpler. Not to mention keeping the list of tables smaller and easier to manage.
It's a balancing act, you need to try to predetermine both the benefits And the costs of doing something like this. My -personal- preference is database referential integrity. Application management of integrity should, in my opinion, be limitted ot business logic. The database should ensure the data is always consistent and valid...
To actually answer your question...
The option I would use is a check constraint using a User Defined Function to check the values. This works in M$ SQL Server...
CREATE TABLE Test_Table_1 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_2 (id INT IDENTITY(1,1), val INT)
GO
CREATE TABLE Test_Table_3 (fk_id INT, table_name VARCHAR(64))
GO
CREATE FUNCTION id_exists (#id INT, #table_name VARCHAR(64))
RETURNS INT
AS
BEGIN
IF (#table_name = 'Test_Table_1')
IF EXISTS(SELECT * FROM Test_Table_1 WHERE id = #id)
RETURN 1
ELSE
IF (#table_name = 'Test_Table_2')
IF EXISTS(SELECT * FROM Test_Table_2 WHERE id = #id)
RETURN 1
RETURN 0
END
GO
ALTER TABLE Test_Table_3 WITH CHECK ADD CONSTRAINT
CK_Test_Table_3 CHECK ((dbo.id_exists(fk_id,table_name)=(1)))
GO
ALTER TABLE [dbo].[Test_Table_3] CHECK CONSTRAINT [CK_Test_Table_3]
GO
INSERT INTO Test_Table_1 SELECT 1
GO
INSERT INTO Test_Table_1 SELECT 2
GO
INSERT INTO Test_Table_1 SELECT 3
GO
INSERT INTO Test_Table_2 SELECT 1
GO
INSERT INTO Test_Table_2 SELECT 2
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_1'
GO
INSERT INTO Test_Table_3 SELECT 3, 'Test_Table_2'
GO
In that example, the final insert statement would fail.
You can get the FK referential integrity, at the costing of having one column in the notes table for each other table.
create table Notes (
id int PRIMARY KEY,
note varchar (whatever),
customer_id int NULL REFERENCES Customer (id),
product_id int NULL REFERENCES Product (id)
)
Then you'll need a constraint to make sure that you have only one of the columns set.
Or maybe not, maybe you might want a note to be able to be associated with both a customer and a product. Up to you.
This design would require adding a new column to Notes if you want to add another referencing table.
You could add a GUID field to the Customers, Suppliers, etc. tables. Then in the Notes table, change the foreign key to reference that GUID.
This does not help for data integrity. But it makes M-to-N relationships easily possible to any number of tables and it saves you from having to define a NoteFKType column in the Notes table.
You can easily implement "multi"-foreign key with triggers. Triggers will give you very flexible mechanism and you can do any integrity checks you wish.
Why dont you do it the other way around and have a foreign key in other tables (Customer, Supplier etc etc) to NotesID. This way you have one to one mapping.