Model 'friends' table in a relational db for social network - sql

I'm rebuilding a project which I did a while ago, and instead of using mongoDB I'll be using PostgresSQL.
The project focuses on a social network - pretty much like facebook where users can post, send friend requests and etc.
Now, suppose I want to model a "friends" table, here is what I thought I can do:
There is going to be a buddy_requests
table
The idea is the sender is the one who sends the buddy request, and the receiver is the one who should approve it.
Now, when the receiver accepts the request, there is another table called "buddies" which is being updated:
Of course, if user A is friends with user B, then user B is friends with user A
Now, there are 3 ways of updating this buddies table which I though about and I would like to know which is the right way to do it (if it's right at all...)
Whenever a new request is being approved - insert 2 new rows, user_id for user A and buddy_id for user B and vice versa, user_id for user B and buddy_id for user A
The why: whenever I would want to query for all the friends of a specific user, I can search for them only in 1 column (user_id) and it is guaranteed I would retrieve all of this friends.
Whenever a new request is being approved - insert 1 new row, user_id for user A and buddy_id for user B
The why: Less space usage of the table, but querying for all the friends requires to go over 2 columns and take the distinct values (because we added only the sender to user_id)
Instead of having only 1 value in buddy_id, keep a json of all the friend's id's of user_id
The why: Querying only 1 column but saving a lot of data in the other one
Use MongoDB to store this kind of data and not use 'buddies' table.
I would like your honest opinions on this take, and if you have any other recommendations which offer better solution I would gladly read them.
Thank you!

Related

How to implement One-To-Many relationship with the same table?

I have a table called user. Now I am building an app where a user can have many users as friends. So I think I should create a new table called friends_list and implement one-to-many relationship where (user) is one and (friends_list) is many. Then to get the list of friends the user has, I could do select * from friends where userId = XXXx.
Is this the best approach? Or is there another better way to create a relationship with the same table?
Your approach is the best approach. You want to represent an n-m relationship (one user has many friends and vice versa).
There are some considerations. If "friendship" is symmetric (that is A friends B automatically means that B friends A), then you probably want to include both in the table when one is inserted. You may also want to prevent a user from self-friending.
If you want to retrieve user's friends as a list, you can do so using a string concatenation function. In Standard SQL, this is:
select listagg(friendid, ',') within group (order by friendid) as friendids
from friends
where userid = XXX;
Different databases have different function names for listagg().
An approach, and clearly NOT the best, would be to use an extra column to store a comma separated value of friend's user_ids.
If the answer to the question "who are my friends?" would not be too difficult to get, you'll need to rely on LIKE %XXXx% conditions and shouldnt expect very fast response times.
Another drawback would be some complexity with the relationship maintenance while editing the friend list.
Therefore, the two tables schema is both the most semantically correct and reliable one.

Hasura SELECT permissions for table relationships

I'm building a forum. I have a really simple database setup:
Users: id, display_name, email, profile
Posts: id, title, content, user_id
The user_id is a foreign key to the Users table.
Permissions:
For inserting/updating, X-Hasura-User-Id must equal id and user_id for inserting into the Users and Posts table. (so they can only modify their own posts)
For selecting, I have it so a user can read any post, but they can only select the row of the User if id = X-Hasura-User-Id. This is so a User can only read their profile data.
However, for selecting, I obviously need the user to be able to access display_name of the user's table, to display the post's author.
Now I can obviously make it so for select, they only have access to this field, and everything works fine. I can return a GQL query that displays the posts and the author.
But doesn't this also mean that a user can just run a query to the Users table and get a list of all the display_names, essentially showing how many users I have?
Is there a way to set it up so that a user can only select their own info from the User's table, but like, if the query is 'coming from' the server, it can access the display_name? I know there are Admin roles etc but I don't think this applies here.
But doesn't this also mean that a user can just run a query to the Users table and get a list of all the display_names, essentially showing how many users I have?
Yes
Is there a way to set it up so that a user can only select their own info from the User's table, but like, if the query is 'coming from' the server, it can access the display_name?
No
It's a valid concern to worry about data leakage in terms of how many users you have. But in general I would not worry about it.
However, there are a few things you could do to prevent this problem.
What you can do is:
Limit the number of rows per request (https://hasura.io/docs/1.0/graphql/manual/deployment/production-checklist.html#limit-number-of-rows-returned)
Make sure users are not allowed to aggregation queries (https://hasura.io/docs/1.0/graphql/manual/queries/aggregation-queries.html#aggregate-fields)
Also what you can do is create a VIEW where the display_name is joined and added to the posts table.

SQL - how to get out chained data?

I have 4 tables which were auto generated for me:
User
Challenge
Exercise
Challenge_Exercise
One User may have many Challenges, and one Challenge will have many Exercises.
What I noticed is that the Challenge table has a reference to it's parent User (called user_id) but Exercise do not have a reference in it's table to Challenge; their relation is stored in Challenge_Exercise as Challenge_id and exercise_id.
My question is, how would I take out every Exercise that is linked to a specific user? For instance User with id = 1?
SELECT *
FROM excerise,
challenge_excerise,
challenge
WHERE challenge.user_id = 1
AND challenge_excerise.challenge_id = challenge.id
AND challenge_excerise.exercise_id = excercise.id
What I'm doing here is a join, you could also explicitly do it with inner joins (google it if you wanna know more).
This table is needed because you have a many to many relationship, which means each challenge can have multiple exercises, but also each exercise can have multiple challenges. It's a standard to make an extra table then, so you don't have redundant data, this table is often called junction table.
If you want background just google it, there are tons of data to this topic.

sql Database to save different contact details for a message sending site

I am working for a project to create a database for saving different persons contact details in SQL.
For example,
X person saves 10 contacts, Y persons save 15 contacts, z persons save 20 contacts and so on.
I cant create separate tables to save contacts of x,y,z and so on. But i just want to know the alternative method to do that. Is there any easy method to save different contacts and is there any easy method to retrieve it.
I'm just a student, I don't know much about sql and don't have much experience in this. So I need your help to know much about this.
You need one table of contacts, with a column of user ID.
Another Table of Users (Persons) and a FK between them.
This is better.

Facebook database design?

I have always wondered how Facebook designed the friend <-> user relation.
I figure the user table is something like this:
user_email PK
user_id PK
password
I figure the table with user's data (sex, age etc connected via user email I would assume).
How does it connect all the friends to this user?
Something like this?
user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N
Probably not. Because the number of users is unknown and will expand.
Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.
Somewhat useful example:
Table Name: User
Columns:
UserID PK
EmailAddress
Password
Gender
DOB
Location
TableName: Friends
Columns:
UserID PK FK
FriendID PK FK
(This table features a composite primary key made up of the two foreign
keys, both pointing back to the user table. One ID will point to the
logged in user, the other ID will point to the individual friend
of that user)
Example Usage:
Table User
--------------
UserID EmailAddress Password Gender DOB Location
------------------------------------------------------
1 bob#bob.com bobbie M 1/1/2009 New York City
2 jon#jon.com jonathan M 2/2/2008 Los Angeles
3 joe#joe.com joseph M 1/2/2007 Pittsburgh
Table Friends
---------------
UserID FriendID
----------------
1 2
1 3
2 3
This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.
TL;DR:
They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.
Long Answer:
I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:
TAO: Facebook’s Distributed Data Store for the Social Graph
TAO: The power of the graph.
I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.
The video and article tells you a few things:
They're using MySQL at the very bottom of their stack
Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
I could not find anything on what software / DB they actually use for their cached graphs
Let's take a look at this, friend connections are top left:
Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches. Attention: Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.
Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.
I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.
Here is my disappointing test for just findings friends of friends:
DB Schema:
CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Friends of Friends Query:
(
select friend_id
from friends
where user_id = 1
) union (
select distinct ff.friend_id
from
friends f
join friends ff on ff.user_id = f.friend_id
where f.user_id = 1
)
I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.
This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.
I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.
I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.
How can I create a well performing social network site?
Update 2021-04-10: I'll probably never ever write the article ;) but here are a few bullet points how you could try to scale it:
Use different read and write repositories
Build specific read repositories based on faster non-relational DB systems made for that purpose, don't be afraid of denormalizing data. Write to a normalized DB but read from specialized views.
Use eventual consistence
Take a look at CQRS
For a social network graphs based read repositories might be also good idea.
Use Redis as a read repository in which you store whole serialized data sets
If you combine the points from the above list in a smart way you can build a very well performing system. The list is not a "todo" list, you'll still have to understand, think and adept it! https://microservices.io/ is a nice site that covers a few of the topics I mentioned before.
What I do is to store events that are generated by aggregates and use projects and handlers to write to different DBs as mentioned above. The cool thing about this is, I can re-build my data as needed at any time.
Have a look at the following database schema, reverse engineered by Anatoly Lubarsky:
My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.
Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.
It's most likely a many to many relationship:
FriendList (table)
user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel
EDIT
The user table probably doesn't have user_email as a PK, possibly as a unique key though.
users (table)
user_id PK
user_email
password
Take a look at these articles describing how LinkedIn and Digg are built:
http://hurvitz.org/blog/2008/06/linkedin-architecture
http://highscalability.com/scaling-digg-and-other-web-applications
There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:
http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html
Also, there's this article that talks about non-relational databases and how they're used by some companies:
http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php
You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.
There are a lot of links on the first two articles that should give you some more insight.
UPDATE 10/20/2014
Murat Demirbas wrote a summary on
TAO: Facebook's distributed data store for the social graph (ATC'13)
F4: Facebook's warm BLOB storage system (OSDI'14)
http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html
HTH
It's not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time
so Facebook implemented this using a hash database (no SQL) and they opensourced the database called Cassandra.
So every user has its own key and the friends details in a queue; to know how cassandra works look at this:
http://prasath.posterous.com/cassandra-55
Its a type of graph database:
http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html
Its not related to Relational databases.
Google for graph databases.
You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.
Example schema:
Users Table
userID PK
other data
Friends Table
userID -- FK to users's table representing the user that has a friend.
friendID -- FK to Users' table representing the user id of the friend
Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.
Whenever a user adds another user as a friend, two new rows are created.
For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').
Now when showing the friends-list to a particular user, a simple sql would do that: "select frnd_id from frnd_list where user_id="
where is the id of the logged-in user (stored as a session-attribute).
Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.
Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.