Activity streams / feeds / news in social network database schema - sql

I have a goal to implement database schema for simple \ typical social network.
I have read many threads \ answers but have couple open questions.
So we have User table (userId, name and etc). We can make some Actions (reply, like, follow and etc). I want to implement some log for all activities and do it as PULL-MODEL. So we write entry in Activity table for any action. Schema for this table is (id, ownerId, actionType, targetId, time) where ownerId is User's id, who made action. actionType is reply, follow or other action. targetId is id of user or post and depends on actionType. When User get his activities we just do query by friends ids. So it is clear for me. My questions are:
1) In case if I follow User and unfollow him, what I should do? Should I make two entries in Activity table or I should remove the first followAction entry? What is the best practice?
2) It is clear foe me do query by friend ids so I get all activities of my friends. But in case any not my friend liked my photo and I must get event that "Some not my friends liked my photo". So, what are good solutions there for this case. May be I must to change my current schema?
Releated questions :
How to implement the activity stream in a social network
Database Design - "Push" Model, or Fan-out-on-write
What's the best manner of implementing a social activity stream?
Thanks you all for good answers.

First, it may be better to split each kind of action into its own table, rather than having all actions in one table, distinguished by types. This makes your metadata about each action more flexible; as you say, the target ID depends on the action; without splitting them out into other tables, it's harder to write constraints on what the data should be.
Second - on your question #1, I think you're confusing a log of user actions with user status. You may need both; you might want two separate data structures. For example, if a user follows and then unfollows, the status is that they aren't following, but the log of actions is that they followed, then unfollowed. So I think you should be careful to have a separate data structure that captures current status of certain relationships, apart from actions. Then the problem becomes simpler, you log all actions as they happen, and update status accordingly.
For question #2, the photo should be its own data object, with "likes" split out into a different table; users like posts. Then of all of the users who like a post, they can easily be grouped into two categories; friends (those who have a friend relationship to the poster) and non-friends.

Related

Friends of friends use case - Redis vs a Graph Databse

I made a small social network as part of an assignemt for school. Our next task is to implement a functionality into the project which will use a non-relational database. We were suggested to go for Redis or ElasticSearch.
It is clear to me, that I could use ElasticSearch to look for people and groups based on their names etc.
But at the moment I am more interested in making a potential friend finder which suggests friends based on common friends of the two users and maybe the groups they are part of.
My question is: Is this a good use-case for Redis or would it be much much better to use a Graph database for something like this?
This is how I imagined it:
I have a Set of registered users called "users" stored inside redis
For each user I have a Set which keeps track of their friends e.g.
"user:1:friends"
I also have a SortedSet of potential friends stored for each user
e.g. "user:1:potential"
Let's say a user I am not friends with will add one of my friends to their friends-list. When this happens I would take all the sets of friends of my friend and check if my friends new friend is part of each of the sets. If not, then I would increment the score assigned to his id in the sets of potential friends of my friends friends which are not friends with the new guy.
All in all this seems to me like a lot of work which is why I am not sure wether this is even a good idea.
So again- Would a graph database just be much better for somehting like this?
Considering you'd have anything to implement (inlcuding setting up the no sql db) I would definitely go with graphDBs.
Friends of friends (and more generally suggestions) is a basic use case for this type of DB. It's were they show their full potential.
I'd suggest that you have a look at Neo4J: http://neo4j.com
And their social network use case: http://neo4j.com/use-cases/social-network/

should database verify if user is authorized to perform action

Should database be verifying if user is authorized to perform certain action?
Two examples:
1)User is enrolled in 30 teams max and it can see scoresheet of these teams only. I'm passing in userid and teamid to the stored procedure and fetching the scoresheet only if user is authorized to view the scoresheet. Is it more appropriate to only pass in only teamid and check beforehand what all teams user is enrolled in? Should I do both?
2)Currently I'm passing in userid of the poster and the commentid of the comment to be deleted and I'm deleting comment only if both criteria is met - userid matches to the poster id and commentid matches to the commentid - just to make sure user is deleting his own comment and not somebody else's. Is it an overkill?
Multiple layers of validation is best practice and it doesn't seem like your methods would cause additional overhead. Just make sure to limit connecting to the database once, I've found that the most costly part of running database queries is the connection and cursors.
http://msdn.microsoft.com/en-us/library/aa174437%28v=sql.80%29.aspx
Security experts will tell you that No amount of security is enough! But at the same time you have to find a balance b/w security and unnecessary layers of protection that are bound to affect your application's performance.
Answering your 2nd question first: It is a good idea to pass both userid as well as commentid, and matching both, so that you accidentally don't delete all comments by a particular user.
Coming to your 1st question now: As I understand it, you want users only part of the team to be able to view the team's scoresheet, right? In order to do so passing only the teamid of all the teams the user is a part of will do. I am not sure what you mean by authorization here!
NOTE:
I have answered your question from a theoretical view with no idea about your Table structure or whats written in your Stored Procedures.
Your frontend is a much more friendlier (libraries, frameworks, best practices) environment to implement whatever access restrictions or authorization that you could possibly have in mind. Adding another layer inside the database just adds a lot of complexity and duplicate implementation of your access restrictions.
I would only consider doing it if clients connect and execute commands directly against the database.
So, rely on the ids provided by the application and spend your energy on sanitizing user input and implementing a sane authentication model. You will need it.

ORM: authorization via reachability

We are building a webapplication which uses a database. Also we use an object relational mapper to access the database. One aspect of authorization in the webapplication is that the user may access an object referred to by an URL. The URL contains a unique id (for example the Primary Key) to a specific record in the database. Consider the following example.
a user may belong to many groups and a group may have many users (many-to-many).
a survey belongs to a group (many-to-one).
a survey may have multiple questions. (many-to-one).
Say we have the following URL: http://app.local/question/edit/10. This means we want to edit question with PK 10. Now, we want to verify if the logged in user may access question with PK 10. This can be done by retrieving this question, then it's survey then it's group and then all its users. If any of the users is the same as the logged in user the logged in user may access the question.
To generalize this a bit; we want to check if a record is reachable from another record by the known many-to-one or many-to-many relations. So if there is a many-to-one relation (like with a survey and a question then we should check if a user is reachable from the question through the survey and then through the group. The group has a many-to-many relation with the user so we should check if any (not all) of the users is the same as the logged in user.
If a table has multiple many-to-one relations, say; we can attach a CSS template to a survey and this template also belongs to a group then we have to check if a user is reachable from all many-to-one relations (thus the group and the template). The same holds of course for multiple many-to-many relations.
Are there Object Relation Mappers which support this behaviour? And what is this behaviour called, maybe reachability? Does Propel (for PHP) support this behaviour? I think this reachability can be done in any of the following two ways:
Execute a query to get each "parent", uses many queries)
Join all necessary tables to see if a record exists (the reachable users matches the logged in user) in one query.
Furthermore this behaviour of the ORM should support nested sets, thus if a group contains nested set behaviour it should also try to reach a user through the group's parent.
I don't think this kind of behaviour should be restricted to authorization; objects should simply be able to see if they can reach another object.
Note that I do not mean persistence by reachability: http://jpaobjects.sourceforge.net/m2-site/main/documentation/docbkx/html/user-guide/ch08s03.html.
Or... am I simply looking at this authorization wrong and is there a far better and different approach with an ORM?
I've handled this in the past using nested resources in Ruby on Rails (which uses the Active Record ORM). Rather than http://app.local/question/10/edit, the URI would be http://app.local/survey/5/questions/10/edit
In the controller you load both the question and survey. You check authorization by comparing the survey to the authenticated user's group memberships. One way to engineer this would be to embed this logic into the User class. For example, in the controller you have question and survey (and the relationship between the two is well understood by the ORM, i.e. question.survey). You could then check access as user.hasAccess?(question), which would be a relatively easy method to write. Pseudocode:
class User < ActiveRecord::Base
def hasAccess?(question)
return question.group.users.include?(self)
Yes, this will result in several queries behind the scenes, but ORMs do the work. I do it this way because you're left with solid schema and easy to read code. Don't optimize until you actually have a performance problem.

The REST-way to check/uncheck like/unlike favorite/unfavorite a resource

Currently I am developing an API and within that API I want the signed in users to be able to like/unlike or favorite/unfavorite two resources.
My "Like" model (it's a Ruby on Rails 3 application) is polymorphic and belongs to two different resources:
/api/v1/resource-a/:id/likes
and
/api/v1/resource-a/:resource_a_id/resource-b/:id/likes
The thing is: I am in doubt what way to choose to make my resources as RESTful as possible. I already tried the next two ways to implement like/unlike structure in my URL's:
Case A: (like/unlike being the member of the "resource")
PUT /api/v1/resource/:id/like maps to Api::V1::ResourceController#like
PUT /api/v1/resource/:id/unlike maps to Api::V1::ResourceController#unlike
and case B: ("likes" is a resource on it's own)
POST /api/v1/resource/:id/likes maps to Api::V1::LikesController#create
DELETE /api/v1/resource/:id/likes maps to Api::V1::LikesController#destroy
In both cases I already have a user session, so I don't have to mention the id of the corresponding "like"-record when deleting/"unliking".
I would like to know how you guys have implemented such cases!
Update April 15th, 2011: With "session" I mean HTTP Basic Authentication header being sent with each request and providing encrypted username:password combination.
I think the fact that you're maintaining application state on the server (user session that contains the user id) is one of the problems here. It's making this a lot more difficult than it needs to be and it's breaking a REST's statelessness constraint.
In Case A, you've given URIs to operations, which again is not RESTful. URIs identify resources and state transitions should be performed using a uniform interface that is common to all resources. I think Case B is a lot better in this respect.
So, with these two things in mind, I'd propose something like:
PUT /api/v1/resource/:id/likes/:userid
DELETE /api/v1/resource/:id/likes/:userid
We also have the added benefit that a user can only register one 'Like' (they can repeat that 'Like' as many times as they like, and since the PUT is idempotent it has the same result no matter how many times it's performed). DELETE is also idempotent, so if an 'Unlike' operation is repeated many times for some reason then the system remains in a consistent state. Of course you can implement POST in this way, but if we use PUT and DELETE we can see that the rules associated with these verbs seem to fit our use-case really well.
I can also imagine another useful request:
GET /api/v1/resource/:id/likes/:userid
That would return details of a 'Like', such as the date it was made or the ordinal (i.e. 'This was the 50th like!').
case B is better, and here have a good sample from GitHub API.
Star a repo
PUT /user/starred/:owner/:repo
Unstar a repo
DELETE /user/starred/:owner/:repo
You are in effect defining a "like" resource, a fact that a user resource likes some other resource in your system. So in REST, you'll need to pick a resource name scheme that uniquely identifies this fact. I'd suggest (using songs as the example):
/like/user/{user-id}/song/{song-id}
Then PUT establishes a liking, and DELETE removes it. GET of course finds out if someone likes a particular song. And you could define GET /like/user/{user-id} to see a list of the songs a particular user likes, and GET /like/song/{song-id} to see a list of the users who like a particular song.
If you assume the user name is established by the existing session, as #joelittlejohn points out, and is not part of the like resource name, then you're violating REST's statelessness constraint and you lose some very important advantages. For instance, a user can only get their own likes, not their friends' likes. Also, it breaks HTTP caching, because one user's likes are indistinguishable from another's.

Typical normalization security issue in web applications

i am currently having a problem, i guess a lot of people have run into before and i would like to know how you handled it.
So, imagine you have 10.000 Users on your App. ( each one has an own user/pw login to administrate his stuff ).
Imagine further, that you have a growing normalized SQL-tablestructure in the backend, with tables like: Users, Orders, OrderPositions, Invoices, etc.
So, to show/edit/delete stuff of a table which isn't the usertable itself, u'll probably have links like these, to let ypur users interact with the application.
~/Orders/EditOrder?id=12
~/Orders/ShowOrderPosition?orderId=12&posId=443
Ok, and now the problem:
How, do i prevent in a "none-complex"-way, that user A has access ( show/edit/delete ) the data of user B.
Example:
User B calls:
~/Orders/ShowOrderPosition?orderId=12&posId=443
which is an order of user A, so user B should have no access to it.
So, in my code i would need to have a UserIdentity-check before or within every single SQL-statement, like:
select * from OrderPosition op, Order o, User u
where op.Id = :orderId
and op.Fk_OrderId = :orderpositionId
and o.Id = :orderId
and o.Fk_User = :userId
Only this way i can make sure, that the data belongs to the requesting user.
To reach the usertable will of course get far more complex, the deeper the usertable-connection is "buried" in the normalization ( imagine tables like payments or invoices, connected to the order-table... )
Question:
What is your approach to deal with this, concidering: Low complexity, DRY and performance
( Hope u understand what i mean ;) )
This is a bit like a multi-tenant application - I have gone down this route and denormalized an ID onto all those tables that require this kind of check (a tenant ID, in your case, sounds like the user id).
I then created an interface that contains this field only and applied it to all those classes in my model layer that required this access.
In my base data access (repository) class, where all the select/update/delete calls go through, I then check to see if the class if of the type of that interface, and I then check that the current access matches that ID.
Of course, this depends on how your code is structured, and how simple/complex making this global kind of change will be...
Never expose ids.
And if you have to: encrypt them.
Performance
for ultimate performance you will have to denormalize to the point that reading the row and comparing with some application level variable would give you an answer on what kind of rights the user has (this is fairly fast and if your DAO/BAO level is well organized plugging it in will keep it relatively DRY and at relatively low complexity.) NOTE: complexity is also a function of your security model, once you start to implement inheritable, positive and negative, role-based access rules then it can not be really simple.
DRY
another route to take (which is very seldomly taken these days) is to use your database roles to manage security; this might get complicated but will offer unparalleled security (as it will be ensured at the DB level and not application level. Complexity should go down, at the application code level, if you manage to encapsulate all of your access paths into VIEWS, which might require quite a bit of re-tailoring at the database level. However(!), it might be possible to implement security model with very little changes to the application code - by renaming existing tables and replacing them with secured views)
Don't use your internal ID column, encrypted or not, it'll come back to bite you one day.
Create a random, unique, string (GUID, whatever), which contains the link between the user and the data he's requesting. So, instead of having, for user 34567:
Edit order
Create a record {"5dsfwe8frf823jrf",34567,12} in a temporary table and show:
Edit order
When the users clicks the link, fetch 34567,12 from your temporary table.
The string 5dsfwe8frf823jrf is impossible to guess = no security risk.