Patterns for building social network type applications? - sql

I need to design / architect / develop a web based social network type application.
Basic functionality:
- users create accounts on the system
- users agree to "friend" each other
- users create content within system
- users specifies which friends may view/edit content that they created
Surely this core functionality has been created many times before? Are there any best practice patterns out there for how to implement this sort of thing?
I'm most interested in how the database for this would look.
What would this look like from a SQL perspective (any database)?
What would this look like from a NOSQL perspective (any NOSQL database)?
The thing I am most interested in, is how is the question of "content visibility" solved within the database? i.e. how does the database/application ensure that only approved friends may see the user created content?
Thanks

First thing to get out the way is the database, an SQL one would just look like a normalised sql database. What else could it look like? A nosql database would look like a bunch of name value pair files.
Three approaches to building a social web site after and only after you do a shed load of research on existing popular and unpopular ones to ascertain their architecture and the markets that that they are aimed at, and the particular services that they offer to these markets.
Roll your own from scratch (and or use a framework). Like Facebook, Beebo, Myspace et al. This is obviously the longest route to getting there but it does mean you have something to sell when you do. Both the platform and the membership and the USP are yours to sell to Rupert Murdoch or whomever.
Use a CMS that lends itself to social site and use the basic functionality, plus the plug-ins plus your own inspiration to hit your target market. In this area Drupal is often used (i have used it successfully as well) but Joomla, Xaraya and many other both free and paid for can be used. Yep more research. Less to sell here when Rupert gives you a bell as the base tool is probably GPL'd
Use one of the provided system where you sign up and then use the tools to build your own but all the goodies are provided, These are known as white label sites. Start looking here. Here you have little to sell on if someone wants to take you over.
How is "content visibility" handled. Initially of course the site builder makes a decision on who can see content. Owners only, friends, registered users, the general public? etc. But this decision must fir in with the aims and policies of the site. The best way then to handle this is through Role Based Access RBAC see here for details
When you say you "need to design / architect / develop" is this because of an overwhelming inner urge or because someone is paying you?
Either way remember the social web space is very very crowded. If you are just building another YouTube, or FaceBook then you are unlikely to be able to generate the critical mass of number required to make such a site commercially successful.
If it is for a niche market not already catered for, e.g. "The Peckham and Brockley Exotic Bird Fanciers Club" then you know what you market is and what features will be required so any of the above options you deem the easiest and cheapest can be used but that is up to you to analyse and execute.
You may of course have an idea for a social site that is mainstream and is not covered by the other, i.e. you have spotted the mythological "gap in the market". In this case go for it but prepare to be disappointed. Or not.

Your design should be maintenable. This is what I have in my project.
1.) Application.Infrastructure
Base classes for all businessobjects, busines object collection, data-access classes and my custom attributes and utilities as extension methods, Generic validation framework. This determines overall behavior organization of my final .net application.
2.) Application.DataModel
Typed Dataset for the Database.
TableAdapters extended to incorporate Transactions and other features I may need.
3.) Application.DataAccess
Data access classes.
Actual place where Database actions are queried using underlying Typed Dataset.
4.) Application.DomainObjects
Business objects and Business object collections.
Enums.
5.) Application.BusinessLayer
Provides manager classes accessible from Presentation layer.
HttpHandlers.
My own Page base class.
More things go here..
6.) Application.WebClient or Application.WindowsClient
My presentation layer
Takes references from Application.BusinessLayer and Application.BusinessObjects.
Application.BusinessObjects are used across the application and they travel across all layers whenever neeeded [except Application.DataModel and Application.Infrastructure]
All my queries are defined only Application.DataModel.
Application.DataAccess returns or takes Business objects as part of any data-access operation. Business objects are created with the help of reflection attributes. Each business object is marked with an attribute mapping to target table in database and properties within the business object are marked with attributes mapping to target coloumn in respective data-base table.
My validation framework lets me validate each field with the help of designated ValidationAttribute.
My framrwork heavily uses Attributes to automate most of the tedious tasks like mapping and validation. I can also new feature as new aspect in the framework.
A sample business object would look like this in my application.
User.cs
[TableMapping("Users")]
public class User : EntityBase
{
#region Constructor(s)
public AppUser()
{
BookCollection = new BookCollection();
}
#endregion
#region Properties
#region Default Properties - Direct Field Mapping using DataFieldMappingAttribute
private System.Int32 _UserId;
private System.String _FirstName;
private System.String _LastName;
private System.String _UserName;
private System.Boolean _IsActive;
[DataFieldMapping("UserID")]
[DataObjectFieldAttribute(true, true, false)]
[NotNullOrEmpty(Message = "UserID From Users Table Is Required.")]
public override int Id
{
get
{
return _UserId;
}
set
{
_UserId = value;
}
}
[DataFieldMapping("UserName")]
[Searchable]
[NotNullOrEmpty(Message = "Username Is Required.")]
public string UserName
{
get
{
return _UserName;
}
set
{
_UserName = value;
}
}
[DataFieldMapping("FirstName")]
[Searchable]
public string FirstName
{
get
{
return _FirstName;
}
set
{
_FirstName = value;
}
}
[DataFieldMapping("LastName")]
[Searchable]
public string LastName
{
get
{
return _LastName;
}
set
{
_LastName = value;
}
}
[DataFieldMapping("IsActive")]
public bool IsActive
{
get
{
return _IsActive;
}
set
{
_IsActive = value;
}
}
#region One-To-Many Mappings
public BookCollection Books { get; set; }
#endregion
#region Derived Properties
public string FullName { get { return this.FirstName + " " + this.LastName; } }
#endregion
#endregion
public override bool Validate()
{
bool baseValid = base.Validate();
bool localValid = Books.Validate();
return baseValid && localValid;
}
}
BookCollection.cs
/// <summary>
/// The BookCollection class is designed to work with lists of instances of Book.
/// </summary>
public class BookCollection : EntityCollectionBase<Book>
{
/// <summary>
/// Initializes a new instance of the BookCollection class.
/// </summary>
public BookCollection()
{
}
/// <summary>
/// Initializes a new instance of the BookCollection class.
/// </summary>
public BookCollection (IList<Book> initialList)
: base(initialList)
{
}
}

a Graph Database like http://www.neo4j.org is a choice to look at. It lends itself very well to both the social network (e.g. http://blog.neo4j.org/2009/09/social-networks-in-database-using-graph.html) and the ACL-based security (e.g. http://wiki.neo4j.org/content/ACL ).

You should first study the existing social networks out there (Facebook, Myspace, etc). There is a fair amount of information available about how they are implemented.

The key to success for social networks is not the technology on which it is based but the problems they solve for the users. If the users like it, you're doomed for success even if your technology is crap.
[EDIT] How is it implemented? Check any SQL-based user role system. In this case, every user is also a role which can be added as "allowed to access" to any object. Depending on how many objects you have and how fine grained the control should be, that can mean that you have a table with three columns: OBJECT, USER, ACCESS_TYPE where ACCESS_TYPE can be one of OWNER, READ (friend), WRITE (close friend).
This table will become pretty large but a few 100 million rows is not uncommon for todays databases anymore.

As Aaroon pointed out, you should first ask youself what problem you want to solve.
What content do you want people to share? Should it really be visible only to friends? It is much easier and scalable if you make content publicly visible, because what content is displayed is not dependent on who is watching the page and you can cache it easily. Publicly available user-generated content attracts new users.
If you want to restrict access and give to the user the opportunity to attach groups of friends to a resource I would go with a simple group-based access control. Let each resource have a group of users which can edit the resource and a group of users who can see it.
That way each resource has two single value attributes, and each user belons to a finite number of group. You can attach the view-group and edit-group attributes to a document stored in a NOSQL database, a search engine like Lucene/Sphinx or a row in a SQL database. When querying for content available for the user, pass all groups the user belongs to (in SQL you would use IN clause, in Sphinx setFilter('view-group', array(2,3,4)). The database would return only content available for the user. Because you are attaching only 2 integer values (view-group and edit-group) to a document, you can store them in memory which makes the search fast and scalable.

In the end it looks like Elgg or Dolphin might meet our requirements. These appear to be PHP frameworks for rolling your own social network. I looked at the Facebook platform but nowhere did it clearly explain just what it is - it appears to be the facebook code but perhaps it is only the code for an addon API or something.

Related

Design Message System: concern about public and private members

For the interview of an oo design question: design message system, I am having trouble understanding what are some uses of public and private members/method for each class.
Long story short. Say we define the user class as the follows.
class user {
public:
string account_name;
string info;
vector<User> friend_list;
vector<Chat> chat_list;
void friend_request(User friend_target);
private:
string system_user_id;
}
I am wondering, should there be any private member in the first place?
Here, I defined system_user_id to be private because it shouldn't be exposed to the real user of the system. What do you guys think?
Another thing that I find helpful is considering encapsulation as it applies to clients of the user class, not just the external user.
A client could be an internal user. Imagine that in a couple months your user is the most popular thing on the internet and you have a team of developers working on your system.
They see a user class with a public account_name. They would like to change the name of the account so they update it directly. But what if a valid update requires synchronization with a datastore or something? The user class design has allowed the client (internal) the ability to create incorrect code!!
The same could go for friend_list, chat_list, if you're user is being used in a concurrent environment, you might need some sort of locking, if you expose the lists directly it allows your internal clients the option of creating race conditions, while if they were private and encapsulated you could better protect your internal clients.

Writing a Custom Hive Provider using objects as datasource

Im trying to create a hive provider that would be able to work towards some objects.
An object may look something like this
public class MyContent
{
public System.Collections.Generic.List Content { get; set; }
}
public class ContentExample
{
public string Title { get; set; }
public string Text { get; set; }
}
public class MyFiles
{
public System.Collections.Generic.List Files { get; set; }
}
public class FileExample
{
public System.IO.FileInfo File { get; set; }
}
I've downloaded and checked the two Hive providers from the Visual Studio Gallery (Umbraco 5 Hive Provider and Umbraco 5 Simple Hive Provider), but the lack of documentation is a bit disturbing. I also downloaded some of the other example hives, like the Wordpress hive provider, but that one is rather different from the ones in the Visual Studio Gallery.
The Idea
Im used to working with stuff like ObjectDataSource, the example above could be complemented with full CRUD if required.
Now, I assume one Hive provider would be able to serve different parts of Umbraco with content (right?). Just set up a new Repository and go? I have now clue how to connect all parts or even how to get the data into the provider yet.
Any help in how I could bring all pieces together?
Thanks
The first step is to take a step back and evaluate your business requirements. Will you allow for users to be updating the information with forms in the frontend? Do you need a tree editor for the content in the backoffice? Do you need to be work with data outside of the built-in ORM?
If the answer to these is no, a hive provider is overkill. Evaluate solutions using either a simple surface controllers, or just a custom document type. Umbraco 5 is a full EAV/CR system, so unlike some CMS products, you'll be able to represent any rdbs structure you can imagine
ContentExample could be represented be a document type called 'Article', which has properties Title and Text. Just by defining this document type we're instantly given add and edit forms for our back office users in our content section. We can even restrict which nodes are able to have children of type 'Article', e.g News.
In the same way, an upload control is a field type that allows you to attach files to your document.
So what's point of a custom hive provider?
The goal of a custom hive provider is to unify CRUD actions for data access layers.
As a result data can be stored in the baked-in nhibernate orm, custom tables, rss feeds, or even flat files, while still using a common interface to retrieve and update it. If this sounds like what you're aiming for, read on.
Going back to the business requirements, specifically, where do you want to actually store the data?--Given that you have some fields and properties related to flat file storage, let's say that one TypedEntity (a model) is equivelant to one file and write some pseduocode:
The first step, is as you say 'get the data into the repository.' This involves going back to that VS template and filling in the 'not implemented' methods with your logic for storing and retrieving data.
protected override void PerformAddOrUpdate(TypedEntity entity)
{
// step 1: serialize the typed entity to xml
// step 2: write the file to the hdd, making sure that the file name is named using the hive id so that we can pull it back later.
}
Once you've written the data access layer, or DAL, you can hook it up in the hive config, giving it url to match. e.g. rather than matching content:\\, yours might match on file-manager:\\
We can allow our backoffice users to be able to add new entities (indirectly, new files) by writing a custom tree, and we can display the results to our front-end users via macros.

Advice on how I should return more than a simple type from my service layer

In my mvc3 application, I have things setup like:
1. repositories for each entity
2. service class for each entity that wraps the bare nhibernate db calls with business logic
Now for example, a class that registers a user, I want the service class to return something more than a boolean or user object if the user can register successfully.
Is this good practise?
Reason being, a person may fail to register correctly for reasons like:
1. duplicate email address in the system
2. duplicate username
3. etc.
So my method may look like:
public User Register(User newUser)
{
// check for a user with the same email
// check for a user with the same username
// validation checks etc.
return user;
}
I am thinking of creating a UserRegistrationResponse object so I can return back a much richer return value.
So something like:
public UserRegistrationResponse Register(User user)
{
..
return userRegistrationResponse;
}
This way I can return back a user frienly response I can propogate to the UI layer, and still get the user object and other information etc.
Comments on this approach?
I guess the only other way would be to throw exceptions, but is that really a good idea? The idea is for me to able to re-use these service classes, like say in a Restful service layer I will need in the future.
This is very common. 10 out of 10 WCF projects I've worked on in the past 3 years used this pattern. This includes legacy projects at three different companies, green field development and mvc/webforms projects.

WCF Multiple Interface

i am really wanting to get my head around this WCF technology and it seems the last months of information cramming has somewhat distorted my overall concept of how i should build my client/server application.
If someone out there could shed some light on the best practises when developing my app and implementing a Duplex WCF service with multiple interfaces.
General outline: I am wanting to develop an app where users connect to a server and lets say'.. add contacts to an sql database. I have discovered many ways of doing this but would ultimatly like to know im heading down the right path when it comes time to developing the app further.
Some models i have discovered are...
Client has its own LINQ to SQL classes and handles all data to and from data.... BAD. really slow. overheads with LINQ and SQL connections amongst poor implementation of Linq Select command.
Another model was the develop the service to implement the linq to sql commands which are used for CRUD operations however this still doesnt provide live data updates to other clients connected to the service.
So i made a basic app that when a client logs in the to the service there Callback Channel gets added to the Callback List. When a client feeds in a new contact to the service, it invokes a callback to all channel clients with the new contact and the client side function takes care of adding the contact to the right spot.
So now i want to implement a User object and perhaps 2 more other business objects say Project and Item and lets say Item... my idea is to create my service like this
[Serializable]
[DataContract]
[ServiceBehavior(
ConcurrencyMode = ConcurrencyMode.Single,
InstanceContextMode = InstanceContextMode.PerCall)]
public class Project: IProject
{
[DataMember()]
public int projectID;
public int Insert(objSubItem _objSubItem)
{
// code here
}
etc and
[ServiceContract(
Name = "Project",
Namespace = "",
SessionMode = SessionMode.Required,
CallbackContract = typeof(IProjectCallback))]
public interface IProject
{
/// <summary>
/// Inserting a Project record to the database
/// </summary>
/// <param name="_project">Project from Client</param>
/// <return>ProjectID back to the client if -1 then fail</return>
[OperationContract()]
int Insert(Project _project);
and
public interface IProjectCallback
{
/// <summary>
/// Notifies the clients that a Project has been added
/// </summary>
/// <param name="_project">Inserted Project</param>
[OperationContract(IsOneWay = true)]
void NotifyProjectInserted(Project _project);
}
obviously i have other crud functions and functions to ensure that both client and server data records are read only when being editited.
now if i have multi objects what is it the best way to lay it out.
Im thinking to create a servce.cs and an Iservice.cs and an IserviceCallback to negotiate the client channel population.. sould i also use partial classes of the service to implement the Iproject and IUser to properly ivoke the service callbacks aswell as invoking the objects insert.
would i do it like this
[ServiceContract(Name = "Service",
Namespace = "",
SessionMode = SessionMode.Required,
CallbackContract = typeof(IServiceCallBack))]
[ServiceKnownType(typeof(Project))]
[ServiceKnownType(typeof(User))]
public interface IService
{
// code here
}
and also
[ServiceBehavior(
ConcurrencyMode = ConcurrencyMode.Single,
InstanceContextMode = InstanceContextMode.PerCall)]
public partial class Service : IUser
{
public int Insert(User _User)
{
//
}
}
public partial class Service : IProject
{
public int Insert(Project _project)
{
// code here
}
}
public partial class Service : IService
{
// functions here
}
}
if feels as though the approach feels right if it was for one interface but feel that i need some "Best Practice" assistance.
Many thanks in advance,,
Chris Leach
Hi Richard,
I appreciate your response. As you see this is my first post and third ever on any forum related to programming. I have lived my programming life very close to google as shown by my google autofill history but its time to start asking questions of my own so i thank-you for your assistance so far. I am really wanting to understand an overall approach to how best managing data consistency amongst a distributed client/service application. I am looking into Telerik ORM and also Entity Framework as a solution and exposing the entities through a WCF service but i lack the understanding to implement data consistency amongst the clients. i have managed to develop a netDualTcp chat application and have used a list of client callback context to send join/leave and chat functions. I lack the overall picture however it seems that if i have a in memory (static) version of all of the tables in my sql database and either have the clients bind directly to these lists if this is possible or it seems best for my custom user controls to handle the connections so the server is aware of who has that particular user control open and can direct changes to those clients who are registered to the callback contract. that way the clients arent having to load the entire project every time they wish to open the application. I am thinking of a multi purpose application such as a contact/grant application program where users will be using different parts of the application and do not always need to access all of the information at one time. When the user first logs in i am hoping that the service will attach a callback contract for the client and several bits of information are loaded back to the client on authentaction such as a basic state i.e if they are an admin they get notifications etc. once they are logged in they are presented with a blank canvas but then begin to load custom user controls into a docking panel type interface. i guess this is where i become a little stuck about how to best manage concurrency and consistency whilst minimizing load/data transfer times to the client and freeing up cpu proccess time on both client. I know in programming there are multiple ways of doing this but i would like to know from the people on this forum what they feel the best approach to this type of soultion is. I understand its a deep topic but i feel i have come this far and a guiding hand would be appreciated. Thanks again
Generally I find taking a non-abstract view of a service gets me to the right place. What is it that consumers of my service are going to need to do?
I obviously have internal domain objects that are used by my business layer to create and manipulate the data. However, the way the business layer does tings isn;t necessarily the best way to partition functionality for my service.
So for example, if any project should have at least one user in it then when you create the project you should send over at least one user at the same time. The service operations need to encapsulate all of the data required to carry out a self contained business transaction.
Similarly, the death knell of many distributed systems is latency - they require lots of round trips to complete something. So, for example, you want to be able to add a user to a project; in reality you probably want to add a number of users to as project. Therefore, you should model the operation to accept a list of users not a single one which must be invoked multiple times
So a project service should allow you to do all the things related to a project, or projects, through a service contract. If users can live independently of projects then also have a user service. If they cannot then don;t have a user service as everything needs to be project focussed.
Business transactions are often more than straight forward CRUD operations on domain entities and the service should model them rather than reflecting the data model

Web Service Contract Design - Single-Responsibility

I'm curious as to see how most developers go about designing the contracts to their web services. I am quite new to service architecture and especially new to WCF.
In short, I'd like to find out what type of objects you are returning in your operations, and does each operation in your service return the same object?
For example consider the following:
Currently, all services I create inherit from a ServiceBase object that looks similar to:
public abstract class AppServiceBase<TDto> : DisposableObjectBase where TDto : IDto
{
protected IAppRequest Request { get; set; }
protected IAppResponse<TDto> Response { get; set; }
}
Response represents the return object which composes something like:
public interface IAppResponse<TDto> where TDto : IDto
{
List<TDto> Data { get; }
ValidationResults ValidationResults { get; }
RequestStatus Status { get; }
}
Therefore, any derived service would return a response composed of the same object.
Now initially, I felt with would be a good design as this forces each service to be responsible for a single object. For the most part this has worked out, but as my services grow, I've found myself questioning this design.
Take this for example:
You have music service you're writing and one of your services would be "Albums".
So you write basic CRUD operations and they pretty much all return a collection of AlbumDto.
What if you want to write an operation that returns the types of albums. (LP, Single, EP, etc)
So you have an object AlbumTypesDto. Would you create a new service just for this object or have your Albums service return many different objects?
I can imagine a complex service with several varying return types to be cumbersome and poor design, yet writing a whole new service for what maybe, only one or two service operation methods to be overkill.
What do you think?
It is a good idea to design your services around your domain problem. By exposing a CRUD pattern on the service, essentially you are using services for data access. The risk of this is your business logic will end up on whatever is consuming your service.
You service should expose methods relavent to the problem you are trying to solve (which loosely models onto the operation on the UI typically)
From here you will see your data contracts start to fit more naturally to the problem you are trying to solve, instead of creating "one size fits all" contracts.
For a good starter, Google "Domain Driven Design" But there is plenty of reference material on this.