Getting 5level relation with one mysql query - sql

Thank u a lot for your answers beforehand. I need to make a such thing
I have a table friendship (id,user_id,friend_id,status,timestamp)
So lets say I am a user with user_id=43 and I am visiting a user with user_id=15
In the profile it should be a connection line of friendships
Let me describe ... lets say I have a friendship with user (user_id=3 and the user with user_id=3 is friend with user which profile I am visiting.
So on web site I will see
Connection
MyIcon->UserIcon(15)->UserIcon(3)->UserIcon(i am visiting)
And only in case when the friendship statuses for all are status=1...
Can anybody tell me how the query should look like?

With plain MySQL, there is no native way to do this. You have to either decide how deep you want to look, and use that amount of JOIN operations to see if you can 'reach' from one user id to the other, or you could give the community contributed Graph engine a whirl:
http://openquery.com/products/graph-engine
(this involves using a non-official binary AFAIK, perhaps it is already availble as a plug-in, but I am not sure aobut that)
With that engine, you can do it in a single simple query:
SELECT * FROM foo WHERE latch = 1 AND origid = 15 AND destid = 43;
And this would then return one row for each link you have to travel to reach from user 15 to user 43. You'd use the application code to display it nicely.

Had you modeled this as a Nested Set modeled hierarchy instead of the Adjacency List model which you have then this query would be trivial. As it is, you're looking at having to use recursion, which isn't natural to a relational database.
For some great information on modeling hierarchies, check out Joe Celko's book.

You might look at this answer to a question about recursive selection to see a hack you can do on the mySql side of things. It shows how to create a hierarchy for selection.
In mySql (ANSI SQL), there is no "native" way to perform such a query.

Related

How to avoid defining two extra tables for an object that can have only 3 possible values and is part of a many-to-many relationship?

I tried to search this online but found this question quite difficult to formulate in a concise & intelligible way.
I am developing an application which enables users to choose from 3 types of authentications: Password, Finger Print & Face Recognition. Each user may opt for multiple types of these 3 and I need to store their picks in a relational database. So theoretically, there exists a many-to-many relationship between users and authentication_types.
I know this seems quite trivial and probably I am overanalysing things, but which would be the optimal way to model this at a relational database level? What I am trying to avoid but seems to be the only reasonable solution in a relational DB setting, is to create a table for login types (say LoginTypes) in which to store the 3 login types mentioned above and create an intermediary table for the many-to-many relationship (say UsersLoginTypes).
What's frustrating a little for me is that for only 3 types of login, I need to create one table to store them and another one for the many-to-many relationship. And any time I want to get the login types chosen by a user, I cannot simply select the user and extract the login types from the user's object, but I need to make a query that involves two another tables (LoginTypes & UsersLoginTypes). Do I miss a simpler solution here?
I thought of maybe assigning each login type a digit (eg. Password - 1, Fingerprint - 2, Face Recognition - 3) and have a field in the User's model for the login types, where to store a string containing the digits corresponding to what the user chose. And eventually, this is perhaps what I would go for if no better solution exists.
PS. I am using Ruby on Rails with ActiveRecord, if this changes something.
In 1997, I once normalised a relational database model to death. It worked, was extremely flexible, but it invariably ground to a halt whenever you wanted to formulate an unforeseen query. It was already very tedious to formulate the query in the first place. (of course, that was at times when you always wrote your SQL manually - BI tools were a thing of the future).
So: a (master) table users , a (lookup) table login_types and a (child/intermediate) table active_users_authentications as your first shot is the correct way of modelling it relationally.
But if you want the system to be efficient/performant (and you don't need any further details for the authentication configurations - which you would store in active_users_authentications, of course), I for one would find it absolutely legitimate to have 3 Booleans (Yes/No) columns in the users table and call them: has_pwd_auth, has_fgnpr_auth, and has_facerec_auth .

What is this form of database called?

I'm new to databases and I'm thinking of creating one for a website. I started with SQL, but I really am not sure if I'm using the right kind of database.
Here's the problem:
What I have right now is the first option. So that means that, my query looks something like this:
user_id photo_id photo_url
0 0 abc.jpg
0 1 123.jpg
0 2 lol.png
etc.. But to me that seems a little bit inefficient when the database becomes BIG. So the thing I want is the second option shown in the picture. Something like this, then:
user_id photos
0 {abc.jpg, 123.jpg, lol.png}
Or something like that:
user_id photo_ids
0 {0, 1, 2}
I couldn't find anything like that, I only find the ordinary SQL. Is there anyway to do something like that^ (even if it isn't considered a "database")? If not, why is SQL more efficient for those kinds of situations? How can I make it more efficient?
Thanks in advance.
Your initial approach to having a user_id, photo_id, photo_url is correct. This is the normalized relationship that most database management systems use.
The following relationship is called "one to many," as a user can have many photos.
You may want to go as far as separating the photo details and just providing a reference table between the users and photos.
The reason your second approach is inefficient is because databases are not designed to search or store multiple values in a single column. While it's possible to store data in this fashion, you shouldn't.
If you wanted to locate a particular photo for a user using your second approach, you would have to search using LIKE, which will most likely not make use of any indexes. The process of extracting or listing those photos would also be inefficient.
You can read more about basic database principles here.
Your first example looks like a traditional relational database, where a table stores a single record per row in a standard 1:1 key-value attribute set. This is how data is stored in RDBMS' like Oracle, MySQL and SQL Server. Your second example looks more like a document database or NoSQL database, where data is stored in nested data objects (like hashes and arrays). This is how data is stored in database systems like MongoDB.
There are benefits and costs to storing data in either model. With relational databases, where data is spread accross multiple tables and linked by keys, it is easy to get at data from multiple angles and aggregate it for multiple purposes. With document databases, data is typically more difficult to join in single queries, but much faster to retrieve, and also typically formatted for quicker application use.
For your application, the latter (document database model) might be best if you only care about referencing a user's images when you have a user ID. This would not be ideal for say, querying for all images of category 'profile pic' or for all images uploaded after a certain date. You could probably accomplish your task with either database type, and choosing the right database will always depend on the application(s) that it will be used for, but as a general rule-of-thumb, relational databases are more flexible and hard to go wrong with.
What you want (having user -> (photo1, photo2, ...)) is kind of an INDEX :
When you execute your request, it will go to the INDEX and fetch the INDEX "user" in the photos table, and get the photo list to fetch. Not all the database will be looked up, it's optimised.
I would do something like
Users_Table(One User - One Photo)
With all the column that every user will have. if one user will have only one photo then just add a column in this table with photo_url
One User Many Photos
If one User Can have multiple Photos. then create a table separately for photos which contains only UserID from Users_Table and the Photo_ID and Photo_File.
Many Users Many Photos
If One Photo can be assigned to multiple users then Create a Separate table for Photos Where there are PhotoID and Photo_File. Third Table User_Photos which can have UserID from Users_Table and Photo_ID from Photos Table.

Sql design question - many tables or not?

15 ECTS credits worth of database design down the bin.. I really can't come up with the best design solution for my problem.
Which is this: Basically I'm making a tool that gathers a lot of information concerning the user. At the most the user would fill in 50 fields of data, ranging from simple checkboxes to text input. I'm designing the db right now (with mySql) and can't decide whether or not to use a single User table with all of those fields, or to have a table for each category of input.
One example would be "type of payment". This one has three options and if I went with the "table" way I would add a table paymentType and give it binary fields for each payment type. Then I would need and id table to identify which paymentType the user has chosen whereas if I use a single user table, the data would already be there.
The site will probably see a lot of users (tv, internet and radio marketing) so I'm concerned which alternative would be the best.
I'll be happy to provide more details if you need more to base a decision.
Thanks for reading.
Read this article "Database Normalization Basics", and come back here if you still have questions. It should help a lot.
The most fundamental idea behind these decisions, as you will see in this article, is that each table should represent one and only one "thing", and each field should relate directly and only to that thing.
In your payment types example, it probably makes sense to break it out into a separate table if you anticipate the need to store additional information about each payment type.
Create your "Type of Payment" table; there's no real question there. That's proper normalization and the power behind using relational databases. One of the many reasons to do so is the ability to update a Type of Payment record and not have to touch the related data in your users table. Your join between the two tables will allow your app to see the updated type of payment info by changing it in just the 1 place.
Regarding your other fields, they may not be as clear cut. The question to ask yourself about each field is "does this field relate only to a user or does it have meaning and possible use in its own right?". If you can never imagine a field having meaning outside of the context of a user you're safe leaving it as a field on the user table, otherwise do the primary key-foreign key relationship and put the information in its own table.
If you are building a form with variable inputs, I wouldn't recommend building it as one table. This is inflexible and dirty.
Normalization is the key, though if you end up with a key/value setup, or effectively a scalar type implementation across many tables and can't cache:
a) the form definition from table data and
b) the joined result of storage (either a caching view or otherwise)
c) or don't build in proper sharding
Then you may hit a performance boundary.
In this KVP setup, you might want to look at something like CouchDB or a less table-driven storage format.
You may also want to look at trickier setups such as serialized object storage and cache-tables if your internal data is heavily relative to other data already in the database
50 columns is a lot. Have you considered a table that stores values like a property sheet? This would only be useful if you didn't need to regularly query the values it contains.
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'PaymentType', 'Visa')
INSERT INTO UserProperty(UserID, Name, Value)
VALUES(1, 'TrafficSource', 'TV')
I think I figured out a great way of solving this. Thanks to a friend of mine for suggesting this!
I have three tables, Field {IdField, FieldName, FieldType}, FieldInput {IdInput, IdField, IdUser} and User { IdUser, UserName... etc }
This way it becomes very easy to see what a user has answered, the solution is somewhat scalable and it provides a good overview. I will constrain the alternatives in another layer, farther away from the db. I believe it's a tradeoff worth doing.
Any suggestions or critics to this solution?

Facebook database design?

I have always wondered how Facebook designed the friend <-> user relation.
I figure the user table is something like this:
user_email PK
user_id PK
password
I figure the table with user's data (sex, age etc connected via user email I would assume).
How does it connect all the friends to this user?
Something like this?
user_id
friend_id_1
friend_id_2
friend_id_3
friend_id_N
Probably not. Because the number of users is unknown and will expand.
Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.
Somewhat useful example:
Table Name: User
Columns:
UserID PK
EmailAddress
Password
Gender
DOB
Location
TableName: Friends
Columns:
UserID PK FK
FriendID PK FK
(This table features a composite primary key made up of the two foreign
keys, both pointing back to the user table. One ID will point to the
logged in user, the other ID will point to the individual friend
of that user)
Example Usage:
Table User
--------------
UserID EmailAddress Password Gender DOB Location
------------------------------------------------------
1 bob#bob.com bobbie M 1/1/2009 New York City
2 jon#jon.com jonathan M 2/2/2008 Los Angeles
3 joe#joe.com joseph M 1/2/2007 Pittsburgh
Table Friends
---------------
UserID FriendID
----------------
1 2
1 3
2 3
This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.
TL;DR:
They use a stack architecture with cached graphs for everything above the MySQL bottom of their stack.
Long Answer:
I did some research on this myself because I was curious how they handle their huge amount of data and search it in a quick way. I've seen people complaining about custom made social network scripts becoming slow when the user base grows. After I did some benchmarking myself with just 10k users and 2.5 million friend connections - not even trying to bother about group permissions and likes and wall posts - it quickly turned out that this approach is flawed. So I've spent some time searching the web on how to do it better and came across this official Facebook article:
TAO: Facebook’s Distributed Data Store for the Social Graph
TAO: The power of the graph.
I really recommend you to watch the presentation of the first link above before continue reading. It's probably the best explanation of how FB works behind the scenes you can find.
The video and article tells you a few things:
They're using MySQL at the very bottom of their stack
Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
I could not find anything on what software / DB they actually use for their cached graphs
Let's take a look at this, friend connections are top left:
Well, this is a graph. :) It doesn't tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches. Attention: Consider that a relational DB is what it is: It's thought to store normalised data, not a graph structure. So it won't perform as good as a specialised graph database.
Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.
I can't tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.
Here is my disappointing test for just findings friends of friends:
DB Schema:
CREATE TABLE IF NOT EXISTS `friends` (
`id` int(11) NOT NULL,
`user_id` int(11) NOT NULL,
`friend_id` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Friends of Friends Query:
(
select friend_id
from friends
where user_id = 1
) union (
select distinct ff.friend_id
from
friends f
join friends ff on ff.user_id = f.friend_id
where f.user_id = 1
)
I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I'm not a DB genius (suggestions are welcome). However, if this scales linear you're already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.
This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn't do any more complex query like "display me only posts from friends of friends + do the permission check if I'm allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them". You want to let the DB do the check on if you liked a post already or not or you'll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.
I think my answer answers the question how Facebook designed their friends relationship very well but I'm sorry that I can't tell you how to implement it in a way it will work fast. Implementing a social network is easy but making sure it performs well is clearly not - IMHO.
I've started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I'll write an article about it.
How can I create a well performing social network site?
Update 2021-04-10: I'll probably never ever write the article ;) but here are a few bullet points how you could try to scale it:
Use different read and write repositories
Build specific read repositories based on faster non-relational DB systems made for that purpose, don't be afraid of denormalizing data. Write to a normalized DB but read from specialized views.
Use eventual consistence
Take a look at CQRS
For a social network graphs based read repositories might be also good idea.
Use Redis as a read repository in which you store whole serialized data sets
If you combine the points from the above list in a smart way you can build a very well performing system. The list is not a "todo" list, you'll still have to understand, think and adept it! https://microservices.io/ is a nice site that covers a few of the topics I mentioned before.
What I do is to store events that are generated by aggregates and use projects and handlers to write to different DBs as mentioned above. The cool thing about this is, I can re-build my data as needed at any time.
Have a look at the following database schema, reverse engineered by Anatoly Lubarsky:
My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.
Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.
It's most likely a many to many relationship:
FriendList (table)
user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel
EDIT
The user table probably doesn't have user_email as a PK, possibly as a unique key though.
users (table)
user_id PK
user_email
password
Take a look at these articles describing how LinkedIn and Digg are built:
http://hurvitz.org/blog/2008/06/linkedin-architecture
http://highscalability.com/scaling-digg-and-other-web-applications
There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:
http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html
Also, there's this article that talks about non-relational databases and how they're used by some companies:
http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php
You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.
There are a lot of links on the first two articles that should give you some more insight.
UPDATE 10/20/2014
Murat Demirbas wrote a summary on
TAO: Facebook's distributed data store for the social graph (ATC'13)
F4: Facebook's warm BLOB storage system (OSDI'14)
http://muratbuffalo.blogspot.com/2014/10/facebooks-software-architecture.html
HTH
It's not possible to retrieve data from RDBMS for user friends data for data which cross more than half a billion at a constant time
so Facebook implemented this using a hash database (no SQL) and they opensourced the database called Cassandra.
So every user has its own key and the friends details in a queue; to know how cassandra works look at this:
http://prasath.posterous.com/cassandra-55
Its a type of graph database:
http://components.neo4j.org/neo4j-examples/1.2-SNAPSHOT/social-network.html
Its not related to Relational databases.
Google for graph databases.
You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.
Example schema:
Users Table
userID PK
other data
Friends Table
userID -- FK to users's table representing the user that has a friend.
friendID -- FK to Users' table representing the user id of the friend
Probably there is a table, which stores the friend <-> user relation, say "frnd_list", having fields 'user_id','frnd_id'.
Whenever a user adds another user as a friend, two new rows are created.
For instance, suppose my id is 'deep9c' and I add a user having id 'akash3b' as my friend, then two new rows are created in table "frnd_list" with values ('deep9c','akash3b') and ('akash3b','deep9c').
Now when showing the friends-list to a particular user, a simple sql would do that: "select frnd_id from frnd_list where user_id="
where is the id of the logged-in user (stored as a session-attribute).
Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.
Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.

How can I automatically determining table(s) schema from a set of queries?

Is there any tool, which will take a set of CRUD queries, and generate a 'good enough'
table schema for that set:
e.g. I can provide input like this:
insert username, password
insert username, realname
select password where username=?
update password where username=?
update realname where username=?
With this input, tool should be able to make either 1 or 2 or 3 table, take care of _id's,
and indexing.
To put it alternatively, i'm looking for a tool, with which, i can design set of queries assuming a single infinite column table, and tool process and actually generates a number of database/tables/columns, and a high level language module with function calls to each of query.
oh yes , i'm trying to fire my db designer (-:
Have you considered using a ORM solution like Hibernate? This requires a inital set of mappings between the application class model (for example the User class) and the database schema representation (eg: USER table).
An ORM solution may supports advanced mapping scenarios where an object maps to more than one table in the schema. Also newer versions of Hibernate supports generating the database schema from the mappings (search for hbm2ddl tool).
You're asking for the impossible.
How would the tool know that username should have an index on it, much less a unique index?
How would it know the data types of the columns?
How would it know any domain constraints — for example, a hypothetical sex column must be either male or female, not crimson?
Wouldn't it be pretty vulnerable to typos, leaving you with a username and a user_name column?
Databases require design for a (well, many) reasons. Questions of normalization, for example, are going to be very difficult for a tool—which can't understand your problem domain—to answer.
That said, it isn't automatic, but what your asking for is—as Aleris answered—an ORM. You didn't specify which language you are using, but surely there is one (or more) for yours.