I know that, generally speaking, if a one-to-one relationship exists between two documents I should consider embedding one document within the other. I do, however, have a few scenarios where this doesn't feel right, primarily in scenarios where I need to query on properties of the embedded document. What I have done instead is to create a relationship between the ids (primary keys) of the two documents by using a convention.
For example, a User has a PasswordResetLog. The user and the log are represented by separate documents. If the user document id is 'users/123' then the corresponding password reset log document id is 'passwordresetlog/123'.
Since I almost always have access to the userId I can easily load documents associated with it in this manner.
My first question is: will this create fragmented indexes for the documents where I specifically set the ids by convention? The document ids are sequential, but I cannot alway guarantee that they'll be created in sequential order.
My second question is: Instead of using this convention, should I just add a property UserId on each document that is in a one-to-one relationship with User, and add an index for this property?
Related
I've been trying to get my head around NoSQL, and I do see the benefits to embedding data in documents.
What I can't understand, and hope someone can clear up, is how to store data if it must be relational.
For example.
I have many users. They are all buying a product. So everytime that they buy a product, we add it under the users document in mongo, so its embedded and its all great.
The problem I have is when something in reference to that product changes.
Lets say user A buys a car called "Porsche". Then, we add a reference to that under the users profile. However, in a strange turn of events Porsche gets purchased by Ferrari.
What do you do now, update each and every record and change to name from Porsche to Ferrari?
Typically in SQL, we would create 3 tables. One for users, one for Cars (description, model etc) & one for mapping users to purchases.
Do you do the same thing for Mongo? It seems like if you go down this route, you are trying to make Mongo do things SQL way, which is not what its intended for.
I can understand how certain data is great for embedding (addresses, contact details, comments, etc) but what happens when you need to reference data that can and needs to change at a regular basis?
I hope this question is clear
DBRefs/Manual References were made specifically to solve this issue. Instead of manually adding the data to each document and then needing to update when something changes, you can store a reference to another collection. Here is the mongoDB documentation for details.
References in Mongo
Then all you would need to do is update the reference collection and the change would be reflected in all downstream locations.
When i used the mongoose library for node js it actually creates 3 tables similar to how you might do it in SQL, you can use object id's as foreign keys and enrich them either on the client side or on the backend, still no joining but you could do an 'in' query for the ID's then enrich the objects that way, mongoose can do this automatically by 'populating'
I'm working on an app where part of it involves people liking and commenting on pictures other people posted. Obviously I want the user to be notified when someone comments/likes their picture but I also want that user to be able to be able to see the pictures that they posted. This brings up a couple structuring questions.
I have a table that stores an image with it's ID, image, other info such as likes/comments, date posted info, and finally the userID of the user that posted the image:
Here's that table structure:
Image Posts Table: |postID|image|misc. image info|userID|
The userID is used to grab information from the users entry in the user table for notifications. Now when that user looks at a page containing his own posts I have two options:
1.) Query the Image Posts Table for any image containing that user's userID.
2.) Create a table for each user and put a postID of each image they posted :
Said User's Table: |postID|
I would assume that the second option would be more efficient because I don't have to query a table with a large amount of entries. Are there any more efficient ways to do this?
Obviously I should read up on good database design so do any of you have any good recommendations?
Multiple tables of identical structure almost never makes sense. Writing queries using your 2nd option would become ugly in short order. Stick with 1 large user's table, databases are designed to handle tables with many rows.
I would recommend against manually storing the userID, as Parse will do it's own internal magic if you just set a property called user to the current user. Internally it stores the ID and marks it as an object reference. They may or may not have extra optimizations in there for query performance.
Given that the system is designed around the concept of references, you should keep to just the two tables/classes you mentioned.
When you query the Image Posts table you can just add a where expression using the current user (again it internally gets the ID and searches on that). It is a fully indexed search so should perform well.
The other advantage is that when you query the Image Posts table you can use the include method to include the User object it is linked to, avoiding a 2nd query. This is only available if you store a reference instead of manually extracting and storing the userID.
Have a look at the AnyPic sample app on the tutorial page as it is very similar to what you mention and will demonstrate the ideas.
Lets say I have a table EmailQueue that is used to build out emails to send to users, using several non related processes. The Emails can contain a ever growing number of different 'content items' but for now lets just say we have News, Events, and Offers. Each of these 'content items' are already populated in their own respective tables and will be selectively added to a users email.
Now I have a design decision to make.
1
I could keep with a normalized pattern and create a mapping table for each of the 'content items' that an email can contain.
|EmailId|NewsId| , |EmailId|OfferId| , ...
The main issue I see with this design is that there is a good bit of overhead every time a new 'content type' is integrated to the email system; Both in database and object mapping.
OR
2
I could create 1 mapping table that has a Type reference.
|EmailId|ContentID|ContentType|
Of course the big issue here is that there is no referential integrity. I feel object mapping would be much easier to handle and adding a new object only requires adding a new ContentType row (and of course the required object mapping code).
Is one of these solutions better than the other? Or is there a solution better than both of these that I am unaware of?
I'm leaning towards using method 2, mainly because this project needs to be rapidly developed, but worried I may regret that decision down the road.
Note: We are using subsonic as our data access ORM, which does a decent(not perfect) job of handling object graphs through keyed relations. I will still likely need to map the active record 'content' objects to domain object though.
I am working on a design where I can have flexible attributes for users and I am confused how to continue the design of the schema.
I made a table where I kept system needed information:
Table name: users
id
username
password
Now, I wish to create a profile table and have one to one relation where all the other attributes in profile table such as email, first name, last name, etc. My question is: is there a way to add a third table in which profiles will be flexible? In other words, if my clients need to create a new attribute he/she won't need any customization to the code.
You're looking for a normalized table. That is a table that has user_id, key, value columns which produce a 1:N relationship between User & this new table. Look into http://en.wikipedia.org/wiki/Database_normalization for a little more information. Performance isn't amazing with normalized tables and it can take some interesting planning for optimization of your code but it's a very standard practice.
Keep the fixed parts of the profile in a standard table to make it easy to query, add constraints, etc.
For the configurable parts it sounds like you are looking for an entity-attribute-value model. The extra configurability comes at a high cost though: everything will have to be stored as strings and you will have to do any data validation in the application, not in the database.
How will these attributes be used? Are they simply a bag of data or would the user expect that the system would do something with these values? Are there ever going to be any reports against them?
If the system must do something with these attributes then you should make them columns since code will have to be written anyway that does something special with the values. However, if the customers just want them to store data then an EAV might be the ticket.
If you are going to implement an EAV, I would suggest adding a DataType column to your attributes table. This enables you to do some rudimentary validation on the entered data and dynamically change the control used for entry.
If you are going to use an EAV, then the one rule you must follow is to never write any code where you specify a particular attribute. If these custom attributes are nothing more than a wad of data, then an EAV for this one portion of your system will work. You could even consider creating an XML column to store these attributes. SQL Server actually has an XML data type but all databases have some form of large text data type that will also work. On reports, the data would only ever be spit out. You would never place specific values in specific places on reports nor would you ever do any kind of numerical operation against the data.
The price of an EAV is vigilence and discipline. You have to have discipline amongst yourself and the other developers and especially report writers to never filter on a specific attribute no matter how much pressure you get from management. The moment a client wants to filter or do operations on a specific attribute, it must become a first class attribute as a column. If you feel that this kind of discipline cannot be maintained, then I would simply create columns for each attribute which would mean an adjustment to code but it will create less of mess down the road.
Lets say I have big Solr that holds ~150M documents.
I also have 100,000 users that each user have documents that he saved.
My questions:
What is the best way to store those documents IDs (the documents that each user saved)
If I decided to store the IDs in Mongo or MySql, what is the best way to allow the users to preform searches on their documents, meaning that I store only the Ids on Mongo/MySQL but the actual information is on Solr.
Thanks.
You can add a field username_s to each document that is being indexed. This field contains the username who may access the document. You can also use an array of users, if you would like to give more people access to this document.
Then in your backend you can add &fq=username_s:User. Even if there are 100 Million documents indexed, only those are shown who belong to the user.
/core/select?q=*.*&fq=username_s:<User>
You could store all documents for all users in the same core, leave the field "id" blank and an unqiue id is automatically generated for you by solr.