Best Practice for Storing this in MongoDB?

Best Practice for Storing this in MongoDB? - sql

As part of a research project for work, I am doing some (greatly simplified) comparisons between SQL Server and MongoDB. I am familiar with SQL server, but this is my first foray into Mongo. I am curious as to what best practice is for this.
Imagine a Blog site. Users can log in and post blogs. They can also comment on blogs, or reply to other comments. The table structure in SQL looks roughly like this:
Users
========
id
Name
Password
Blogs
=========
id
Title
Content
AuthorUserId
Comments
=========
id
Content
AuthorUserId
BlogId
ParentCommentId
Straightforward enough, I guess. If ParentCommentId is NULL, then the comment is in direct reply to the Blog, otherwise, ParentComment is another comment which is being replied to.
I use a nifty little recursive function to Delete a comment that's far down in the tree, which also deletes any child comments associated with it.
So in Mongo, I currently have a Users collection with the same fields.
The part I'm wondering about is Blogs/Comments.
My initial impulse was to store comments as a Subcollection of the Blog. The problem comes when Comments start replying to other Comments. There is no practical limit on the "depth" of a reply tree. So if I store Comment replies as a Subcollection of a Comment, and so forth, even after say 4 replies, we are at: Blog -> Comment -> Comment -> Comment -> Comment -> Comment, or a Sub-sub-sub-sub-subcollection.
Since Mongo doesn't seem to have a recursive query/delete - this gets unworkable fast.
So here's where I'm stumped. Aside from working through some basic tutorials, I have never worked with Mongo before, so I'm not really sure how to accomplish this without mimicking a relational structure, which seems to be defeating the purpose of using a non-relational database a bit.
So uh. Help?
TIA.

Related

2sxc Knowledge Management solution hurdles

I'm evaluating 2sxc as a possible platform for implementing a knowledge management solution but we're in a bit of a rush. Our alternative is DNN Live Articles.
So far I really like the look of 2sxc, but I have questions regarding our possible use of it.
The main questions I have are around hierarchical lists like nested Categories and permissions.
From the look of some of the apps I've installed like FAQs with Categories but I can't find anything yet where they are nested. I tried creating a Content Type and adding fields where the first is the Category Name and the second is Parent Category. I created a new Content Type Field with a Data Type of Entity, but the only option for Input Type is default and Content Block Items. It works but when you create a new category the content that comes up in the Parent Category field covers just about everything - not sure I understand the concept behind this.
Then the second issue is permissions. Does this system somehow incorporate permissions because we'd like to lock down knowledge articles by category, but I haven't seen any implementations that showcase how one would do this.

Regarding #1 I don't understand your question, sorry :)
Regarding #2: there is no rule-based security, so you can't say "items with category X may be edited, but category Y may not"
BUT: you can easily implement this in your UI, if your main concern is user guidance and not "bad people with very good IT skills"

Best solution for relational db design with different types of content

Edit: title is pretty broad. It would be more accurate as "Best solution for relational db design where users can post content, and the available fields would vary depending on the type of content".
I'm working on a website where users can post different types of content. All posts have text and a feed_id value relating it to another (many posts to one feed). Based on the idea that some posts will only contain text, what would be the best solution?
Ideas I've thought of so far:
Add a table for each special type that refers to a post, leaving unreferenced posts to be text posts.
Problem: How would text-only posts be queried?
Add a table for each type including text posts.
Problem: Doesn't seem very efficient. For every post found in text_posts, the post would again have to be found in posts to get the text field. Then again, this was already the case with special types like pictures. Is there a proper way to accomplish this with JOIN?
foreign key in each post entry to each special type which can be null
problem: Lots of null fields, and rules have to be maintained by the application.
a field representing the type of post
cons: rules have to be maintained by the application (case sensitivity, posts should only have a valid type), harder to query
pro: this field could also represent different views for similar content (i.e. I could make a value for post_type called "blog_entry" which is exactly the same as a text post, but have the application show the content it differently). This pro has a bit of a code-smell to it though...
Also, if it makes any difference, my application is written in PHP.
Edit: It seems that my first solution, "Add a table for each special type that refers to a post, leaving unreferenced posts to be text posts", would work fine if I queried all posts like this:
SELECT posts.title, posts.text, picture_posts.id, picture_posts.src FROM posts
LEFT JOIN picture_posts
ON picture_posts.post_id=posts.id
ORDER BY posts.date DESC
and then some pseudo-code for the application would look like this:
print(posts.title, posts.text);
if (picture_posts.id is not null) {
showPicture(picture_posts.src);
}
Should I use this design?

That's a very open-ended question. If you are doing this as an academic exercise, I'd recommend reading about database normalization and database normal forms. If you are doing this to produce a functional website, it sounds like you are reinventing several wheels from scratch and your time would be better spent searching for an off-the-shelf solution that meets your needs.

Database MN relationship

I'm taking a class on database management systems (absolute beginner) and I'm working on a database for a very simple blog system.
I have a question regarding one M:N relationship between blog posts and categories where the posts belong (one blog posts can be in several categories.)
The part of the scheme looks like this:
Scheme
I know that somehow this scheme allows to add a blog post that doesn't belong to any category. However, I don't know why that is. Could someone please explain this to me?
Thanks.

It's probably a combination of two things. One would be a lack of referential integrity in your database design, ie you need foreign keys. The other would be that your front end application is allowing blogs without categories to be posted.

Because you can add a blog_posts record without having to add an associated post_cat record.

How would you model a read/follow systems?

So I've the following domain model :
Article which is basically a blog post and currently an Entity.
Now, I'd like to add the following feature :
When an user view the article (in its browser), an api call is made to "flag" the blog post as being read.
Now, if I do some computation, I should be able to determine which articles haven't been read yet.
When an user post a comment to an article, an api call is made to "flag" the blog post as being followed.
Now, if I do some computation, I should be able to determine if there are some new posted comments since the latest user's comment post.
Basically, both feature (read & follow) share the attribute, an article id, an user id and a read/action date.
Note that, if an Article is followed, and then read, the read date should be used.
Therefore, I though I could use the same object and adding an extra attributes to mark it as followed.
Do you have any design ideas?
Note that are much articles & users, I'm using Doctrine2 and MySQL but this apply to any languages.

To ensure your application scales well, I'd do your computations locally when the events are triggered. I.e. someone adds a comment and it causes the system to check who has an investment in that new comment. Otherwise you end up with a scheduled task processing all the data, which will run fine at first, but will have an exponentially increasing workload as the relations between users, articles and comments increases.
You can also look into using the Map/Reduce pattern, Ayende has a good introduction article to this, which is almost in the same application domain as you describe (articles, comments, etc.).
As for the event of marking an article or comment as read by a particular user, this is something that is neither an article or user thing. If you were using a document database and wanted to store this data against a user, then it could build up quite a bit of data over time, I'd be more tempted to either store the data in a new entity or against the article (as in theory this will have an initial burst of interest and them dip in interest to a level representing it's popularity.
Hopefully some of that might help.

Generate webpages directly from database or cache?

[I'm not asking about the architecture of SO, but it would be helpful to the question.]
On SO, when a user clicks on his/her name and clicks on "responses" they see other users responses to comment threads, questions, and answers in which they have participated. I've had the sneaking suspicion that I've missed certain responses out there, which made me wonder: if you had to build that thing, would you pull everything dynamically from the database every time a user requested it? Or would you modify it when there is new related activity in the application? Or would you build it in a nightly daemon process?
I imagine that the real answer is that it's dynamically constructed every time, but that the tables are denormalized in such a way so as to make the thing less time-consuming. How would you build it?
I'm asking about any platform, of course, not only on .Net.

I would pull it dynamically from the database every time. I think this gives you the best result from a user experience and then I would apply the principal that premature optimization is evil. Later if there were performance issues I would look into caching.
I think doing it as a daemon/push process would actually result in more overall work being done. That is the updates would happen more frequently than the users are requesting the info.

Obviously, when an answer or comment is posted, you'll want to identify the user that should be informed in their responses tab. Then just add a row to a responses table containing the response text, timestamp, and the user to which it belongs. That way you can dynamically generate the tab with a simple
select * from responses where user=<userid> order by time desc limit 30
or something like that.
p.s. Extra credit to anyone that can write a query that will remove old responses - assume that each person should have the last 30 responses in their responses tab.

I expect that userid would be a natural option for the clustered index. If you have an "Active" boolean field then you don't need to worry much about locks; the table could be write-only except to update the (unindexed) Active column. I bet it already works that way, since it appears that everything is recoverable.
Don't need no stinking extra-credit response remover.

I would assume this is denormalized in the database. The Comment table probably has both and answer_id and an answer_uid so the SQL to find comments on you answers just run against the comment table. The same setup would work on the Answer table. Each answer has a question_id and a question_uid.
Having said that, these are probably the same table and you have response_to_id and response_to_uid and that makes lots of code simpler and makes the "recent" tab a single select as well. In fact the difference between the two selects is one uses the uid and one uses the response_to_uid.

I'd say that your UI and your database should both be driven by your Application Domain; so they will reflect each other based on their common provenance there.
Some quick notes to illustrate, using simplified Object Role Modeling as discussed by Fowler et al.
Entities
Users
Questions
Answers
Comments
Entity Roles
(Note: In Object Role Modeling, most Roles are reflexive. Some, e.g. booleans here, are monopolar)
Question has User
Question has QuestionVersions
Question as Answers
Question has Comments
Answer has AnswerVersions
Answer has Comments
Question has User
QuestionVersion has Text
QuestionVersion has Timestamp
QuestionVersion has IsDeleted (could be inferred from nonNULL timestamp eg)
QuestionVersion has DeltedByUser
QuestionVersion has DeletedTimestamp
Answer has User
AnswerVersion has Text
AnswerVersion has Timestamp
AnswerVersion has IsDeleted
AnswerVersion has DeltedByUser
AnswerVersion has DeletedTimestamp
Comment has Text
Comment has User
Comment has Timestamp
Comment IsDeleted (boolean)
(note - no versions on comments)
I think that's the basics. These assertions drive ERDs in ORM. Hopefully it's self-evident how they drive the User Stories as well.
I don't think an implementation of a normalized design like this would require denormalization - especially since I think it's clear (from behavior) that queries => UI displays are cached to be refreshed 1X per minute.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas