I'm writing an application that displays posts, much like a simple forum system.
Posts can be marked as urgent by users when they submit the post, but a moderator must approve the "urgent" status. A post will still be displayed even if it has not been approved as urgent, but will just appear to be a normal post until the moderator approves the urgent status at which point the post will get special treatment.
I have considered two approaches for this:
1) have two flags in the posts tables. One to say the user has requested urgent status, and a second to indicate if admin has approved urgent status. Only if both are true will the post be show as being urgent.
2) have two tables. A requests pending table which holds all the pending urgent approvals. Once admin approves urgent status, I would delete the request from the pending table and update the posts table so that the urgent field becomes true for that post.
I'm not sure if either approach is better than the other.
The first solution means I only have one table to worry about, but it ends up with more fields in it. Not sure if this actually makes querying any slower or not, considering that the posts table will be the most queried table in the app.
The second solution keeps the posts table leaner but ads another table to deal with (not that this is hard).
I'm leaning towards the second solution, but wonder if I'm not over analysing hings and making my life more complicated than it needs to be. Advice?
Definitely 1). The additional table just messes things up. One extra status field is enough, with values: 0=normal, 1=urgent_requested, 2=urgent_approved for example.
You could query with status=1 for queries needing approval, and if you order by status desc, you naturally get the urgent messages up front.
There's another solution that comes in mind :)
You can have a table with post statuses and in your Posts table you will have a columnt which references a status..
This apporach has several advantages - like you can seamlessly add more statuses in the future.. or you can even have another table holding rules how statuses can be changed (workflow).
The second approach is cleanest in terms of design, and it will probably end up using less disk space. The first approach is less "pure", but from a maintenance and coding point of view it's simpler; hence I'd go with this approach.
Also, it's great to see someone thinking about design before they go off and write reams of code. :) Can't tell you how many messed-up projects I've seen, where a single hour of thinking about the design would've saved many hours of effort for all involved...
I think option 1 is the best. The only thing you need to do is make an index with the two fields.
The option 2 add too much complexity
I have a mysql query just for things like this. I will post it as soon as I remember/find the correct syntax
Related
Description
I am developing an app which has Posts in it and for each post users can comment and like.
I am running a PG db on the server side and the data is structured in three different tables: post, post_comments (with ref to post), post_likes (with ref to post).
In the feed of the app I want to display all the posts with the comments count, last comment, number of likes and the last user that liked the post.
I was wondering what is the best approach to create the API calls, and currently have two ideas in mind:
First Idea
Make one large request using a query with multiple joins and parse the query accordingly.
The down side that I see in this approach is that the query will be very heavy which will affect the load time of the users feed as it will have to run over post_comments, post_likes, etc and count all the rows and then retrieve also the last rows.
Second Idea
Add an extra table which I will call post_meta that will store those exact parameters I need and update them when needed.
This approach will make the retrieve query much lighter and faster (faster loading time), but will increase the adding & updating time of comments and likes.
Was wondering if someone could give me some insights about the preferred way to tackle this problem.
Thanks
I'm trying to design a system where an administrator will have to approve changes to the data and other various administrative tasks -- add a user, add an admin etc.
My idea is to have a notification table that contains these notifications, but the problem is that a notification can be any of the previously mentioned types, ie it's data is stored in one of many tables. Here is a picture to describe my current plan -- note I'm sure that it's not a proper ER diagram.
full_screen
Also, the data goes into a pending table, that reflects the table it will eventually wind up in, provided the data is approved -- it's a staging ground of sorts. So, a pending_user is a user that is not in the user table. And as you can see the user table, amongst others, is not shown here, but one can use their imagination.
I'm concerned that the multiple null values in the pending table will have adverse effects that I'm not totally aware of, such as increased space usage and possibly increase query time. Also, I'm not sure how I'll implement the retrieval of these notifications. My naive approach is to select the first X notifications, analyze the rows to find the non-null column, retrieve the appropriate data and then load all the data in a response.
Is there a more straight forward pattern for this type of problem?
Thanks in advance for any help.
I think, the traditional way is to provide various levels of access/read/write rights to users. These access rights define what actions a user can and can't perform. In this traditional approach if a user has access to a certain function, he can do it without further approval.
Also, traditionally there are some kind of audit logs that contain a trace of all important changes to the data. With such logs it would be possible to know who made a change (and when).
If you need to build a two-stage system, where a change has to go through an approval, I'd add a flag column to each important table that would indicate that values in the given row are not final and have to be approved. The table would store all historical changes to the data and with the help of this flag the system would know which variant is the latest approved version and which variant is pending and waiting for approval.
I would not try to make a single universal table that would hold data related to changes in many different tables. Each table is different and approval process for each table is likely to be different. I doubt that you'll have more than a dozen entities that are important enough to go through this approval process.
On my website, there exists a group of 'power users' who are fantastic and adding lots of content on to my site.
However, their prolific activities has led to their profile pages slowing down a lot. For 95% of the other users, the SPROC that is returning the data is very quick. It's only for these group of power users, the very same SPROC is slow.
How does one go about optimising the query for this group of users?
You can assume that the right indexes have already been constructed.
EDIT: Ok, I think I have been a bit too vague. To rephrase the question, how can I optimise my site to enhance the performance for these 5% of users. Given that this SPROC is the same one that is in use for every user and that it is already well optimised, I am guessing the next steps are to explore caching possibilities on the data and application layers?
EDIT2: The only difference between my power users and the rest of the users is the amount of stuff they have added. So I guess the bottleneck is just the sheer number of records that is being fetched. An average user adds about 200 items to my site. These power users add over 10,000 items. On their profile, I am showing all the items they have added (you can scroll through them).
I think you summed it up here:
An average user adds about 200 items
to my site. These power users add over
10,000 items. On their profile, I am
showing all the items they have added
(you can scroll through them).
Implement paging so that it only fetches 100 at a time or something?
Well you can't optimize a query for a specific result set and leave the query for the rest unchanged. If you know what I mean. I'm guessing there's only one query to change, so you will optimize it for every type of user. Therefore this optimization scenario is no different from any other. Figure out what the problem is; is it too much data being returned? Calculations taking too long because of the amount of data? Where exactly is the cause of the slowdown? Those are questions you need to ask yourself.
However I see you talking about profile pages being slow. When you think the query that returns that information is already optimized (because it works for 95%), you might consider some form of caching of the profile page content. In general, profile pages do not have to supply real-time information.
Caching can be done in a lot of ways, far too many to cover in this answer. But to give you one small example; you could work with a temp table. Your 'profile query' returns information from that temp table, information that is already calculated. Because that query will be simple, it won't take that much time to execute. Meanwhile, you make sure that the temp table periodically gets refreshed.
Just a couple of ideas. I hope they're useful to you.
Edit:
An average user adds about 200 items to my site. These power users add over 10,000 items.
On their profile, I am showing all the
items they have added (you can scroll
through them).
An obvious help for this will be to limit the number of results inside the query, or apply a form of pagination (in the DAL, not UI/BLL!).
You could limit the profile display so that it only shows the most recent 200 items. If your power users want to see more, they can click a button and get the rest of their items. At that point, they would expect a slower response.
Partition / separate the data for those users then the tables in question will be used by only them.
In a clustered environment I believe SQL recognises this and spreads the load to compensate, however in a single server environment i'm not entirely sure how it does the optimisation.
So essentially (greatly simplified of course) ...
If you havea table called "Articles", have 2 tables ... "Articles", "Top5PercentArticles".
Because the data is now separated out in to 2 smaller subsets of data the indexes are smaller and the read and write requests on a single table in the database will drop.
it's not ideal from a business layer point of view as you would then need some way to list what data is stored in what tables but that's a completely separate problem altogether.
Failing that your only option past execution plans is to scale up your server platform.
I will do microblogging web service (for school, so don't blast me for lack of new idea) and I worry that DB could be often be overloaded (user could following other users or even tag so I suppouse that SELECT will be heavy - check 20 latest messages which contains all observing tags and user).
My idea is create another table, and store in it only statusID and userID (who should pick up message). Danger of that is, if some tag or user has many followers there will be a lot of record with that status ID. So, is it good idea? Or maybe better is used M2M relation? (one status -> many receivers)
I think most databases can easily handle large record sets. The responsibility to have it preform lies in your design with properly setting up the indexes. If you create the right indexes the select clauses should perform really well.
I'd go with a users table, a table to have the m2m relationship between users and messages table.
You can then do one select to find all of the users a user is following and then a second select in to get all of the messages of interest (sorting and limiting the results as appropriate). Extending this to tagging should be pretty simple.
This design should be fine for large numbers of users and messages as long as you index the right columns. If you got massive then you could also run the users tables and messages tables to different servers or have read only replicates. I wouldn't even worry about that for the moment - you'd need to be huge.
When implementing Collabinate (http://www.collabinate.com), a service-based engine for microblogging and shared activity streams, I used a graph database. The fact that people create posts and follow other people lends itself to a graph structure. With the right relationships and algorithms, this can be a very efficient and performant solution.
This is not SO Meta question. I am using SO only as example.
In StackoverFlow each answer, each comment, each question, each vote has a effect which produces a badge at some point of time. I mean after every action a list of queries are tested.
E.g. If Mr.A up votes Mr.B Answer. So we have to check is this Mr.B's answer upvoted 100 times so give the Mr.B a badge , Has Mr.A upvoted 100th time so give him a badge.
It means I have to run at least 100 queries/IfElse for each action.
Now my real life example is I have an application where I receive online data from an attendance machine. When a user shows his card to machine. I receive this and store it as a record. Now based on this record I have multiple calculations.i.e Is he late. Is he late for continues 3 days. Is he in a right shift(Day shift/Night Shift). Is today holiday. Is this a overtime. Is he early.......etc.,etc.,etc.
What is the best strategy for this kind of requirements.
Update:
Can SO team guide us on this?
You use queues and workflows. This way you decouple the moment of the update from the actual notifications, allowing the system to scale. Tighly coupled, trigger based or similar solutions cannot scale, as each update has to wait for all the interested parties to react to the notification. The design of processing engine using workflows allows to easily add steps and notifications consumers by changing the data, w/o changing the schema.
For instance see how MSDN uses queues to handle similar problems with MSDN content: Building the MSDN Aggregation System.
Couldn't you just use "flags" (other tables, other columns, whatever) to indicate when those special cases occur? That way you would only have to do one lookup (per special case) than a ton of lookups and/or joins. You could record the changes (third day late, etc.) on insert.
Also, what to check depends on a threshold.
e.g. Is a person absent from last 3 days? That check is required only when the person is absent for 2 days.
I mean - you need not check everything, everytime.
Also, how much of info needs to be updated immediately? SO doesn't update things real time.
May be you must use two databases with online replication between them - one for getting realtime data and nothing else, in second you may use hard calculations (for example calculate all latings every 10 minutes or by requsts). Locate this databases on different servers.