Optimising a query for Top 5% of users - sql-server-2005

On my website, there exists a group of 'power users' who are fantastic and adding lots of content on to my site.
However, their prolific activities has led to their profile pages slowing down a lot. For 95% of the other users, the SPROC that is returning the data is very quick. It's only for these group of power users, the very same SPROC is slow.
How does one go about optimising the query for this group of users?
You can assume that the right indexes have already been constructed.
EDIT: Ok, I think I have been a bit too vague. To rephrase the question, how can I optimise my site to enhance the performance for these 5% of users. Given that this SPROC is the same one that is in use for every user and that it is already well optimised, I am guessing the next steps are to explore caching possibilities on the data and application layers?
EDIT2: The only difference between my power users and the rest of the users is the amount of stuff they have added. So I guess the bottleneck is just the sheer number of records that is being fetched. An average user adds about 200 items to my site. These power users add over 10,000 items. On their profile, I am showing all the items they have added (you can scroll through them).

I think you summed it up here:
An average user adds about 200 items
to my site. These power users add over
10,000 items. On their profile, I am
showing all the items they have added
(you can scroll through them).
Implement paging so that it only fetches 100 at a time or something?

Well you can't optimize a query for a specific result set and leave the query for the rest unchanged. If you know what I mean. I'm guessing there's only one query to change, so you will optimize it for every type of user. Therefore this optimization scenario is no different from any other. Figure out what the problem is; is it too much data being returned? Calculations taking too long because of the amount of data? Where exactly is the cause of the slowdown? Those are questions you need to ask yourself.
However I see you talking about profile pages being slow. When you think the query that returns that information is already optimized (because it works for 95%), you might consider some form of caching of the profile page content. In general, profile pages do not have to supply real-time information.
Caching can be done in a lot of ways, far too many to cover in this answer. But to give you one small example; you could work with a temp table. Your 'profile query' returns information from that temp table, information that is already calculated. Because that query will be simple, it won't take that much time to execute. Meanwhile, you make sure that the temp table periodically gets refreshed.
Just a couple of ideas. I hope they're useful to you.
Edit:
An average user adds about 200 items to my site. These power users add over 10,000 items.
On their profile, I am showing all the
items they have added (you can scroll
through them).
An obvious help for this will be to limit the number of results inside the query, or apply a form of pagination (in the DAL, not UI/BLL!).

You could limit the profile display so that it only shows the most recent 200 items. If your power users want to see more, they can click a button and get the rest of their items. At that point, they would expect a slower response.

Partition / separate the data for those users then the tables in question will be used by only them.
In a clustered environment I believe SQL recognises this and spreads the load to compensate, however in a single server environment i'm not entirely sure how it does the optimisation.
So essentially (greatly simplified of course) ...
If you havea table called "Articles", have 2 tables ... "Articles", "Top5PercentArticles".
Because the data is now separated out in to 2 smaller subsets of data the indexes are smaller and the read and write requests on a single table in the database will drop.
it's not ideal from a business layer point of view as you would then need some way to list what data is stored in what tables but that's a completely separate problem altogether.
Failing that your only option past execution plans is to scale up your server platform.

Related

Implementation of last comment and comments count SQL

Description
I am developing an app which has Posts in it and for each post users can comment and like.
I am running a PG db on the server side and the data is structured in three different tables: post, post_comments (with ref to post), post_likes (with ref to post).
In the feed of the app I want to display all the posts with the comments count, last comment, number of likes and the last user that liked the post.
I was wondering what is the best approach to create the API calls, and currently have two ideas in mind:
First Idea
Make one large request using a query with multiple joins and parse the query accordingly.
The down side that I see in this approach is that the query will be very heavy which will affect the load time of the users feed as it will have to run over post_comments, post_likes, etc and count all the rows and then retrieve also the last rows.
Second Idea
Add an extra table which I will call post_meta that will store those exact parameters I need and update them when needed.
This approach will make the retrieve query much lighter and faster (faster loading time), but will increase the adding & updating time of comments and likes.
Was wondering if someone could give me some insights about the preferred way to tackle this problem.
Thanks

Best way to handle multiple lists in redis?

I am building a site that allows users to view and do some activities (vote, comments,...) on articles. I am using MySql as main storage. In order to improve performance, I am considering using Redis (4.x) to handle some view activities such as top/hot articles...
I am gonna use one sortedSet, called topAticleSortedSet, to store top articles, and this set will be updated frequently every time a user vote or somment on a certain article.
Since each user will login and follow some topics and I also need to filter and display articles in the topArticleSortedSet based on users' following topics.
There is of course scroll paging as well.
For those reasons, I intend to create one topArticleSortedSet for each user and that way each user will have one independent list. But I dont know if this is best practice because there might be million of logged-in users access in my site (then it would be million of sets which is around 1000 article items for each).
Can anyone give me some advice please?
I think you should keep to one Set, and filter it for each user, instead of having a Set per user. Here is why:
My understanding is that the Set have to be updated each time someone reads an article (incrementing a counter probably).
Let's say you have n users, each one reading p articles per day. So you have to update the Set n*p times a day.
In the "single" set option, you will need to update just one set when there is an article read. So it makes a total of n*p updates. In the "one set per user" architecture, you will need to do n*p*n updates, which is much bigger.
Of course, filtering a single Set will take you some time, longer than accessing a Set designed for one user. But on average, I guess it would take you much less time than n operations. Basically, you need to know which is faster: filtering one Set or updating n Sets ?

Gain a Customized report

Goal:
Display the result based on the picture below in reporting Service 2008 R2.
Problem:
How should I do it?
You also have to remember that in reality the list contains lots of data, maybe miljon
In terms of the report itself, this should be a fairly standard implementation.
You'll need to create one Tablix, with one Group for Customer (one row), one Group for Artist (two rows, one for the headers and one for the Artist name, then a detail row for the Title.
It looks like you need more formatting options for the Customers Textbox - you could merge the cells in the Customer header row, then insert a Rectangle, which will give you more options to move objects around in the row.
For large reports you've got a few options:
Processing large reports: http://msdn.microsoft.com/en-us/library/ms159638(v=sql.105).aspx
Report Snapshots: http://msdn.microsoft.com/en-us/library/ms156325(v=sql.105).aspx
Report Caching: http://msdn.microsoft.com/en-us/library/ms155927(v=sql.105).aspx
I would recommend scheduling a Snapshot overnight to offload the processing to a quiet time, then making sure the report has sensible pagination set up so not too much data has to be handled at one time when viewed (i.e. not trying to view thousands of reports at one time when viewed in Report Manager).
Another option would be to set up an overnight Subscription that could save the report to a fileshare or send it as an email.
Basically you're thinking about reducing the amount of processing that needs to be done at peak times and processing the report once for future use to reduce overall resource usage.
I would use a List with text-boxes inside to for that kind of display.
In addition you may consider to add page break after each customer.
Personally I Experienced Lots of performance issues when dealing with thousands of rows, not to mention millions.
My advise to you is to re-consider the report main target: if the report is for exporting purposes - then don't use the ssrs for that.
If the report is for viewing - then perhaps it is possible to narrow down the data using parameters per user's choice.
Last thing, I wish you Good luck :)

Performance: SQL vs. Post

I have 2 webpages, one will contain summary event info (i.e. Artist, City, Date) and the other will contain additional detailed event info (i.e. Time, Lat/Long, Full Address). Users will see the summary info, and be able to click to be taken to the detailed info.
As far as performance is concerned is it best to preload all of the information once from my first page, and then simply pass the additional detailed info to the second page via a post? Or on the first page simply query for the exact data I need, then on the second page, run a query again, this time bring back the additional data?
I'm open to any and all suggestions. Thanks.
Query for the data you need on the first page, then get the second page with its own query. That way you don't risk showing the user stale data (which is really important), plus your application has less moving parts (which is a nice feature) and you can use GET for the detail page so the user can bookmark it (also a good thing). The database will cache the information from the first query so it probably won't be doing a lot of repeated work for the second one.
Typically even when performance is a major concern, you would like to have the caching be orthogonal to your application logic. There are non-intrusive mechanisms like web proxies and using a Hibernate 2nd level cache that are worth considering before you commit yourself to baking caching into your application code.
NO answer can be 100% correct since can change based on: (1) the amount of data you have (ie: how many results on the summary page, how many event total on you DB?) (2) the amount of servers you have (3) the amount of traffic you have (4) your software architecture (ie: what database, webserver, etc...) and (5) how your site is used (ie: are th users typically going to click on 1 or 2 results from the summary page or will they probably click on 20 or 30?). (6) Are you willing to sacrifice accuracy for speed? (ie: how often does the data change and how critical is it for you users to get the ABSOLUTE latest info?) etc, etc, etc, ----
PROBABLY your best bet is to just load the info you need for each page.

When should I be concerned about transaction size?

I have a feature where we need to merge user records. The user chooses a user to keep and and a user to discard.
We have 59 relations to user records in our database so for each one I need to do:
UPDATE (Table) set UserNo=(userToKeep) WHERE UserNo=(userToDiscard)
and then DELETE the userToDiscard and their user prefs (118).
Should I be worried about the transaction size? This is MS-SQL 2005.
Is there anything I could do?
Thanks
Have you tested how long the process actually takes? How often are users merged?
If you have indexes on the user ID in each of this table (and I would think that would be the natural thing to do anyway) then even with 59 tables it shouldn't take too long to perform those updates and deletes. If you only actually merge users a couple times a week then a little blip like that shouldn't be an issue. At worst, someone has to wait an extra couple seconds to do something once or twice a week.
Another option would be to save these user merge requests in a table and do the actual work in a nightly process (or whenever "off-hours" is for your application). You would need to make it clear to the users of your application though that merges do not take effect immediately. You would also need to account for a lot of possible contingencies: what if the same user is set to merge with two different users that night, etc.
It depends on how large your user table is, and what indexes you have in place.
Merging users does not sound like feature that would be used very often. Given that, there's 98% probability you shouldn't worry about transaction size (remaing 2% reserved for possible deadlocks)
Generally transactions should be the smallest size that they need to be to minimize contention and possible deadlock situations. (Although making them too small can cause overhead as well) Would the queries that go against these tables give incorrect results if some of the rows were changed first and others later? Depending on your application, this could cause a business problem.
Any idea how many rows will be updated in each table? If each user could have millions of rows in a table, you might need to be more careful than if there are a handful of rows in each table.