I have 2 webpages, one will contain summary event info (i.e. Artist, City, Date) and the other will contain additional detailed event info (i.e. Time, Lat/Long, Full Address). Users will see the summary info, and be able to click to be taken to the detailed info.
As far as performance is concerned is it best to preload all of the information once from my first page, and then simply pass the additional detailed info to the second page via a post? Or on the first page simply query for the exact data I need, then on the second page, run a query again, this time bring back the additional data?
I'm open to any and all suggestions. Thanks.
Query for the data you need on the first page, then get the second page with its own query. That way you don't risk showing the user stale data (which is really important), plus your application has less moving parts (which is a nice feature) and you can use GET for the detail page so the user can bookmark it (also a good thing). The database will cache the information from the first query so it probably won't be doing a lot of repeated work for the second one.
Typically even when performance is a major concern, you would like to have the caching be orthogonal to your application logic. There are non-intrusive mechanisms like web proxies and using a Hibernate 2nd level cache that are worth considering before you commit yourself to baking caching into your application code.
NO answer can be 100% correct since can change based on: (1) the amount of data you have (ie: how many results on the summary page, how many event total on you DB?) (2) the amount of servers you have (3) the amount of traffic you have (4) your software architecture (ie: what database, webserver, etc...) and (5) how your site is used (ie: are th users typically going to click on 1 or 2 results from the summary page or will they probably click on 20 or 30?). (6) Are you willing to sacrifice accuracy for speed? (ie: how often does the data change and how critical is it for you users to get the ABSOLUTE latest info?) etc, etc, etc, ----
PROBABLY your best bet is to just load the info you need for each page.
Related
Let's assume we have a ticketing system web page where are displayed tickets (tickets are distributed on multiple pages). Also, in the same page there is a search form which allows filtering.
Those tickets can be modified anytime (delete,update,insert).
So i'm a bit confused. How should the internal architecture look?I've been thinking for a while and I haven't found a clear path.
From my point of view there are 2 ways:
use something like an in-memory database and store all the data there. So it's very easy to filter content and to display the requested items. But this solution implies storing a lot of useless data in ram. Like tickets closed or resolved. And those tickets should be there because they can be requested.
use database for every search, page display, etc. So there will be a lot of queries. Every search, every page (per user) will result in a database query. Isn't this a bit too much ?
Which solution is better? Are there any better solutions ? Are my concerns futile?
You said "But this solution implies storing a lot of useless data in ram. Like tickets closed or resolved. And those tickets should be there because they can be requested."
If those tickets should be there because they can be requested, then it's not really useless data, is it?
It sounds like a good use case for a hybrid in-memory/persistent database. Keep the open/displayed tickets in in-memory tables. When closed, move them to persistent tables.
I've developed a react native application where users can login, chose different items from a list, see the details to the item (profile) and add/delete/edit different posts (attached to one item).
Now my user base has grown and therefor I have decided to introduce new database tables in order to log each action my users do to analyze the accumulated data later and optimize for example the usability etc.
My first question is: Is there any convention or standard that lists the data to be collected in such a case (like logtime, action, ...)? Don't want to lose any useful data because I've noticed the value of it too late.
And: In which time intervals should an app send the users logdata to my remote server (async requests after each action, daily, before logout...)? Is there any gold standard?
Actually it's more about how much data you would like to collect and if it is matching with your privacy terms and conditions. If you're going to store the data on some server other than yours to analyse it, it is highly recommended that you don' refer to user ids there, clearly for privacy reasons.
About when is the right time to log data, again it depends of the data you would like to track, if you are tracking how many minutes they spend on a screen or how they interact with some posts, you may need to send those regularly to your server depending on your needs: whether you want to analyse the data instantly to improve the user experience (show more relevant posts) or just to use the data later. If the data you need to analyse is not really that much, you can do it after each call, if you're planning to track huge amounts of data that you don't need right away, probably you could choose to send the data at time frames where you don't have a big load on your server (to save bandwidth, you can choose night time (it's a little bit more complicated than that))
I am currently recording basic page views on a website using a single column, incrementing by one on each page load.
This gives a limited, very general view of the most visited pages, without taking into account pages being repeatedly loaded by a visitor, or being visited by search bots, etc.
Without worrying about these, I would like to efficiently track webpage visits, to allow querying for more detail, such as the most popular page today, or most popular this week.
Storing each view as an individual record would surely be quickly inefficient, and the data required doesn't need that level of detail.
In order to answer your question, you would have to provide your storage requirements and limitations, as well as the information that you want to store to identify page views.
In terms of pure storage efficiency, your existing logging is, i'd say, the most efficient way of storing page views, but realistically, this data is not very relevant without other pieces of information that give you a better picture, as you mentioned, tracking either the user, IP address and other non-sensitive information will give you a better panorama of the activity in your site.
I would suggest an approach that gives you both meaningful information and analytics capability in the following form:
Keep a log for all page views in a table that will store information such as:
IP
Page (either the address, or if you're using MVC, the Controller and Action)
User Agent
IsMobileRequest? (Optional, in MVC you can access it through the Request.Browser.IsMobileDevice property)
TimeStamp
Additionally, you can have another table that stores the summary for the visits of all pages for a given period (for example, by month) that is updated using a SQL Server Job every month, retrieving records from the previous table, filtering them, creating a summary record in the monthly summary table and deleting them from the PageViews log table. This table would look similar to the one you already have, with maybe a few additional columns that contain figures such as different IPs count, most popular browser, amount of mobile visits and maybe an average visit hour range (all of them calculated by the job using the Log table).
This way, you can always have information about your page visits for the last month and statistic summaries for the monthly activity of your site, effectively using your available storage and providing you with a more rich source of analysis about your site's users.
We have data stored in a data warehouse as follows:
Price
Date
Product Name (varchar(25))
We currently only have four products. That changes very infrequently (on average once every 10 years). Once every business day, four new data points are added representing the day's price for each product.
On the website, a user can request this information by entering a date range and selecting one or more products names. Analytics shows that the feature is not heavily used (about 10 users requests per week).
It was suggested that the data warehouse should daily push (SFTP) a CSV file containing all data (currently 6718 rows of this data and growing by four each day) to the web server. Then, the web server would read data from the file and display that data whenever a user made a request.
Usually, the push would only be once a day, but more than one push could be possible to communicate (infrequent) price corrections. Even in the price correction scenario, all data would be delivered in the file. What are problems with this approach?
Would it be better to have the web server make a request to the data warehouse per user request? Or does this have issues such as a greater chance for network errors or performance issues?
Would it be better to have the web server make a request to the data warehouse per user request?
Yes it would. You have very little data, so there is no need to try and 'cache' this in some way. (Apart from the fact that CSV might not be the best way to do this).
There is nothing stopping you from doing these requests from the webserver to the database server. With as little information as this you will not find performance an issue, but even if it would be when everything grows, there is a lot to be gained on the database-side (indexes etc) that will help you survive the next 100 years in this fashion.
The amount of requests from your users (also extremely small) does not need any special treatment, so again, direct query would be the best.
Or does this have issues such as a greater chance for network errors or performance issues?
Well, it might, but that would not justify your CSV method. Examples and why you need not worry, could be
the connection with the databaseserver is down.
This is an issue for both methods, but with only one connection per day the change of a 1-in-10000 failures might seem to be better for once-a-day methods. But these issues should not come up very often, and if they do, you should be able to handle them. (retry request, give a message to user). This is what enourmous amounts of websites do, so trust me if I say that this will not be an issue. Also, think of what it would mean if your daily update failed? That would present a bigger problem!
Performance issues
as said, this is due to the amount of data and requests, not a problem. And even if it becomes one, this is a problem you should be able to catch at a different level. Use a caching system (non CSV) on the database server. Use a caching system on the webserver. Fix your indexes to stop performance from being a problem.
BUT:
It is far from strange to want your data-warehouse separated from your web system. If this is a requirement, and it surely could be, the best thing you can do is re-create your warehouse-database (the one I just defended as being good enough to query directly) on another machine. You might get good results by doing a master-slave system
your datawarehouse is a master-database: it sends all changes to the slave but is inexcessible otherwise
your 2nd database (on your webserver even) gets all updates from the master, and is read-only. you can only query it for data
your webserver cannot connect to the datawarehouse, but can connect to your slave to read information. Even if there was an injection hack, it doesn't matter, as it is read-only.
Now you don't have a single moment where you update the queried database (the master-slave replication will keep it updated always), but no chance that the queries from the webserver put your warehouse in danger. profit!
I don't really see how SQL injection could be a real concern. I assume you have some calendar type field that the user fills in to get data out. If this is the only form just ensure that the only field that is in it is a date then something like DROP TABLE isn't possible. As for getting access to the database, that is another issue. However, a separate file with just the connection function should do fine in most cases so that a user can't, say open your webpage in an HTML viewer and see your database connection string.
As for the CSV, I would have to say querying a database per user, especially if it's only used ~10 times weekly would be much more efficient than the CSV. I just equate the CSV as overkill because again you only have ~10 users attempting to get some information, to export an updated CSV every day would be too much for such little pay off.
EDIT:
Also if an attack is a big concern, which that really depends on the nature of the business, the data being stored, and the visitors you receive, you could always create a backup as another option. I don't really see a reason for this as your question is currently stated, but it is a possibility that even with the best security an attack could happen. That mainly just depends on if the attackers want the information you have.
On my website, there exists a group of 'power users' who are fantastic and adding lots of content on to my site.
However, their prolific activities has led to their profile pages slowing down a lot. For 95% of the other users, the SPROC that is returning the data is very quick. It's only for these group of power users, the very same SPROC is slow.
How does one go about optimising the query for this group of users?
You can assume that the right indexes have already been constructed.
EDIT: Ok, I think I have been a bit too vague. To rephrase the question, how can I optimise my site to enhance the performance for these 5% of users. Given that this SPROC is the same one that is in use for every user and that it is already well optimised, I am guessing the next steps are to explore caching possibilities on the data and application layers?
EDIT2: The only difference between my power users and the rest of the users is the amount of stuff they have added. So I guess the bottleneck is just the sheer number of records that is being fetched. An average user adds about 200 items to my site. These power users add over 10,000 items. On their profile, I am showing all the items they have added (you can scroll through them).
I think you summed it up here:
An average user adds about 200 items
to my site. These power users add over
10,000 items. On their profile, I am
showing all the items they have added
(you can scroll through them).
Implement paging so that it only fetches 100 at a time or something?
Well you can't optimize a query for a specific result set and leave the query for the rest unchanged. If you know what I mean. I'm guessing there's only one query to change, so you will optimize it for every type of user. Therefore this optimization scenario is no different from any other. Figure out what the problem is; is it too much data being returned? Calculations taking too long because of the amount of data? Where exactly is the cause of the slowdown? Those are questions you need to ask yourself.
However I see you talking about profile pages being slow. When you think the query that returns that information is already optimized (because it works for 95%), you might consider some form of caching of the profile page content. In general, profile pages do not have to supply real-time information.
Caching can be done in a lot of ways, far too many to cover in this answer. But to give you one small example; you could work with a temp table. Your 'profile query' returns information from that temp table, information that is already calculated. Because that query will be simple, it won't take that much time to execute. Meanwhile, you make sure that the temp table periodically gets refreshed.
Just a couple of ideas. I hope they're useful to you.
Edit:
An average user adds about 200 items to my site. These power users add over 10,000 items.
On their profile, I am showing all the
items they have added (you can scroll
through them).
An obvious help for this will be to limit the number of results inside the query, or apply a form of pagination (in the DAL, not UI/BLL!).
You could limit the profile display so that it only shows the most recent 200 items. If your power users want to see more, they can click a button and get the rest of their items. At that point, they would expect a slower response.
Partition / separate the data for those users then the tables in question will be used by only them.
In a clustered environment I believe SQL recognises this and spreads the load to compensate, however in a single server environment i'm not entirely sure how it does the optimisation.
So essentially (greatly simplified of course) ...
If you havea table called "Articles", have 2 tables ... "Articles", "Top5PercentArticles".
Because the data is now separated out in to 2 smaller subsets of data the indexes are smaller and the read and write requests on a single table in the database will drop.
it's not ideal from a business layer point of view as you would then need some way to list what data is stored in what tables but that's a completely separate problem altogether.
Failing that your only option past execution plans is to scale up your server platform.