NHibernate Eager Loading - Lots of unrelated data

NHibernate Eager Loading - Lots of unrelated data - nhibernate

My members will have the ability to customise their profile page with X amount of widgets, each widget displays different data such as a list of music, list of people they are following etc.
Several widgets include:
- List of media they have uploaded
- List of people they are following
- List of people following them
- Html/Text widget
- Media Statistics (num downloads etc)
- Comments widget for other members to leave comments
Some widgets will have to page the data returned because there could be hundreds of results.
I haven't done any optimisation at the moment so it is doing lots of DB work to return all the data...what would be the most efficient way to retrieve the data...would 1 DB call per widget be acceptable? There could be around 5-20 widgets per page.
If you need more information about my situation please feel free to ask.
Paul

Short answer: It depends.
Start off from the unoptimised state, then use SQL profiler or a C# profiler like dotTrace to work out the best places to make improvements. Set a realistic goal to work towards (e.g. 'less than 800 milliseconds to load the page').
Generally I find performance starts to suffer after about 20-30 database calls in a request, but this is going to depend on your server, the location of the database etc.
There are many things that you can try: pre-caching, eager fetch using joins rather than selects etc. Nothing is going to guarantee better performance though unless it is applied intelligently.
For a page with lots of widgets, a common design pattern is to load each widget asynchronously using AJAX, rather than loading the entire page in one go.

since you've cut out out your work to widgets the proper thing to do would be for each widget to do a single query for all its required functionality. This would also be the case even if you retrieved widgets via AJAX (which as cbp noted is not a bad idea).
Secondly, i would set up some kind of mechanism for each widget to register its existence and then after all widgets have registered then i would fire a single query that would include all widget queries. (technically its again multiple queries but in a single round-trip, see MulriCriteria and MultiQueries in NH reference).
Also do not forget that lazy loads are hidden db retrievals and you could have a huge performance impact by using lazy load in a situation where an eager load is proper (for example Foo.Bar.Name where you always show the Bar.Name value when you present the Foo entity)
Performance degradation can occur even with less that 20-30 database call per request, but it depends on the size and complexity of your entities, queries, filters as well as the size of the data sets retrieved.

Related

Lazy loading data a la Skeleton Screens. Is it possible?

I've been reading the article The Vietnam of Computer Science by Ted Neward. Although there's much I don't understand, or have not fully grasped, I was struck by a thought while reading this paragraph.
The Partial-Object Problem and the Load-Time Paradox
It has long been known that network traversal, such as that done when making a traditional SQL request, takes a significant amount of time to process. ... This cost is clearly non-trivial, so as a result, developers look for ways to minimize this cost by optimizing the number of round trips and data retrieved.
In SQL, this optimization is achieved by carefully structuring the SQL request, making sure to retrieve only the columns and/or tables desired, rather than entire tables or sets of tables. For example, when constructing a traditional drill-down user interface, the developer presents a summary display of all the records from which the user can select one, and once selected, the developer then displays the complete set of data for that particular record. Given that we wish to do a drill-down of the Persons relational type described earlier, for example, the two queries to do so would be, in order (assuming the first one is selected):
SELECT id, first_name, last_name FROM person;
SELECT * FROM person WHERE id = 1;
In particular, take notice that only the data desired at each stage of the process is retrieved–in the first query, the necessary summary information and identifier (for the subsequent query, in case first and last name wouldn’t be sufficient to identify the person directly), and in the second, the remainder of the data to display. ... This notion of being able to return a part of a table (though still in relational form, which is important for reasons of closure, described above) is fundamental to the ability to optimize these queries this way–most queries will, in fact, only require a portion of the complete relation.
Skeleton Screens
Skeleton Screens are a relatively new UI design concept introduced in 2013 by Luke Wrobleski in this paper. It encourages avoiding spinners to indicate loading and to instead gradually build UI elements, during load time, which makes the user feel as if things are quickly progressing and progress is being made, even if it's, in fact, slower than than when using a traditional loading indicator.
Here is a Skeleton Screen in use in the Microsoft Teams chat app. It displays this while waiting for stored chat logs to arrive from the database.
Utilizing Skeleton Screen Style Data Loading as a Paradigm for Data Retrieval
While Neward's paper is focusing on Object Relational Mappers, I had this thought about structuring data retrieval.
The paragraph quoted above indicates a struggle with querying too much data at one time, increasing data retrieval times until all the indicated data is gathered. What if, similar to Neward's SQL example above, smaller chunks of data were retrieved, as necessary, and loaded piecemeal into the application?
This would necessitate a fundamental shift in database querying logic, I think. It would obviously be a ridiculous suggestion to have this implemented in application code. To have your everyday developer write a multi-layered retrieval scheme to retrieve a single object would be insane. Rather, there would need to be some sort of built-in method whereby the developer can indicate which properties are considered to be required (username, Id, permissions, roles, etc...) which must be retrieved fully before moving forward, and which properties are ancillary. After all, many app users navigate through an application faster than all the data can populate, if they are familiar with the application and just need to navigate to a certain page. All they need is enough data to be loaded, which is the point of this scheme.
On the database side, there would probably a series of smaller retrievals, rather than a large one. I know this is probably more expensive, although I'm not certain of the technicalities, and while database performance may suffer, application performance might improve, at least as perceived by the user.
Conclusion
Imagine pulling up your Instagram and having the first series of photos (the ones you can see) load, twice as quickly as before. Photos are likely your first priority. It wouldn't matter if your username, notification indicator, and profile picture takes a few extra seconds to populate, since you have already been fed your expected data and have begun consuming as a user. Contrast that with having structural data loaded first. Nobody cares about seeing their username or the company logo load. They want to see content.
I don't know if this is a terrible idea or something that has already been considered, but I'd love to get some feedback on it.
What do you think?
Is it possible, from a technical standpoint?

Yii - generation of page meta information from DB

I need help regarding generating meta tags from database and setting them in different controller actions.
I have a table in DB, where I store meta information (keywords, description) for each controller action. I want to select this values in every action and set the tags fetched from DB using registerMetaTag().
What I want to know is how much this queries will effect page load time, and if there is a better approach for doing this?
Thanks,
Mark

This will be nearly unnoticeable if your database is set up traditionally. It will add 10,000ths of a second to load times for each query.
For such low frequency data though, you should be caching heavily, as you know it will not be changing often. This means the hit in performance will be negligible as it's pulled from a file/mem store/memory table depending how your caching is set up.
This is all a generalization of course, but then so was the question. If you've got any special set up or more specific optimization issues just comment or open a new question.
P.S
Don't micro optimize. Just do it, analyse the impact, decide if it needs performance improvement, and to what degree.
http://www.codinghorror.com/blog/2009/01/the-sad-tragedy-of-micro-optimization-theater.html

Is there a Rails convention to persisting lots of query data to the browser?

I have an application that allows the user to drill down through data from a single large table with many columns. It works like this:
There is a list of distinct top-level table values on the screen.
User clicks on it, then the list changes to the distinct next-level values for whatever was clicked on.
User clicks on one of those values, taken to 3rd level values, etc.
There are about 50 attributes they could go through, but it usually ends up only being 3 or 4. But since those 3 or 4 vary among the 50 possible attributes, I have to persist the selections to the browser. Right now I do it in a hideous and bulky hidden form. It works, but it is delicate and suboptimal. In order for it to work, the value of whatever level attribute is on the screen is populated in the appropriate place on the hidden form on the click event, and then a jQuery Ajax POST submits the form. Ugly.
I have also looked at Backbone.js, but I don't want to roll another toolkit into this project while there may be some other simple convention that I'm missing. Is there a standard Rails Way of doing something like this, or just some better way period?

Possible Approaches to Single-Table Drill-Down
If you want to perform column selections from a single table with a large set of columns, there are a few basic approaches you might consider.
Use a client-side JavaScript library to display/hide columns on demand. For example, you might use DataTables to dynamically adjust which columns are displayed based on what's relevant to the last value (or set of values) selected.
You can use a form in your views to pass relevant columns names into the session or the params hash, and inspect those values for what columns to render in the view when drilling down to the next level.
Your next server-side request could include a list of columns of interest, and your controller could use those column names to build a custom query using SELECT or #pluck. Such queries often involve tainted objects, so sanitize that input thoroughly and handle with care!
If your database supports views, users could select pre-defined or dynamic views from the next controller action, which may or may not be more performant. It's at least an idea worth pursuing, but you'd have to benchmark this carefully, and make sure you don't end up with SQL injections or an unmanageable number of pre-defined views to maintain.
Some Caveats
There are generally trade-offs between memory and latency when deciding whether to handle this sort of feature client-side or server-side. It's also generally worth revisiting the business logic behind having a huge denormalized table, and investigating whether the problem domain can't be broken down into a more manageable set of RESTful resources.
Another thing to consider is that Rails won't stop you from doing things that violate the basic resource-oriented MVC pattern. From your question, there is an implied assumption that you don't have a canonical representation for each data resource; approaching Rails this way often increases complexity. If that complexity is truly necessary to meet your application's requirements then that's fine, but I'd certainly recommend carefully assessing your fundamental design goals to see if the functional trade-offs and long-term maintenance burdens are worth it.

I've found questions similar to yours on Stack Overflow; there doesn't appear to be an API or style anyone mentions for persisting across requests. The best you can do seems to be storage in classes or some iteration on what you're already doing:
1) Persistence in memory between sessions/requests
2) Coping with request persistence design-wise
3) Using class caching

Is duplicating data in SQL and Document store (like MongoDB) a legit idea or should be avoided?

I have a question. I am considering using a data store for some type of objects (e.g. products data). Criteria for using document store is if object has a detail page, so fast read of the entire object is necessary (example - product with all attributes, images, comments etc). Criteria for using SQL is displaying lists (e.g. N newest, most popular etc).
Some objects meet both criteria. Products is an example. So is it a normal practice to store info that will be used in rendering lists on index pages in SQL database, and other data in document store?

If denormalization is suitable for getting performance, go ahead with denormalization. But you have to ensure that you have a way to deal with updates of denormalized data. Your options in MongoDB are:
multiple queries to avoid denormalization
embedded docs
database references
make your choice..

The main idea is mongoDB was created for denormalization and embedding. At one of my past projects i've done sql denormalization to get better performance, but i don't like sql denormalization because very many dublicated data( if you have one to many relation for example). Second step was rewriting data access layer to mongoDB. And in mongoDB for some difficult pages where i need to load multiple documents i've created denormalized document(with embeded collections and plain data from different documents) to fit page content. No all my problem pages work fast, like facebook ;).
But here possible problems, becase you should support denormalized document every time. Also all my denormalized data updates work async, and some data can be stale in some moment, but it's normal practice. Even stackoverlow use denormalization because sometime when open question i see an answer, but when i return back to questions list and refresh page sometimes question doesn't have answer.
If i need denormalization i choose mongodb.

When should one avoid using NHibernate's lazy-loading feature?

Most of what I hear about NHibernate's lazy-loading, is that it's better to use it, than not to use it. It seems like it just makes sense to minimize database access, in an effort to reduce bottlenecks. But few things come without trade-offs, certainly it slightly limits design by forcing you to have virtual properties. But I've also noticed that some developers turn lazy-loading off on certain often-used objects.
This makes me wonder if there are some definite situations where data-access performance is hurt by using lazy-loading.
So I wonder, when and in what situations should I avoid lazy-loading one of my NHibernate-persisted objects?
Is the downside to lazy-loading merely in additional processing time, or can nhibernate lazy-loading also increase the data-access time (for instance, by making additional round-trips to the database)?
Thanks!

There are clear performance tradeoffs between eager and lazy loading objects from a database.
If you use eager loading, you suck a ton of data in a single query, which you can then cache. This is most common on application startup. You are trading memory consumption for database round trips.
If you use lazy loading, you suck a minimal amount of data in a single query, but any time you need more information related to that initial data it requires more queries to the database and database performance hits are very often the major performance bottleneck in most applications.
So, in general, you always want to retrieve exactly the data you will need for the entire "unit of work", no more, no less. In some cases, you may not know exactly what you need (because the user is working through a wizard or something similar) and in that case it probably makes sense to lazy load as you go.
If you are using an ORM and focused on adding features quickly and will come back and optimize performance later (which is extremely common and a good way to do things), having lazy loading being the default is the correct way to go. If you later find (through performance profiling/analysis) that you have one query to get an object and then N queries to get the N objects related to that original object, you can change that piece of code to use eager loading to only hit the database once instead of N+1 times (the N+1 problem is a well known downside of using lazy loading).

The usual tradeoff for lazy loading is that you make a smaller hit on the database up front, but you end up making more hits on it long-term.
Without lazy loading, you'll grab an entire object graph up front, sucking down a large chunk of data at once. This could, potentially, cause lag in your UI, and so it is often discouraged. However, if you have a common object graph (not just single object - otherwise it wouldn't matter!) that you know will be accessed frequently, and top to bottom, then it makes sense to pull it down at once.
As an example, if you're doing an order management system, you probably won't pull down all the lines of every order, or all the customer information, on a summary screen. Lazy loading prevents this from happening.
I can't think of a good example for not using it offhand, but I'm sure there are cases where you'd want to do a big load of an object graph, say, on application initialization, in order to avoid lags in processing further down the line.

The short version is this:
Development is simpler if you use lazy loading. You just traverse object relationships in a natural OO way, and you get what you need when you ask for it.
Performance is generally better if you figure out what you need before you ask for it, and ask for it in one trip to the database.
For the past few years we've been focusing on quick development times. Now that we have a solid app and userbase, we're optimizing our data access.

If you are using a webservice between the client and server handling the database access using nhibernate it might be problematic using lazy loading since the object will be serialized and sent over the webservice and subsequent usage of "objects" further down in the object relationship needs a new trip to the database server using additional webservices. In such an instance it might not be too good using lazy loading. A word of caution, be careful in what you fetch if you turn lazy loading of, its way to easy to not think this through and through and end up fetching almost the whole database...

I have seen many performance problems aring from wrong loading behaviour configuration in Hibernate. The situation is quite the same with NHibernate I think. My recommendation is to always use lazy relations and then use eager fetching statemetns in your query - like fetch joins - . This ensures you are not loading to much data and you can avoid to many SQL queries.
It is easy to make a lazy releation eager by a query. It is nearly impossible the other way round.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas