Lazy loading data a la Skeleton Screens. Is it possible? - sql

I've been reading the article The Vietnam of Computer Science by Ted Neward. Although there's much I don't understand, or have not fully grasped, I was struck by a thought while reading this paragraph.
The Partial-Object Problem and the Load-Time Paradox
It has long been known that network traversal, such as that done when making a traditional SQL request, takes a significant amount of time to process. ... This cost is clearly non-trivial, so as a result, developers look for ways to minimize this cost by optimizing the number of round trips and data retrieved.
In SQL, this optimization is achieved by carefully structuring the SQL request, making sure to retrieve only the columns and/or tables desired, rather than entire tables or sets of tables. For example, when constructing a traditional drill-down user interface, the developer presents a summary display of all the records from which the user can select one, and once selected, the developer then displays the complete set of data for that particular record. Given that we wish to do a drill-down of the Persons relational type described earlier, for example, the two queries to do so would be, in order (assuming the first one is selected):
SELECT id, first_name, last_name FROM person;
SELECT * FROM person WHERE id = 1;
In particular, take notice that only the data desired at each stage of the process is retrieved–in the first query, the necessary summary information and identifier (for the subsequent query, in case first and last name wouldn’t be sufficient to identify the person directly), and in the second, the remainder of the data to display. ... This notion of being able to return a part of a table (though still in relational form, which is important for reasons of closure, described above) is fundamental to the ability to optimize these queries this way–most queries will, in fact, only require a portion of the complete relation.
Skeleton Screens
Skeleton Screens are a relatively new UI design concept introduced in 2013 by Luke Wrobleski in this paper. It encourages avoiding spinners to indicate loading and to instead gradually build UI elements, during load time, which makes the user feel as if things are quickly progressing and progress is being made, even if it's, in fact, slower than than when using a traditional loading indicator.
Here is a Skeleton Screen in use in the Microsoft Teams chat app. It displays this while waiting for stored chat logs to arrive from the database.
Utilizing Skeleton Screen Style Data Loading as a Paradigm for Data Retrieval
While Neward's paper is focusing on Object Relational Mappers, I had this thought about structuring data retrieval.
The paragraph quoted above indicates a struggle with querying too much data at one time, increasing data retrieval times until all the indicated data is gathered. What if, similar to Neward's SQL example above, smaller chunks of data were retrieved, as necessary, and loaded piecemeal into the application?
This would necessitate a fundamental shift in database querying logic, I think. It would obviously be a ridiculous suggestion to have this implemented in application code. To have your everyday developer write a multi-layered retrieval scheme to retrieve a single object would be insane. Rather, there would need to be some sort of built-in method whereby the developer can indicate which properties are considered to be required (username, Id, permissions, roles, etc...) which must be retrieved fully before moving forward, and which properties are ancillary. After all, many app users navigate through an application faster than all the data can populate, if they are familiar with the application and just need to navigate to a certain page. All they need is enough data to be loaded, which is the point of this scheme.
On the database side, there would probably a series of smaller retrievals, rather than a large one. I know this is probably more expensive, although I'm not certain of the technicalities, and while database performance may suffer, application performance might improve, at least as perceived by the user.
Conclusion
Imagine pulling up your Instagram and having the first series of photos (the ones you can see) load, twice as quickly as before. Photos are likely your first priority. It wouldn't matter if your username, notification indicator, and profile picture takes a few extra seconds to populate, since you have already been fed your expected data and have begun consuming as a user. Contrast that with having structural data loaded first. Nobody cares about seeing their username or the company logo load. They want to see content.
I don't know if this is a terrible idea or something that has already been considered, but I'd love to get some feedback on it.
What do you think?
Is it possible, from a technical standpoint?

Related

Good practice to fetch detail api data in react-redux app

Whats the best practice to fetch details data in react app when you are dealing with multiple master details view?
For an example if you have
- /rest/departments api which returns list of departments
- /rest/departments/:departmentId/employees api to return all employees within department.
To fetch all departments i use:
componentDidMount() {
this.props.dispatch(fetchDepartments());
}
but then ill need a logic to fetch all employees per department. Would be a great idea to call employee action creator for each department in department reducer logic?
Dispatching employees actions in render method does not look like a good idea to me.
Surely it is a bad idea to call an employee action creator inside the department reducer, as reducers should be pure functions; you should do it in your fetchDepartments action creator.
Anyway, if you need to get all the employees for every department (not just the selected one), it is not ideal to make many API calls: if possible, I would ask to the backend developers to have an endpoint that returns the array of departments and, for each department, an embedded array of employees, if the numbers aren't too big of course...
Big old "It depends"
This is something that in the end, you will need to pick a way and see how it works out with your specific data and user needs. This somewhat deals with network issues as well, such as latency. In a very nicely networked environment, such as a top-3 insurance company I was a net admin for, you can achieve super low latency network calls. In such a case, multiple network requests would be significantly different than a homeowner internet based environment could be. Even then, you have to consider a wide range of possibilities. And you ALWAYS need to consider your end goals.
(Not to get too down in the technical aspects, but latency can fairly accurately be defined as "the time you are waiting for a network request to actually start sending data". A classic example of where this can be important is online first person shooter gaming. You click shoot, and the data is not transmitted as fast as you would like since the network is waiting to send the data, then you die. A classic example where bandwidth is more useful than latency is downloading or uploading large files. If you have to wait a second or two for the actual data to move, but when it moves you can download a GB in seconds, then oh well, I'll take it.)
Currently, I have our website making multiple calls to load dynamic menus and dynamic content. It is very small data. It is done in three separate calls. On the internet. It's "ok", but I would not say that it is "good". Since users are waiting for all of it to even start, I might as well throw it all in a single network call. Also, in case two calls go ok, then the third chokes a bit, the user may start to navigate, then more menus pop in and it is not ideal. This is why regardless, you have to think about your specific needs, and what range of possible use cases may likely apply. (I am currently re-writing the entire site anyways)
As a previous (in my opinion "good") answer stated, it probably makes sense to have the whole data set shot to you in one gulp. It appears to me this is an internal, or at least commercial app, with decent network and much more importantly, no risk of losing customers because your stuff did not load super fast.
That said, if things do not work out well with that, especially if you are talking large data sets, then consider a lazy loading architecture. For example, your user cannot get to an employee until they see the departments. So it may be ok, depending on your network and large data size, to load departments, and then after it returns initiate an asynchronous load of the employee data. The employee data is now being loaded while your user browses the department names.
A huge question you may want to clarify is whether or not any employee list data is rendered WITH the departments. In one of my cases, I have a work order system that I load after login, but lazy, and when it is loaded it throws a badge on the Work Order menu to show how many are outstanding. Since I do not have a lot of orders, it is basically a one second wait. No biggie. It is not like the user has to wait for it to load to begin work. If you wanted a badge per department, then it may get weird. You could, if you load by department, have multiple badges popping in randomly. In this case, it may cause user confusion, and it probably a good choice to load it in one large chunk. If the user has to wait anyways, it may produce one less call with a user asking "is it ok that it is doing this?". Especially with software for the workplace, it is more acceptable to have to wait for an initial load at the beginning of the work day.
To be clear, with all of these complications to consider, it is extremely important that you develop with as good of software coding practices as you are able. This way, you can code one solution, and if it does not meet your performance or user needs, it is not a nightmare to make a change. In a general case with small data, I would just load it in one big gulp to start, and if there are problems with load times complicate it from there. Complicating code from the beginning for no clearly needed reason is a good way to clutter your code up to the point of making it completely unwieldy to maintain.
On a third note, if you are dealing with enterprise size data sets, that is a whole different thing. Then you have to deal with pagination, and yes it gets a bit more complicated.
Regards,
DB
I'm not sure what fetchDepartments does exactly but I'd ensure the actual fetch request is executed from a Redux middleware. By doing it from middleware, you can fingerprint / cache / debounce all your requests and make a single one across the app no matter how many components request the thing.
In general, middleware is the best place to handle asynchronous side effects.

Is there a Rails convention to persisting lots of query data to the browser?

I have an application that allows the user to drill down through data from a single large table with many columns. It works like this:
There is a list of distinct top-level table values on the screen.
User clicks on it, then the list changes to the distinct next-level values for whatever was clicked on.
User clicks on one of those values, taken to 3rd level values, etc.
There are about 50 attributes they could go through, but it usually ends up only being 3 or 4. But since those 3 or 4 vary among the 50 possible attributes, I have to persist the selections to the browser. Right now I do it in a hideous and bulky hidden form. It works, but it is delicate and suboptimal. In order for it to work, the value of whatever level attribute is on the screen is populated in the appropriate place on the hidden form on the click event, and then a jQuery Ajax POST submits the form. Ugly.
I have also looked at Backbone.js, but I don't want to roll another toolkit into this project while there may be some other simple convention that I'm missing. Is there a standard Rails Way of doing something like this, or just some better way period?
Possible Approaches to Single-Table Drill-Down
If you want to perform column selections from a single table with a large set of columns, there are a few basic approaches you might consider.
Use a client-side JavaScript library to display/hide columns on demand. For example, you might use DataTables to dynamically adjust which columns are displayed based on what's relevant to the last value (or set of values) selected.
You can use a form in your views to pass relevant columns names into the session or the params hash, and inspect those values for what columns to render in the view when drilling down to the next level.
Your next server-side request could include a list of columns of interest, and your controller could use those column names to build a custom query using SELECT or #pluck. Such queries often involve tainted objects, so sanitize that input thoroughly and handle with care!
If your database supports views, users could select pre-defined or dynamic views from the next controller action, which may or may not be more performant. It's at least an idea worth pursuing, but you'd have to benchmark this carefully, and make sure you don't end up with SQL injections or an unmanageable number of pre-defined views to maintain.
Some Caveats
There are generally trade-offs between memory and latency when deciding whether to handle this sort of feature client-side or server-side. It's also generally worth revisiting the business logic behind having a huge denormalized table, and investigating whether the problem domain can't be broken down into a more manageable set of RESTful resources.
Another thing to consider is that Rails won't stop you from doing things that violate the basic resource-oriented MVC pattern. From your question, there is an implied assumption that you don't have a canonical representation for each data resource; approaching Rails this way often increases complexity. If that complexity is truly necessary to meet your application's requirements then that's fine, but I'd certainly recommend carefully assessing your fundamental design goals to see if the functional trade-offs and long-term maintenance burdens are worth it.
I've found questions similar to yours on Stack Overflow; there doesn't appear to be an API or style anyone mentions for persisting across requests. The best you can do seems to be storage in classes or some iteration on what you're already doing:
1) Persistence in memory between sessions/requests
2) Coping with request persistence design-wise
3) Using class caching

Select * vs specific columns & loading object properties

I always thought SELECT * was bad and that you should always return only the columns you are going to use. One of the reasons for this is that the DB can return the result without hitting any tables if all the columns needed are in the index.
I have a factory class that loads the properties of a Product object. It loads all the properties everytime GetProduct is called etc.
Many of the pages won't be using all of the Product properties even though they will be loaded from the database because of the SELECT*.
Is there any design advice/guidelines on this?
The trade-off here is between squeaking out every last bit of potential performance versus code maintainability. There is no question that bringing back columns you won't use wastes some CPU cycles. The question becomes: how many? Then you have to consider what is more expensive, your wasted CPU cycles or your programmers' time for building and maintaining the code?
If you are working on a system with huge performance requirements then it may very well pay to optimize your ORM / factory code. On the other hand, if you're building a departmental line of business app and you've got scores or hundreds of ORM classes, maybe you are better off keeping it simple for the programmers (and the people who have to pay for them) and stop worrying about a few cycles. This becomes even more the case if you use a framework that scaffolds up most of your ORM code for you with code generation - like Entity Framework (or many others)...
If you are building your system without the use of any kind of code generating framework, and if your data access layer is pretty close to bare metal SQL then only bringing back what you need is good advice. If you are building an app that is going to be used by thousands or millions of people simultaneously, then by all means tune your SQL from the outset. If, on the other hand, you work in a shop that uses ORM frameworks and RAD or agile then writing dozens of SQLs is counter productive.
I'd definitely avoid SELECT *. Just retrieve the data you know you'll need. I'd prefer to write a dozen queries to the same table, where each one refers to just the few columns I need for a particular purpose, rather than write one query that retrieves all the columns and just use that everywhere.
Even if you know you need every column currently in a table, list each one explicitly. That way, if someone adds half a dozen more columns to the table in the future, all your old queries won't suddenly be retrieving more data than is needed.

NHibernate Eager Loading - Lots of unrelated data

My members will have the ability to customise their profile page with X amount of widgets, each widget displays different data such as a list of music, list of people they are following etc.
Several widgets include:
- List of media they have uploaded
- List of people they are following
- List of people following them
- Html/Text widget
- Media Statistics (num downloads etc)
- Comments widget for other members to leave comments
Some widgets will have to page the data returned because there could be hundreds of results.
I haven't done any optimisation at the moment so it is doing lots of DB work to return all the data...what would be the most efficient way to retrieve the data...would 1 DB call per widget be acceptable? There could be around 5-20 widgets per page.
If you need more information about my situation please feel free to ask.
Paul
Short answer: It depends.
Start off from the unoptimised state, then use SQL profiler or a C# profiler like dotTrace to work out the best places to make improvements. Set a realistic goal to work towards (e.g. 'less than 800 milliseconds to load the page').
Generally I find performance starts to suffer after about 20-30 database calls in a request, but this is going to depend on your server, the location of the database etc.
There are many things that you can try: pre-caching, eager fetch using joins rather than selects etc. Nothing is going to guarantee better performance though unless it is applied intelligently.
For a page with lots of widgets, a common design pattern is to load each widget asynchronously using AJAX, rather than loading the entire page in one go.
since you've cut out out your work to widgets the proper thing to do would be for each widget to do a single query for all its required functionality. This would also be the case even if you retrieved widgets via AJAX (which as cbp noted is not a bad idea).
Secondly, i would set up some kind of mechanism for each widget to register its existence and then after all widgets have registered then i would fire a single query that would include all widget queries. (technically its again multiple queries but in a single round-trip, see MulriCriteria and MultiQueries in NH reference).
Also do not forget that lazy loads are hidden db retrievals and you could have a huge performance impact by using lazy load in a situation where an eager load is proper (for example Foo.Bar.Name where you always show the Bar.Name value when you present the Foo entity)
Performance degradation can occur even with less that 20-30 database call per request, but it depends on the size and complexity of your entities, queries, filters as well as the size of the data sets retrieved.

Is it bad to not use normalised tables in this database?

I recently learned about normalisation in my informatics class and I'm developing a multiplayer game using SQLite as backend database at the moment.
Some information on it:
The simplified structure looks a bit like the following:
player_id | level | exp | money | inventory
---------------------------------------------------------
1 | 3 | 120 | 400 | {item a; item b; item c}
Okay. As you can see, I'm storing a table/array in string form in the column "inventory". This is against normalization.
But the thing is: Making an extra table for the inventory of players brings only disadvantages for me!
The only points where I access the database is:
When a player joins the game and his profile is loaded
When a player's profile is saved
When a player joins, I load his data from the DB and store it in memory. I only write to the DB like every five minutes when the player is saved. So there are actually very few SQL queries in my script.
If I used an extra table for the inventory I would have to, upon loading:
Perform an performance and probably more data-intensive query to fetch all items from the inventory table which belong to player X
Walk through the results and convert them into a table for storage in memory
And upon saving:
Delete all items from the inventory table which belong to player X (player might have dropped/sold some items?)
Walk through the table and perform a query for each item the player owns
If I kept all the player data in one table:
I'd only have one query for saving and loading
Everything would be in one place
I would only have to (de)serialize the tables upon loading and saving, in my script
What should I do now?
Do my arguments and situation justify working against normalisation?
Are you saying that you think parsing a string out of "inventory" doesn't take any time or effort? Because everything you need to do to store/retrieve inventory items from a sub table is something you'd need to do with this string, and with the string you don't have any database tools to help you do it.
Also, if you had a separate subtable for inventory items, you could add and remove items in real time, meaning that if the app crashes or the user disconnects, they don't lose anything.
There are a lot of possible answers, but the one that works for you is the one to choose. Keep in mind, your choice may need to change over time.
If the amount of data you need to persist is small (ie: fits into a single table row) and you only need to update that data infrequently, and you don't have any reason to care about subsets of that data, then your approach makes sense. As time goes on and your players gain more items and you add more personalization to the game, you may begin to push up against the limits of SQLite, and you'll need to evolve your design. If you discover that you need to be able to query the item list to determine which players have what items, you'll need to evolve your design.
It's generally considered a good idea to get your data architecture right early, but there's no point in sitting in meetings today trying to guess how you'll use your software in 5-10 years. Better to get a design that meets this year's needs, and then plan to re-evaluate the design again after a year.
What's going to happen when you have one hundred thousand items in your inventory and you only want to bring back two?
If this is something that you're throwing together for a one off class and that you won't ever use again, then yes, the quick and dirty route might be a quicker option for you.
However if this is something you're going to be working on for a few months, then you're going to run into long-term issues with that design decision.
No, your arguments aren't valid. They basically boil down to "I want to do all of this processing in my client code instead of in SQL and then just write it all to a single field" because you are still doing all of the exact same processing to generate the string. By doing this you are removing the ability to easily load a small portion of the list and losing relationships to the actual item table which could contain more information about the items (I assume you're hard coding it all based on names instead of using internal item IDs which is a really bad idea, imo).
Don't do it. Long term the approach you are wanting to take will generate a lot more work for you as your needs evolve.
Another case of premature optimization.
You are trying to optimize something that you don't have any performance metrics. What is the target platform? Even crappiest computers nowadays could run at least hundreds of your reading operation per second. Then you add better hardware for more users, then you can go to cloud and when you come into problem space that Google, Twitter and Facebook are dealing with, you can consider denormalizing. Even then, best solution is some sort of key-value database.
Maybe you should check Wikipedia article on Database Normalization to remind you why normalized database is a good thing.
You should also think about the items. Are the items unique for every user or does user1 could have item1 and user2 have item1 to. If you now want to change item1 you have to go through your whole table and check which user have this item. If you would normalize your table, this would be much more easy.
But it the end, I think the answer is: It depends
Do my arguments and situation justify
working against normalisation?
Not based on what I've seen so far.
Normalized database designs (appropriately indexed and with efficient usage of the database with UPSERTS, transactions, etc) in general-purpose engines will generally outperform code except where code is very tightly optimized. Typically in such code, some feature of the general purpose RDBMS engine is abandoned, such as one of the ACID properties or referntial integrity.
If you want to have very simple data access (you tout one table, one query as a benefit), perhaps you should look at a document centric database like mongodb or couchdb.
The reason that you use any technology is to leverage the technology's advantages. SQL has many advantages that you seem to not want to use, and that's fine, if you don't need them. In Neal Stephenson's Zodiac, the main character mentions that few things bought from a hardware store are used for their intended purpose. Software's like that, too. What counts is that it works, and it works nearly 100% of the time, and it works fast enough.
And yet, I can't help but think that someday you're going to have some overpowered item released into the wild, and you're going to want to deal with this problem at the database layer. Say you accidently gave out some superinstakillmegadeathsword inventory items that kill everything within 50 meters on use (wielder included), and you want to remove those things from play. As an apology to the people who lose their superinstakillmegadeathsword items, you want to give them 100 money for each superinstakillmegadeathsword you take away.
With a properly normalized database structure, that's a trivial task. With a denormalized structure, it's quite a bit harder and slower. A normalized database is also going to be easier to expand on the design in the future.
So are you sure you don't want to normalize your database?