Good practice to fetch detail api data in react-redux app - api

Whats the best practice to fetch details data in react app when you are dealing with multiple master details view?
For an example if you have
- /rest/departments api which returns list of departments
- /rest/departments/:departmentId/employees api to return all employees within department.
To fetch all departments i use:
componentDidMount() {
this.props.dispatch(fetchDepartments());
}
but then ill need a logic to fetch all employees per department. Would be a great idea to call employee action creator for each department in department reducer logic?
Dispatching employees actions in render method does not look like a good idea to me.

Surely it is a bad idea to call an employee action creator inside the department reducer, as reducers should be pure functions; you should do it in your fetchDepartments action creator.
Anyway, if you need to get all the employees for every department (not just the selected one), it is not ideal to make many API calls: if possible, I would ask to the backend developers to have an endpoint that returns the array of departments and, for each department, an embedded array of employees, if the numbers aren't too big of course...

Big old "It depends"
This is something that in the end, you will need to pick a way and see how it works out with your specific data and user needs. This somewhat deals with network issues as well, such as latency. In a very nicely networked environment, such as a top-3 insurance company I was a net admin for, you can achieve super low latency network calls. In such a case, multiple network requests would be significantly different than a homeowner internet based environment could be. Even then, you have to consider a wide range of possibilities. And you ALWAYS need to consider your end goals.
(Not to get too down in the technical aspects, but latency can fairly accurately be defined as "the time you are waiting for a network request to actually start sending data". A classic example of where this can be important is online first person shooter gaming. You click shoot, and the data is not transmitted as fast as you would like since the network is waiting to send the data, then you die. A classic example where bandwidth is more useful than latency is downloading or uploading large files. If you have to wait a second or two for the actual data to move, but when it moves you can download a GB in seconds, then oh well, I'll take it.)
Currently, I have our website making multiple calls to load dynamic menus and dynamic content. It is very small data. It is done in three separate calls. On the internet. It's "ok", but I would not say that it is "good". Since users are waiting for all of it to even start, I might as well throw it all in a single network call. Also, in case two calls go ok, then the third chokes a bit, the user may start to navigate, then more menus pop in and it is not ideal. This is why regardless, you have to think about your specific needs, and what range of possible use cases may likely apply. (I am currently re-writing the entire site anyways)
As a previous (in my opinion "good") answer stated, it probably makes sense to have the whole data set shot to you in one gulp. It appears to me this is an internal, or at least commercial app, with decent network and much more importantly, no risk of losing customers because your stuff did not load super fast.
That said, if things do not work out well with that, especially if you are talking large data sets, then consider a lazy loading architecture. For example, your user cannot get to an employee until they see the departments. So it may be ok, depending on your network and large data size, to load departments, and then after it returns initiate an asynchronous load of the employee data. The employee data is now being loaded while your user browses the department names.
A huge question you may want to clarify is whether or not any employee list data is rendered WITH the departments. In one of my cases, I have a work order system that I load after login, but lazy, and when it is loaded it throws a badge on the Work Order menu to show how many are outstanding. Since I do not have a lot of orders, it is basically a one second wait. No biggie. It is not like the user has to wait for it to load to begin work. If you wanted a badge per department, then it may get weird. You could, if you load by department, have multiple badges popping in randomly. In this case, it may cause user confusion, and it probably a good choice to load it in one large chunk. If the user has to wait anyways, it may produce one less call with a user asking "is it ok that it is doing this?". Especially with software for the workplace, it is more acceptable to have to wait for an initial load at the beginning of the work day.
To be clear, with all of these complications to consider, it is extremely important that you develop with as good of software coding practices as you are able. This way, you can code one solution, and if it does not meet your performance or user needs, it is not a nightmare to make a change. In a general case with small data, I would just load it in one big gulp to start, and if there are problems with load times complicate it from there. Complicating code from the beginning for no clearly needed reason is a good way to clutter your code up to the point of making it completely unwieldy to maintain.
On a third note, if you are dealing with enterprise size data sets, that is a whole different thing. Then you have to deal with pagination, and yes it gets a bit more complicated.
Regards,
DB

I'm not sure what fetchDepartments does exactly but I'd ensure the actual fetch request is executed from a Redux middleware. By doing it from middleware, you can fingerprint / cache / debounce all your requests and make a single one across the app no matter how many components request the thing.
In general, middleware is the best place to handle asynchronous side effects.

Related

Lazy loading data a la Skeleton Screens. Is it possible?

I've been reading the article The Vietnam of Computer Science by Ted Neward. Although there's much I don't understand, or have not fully grasped, I was struck by a thought while reading this paragraph.
The Partial-Object Problem and the Load-Time Paradox
It has long been known that network traversal, such as that done when making a traditional SQL request, takes a significant amount of time to process. ... This cost is clearly non-trivial, so as a result, developers look for ways to minimize this cost by optimizing the number of round trips and data retrieved.
In SQL, this optimization is achieved by carefully structuring the SQL request, making sure to retrieve only the columns and/or tables desired, rather than entire tables or sets of tables. For example, when constructing a traditional drill-down user interface, the developer presents a summary display of all the records from which the user can select one, and once selected, the developer then displays the complete set of data for that particular record. Given that we wish to do a drill-down of the Persons relational type described earlier, for example, the two queries to do so would be, in order (assuming the first one is selected):
SELECT id, first_name, last_name FROM person;
SELECT * FROM person WHERE id = 1;
In particular, take notice that only the data desired at each stage of the process is retrieved–in the first query, the necessary summary information and identifier (for the subsequent query, in case first and last name wouldn’t be sufficient to identify the person directly), and in the second, the remainder of the data to display. ... This notion of being able to return a part of a table (though still in relational form, which is important for reasons of closure, described above) is fundamental to the ability to optimize these queries this way–most queries will, in fact, only require a portion of the complete relation.
Skeleton Screens
Skeleton Screens are a relatively new UI design concept introduced in 2013 by Luke Wrobleski in this paper. It encourages avoiding spinners to indicate loading and to instead gradually build UI elements, during load time, which makes the user feel as if things are quickly progressing and progress is being made, even if it's, in fact, slower than than when using a traditional loading indicator.
Here is a Skeleton Screen in use in the Microsoft Teams chat app. It displays this while waiting for stored chat logs to arrive from the database.
Utilizing Skeleton Screen Style Data Loading as a Paradigm for Data Retrieval
While Neward's paper is focusing on Object Relational Mappers, I had this thought about structuring data retrieval.
The paragraph quoted above indicates a struggle with querying too much data at one time, increasing data retrieval times until all the indicated data is gathered. What if, similar to Neward's SQL example above, smaller chunks of data were retrieved, as necessary, and loaded piecemeal into the application?
This would necessitate a fundamental shift in database querying logic, I think. It would obviously be a ridiculous suggestion to have this implemented in application code. To have your everyday developer write a multi-layered retrieval scheme to retrieve a single object would be insane. Rather, there would need to be some sort of built-in method whereby the developer can indicate which properties are considered to be required (username, Id, permissions, roles, etc...) which must be retrieved fully before moving forward, and which properties are ancillary. After all, many app users navigate through an application faster than all the data can populate, if they are familiar with the application and just need to navigate to a certain page. All they need is enough data to be loaded, which is the point of this scheme.
On the database side, there would probably a series of smaller retrievals, rather than a large one. I know this is probably more expensive, although I'm not certain of the technicalities, and while database performance may suffer, application performance might improve, at least as perceived by the user.
Conclusion
Imagine pulling up your Instagram and having the first series of photos (the ones you can see) load, twice as quickly as before. Photos are likely your first priority. It wouldn't matter if your username, notification indicator, and profile picture takes a few extra seconds to populate, since you have already been fed your expected data and have begun consuming as a user. Contrast that with having structural data loaded first. Nobody cares about seeing their username or the company logo load. They want to see content.
I don't know if this is a terrible idea or something that has already been considered, but I'd love to get some feedback on it.
What do you think?
Is it possible, from a technical standpoint?

How do we make APIs with slow depedencies faster?

Recently I have attended two different job interviews and one of the questions they made was something like this:
1- You need to create an API that will use some microservices that are very slow. Some of them respond under a few seconds (let's say 2 seconds). We have to make our best to build our API very reliable in terms of latency. What would you do to make this system work fast?
2- This led me to other questions like if I choose to cache some data, what do I have to do avoid old cache? For example, if i cached the user personal info and he just updated his profile?
3- Finally if it was not a reading operation, what do we have to do to use services that take a long time not impact the user experience? In this case imagine that it's a writing operation
How would you answer these questions?
The question is a little vague but I'll try and throw a couple of solutions out there.
Before jumping into the cache, I would first ask questions about the data set. For instance, how large is this data set and how often does the data set change? If the data set isn't large, you can probably store all of it in memory indefinitely and on updates, you can update individual records in the cache.
Of course when we say we store it in cache, we also have to keep in mind data retrieval. If data retrieval requires grabbing the data in many different ways and the data set is large, caching may not be as great as a solution. This kind of addresses the first and second question that you've posted without further information from the interviewer. This in turn is really where you need to tease out requirements from the interviewer to see if you're on the right track.
Now finally for the third question, I think the interviewer is trying to get you to write asynchronously to something like a queuing mechanism that allows user to get a quick response and your system to take its time processing it. A follow up question here may be about how long can a write take to be processed and that will lead to a series of more domain specific questions. Again, you'll have to dig into the requirements of this to see what kind of trade-offs the interviewer wants you to make because there is no silver bullet.

data population and manipulation in frontend vs. backend

I have very general question. For example, I have employee table which includes name, address, age, gender, dept etc.. When a user wants to see overall employee information, I do not need extract whole columns from DB. I want to show overall employee information first by grouping employees' names per dept. And then, a user can select one particular employee if he/she has more interest.
To implement this, which approach will be better.
1) Apply two different APIs which will produce two different data results by applying different queries. Therefore, if I take this approach, even though I have to call two different APIs, it seems effective.
2) Apply one API which will produce one data result including whole employee relevant columns even though the overall employee information does not need entire detail employee information. Once I get this data, I can re-format employee information by manipulating already extracted data from the front-end side according to different needs.
Normally, which approach developers take between 1) and 2)? I think 1) makes sense, but API will be too much specialized, not generalized. Is it a desirable way to manipulate data gotten from back-end side(RESTful) in a front-end side(Angular 2)?
Which approach is preferred between creating relatively much more numbers of specialized APIs or manipulating the data in the front-end once after getting whole data? If there is some criteria to take which approach, on which ground I have to consider?
Is this is correct thinking? If someone has some idea about it, could you give some guide?
That's a very interesting discussion. If you need a very optimized application then you would need to go with the first option. Also if you have some column on the employee that can be accessed only by a privileged user, you can imagine that a client side manipulation won't stop some malicious hacker from seeing it. In this case you must implement the first option as well.
The drawback is that at some point you could have too many APIs doing very similar things making It confusing and unproductive.
About the second option, as you pointed out having generalized APIs is a good thing, so in order to speed up the development process and have a cleaner set of APIs you can waste few internet traffic in return.
You need to balance your needs.
Is it optimization of resources? Like speed in terms of database queries and internet requests.
Or is it development time? Do you have enough time to implement all these things and optimizations?

Good Reading on Rails Database Optimizations: Caching ids in column, demormalization?

I'm looking for some (more than intro-level) database optimizations, not just for Rails but for SQL in general.
In Rails, how/when do you start using these?
cache_counter
caching ids in a table (user has_many roles, so the user table gets a role_ids serialized column, since every request requires fetching the roles).
Are these types of things necessary, or when are they necessary, given that you can also just store these things (roles for user in this example) in memcached?
Right now I have an app where in every request a User has_many Role objects, and maybe a few preferences (where preferences are in another table). Is it good/bad to cache those preference/role ids in the user table?
Thanks for the tips!
Caching is a good thing if you need it and a terrible thing if you don't. Is your app showing strain? Have you benchmarked it? Have you isolated the bottleneck to the point where you could select an optimization if you had a cartload of them? The reason my answer is a question is that no two scaling issues have the same solution so without a much clearer idea where your app is hung up, it's hard to say whether caching will help.
On a side note, unless the principal function of your app queries users and roles, you will probably find the traffic down that path is not particularly dense. Typically you hit it once, squirrel the information away in the session and off you go. If you put a lot of work into denormalizing and caching, you may not find it pays the dividends you expect. And don't forget you now have to test for the cache-hit and cache-miss scenarios to make sure everything functions identically.
And more directly to your question:
memcached is a good LRU cache. There are other good KV stores like redis and tokyo cabinet.
Counter caching is low hanging fruit if (and only if) you have a frequently accessed counter. For example, if you need to know how many friends a user has, but don't need to know anything about the friends, you can just denormalize friends.count.
I'm not sure what caching IDs would buy you unless the roles remained pretty static. You have to develop a strategy for freshening them when the roles change and ... well, then you have to test the cache-hit and cache-miss scenarios.
My experience is that the biggest bang you can get for your optimization buck is caching pages. After that, fragments, then down to database-level and application-level tweaking.
Have you profiled your app?

Most optimized way to store crawler states?

I'm currently writing a web crawler (using the python framework scrapy).
Recently I had to implement a pause/resume system.
The solution I implemented is of the simplest kind and, basically, stores links when they get scheduled, and marks them as 'processed' once they actually are.
Thus, I'm able to fetch those links (obviously there is a little bit more stored than just an URL, depth value, the domain the link belongs to, etc ...) when resuming the spider and so far everything works well.
Right now, I've just been using a mysql table to handle those storage action, mostly for fast prototyping.
Now I'd like to know how I could optimize this, since I believe a database shouldn't be the only option available here. By optimize, I mean, using a very simple and light system, while still being able to handle a great amount of data written in short times
For now, it should be able to handle the crawling for a few dozen of domains, which means storing a few thousand links a second ...
Thanks in advance for suggestions
The fastest way of persisting things is typically to just append them to a log -- such a totally sequential access pattern minimizes disk seeks, which are typically the largest part of the time costs for storage. Upon restarting, you re-read the log and rebuild the memory structures that you were also building on the fly as you were appending to the log in the first place.
Your specific application could be further optimized since it doesn't necessarily require 100% reliability -- if you miss writing a few entries due to a sudden crash, ah well, you'll just crawl them again. So, your log file can be buffered and doesn't need to be obsessively fsync'ed.
I imagine the search structure would also fit comfortably in memory (if it's only for a few dozen sites you could probably just keep a set with all their URLs, no need for bloom filters or anything fancy) -- if it didn't, you might have to keep in memory only a set of recent entries, and periodically dump that set to disk (e.g., merging all entries into a Berkeley DB file); but I'm not going into excruciating details about these options since it does not appear you will require them.
There was a talk at PyCon 2009 that you may find interesting, Precise state recovery and restart for data-analysis applications by Bill Gribble.
Another quick way to save your application state may be to use pickle to serialize your application state to disk.