Is there a Rails convention to persisting lots of query data to the browser? - ruby-on-rails-3

I have an application that allows the user to drill down through data from a single large table with many columns. It works like this:
There is a list of distinct top-level table values on the screen.
User clicks on it, then the list changes to the distinct next-level values for whatever was clicked on.
User clicks on one of those values, taken to 3rd level values, etc.
There are about 50 attributes they could go through, but it usually ends up only being 3 or 4. But since those 3 or 4 vary among the 50 possible attributes, I have to persist the selections to the browser. Right now I do it in a hideous and bulky hidden form. It works, but it is delicate and suboptimal. In order for it to work, the value of whatever level attribute is on the screen is populated in the appropriate place on the hidden form on the click event, and then a jQuery Ajax POST submits the form. Ugly.
I have also looked at Backbone.js, but I don't want to roll another toolkit into this project while there may be some other simple convention that I'm missing. Is there a standard Rails Way of doing something like this, or just some better way period?

Possible Approaches to Single-Table Drill-Down
If you want to perform column selections from a single table with a large set of columns, there are a few basic approaches you might consider.
Use a client-side JavaScript library to display/hide columns on demand. For example, you might use DataTables to dynamically adjust which columns are displayed based on what's relevant to the last value (or set of values) selected.
You can use a form in your views to pass relevant columns names into the session or the params hash, and inspect those values for what columns to render in the view when drilling down to the next level.
Your next server-side request could include a list of columns of interest, and your controller could use those column names to build a custom query using SELECT or #pluck. Such queries often involve tainted objects, so sanitize that input thoroughly and handle with care!
If your database supports views, users could select pre-defined or dynamic views from the next controller action, which may or may not be more performant. It's at least an idea worth pursuing, but you'd have to benchmark this carefully, and make sure you don't end up with SQL injections or an unmanageable number of pre-defined views to maintain.
Some Caveats
There are generally trade-offs between memory and latency when deciding whether to handle this sort of feature client-side or server-side. It's also generally worth revisiting the business logic behind having a huge denormalized table, and investigating whether the problem domain can't be broken down into a more manageable set of RESTful resources.
Another thing to consider is that Rails won't stop you from doing things that violate the basic resource-oriented MVC pattern. From your question, there is an implied assumption that you don't have a canonical representation for each data resource; approaching Rails this way often increases complexity. If that complexity is truly necessary to meet your application's requirements then that's fine, but I'd certainly recommend carefully assessing your fundamental design goals to see if the functional trade-offs and long-term maintenance burdens are worth it.

I've found questions similar to yours on Stack Overflow; there doesn't appear to be an API or style anyone mentions for persisting across requests. The best you can do seems to be storage in classes or some iteration on what you're already doing:
1) Persistence in memory between sessions/requests
2) Coping with request persistence design-wise
3) Using class caching

Related

Lazy loading data a la Skeleton Screens. Is it possible?

I've been reading the article The Vietnam of Computer Science by Ted Neward. Although there's much I don't understand, or have not fully grasped, I was struck by a thought while reading this paragraph.
The Partial-Object Problem and the Load-Time Paradox
It has long been known that network traversal, such as that done when making a traditional SQL request, takes a significant amount of time to process. ... This cost is clearly non-trivial, so as a result, developers look for ways to minimize this cost by optimizing the number of round trips and data retrieved.
In SQL, this optimization is achieved by carefully structuring the SQL request, making sure to retrieve only the columns and/or tables desired, rather than entire tables or sets of tables. For example, when constructing a traditional drill-down user interface, the developer presents a summary display of all the records from which the user can select one, and once selected, the developer then displays the complete set of data for that particular record. Given that we wish to do a drill-down of the Persons relational type described earlier, for example, the two queries to do so would be, in order (assuming the first one is selected):
SELECT id, first_name, last_name FROM person;
SELECT * FROM person WHERE id = 1;
In particular, take notice that only the data desired at each stage of the process is retrieved–in the first query, the necessary summary information and identifier (for the subsequent query, in case first and last name wouldn’t be sufficient to identify the person directly), and in the second, the remainder of the data to display. ... This notion of being able to return a part of a table (though still in relational form, which is important for reasons of closure, described above) is fundamental to the ability to optimize these queries this way–most queries will, in fact, only require a portion of the complete relation.
Skeleton Screens
Skeleton Screens are a relatively new UI design concept introduced in 2013 by Luke Wrobleski in this paper. It encourages avoiding spinners to indicate loading and to instead gradually build UI elements, during load time, which makes the user feel as if things are quickly progressing and progress is being made, even if it's, in fact, slower than than when using a traditional loading indicator.
Here is a Skeleton Screen in use in the Microsoft Teams chat app. It displays this while waiting for stored chat logs to arrive from the database.
Utilizing Skeleton Screen Style Data Loading as a Paradigm for Data Retrieval
While Neward's paper is focusing on Object Relational Mappers, I had this thought about structuring data retrieval.
The paragraph quoted above indicates a struggle with querying too much data at one time, increasing data retrieval times until all the indicated data is gathered. What if, similar to Neward's SQL example above, smaller chunks of data were retrieved, as necessary, and loaded piecemeal into the application?
This would necessitate a fundamental shift in database querying logic, I think. It would obviously be a ridiculous suggestion to have this implemented in application code. To have your everyday developer write a multi-layered retrieval scheme to retrieve a single object would be insane. Rather, there would need to be some sort of built-in method whereby the developer can indicate which properties are considered to be required (username, Id, permissions, roles, etc...) which must be retrieved fully before moving forward, and which properties are ancillary. After all, many app users navigate through an application faster than all the data can populate, if they are familiar with the application and just need to navigate to a certain page. All they need is enough data to be loaded, which is the point of this scheme.
On the database side, there would probably a series of smaller retrievals, rather than a large one. I know this is probably more expensive, although I'm not certain of the technicalities, and while database performance may suffer, application performance might improve, at least as perceived by the user.
Conclusion
Imagine pulling up your Instagram and having the first series of photos (the ones you can see) load, twice as quickly as before. Photos are likely your first priority. It wouldn't matter if your username, notification indicator, and profile picture takes a few extra seconds to populate, since you have already been fed your expected data and have begun consuming as a user. Contrast that with having structural data loaded first. Nobody cares about seeing their username or the company logo load. They want to see content.
I don't know if this is a terrible idea or something that has already been considered, but I'd love to get some feedback on it.
What do you think?
Is it possible, from a technical standpoint?

Is it better to use fewer tables with more columns or vice versa?

I'm trying to figure out how to determine the best balance in structuring a database. I want to be able to store the information from several different forms submitted by different people, sometimes multiple times (such as a yearly update). I'm stuck between having a different table for each form, or a combination of form and element definition and element value tables.
Example A: There are three types of form with different information, so there are four tables, [FormA], [FormB], and [FormC] that each have the data associated with their respective forms, all FKed to [Customers].
Example B: Same three forms, but this time there are five different tables. [FormDescriptions] defines the form names, types, etc and has three entries, one for each form. [Forms] FKs to [Customers] and [FormDescriptions] and uses these in combination with the submission date to distinguish individual submissions. [FormElements] defines all the elements from the three forms, with a FK on FormDescriptions and a unique elementID. [ElementValues] FKs to [FormElements] and [Forms] and stores the value of the selected element on the selected form.
My question is, is one of these methods inherently better than the other, and if not, in which situations is each better than the other? As much why or why not that you want to include is appreciated.
"My question is, is one of these methods inherently better than the other, and if not, in which situations is each better than the other? As much why or why not that you want to include is appreciated."
Your option two is (your personalized variant of) the EAV antipattern. If you use this, and you expect (now or later) the system to do anything "intelligent" with the data, you'll find yourself in serious trouble. And things as basic as "rigorous data validation to catch data entry errors" already qualifies as "intelligent". So only use it if you can reasonably anticipate that the system will only be used for just merely storing the data, and that it will be unlikely for there ever to be a request to start processing/manipulating the data in "intelligent ways".
If you ever run into requests to start doing "intelligent" things with an EAV database, you'll find that whatever development time you thought you gained by working from a super duper generic information model, you'll lose orders of magnitude more time coding all the "intelligent" things required, i.e. reinstating the data structures in code that you refused to reflect in the DB.
Googling for "EAV antipattern" (try to locate the book by Bill Karwin) should provide you with more than enough info on why not to do it.
There are 2 factors in consideration here
Performance
flexibility
If your system is such that it will require you to add more forms in future frequently.. method 2 is better. You won't have to add additional tables or columns. Your forms are data driven. It will add little overhead for generating forms and saving as key value pairs.
On other hand if your system won't require many changes to forms first method can work.
Also consider usage of data after forms are submitted. Are you going to run analytics, reports on this data? Are these reports specific to forms? That will favor method 1.

Best practice when using multiple forms - vb.net

What is the best practice for having many different menus/screens/forms in a visual basic program? Would it be to just make a new form for each menu or screen that I want? Or are there other better options?
I am not trying to make this overly complicated, I have a group project to work on and we all have different skill levels. That said it has peaked my curiosity so I figured it wouldn't hurt to ask before I got started.
I can see this question being closed pretty quickly as being too open ended so allow me to get in my key gripe on this before that happens... no .Visible property for TabControl pages? Seriously, Microsoft??
Which brings me to the key point. If the forms are in some way related but not necessarily identical I prefer to use a single form with different tabs, despite that glaring shortcoming in the control. (Which you don't have to look far to find workarounds for on SO, but a workaround is still a workaround.) Dynamically manipulating controls at run time is another side of this coin, though one that I tend to use more rarely... but that's just a personal thing.
In a recent application, for instance, I had lists of several types of objects. They were related, but performed quite different functions and the user wouldn't really need to look at more than one list at once. As a result I used one form with a tab for each object list to keep the users' display less cluttered.
Similarly when doing a GL app recently I had the journal header and journal line entries (which go to different tables in the back-end database) in separate parts of the one form. On the other hand asset creation was sufficiently different that I created a different form, despite the creation process sharing some of the underlying data. (That is, journal line data.)
I don't believe in the concept of "best practice" because what's a good practice in one situation may be a very bad one in another. However the "rules of thumb" that I use are:
- Keep the number of forms to a minimum to keep overhead low and reduce maintenance BUT
- If there is no logical "tie" between two functions, don't be afraid to make a new form because trying to maintain one form which performs 7 different roles is a guaranteed path to madness and frustration, especially if you break something inadvertently.
Yes, the two rules conflict, but in a way I see this aspect of design as being akin to database normalisation; there's a sweet spot between over-normalising (a separate form for each and every display) and under-normalising (trying to shoe-horn too many unrelated functions into one form). At the very least the rules always give me pause to think "do I need this form, or does it relate to something that I've already done?"
And the third rule of thumb is, obviously... always look at it from the point of view of your user. Are they going to feel like you're bouncing them around too much? Do all of the forms share a look and feel and, more importantly, control layout so that they always know where to find something?
All of these things will vary from app to app, and there's never one size that will fit all IMHO.
In my case, when I am dealing with multiple forms, I use MDI Parent Form to avoid multiple items in the windows task bar.
Another unusual solution is to set each forms ShowInTaskbar property to false.

How should I (if I should at all) implement Generic DB Tables without falling into the Inner-platform effect?

I have a db model like this:
tb_Computer (N - N) tb_Computer_Peripheral (N - 1) tb_Peripheral
Each computer has N peripherals. But each peripheral is different in nature, and will have different fields. A keyboard will have model, language, etc, and a network card has specification about speed and such.
But I don't think it's viable to create as many tables as there are peripherals. Because one day someone will come up with a very specific peripheral and I don't want him to be unable to add it just because it is not a keyboard neither a network card.
Is it a bad practice to create a field data inside tb_Peripheral which contains JSON data about a specific peripheral?
I could even create a tb_PeripheralType with specific information about which data a specific type of peripheral has.
I read about this in many places and found everywhere that this is a bad practice, but I can't think of any other way to implement this the way I want, completely dynamic.
What is the best way to achieve what I want? Is the current model wrong? What would you do ?
It's not a question of "good practices" or "bad practices". Making things completely dynamic has an upside and a downside. You have outlined the upside fairly well.
The downside of a completely dynamic design is that the process of turning the data into useful information is not nearly as routine as it is with a database that pins down the semantics of the data within the scope of the design.
Can you build a report and a report generating process that will adapt itself to the new structure of the data when you begin to add data about a new kind of peripheral? If you end up stuck with doing maintenance on the application when requirements change, what have you gained by making the database design completely dynamic?
PS: If the changes to the database design consist only of adding new tables, the "ripple effect" on your existing applications will be negligible.
I can think of four options.
The first is to create a table peripherals that would have all the information you could want about peripherals. This would have NULLs in the columns where the field is not appropriate to the type. When a new peripheral is added, you would have to add the descriptive columns.
The second is to create a separate table for each peripheral.
The third is to encode the information in something like JSON.
The fourth is to store the data as pairs. So each peripheral would have many different rows.
There are also hybrids for these approaches. For instance, you could store common fields in a single table (ala (1)) and then have key value pairs for other values.
The question is how this information is going to be used. I do most of my work directly in SQL, so the worst option for me is (3). I don't want to parse strange information formats to get something potentially useful to a SQL query.
Option (4) is the most flexible, but it also requires more work to get a complete picture of all the possible attributes.
If I were starting from scratch, and I had a pretty good idea of what fields I wanted, then I would start with (1), a single table for peripherals. If I had requirements where peripherals and attributes would be changing fairly regularly, then I would seriously consider (4). If the tables are only being used by applications, then I might consider (3), but I would probably reject it anyway.
Only one question to answer when you do this sort of design. JSON, a serialised object, xml, or heaven forbid a csv, doesn't really matter.
Do you want to consume them outside of the API that knows the structure?
If you want to say use sql to get all peripherals of type keyboard with a number of keys property >= 102 say.
If you do, it gets messy, much messier than extra tables.
No different to say having a table of pdfs or docs and trying to find all the ones which have more than 10 pages.
Gets even funnier if you want to version the content as your application evolves.
Have a look at a Nosql back end, it's designed for stuff like this, a relational database is not.

Avoid loading unnecessary data from db into objects (web pages)

Really newbie question coming up. Is there a standard (or good) way to deal with not needing all of the information that a database table contains loaded into every associated object. I'm thinking in the context of web pages where you're only going to use the objects to build a single page rather than an application with longer lived objects.
For example, lets say you have an Article table containing id, title, author, date, summary and fullContents fields. You don't need the fullContents to be loaded into the associated objects if you're just showing a page containing a list of articles with their summaries. On the other hand if you're displaying a specific article you might want every field loaded for that one article and maybe just the titles for the other articles (e.g. for display in a recent articles sidebar).
Some techniques I can think of:
Don't worry about it, just load everything from the database every time.
Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle).
Use one class but set unused properties to null at creation if that field is not needed and be careful.
Give the objects access to the database so they can load some fields on demand.
Something else?
All of the above seem to have fairly major disadvantages.
I'm fairly new to programming, very new to OOP and totally new to databases so I might be completely missing the obvious answer here. :)
(1) Loading the whole object is, unfortunately what ORMs do, by default. That is why hand tuned SQL performs better. But most objects don't need this optimization, and you can always delay optimization until later. Don't optimize prematurely (but do write good SQL/HQL and use good DB design with indexes). But by and large, the ORM projects I've seen resultin a lot of lazy approaches, pulling or updating way more data than needed.
2) Different Models (Entities), depending on operation. I prefer this one. May add more classes to the object domain, but to me, is cleanest and results in better performance and security (especially if you are serializing to AJAX). I sometimes use one model for serializing an object to a client, and another for internal operations. If you use inheritance, you can do this well. For example CustomerBase -> Customer. CustomerBase might have an ID, name and address. Customer can extend it to add other info, even stuff like passwords. For list operations (list all customers) you can return CustomerBase with a custom query but for individual CRUD operations (Create/Retrieve/Update/Delete), use the full Customer object. Even then, be careful about what you serialize. Most frameworks have whitelists of attributes they will and won't serialize. Use them.
3) Dangerous, special cases will cause bugs in your system.
4) Bad for performance. Hit the database once, not for each field (Except for BLOBs).
You have a number of methods to solve your issue.
Use Stored Procedures in your database to remove the rows or columns you don't want. This can work great but takes up some space.
Use an ORM of some kind. For .NET you can use Entity Framework, NHibernate, or Subsonic. There are many other ORM tools for .NET. Ruby has it built in with Rails. Java uses Hibernate.
Write embedded queries in your website. Don't forget to parametrize them or you will open yourself up to hackers. This option is usually frowned upon because of the mingling of SQL and code. Also, it is the easiest to break.
From you list, options 1, 2 and 4 are probably the most commonly used ones.
1. Don't worry about it, just load everything from the database every time: Well, unless your application is under heavy load or you have some extremely heavy fields in your tables, use this option and save yourself the hassle of figuring out something better.
2. Have several different, possibly inherited, classes for each table and create the appropriate one for the situation (e.g. SummaryArticle, FullArticle): Such classes would often be called "view models" or something similar, and depending on your data access strategy, you might be able to get hold of such objects without actually declaring any new class. Eg, using Linq-2-Sql the expression data.Articles.Select(a => new { a .Title, a.Author }) will give you a collection of anonymously typed objects with the properties Title and Author. The generated SQL will be similar to select Title, Author from Article.
4. Give the objects access to the database so they can load some fields on demand: The objects you describe here would usaly be called "proxy objects" and/or their properties reffered to as being "lazy loaded". Again, depending on your data access strategy, creating proxies might be hard or easy. Eg. with NHibernate, you can have lazy properties, by simply throwing in lazy=true in your mapping, and proxies are automatically created.
Your question does not mention how you are actually mapping data from your database to objects now, but if you are not using any ORM framework at the moment, do have a look at NHibernate and Entity Framework - they are both pretty solid solutions.