I am trying to create a QTreeView to display data from a SQL database. This is a large database, so simply loading the data into a QStandardItemModel seems prohibitive.
None of Qt's pre-built SQL model classes are sufficient for the task. Therefore it seems necessary to subclass QAbstractItemModel.
In the first place, I can find no examples where this is done, so I am wondering whether it is the correct approach.
Implementing QAbstractItemModel::data is pretty straightforward. I am uncertain how to implement QAbstractItemModel::parent.
Qt's "Simple Tree Model Example" example would be informative, but in that example the tree structure is represented in memory with the TreeItem class. I could copy that, but if I am going to duplicate the database structure, it would be just as easy to use QStandardItemModel. If I need to maintain a separate data structure (in addition to the database and the QAbstractItemModel subclass) to represent the tree structure, is there any advantage to subclassing QAbstractItemModel over just using a QStandardItemModel?
The challenge in the tree structure is to always be able to identify a model index's parent (i.e., overloading the parent() method). In the Simple Tree example, this is done by storing the three structure in a separate data structure. For large SQL queries this is impractical. For the right database structure, you might be able to calculate the proper parent node given the child, but that is not a guarantee. The only alternative I can imagine is passing a quint32 to QAbstractItemModel::createIndex which encodes the item's parent.
One performance consideration that might be useful. After giving up on sublcassing QAbstractItemModel, I tried populating a QStandardItemModel from the database. I loaded about 1200 items into the model, and four child items to each item with two separate database calls. This took about 3 seconds on a 2009 laptop. That is faster than I had been expecting. (And there would be performance gains if I used a single query instead of repeated queries.)
In the end I went another route: having several QTableViews in a the GUI, with signals and slots to show different aspects of the data. My code is much simpler, and the proper functionality is in place, so this feels like the "right" solution.
Related
I have two data structures that I need to store in a database. At this point, I'm relatively sure that SQL and any relational database types wouldn't work, but I'm also not sure what alternatives I have and/or which of those alternatives would be best. If there is a reasonable way to implement these structures in mySQL or something similar, I'm open to the idea.
Structure 1:
A nested tree diagram, where nodes are not defined ahead of time, and are instead generated from the data. I have a lot of strings that I need to separate into trees such that each branch node on the tree is empty and each leaf node contains a maximum of 200 strings, all beginning with the same prefix. I would use SQL, but considering I will regularly have upwards of 9.45x10^55 nodes (branch and leaf), I can't use the tree traversal method; adding a single node would take too much time.
Structure 2:
I have an array of the leaf nodes from the above structure, however, every leaf node has its own data associated with it, yet not contained within it.
From my (extremely limited) understanding of SQL, the second structure can be implemented in mySQL or something similar. The problem is, I need to be able to retrieve individual nodes from the 2nd structure, instead of the entire array of nodes. Also, I don't know the length of the array ahead of time, so I can't simply make a table with a certain number of columns available for each node: I'd end up having over 9.09x10^55 columns, when I will regularly be only using 5 or less.
If you have any recommendations as to what kind of database I could use to implement these structures relatively easily, or any advice pertaining to the implementation itself, it would be greatly appreciated.
I am trying to design an income return tax software.
What is the best way to represent/store a form with hundreds of questions in a model?
Just for this example, I need at least 6 models (T4, T4A(OAS), T4A(P), T1032, UCCB, T4E) which possibly contain hundreds of fields.
Is it by creating hundred of fields? Storing values in a map? An Array?
One very generic approach could be XML
XML allows you to
nest your data to any degree
combine values and meta information (attributes and elements)
describe your data in detail with XSD
store it externally
maintain it easily
even combine it with additional information (look at processing instructions)
and (last but not least) store the real data in almost the same format as the modell...
and (laster but even not leaster :-) ) there is XSLT to transform your XML data into any other format (such as HTML for nice presentation)
There is high support for XML in all major languages and database systems.
Another way could be a typical parts list (or bill of materials/BOM)
This tree structure is - typically - implemented as a table with a self-referenced parentID. Working with such a table needs a lot of recursion...
It is very highly recommended to store your data type-safe. Either use a character storage format and a type identifier (that means you have to cast all your values here and there), or you use different type-safe side tables via reference.
Further more - if your data is to be filled from lists - you should define a datasource to load a selection list dynamically.
Conclusio
What is best for you mainly depends on your needs: How often will the modell change? How many rules are there to guarantee data's integrity? Are you using a RDBMS? Which language/tools are you using?
With a case like this, the monolithic aggregate is probably unavoidable (unless you can deduce common fields). I'm going to exclude RDBMS since the topic seems to focus more on lower-level data structures and a more proprietary-style solution, though that could be a very valid option that can manage all these fields.
In this case, I think it ceases to become so much about formalities as just daily practicalities.
Probably worst from that standpoint in this case is a formal object aggregating fields, like a class or struct with a boatload of data members. Those tend to be the most awkward and the most unattractive as monoliths, since they tend to have a static nature about them. Depending on the language, declaration/definition/initialization could be separate which means 2-3 lines of code to maintain per field. If you want to read/write these fields from a file, you have to write a separate line of code for each and every field, and maintain and update all that code if new fields added or existing ones removed. If you start approaching anything resembling polymorphic needs in this case, you might have to write a boatload of branching code for each and every field, and that too has to be maintained.
So I'd say hundreds of fields in a static kind of aggregate is, by far, the most unmaintainable.
Arrays and maps are effectively the same thing to me here in a very language-agnostic sense provided that you need those key/value pairs, with only potential differences in where you store the keys and what kind of algorithmic complexity is involved. Whatever you do, probably a key search in this monolith should be logarithmic time or better. 'Maps/associative arrays' in most languages tend to inherently have this quality.
Those can be far more suitable, and you can achieve the kind of runtime flexibility that you like on top of those (like being able to manage these from a file and add the fields on the fly with no pre-existing knowledge). They'll be far more forgiving here.
So if the choice is between a bunch of fields in a class and something resembling a map, I'd suggest going for a map. The dynamic nature of it will be far more forgiving for these kinds of cases and will typically far outweigh the compile-time benefits of, say, checking to make sure a field actually exists and producing a syntax error otherwise. That kind of checking is easy to add back in and more if we just accept that it will occur at runtime.
An exception that might make the field solution more appealing is if you involve reflection and more dynamic techniques to generate an object with the appropriate fields on the fly. Then you get back those dynamic benefits and flexibility at runtime. But that might be more unwieldy to initialize the structure, could involve leaning a lot more heavily on heavy-duty (and possibly very computationally-expensive) introspection and type manipulation and code generation mechanisms, and also end up with more funky code that's hard to maintain.
So I think the safest bet is the map or associative array, and a language that lets you easily add new fields, inspect existing ones, etc. with very fast turnaround. If the language doesn't inherently have that quality, you could look to an external file to dynamically add fields, and just maintain the file.
I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.
I have a book structure with Chapter, Subchapter, Section, Subsection, Article and unknown number of subarticles, sub-subarticles, sub-sub-subarticles etc.
What's the best way to structure this?
One table with child-parent relationships, multiple tables?
Thank you.
To determine whether there are seperate tables or one-big-table involved, you should take a close look at each item - chapter, subchapter, etc. - and decide if they carry different attributes from the others. Does a chapter carry something different from a sub-chapter?
If so, then you're looking at seperate tables for Chapter, SubChapter, Section, SubSection, Article. Article still feels hierarchical to me with your sub- sub-sub- sub-sub-sub- etc.
If not, then maybe it is one big table with parent/child, but it looks like you may be talking about 'names' for the depth of the hierarchy which leans me toward seperate tables again.
Also consider how you'll query and what you'll be searching for.
There are a couple of methods to save a tree structure in a relational database. The most commonly used are using parent pointers and nested sets.
The first has a very easy data structure, namely a pointer to the respective parent element on each object), and is thus easy to implement. On the downside it is not easy to make some queries on it as the tree can not be fully traversed. You would need a self-join per layer.
The nested set is easier to query (when you have understood how it works) but is harder to update. Many writes require additional updates to other objects ion the tree which might make it harder to be transitionally save.
A third variant is that of the materialized path which I personally consider a good compromise between the former two.
That said, if you want to store arbitrary size trees (e.g,. for sections, sub-sections, sub-sub-sections, ...) you should use one of the mentioned tree implementations. If you have a very limited maximum depth (e.g max 3 layers) you could get away with creating an explicit data structure. But as things always get more complex than initially though, I'd advise you to use a real tree implementation.
In my project (ASP.NET MVC + NHibernate) I have all my entities, lets say Documents, described by set of custom metadata. Metadata is contained in a structure that can have multiple tags, categories etc. These terms have the most importance for users seeking the document they want, so it has an impact on views as well as underlying data structures, database querying etc.
From view side of application, what interests me the most are the string values for the terms. Ideally I would like to operate directly on the collections of strings like that:
class MetadataAsSeenInViews
{
public IList<string> Categories;
public IList<string> Tags;
// etc.
}
From model perspective, I could use the same structure, do the simplest-possible ORM mapping and use it in queries like "fetch all documents with metadata exactly like this".
But that kind of structure could turn out useless if the application needs to perform complex database queries like "fetch all documents, for which at least one of categories is IN (cat1, cat2, ..., catN) OR at least one of tags is IN (tag1, ..., tagN)". In that case, for performance reasons, we would probably use numeric keys for categories and tags.
So one can imagine a structure opposite to MetadataAsSeenInViews that operates on numeric keys and provide complex mappings of integers to strings and other way round. But that solution doesn't really satisfy me for several reasons:
it smells like single responsibility violation, as we're dealing with database-specific issues when just wanting to describe Document business object
database keys are leaking through all layers
it adds unnecessary complexity in views
and I believe it doesn't take advantage of what can good ORM do
Ideally I would like to have:
single, as simple as possible metadata structure (ideally like the one at the top) in my whole application
complex querying issues addressed only in the database layer (meaning DB + ORM + at less as possible additional code for data layer)
Do you have any ideas how to structure the code and do the ORM mappings to be as elegant, as effective and as performant as it is possible?
I have found that it is problematic to use domain entities directly in the views. To help decouple things I apply two different techniques.
Most importantly I'm using separate ViewModel classes to pass data to views. When the data corresponds nicely with a domain model entity, AutoMapper can ease the pain of copying data between them, but otherwise a bit of manual wiring is needed. Seems like a lot of work in the beginning but really helps out once the project starts growing, and is especially important if you haven't just designed the database from scratch. I'm also using an intermediate service layer to obtain ViewModels in order to keep the controllers lean and to be able to reuse the logic.
The second option is mostly for performance reasons, but I usually end up creating custom repositories for fetching data that spans entities. That is, I create a custom class to hold the data I'm interested in, and then write custom LINQ (or whatever) to project the result into that. This can often dramatically increase performance over just fetching entities and applying the projection after the data has been retrieved.
Let me know if I haven't been elaborate enough.
The solution I've finally implemented don't fully satisfy me, but it'll do by now.
I've divided my Tags/Categories into "real entities", mapped in NHibernate as separate entities and "references", mapped as components depending from entities they describe.
So in my C# code I have two separate classes - TagEntity and TagReference which both carry the same information, looking from domain perspective. TagEntity knows database id and is managed by NHibernate sessions, whereas TagReference carries only the tag name as string so it is quite handy to use in the whole application and if needed it is still easily convertible to TagEntity using static lookup dictionary.
That entity/reference separation allows me to query the database in more efficient way, joining two tables only, like select from articles join articles_tags ... where articles_tags.tag_id = X without joining the tags table, which will be joined too when doing simple fully-object-oriented NHibernate queries.