What's the best database for my data structure? - sql

I have two data structures that I need to store in a database. At this point, I'm relatively sure that SQL and any relational database types wouldn't work, but I'm also not sure what alternatives I have and/or which of those alternatives would be best. If there is a reasonable way to implement these structures in mySQL or something similar, I'm open to the idea.
Structure 1:
A nested tree diagram, where nodes are not defined ahead of time, and are instead generated from the data. I have a lot of strings that I need to separate into trees such that each branch node on the tree is empty and each leaf node contains a maximum of 200 strings, all beginning with the same prefix. I would use SQL, but considering I will regularly have upwards of 9.45x10^55 nodes (branch and leaf), I can't use the tree traversal method; adding a single node would take too much time.
Structure 2:
I have an array of the leaf nodes from the above structure, however, every leaf node has its own data associated with it, yet not contained within it.
From my (extremely limited) understanding of SQL, the second structure can be implemented in mySQL or something similar. The problem is, I need to be able to retrieve individual nodes from the 2nd structure, instead of the entire array of nodes. Also, I don't know the length of the array ahead of time, so I can't simply make a table with a certain number of columns available for each node: I'd end up having over 9.09x10^55 columns, when I will regularly be only using 5 or less.
If you have any recommendations as to what kind of database I could use to implement these structures relatively easily, or any advice pertaining to the implementation itself, it would be greatly appreciated.

Related

Best DB for represent a tree structure

Which is the best DB for a tree strucutre?
I have different kinds of objects which can have parent or child objects . The structure of this objects is dinamic, ej: some can have a 'name' field while others dont, some can have a 'menu' field and others an 'image' field.
One element can have 1000 fields(or attributes) while other can have just 1.
An SQL database is discarted, because it can not be schemaless
Currently, I am storing this in mongoDB, but I think it is not the most appropiate thing, because I can not have infinite childs or parents on one document(its limited to 16mb) so I have to make a separate document for every object and then one of the greatest advantaje of mongodb is lost.
Another solution migth be a graph db, im not familiar with them, but they seem the perfect solution, a tree is a graph after all.
So what do you think?
A graph database sounds like the right answer. Please consider looking at TinkerPop which is an open source graph technology stack. It enables connection to most any graph database (Neo4j, Titan, OrientDB, Bitsy, etc.) in an agnostic way. Obviously, that enables you to try out different graph implementations to find the right one for you.
While far from being performant, compared to true graph databases, there's even a MongoDB implementation of a graph. I'd recommend starting with a simple in-memory TinkerGraph and a Gremlin REPL to begin your learning process.
Take a look at the graph databases. Neo4j is leading here.

Using QAbstractItemModel to represent data in a SQL database

I am trying to create a QTreeView to display data from a SQL database. This is a large database, so simply loading the data into a QStandardItemModel seems prohibitive.
None of Qt's pre-built SQL model classes are sufficient for the task. Therefore it seems necessary to subclass QAbstractItemModel.
In the first place, I can find no examples where this is done, so I am wondering whether it is the correct approach.
Implementing QAbstractItemModel::data is pretty straightforward. I am uncertain how to implement QAbstractItemModel::parent.
Qt's "Simple Tree Model Example" example would be informative, but in that example the tree structure is represented in memory with the TreeItem class. I could copy that, but if I am going to duplicate the database structure, it would be just as easy to use QStandardItemModel. If I need to maintain a separate data structure (in addition to the database and the QAbstractItemModel subclass) to represent the tree structure, is there any advantage to subclassing QAbstractItemModel over just using a QStandardItemModel?
The challenge in the tree structure is to always be able to identify a model index's parent (i.e., overloading the parent() method). In the Simple Tree example, this is done by storing the three structure in a separate data structure. For large SQL queries this is impractical. For the right database structure, you might be able to calculate the proper parent node given the child, but that is not a guarantee. The only alternative I can imagine is passing a quint32 to QAbstractItemModel::createIndex which encodes the item's parent.
One performance consideration that might be useful. After giving up on sublcassing QAbstractItemModel, I tried populating a QStandardItemModel from the database. I loaded about 1200 items into the model, and four child items to each item with two separate database calls. This took about 3 seconds on a 2009 laptop. That is faster than I had been expecting. (And there would be performance gains if I used a single query instead of repeated queries.)
In the end I went another route: having several QTableViews in a the GUI, with signals and slots to show different aspects of the data. My code is much simpler, and the proper functionality is in place, so this feels like the "right" solution.

Tree data structure persistence in Ruby

I have a project where I need to build and store large trees of data in Ruby. I am considering different approaches for serialization, deserialization and querying of trees, and I am wondering what would be the best way to go. My major constraints are read time, query efficiency and and cross-version/cross-platform compatibility. The most frequent operation is to retrieve sets of nodes based on a combination of id/value and/or feature(s).Trees can be up to 15-20 levels deep. Moving subtrees is an uncommon procedure, but should be possible without too much black magic. Rails integration is not a primary concern. The options I thought about, along with some issues I'm concerned about, are the following:
Marshal the trees, and when needed load them into memory and query them in Ruby (inefficiency as tree grows, cross-version compatibility?)
Same as above, but use YAML (more cross-version compatible, but less efficient?)
Same as above, but use a custom XML parser (need to recreate objects from scratch each time the tree is loaded?)
Serialize the trees to XML, store them in an XML database (e.g. Sedna) and use XPath to query the trees (no experience with this approach, not sure about efficiency?)
Use adjacency lists to query trees stored in an schema-less database (inefficiency when counting descendants?)
Use materialized paths (potential of overfilling the max string length for deep trees?)
Use nested sets (complex SQL queries?)
Use the array of ancestors approach? Seems interesting in terms of querying efficiency according to the MongoDB page, but I haven't been able to find any serious discussion of this algorithm.
Based on your experience, which approach would better fit with the constraints I have described? If I go for an XML database, are there ones that would be more suited for this project? Are there other approaches I have overlooked that would be more efficient? Thanks for your time.
Trees work really well with graph databases, such as neo4j: http://neo4j.org/learn/
Neo4j is a graph database, storing data in the nodes and relationships of a graph. The most generic of data structures, a graph elegantly represents any kind of data, preserving the natural structure of the domain.
Ruby has a good interface for the trees:
https://github.com/andreasronge/neo4j
Pacer is a JRuby library that enables very expressive graph traversals. Pacer allows you to create, modify and traverse graphs using very fast and memory efficient stream processing. That also means that almost all processing is done in pure Java, so when it comes the usual Ruby expressiveness vs. speed problem, you can have your cake and eat it too, it's very fast!
https://github.com/pangloss/pacer
Neography is like the neo4j.rb gem and suggested by Ron in the comments (thanks Ron!)
https://github.com/maxdemarzi/neography
Since you are considering a SQL approach, here are some things to think about.
First, how big are the trees? For many applications, 10,000 leafs would seem big. Yet this is small for a database. On any decent database system (like a laptop), you should be able to store hunreds of thousands or millions of leafs in memory.
What a database buys you over other approaches is:
-- Not having to worry about memory/disk performance. When the data spills over to disk, you don't take a big hit on performance. By comparison, consider what happens when a hash table overflows memory.
-- Being able to add indexes to optimize performance.
-- Being able to alter your access path for the tree "just" by modifying SQL
One of the problems with standard SQL is that you can represent a tree node as a simple pair: , , . Then, with a simple join, you can move between parents and leafs. However, the joins accumulate as you move up the tree.
Sigh. Different databases have different solutions for this. SQL Server has recursive CTEs, which let you traverse the tree. Oracle has another approach for tree structures.
This starts to get complicated.
Perhaps a better approach is to assign a "leaf" id based on the hierarchy in the tree. So, if this is a binary tree, then "10011" would be the node at right branch, left branch, left branch, right branch, right branch. There you would store information . . . such as whether it has children and whatever else. Getting the parent is easy, because you can just truncate the last digit.
You can see how this would generalize to non-binary trees. Having any number of children could pose a little challenge.
I believe this may be related to the "array of ancestors" approach.
As I think about it, I think this would work pretty well. I would then suggest that you define separate stored procedures for each action that you want:
usp_tree_FetchNode (NodeId)
usp_tree_GetParent (NodeId)
usp_tree_NodeDelete (NodeId)
usp_tree_FetchSubTree (NodeId)
etc. etc. etc.
Although SQL does not really support object-oriented programming, you can still organize your code with clean naming conventions and good function wrappers.
I actually think this might work and provide a pretty good method for developing the code. One nice side effect is that you can analyze the tree outside the application, which might suggest future enhancements.
Have you looked at ancestry gem? I've used it for simple trees, but by the description it looks to fit on your requirements.

SQLite structure advice

I have a book structure with Chapter, Subchapter, Section, Subsection, Article and unknown number of subarticles, sub-subarticles, sub-sub-subarticles etc.
What's the best way to structure this?
One table with child-parent relationships, multiple tables?
Thank you.
To determine whether there are seperate tables or one-big-table involved, you should take a close look at each item - chapter, subchapter, etc. - and decide if they carry different attributes from the others. Does a chapter carry something different from a sub-chapter?
If so, then you're looking at seperate tables for Chapter, SubChapter, Section, SubSection, Article. Article still feels hierarchical to me with your sub- sub-sub- sub-sub-sub- etc.
If not, then maybe it is one big table with parent/child, but it looks like you may be talking about 'names' for the depth of the hierarchy which leans me toward seperate tables again.
Also consider how you'll query and what you'll be searching for.
There are a couple of methods to save a tree structure in a relational database. The most commonly used are using parent pointers and nested sets.
The first has a very easy data structure, namely a pointer to the respective parent element on each object), and is thus easy to implement. On the downside it is not easy to make some queries on it as the tree can not be fully traversed. You would need a self-join per layer.
The nested set is easier to query (when you have understood how it works) but is harder to update. Many writes require additional updates to other objects ion the tree which might make it harder to be transitionally save.
A third variant is that of the materialized path which I personally consider a good compromise between the former two.
That said, if you want to store arbitrary size trees (e.g,. for sections, sub-sections, sub-sub-sections, ...) you should use one of the mentioned tree implementations. If you have a very limited maximum depth (e.g max 3 layers) you could get away with creating an explicit data structure. But as things always get more complex than initially though, I'd advise you to use a real tree implementation.

Tree Structure With History in SQL Server

I'm not sure if it's a duplicate question, but I couldn't find anything on this subject.
What's the best way for storing tree structure in a table with ability to store history of changes in that structure.
Thank you for any help.
UPDATE:
I have Employees Table and I need to store the structure of branches, departments, sections, sub sections.. in a table.
I need to store historical information on employees branches, departments, to be able to retrieve an employee's branch, department, section, sub section even if the structure has been changed.
UPDATE 2:
There's a solution to save the whole structure in a history table on every change in the structure, but is that the best approach?
UPDATE 3:
There's also Orders Table. I must store employee's position, branch, department, section, sub section and other on every order. That's the main reason for storing history. It will be used very often. In another words I should be able to show db data for every past day.
UPDATE 4:
Maybe using hierarchyid is an option?
What if a node is renamed? What should I do, if I need the old name on old orders?
I think you are looking for something like this. It provides a complete tree structure. This is used for a directory, but it can be used for anything: divisions, departments, sections, etc.
The History is separate and it is best if you get your head aroun dthe Node structure before contemplating the history. For the History of any table, all that is required is the addition of a DateTime or TimeStamp column to the PK. The history stores before images of the current rows.
Functions such as (a) resolving the path of the tree and (b) finding the relevant history rows that were current at some point in time, are performed using pure SQL. With MS you can use recursive CTEs for (a), or write a simpler form in a stored proc.
For (b), I would implement a derived version of the table (Node, Department, whatever) that returns the rows that were current at the relevant point in time; then use that for the historic version of (a).
It is not necessary to copy and save the entire tree structure every time it changes.
Feel free to ask questions if you need any clarification.
Data Model
▶Tree Structure with History Data Model◀
Readers who are unfamiliar with the Relational Modelling Standard may find ▶IDEF1X Notation◀ useful.
Depending on how the historical information will be used will determine whether you need a temporal solution or simply an auditing solution. In a temporal solution, you would store a date range over which a given node applies in its given position and the "current" hierarchy is derived by using the current date and time to query for active nodes in order to report on the hierarchy. To say that this is a complicated solution is an understatement.
Given that we are talking about an employee hierarchy, my bet is that an auditing solution will suffice. In an auditing solution, you have one table(s) for the current hierarchy and store changes somewhere else that is accessible. It should be noted that the "somewhere else" doesn't necessarily need to be data. If changes to the company hierarchy are infrequent, then you could even use a seriously low-tech solution of creating a report (or series of reports) of the company hierarchy and store those in PDF format. When someone wanted to know what the hierarchy looked like last May, they could go find the corresponding PDF printout.
However, if it is desired to have the audit trail be queryable, then you could consider something like SQL Server 2008's Change Tracking feature or a third-party solution which does something similar.
Remember that there is more to the question of "best" than the structure itself. There is a cost-benefit analysis of how much effort is required vs. the functionality it provides. If stakeholders need to query for the hierarchy at any moment in time and it fluctuates frequently (wouldn't want to work there :)) then a temporal solution may be best but will be more costly to implement. If the hierarchy changes infrequently and granular auditing is not necessary, then I'd be inclined to simply store PDFs of the hierarchy on a periodic basis. If there is a real need to query for changes or granular auditing is needed, then I'd look at an auditing tool.
Change Tracking
Doing a quick search, here's a couple of third-party solutions for auditing:
ApexSQL
Lumigent Technologies
there are several approaches, however with your limited information I would say to use single table with parent relationship, for the history you can simply implement audit of the table
UPDATE: based on your new information I would not use a single table data store, it looks like your hierarchical structure is much more complex than a simple tree, also it looks like you have some well defined nodes of the structures especially at the top before getting into the sections and sub-sections; so multi-table relations might be a better fit;
as far as the audit tables, there are plenty of resources to find out what will work best for you, there are per row and per column audits, etc.
One thing to note is that you dont ahve to historize the tree structure. It always only grows, never gets smaller.
What chagnes is the USE of the nodes. Their data may change.
For eaxample a site.
/a/b/c
will be there forever. A may be a folder, b too, c a file. Later a may be a folter, b a tombstone (no data here at the moment) as c. But the tree in itself never changes.
Then, add a version number, and for every node a historiy / list of uses (node types) with start and possibly (can be null) end version.
The code for showing a version X then can build out the "real" tree for this moment easily.
To support moves of nodes, have a "rehook" type that indicates the noe was moved to another item at this version.
Voila.