Best DB for represent a tree structure - sql

Which is the best DB for a tree strucutre?
I have different kinds of objects which can have parent or child objects . The structure of this objects is dinamic, ej: some can have a 'name' field while others dont, some can have a 'menu' field and others an 'image' field.
One element can have 1000 fields(or attributes) while other can have just 1.
An SQL database is discarted, because it can not be schemaless
Currently, I am storing this in mongoDB, but I think it is not the most appropiate thing, because I can not have infinite childs or parents on one document(its limited to 16mb) so I have to make a separate document for every object and then one of the greatest advantaje of mongodb is lost.
Another solution migth be a graph db, im not familiar with them, but they seem the perfect solution, a tree is a graph after all.
So what do you think?

A graph database sounds like the right answer. Please consider looking at TinkerPop which is an open source graph technology stack. It enables connection to most any graph database (Neo4j, Titan, OrientDB, Bitsy, etc.) in an agnostic way. Obviously, that enables you to try out different graph implementations to find the right one for you.
While far from being performant, compared to true graph databases, there's even a MongoDB implementation of a graph. I'd recommend starting with a simple in-memory TinkerGraph and a Gremlin REPL to begin your learning process.

Take a look at the graph databases. Neo4j is leading here.

Related

What's the best database for my data structure?

I have two data structures that I need to store in a database. At this point, I'm relatively sure that SQL and any relational database types wouldn't work, but I'm also not sure what alternatives I have and/or which of those alternatives would be best. If there is a reasonable way to implement these structures in mySQL or something similar, I'm open to the idea.
Structure 1:
A nested tree diagram, where nodes are not defined ahead of time, and are instead generated from the data. I have a lot of strings that I need to separate into trees such that each branch node on the tree is empty and each leaf node contains a maximum of 200 strings, all beginning with the same prefix. I would use SQL, but considering I will regularly have upwards of 9.45x10^55 nodes (branch and leaf), I can't use the tree traversal method; adding a single node would take too much time.
Structure 2:
I have an array of the leaf nodes from the above structure, however, every leaf node has its own data associated with it, yet not contained within it.
From my (extremely limited) understanding of SQL, the second structure can be implemented in mySQL or something similar. The problem is, I need to be able to retrieve individual nodes from the 2nd structure, instead of the entire array of nodes. Also, I don't know the length of the array ahead of time, so I can't simply make a table with a certain number of columns available for each node: I'd end up having over 9.09x10^55 columns, when I will regularly be only using 5 or less.
If you have any recommendations as to what kind of database I could use to implement these structures relatively easily, or any advice pertaining to the implementation itself, it would be greatly appreciated.

Using QAbstractItemModel to represent data in a SQL database

I am trying to create a QTreeView to display data from a SQL database. This is a large database, so simply loading the data into a QStandardItemModel seems prohibitive.
None of Qt's pre-built SQL model classes are sufficient for the task. Therefore it seems necessary to subclass QAbstractItemModel.
In the first place, I can find no examples where this is done, so I am wondering whether it is the correct approach.
Implementing QAbstractItemModel::data is pretty straightforward. I am uncertain how to implement QAbstractItemModel::parent.
Qt's "Simple Tree Model Example" example would be informative, but in that example the tree structure is represented in memory with the TreeItem class. I could copy that, but if I am going to duplicate the database structure, it would be just as easy to use QStandardItemModel. If I need to maintain a separate data structure (in addition to the database and the QAbstractItemModel subclass) to represent the tree structure, is there any advantage to subclassing QAbstractItemModel over just using a QStandardItemModel?
The challenge in the tree structure is to always be able to identify a model index's parent (i.e., overloading the parent() method). In the Simple Tree example, this is done by storing the three structure in a separate data structure. For large SQL queries this is impractical. For the right database structure, you might be able to calculate the proper parent node given the child, but that is not a guarantee. The only alternative I can imagine is passing a quint32 to QAbstractItemModel::createIndex which encodes the item's parent.
One performance consideration that might be useful. After giving up on sublcassing QAbstractItemModel, I tried populating a QStandardItemModel from the database. I loaded about 1200 items into the model, and four child items to each item with two separate database calls. This took about 3 seconds on a 2009 laptop. That is faster than I had been expecting. (And there would be performance gains if I used a single query instead of repeated queries.)
In the end I went another route: having several QTableViews in a the GUI, with signals and slots to show different aspects of the data. My code is much simpler, and the proper functionality is in place, so this feels like the "right" solution.

Database access: one master database object or have objects call queries themselves?

For a hobby project I am building an application to keep track of my money. Register everything that comes in and goes out. I am using sqlite as a database backend.
I have two data access models in mind.
Creating one master object as a sort of database connector, which contains methods which execute the queries and provide the required sets of data as a list of objects
Have objects who need data execute the queries themselves
Which one of these is 'the best' and why? Or are there different, better models out there?
The latter option is better. In the first option, you would end up having to touch your universal data access object for just about any update to the code that wasn't purely a change in display logic. If you have different data access objects, then you will have much more testable, maintainable code.
I suggest you read up a bit on the model-view-controller paradigm. The wikipedia article on it is a good start: http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller.
Also, you didn't say which language/platform you were coding in, but most platforms have numerous options for auto-generating a starting point your data access classes from your database. You may find something like that useful.
Much of a muchness really, the thing to avoid is having the "same" sql sprinkled all over your code base.
The key point is. You've just added a new column to Table1. When you do Find In Files "Table1", how many hits are you going to get and where.
If you use one class and there's a lot of db operations, it's going to get very messy very quickly, but if you have one interface (say IModel) with one implementation, you can swap backends very easily.
So how many db operations, and how likely is it you will move away from SqlLite.

Tree data structure persistence in Ruby

I have a project where I need to build and store large trees of data in Ruby. I am considering different approaches for serialization, deserialization and querying of trees, and I am wondering what would be the best way to go. My major constraints are read time, query efficiency and and cross-version/cross-platform compatibility. The most frequent operation is to retrieve sets of nodes based on a combination of id/value and/or feature(s).Trees can be up to 15-20 levels deep. Moving subtrees is an uncommon procedure, but should be possible without too much black magic. Rails integration is not a primary concern. The options I thought about, along with some issues I'm concerned about, are the following:
Marshal the trees, and when needed load them into memory and query them in Ruby (inefficiency as tree grows, cross-version compatibility?)
Same as above, but use YAML (more cross-version compatible, but less efficient?)
Same as above, but use a custom XML parser (need to recreate objects from scratch each time the tree is loaded?)
Serialize the trees to XML, store them in an XML database (e.g. Sedna) and use XPath to query the trees (no experience with this approach, not sure about efficiency?)
Use adjacency lists to query trees stored in an schema-less database (inefficiency when counting descendants?)
Use materialized paths (potential of overfilling the max string length for deep trees?)
Use nested sets (complex SQL queries?)
Use the array of ancestors approach? Seems interesting in terms of querying efficiency according to the MongoDB page, but I haven't been able to find any serious discussion of this algorithm.
Based on your experience, which approach would better fit with the constraints I have described? If I go for an XML database, are there ones that would be more suited for this project? Are there other approaches I have overlooked that would be more efficient? Thanks for your time.
Trees work really well with graph databases, such as neo4j: http://neo4j.org/learn/
Neo4j is a graph database, storing data in the nodes and relationships of a graph. The most generic of data structures, a graph elegantly represents any kind of data, preserving the natural structure of the domain.
Ruby has a good interface for the trees:
https://github.com/andreasronge/neo4j
Pacer is a JRuby library that enables very expressive graph traversals. Pacer allows you to create, modify and traverse graphs using very fast and memory efficient stream processing. That also means that almost all processing is done in pure Java, so when it comes the usual Ruby expressiveness vs. speed problem, you can have your cake and eat it too, it's very fast!
https://github.com/pangloss/pacer
Neography is like the neo4j.rb gem and suggested by Ron in the comments (thanks Ron!)
https://github.com/maxdemarzi/neography
Since you are considering a SQL approach, here are some things to think about.
First, how big are the trees? For many applications, 10,000 leafs would seem big. Yet this is small for a database. On any decent database system (like a laptop), you should be able to store hunreds of thousands or millions of leafs in memory.
What a database buys you over other approaches is:
-- Not having to worry about memory/disk performance. When the data spills over to disk, you don't take a big hit on performance. By comparison, consider what happens when a hash table overflows memory.
-- Being able to add indexes to optimize performance.
-- Being able to alter your access path for the tree "just" by modifying SQL
One of the problems with standard SQL is that you can represent a tree node as a simple pair: , , . Then, with a simple join, you can move between parents and leafs. However, the joins accumulate as you move up the tree.
Sigh. Different databases have different solutions for this. SQL Server has recursive CTEs, which let you traverse the tree. Oracle has another approach for tree structures.
This starts to get complicated.
Perhaps a better approach is to assign a "leaf" id based on the hierarchy in the tree. So, if this is a binary tree, then "10011" would be the node at right branch, left branch, left branch, right branch, right branch. There you would store information . . . such as whether it has children and whatever else. Getting the parent is easy, because you can just truncate the last digit.
You can see how this would generalize to non-binary trees. Having any number of children could pose a little challenge.
I believe this may be related to the "array of ancestors" approach.
As I think about it, I think this would work pretty well. I would then suggest that you define separate stored procedures for each action that you want:
usp_tree_FetchNode (NodeId)
usp_tree_GetParent (NodeId)
usp_tree_NodeDelete (NodeId)
usp_tree_FetchSubTree (NodeId)
etc. etc. etc.
Although SQL does not really support object-oriented programming, you can still organize your code with clean naming conventions and good function wrappers.
I actually think this might work and provide a pretty good method for developing the code. One nice side effect is that you can analyze the tree outside the application, which might suggest future enhancements.
Have you looked at ancestry gem? I've used it for simple trees, but by the description it looks to fit on your requirements.

Is Dataset an ORM?

I am a little bit confused about Dataset compared to ORM (NHibernate or Spring.Net). From my understanding the ORM sits between the application layer and the database layer. It will generate the SQL commands for the application layer. Is this the same as what Dataset does? What is the difference between the Dataset and ORM? What are the advantages and disadvantages for these two methods? Hope the experts in here can explain something.
Thanks,
Fakhrul
There is a BIG difference between them, first of all about the programming model they represent:
The Dataset is based on a Table Model
An ORM (without specify a particular product of framework) is based and tends to a Domain Model.
There is another kind of tool which could be used in data scenario, this kind of tool is a Data Mapper (eg. iBatis.NET)
As others answers before me, I think it's important to view what Microsoft says about Dataset and better what Wikipedia says about ORM, but I think (this was for me at beginning) it's more to understand the difference between them in terms of model. Understanding that will not only clarify the choises behind but better, will do too easy to approach and understand a tool itself.
As little explanation it's possible to say:
Table Model
is a model which tends to represent tabular data in a memory structure as close as possible (and even as needed). So it's easy to find implementations which implements concepts as Table, Columns, Relations in fact the model is concetrate on the table structure, so object orientation is based on that not on data itself. This model could has its own advantages, but in some case could be heavy to manage and difficult to apply concepts on contained data. As previous answers says, implementations like Dataset, let, or better, force you to prepare (even if with a tool) needed SQL instructions to perform actions over the data.
ORM
is a model which (as mendelt says before me..) where Objects are mapped directly to database objects, principally Tables and Views (even if it's possible to map even functions and procedures too). This is done in 2 ways generally, with a mapping file which describes the mapping, or with (in case of .NET or Java) code Attributes. This model is based on Objects which represents the data, so object orientation could be done on them as in normal programs, it's clear with more attention and caution in certain cases, but generally, when you are confident with ORM it could be a really powerfull tool! Even ORM could be heavy to manage if it's not managed and designed well, or better understood weel, so it's important to understand techniques, but I can say with my experience that ORM is a really powerfull tool. In ORM, the tool principally it's responsible to generate the SQL instructions needed as operations are done in code, and in more cases ORMs has a middle language (like HQL) to perform operations on Objects.
MAPPER
A mapper is a tool which doesn't makes things like an ORM, but, maps hand written SQL instructions to an Object Model. Thi kind of tool could be a better solution when it's needed to write by hand SQL instructions but It's wanted to designe an application Object model to represent data.
In this "model" objects are mapped to instruction and described in a mapping file (generally an Xml file as iBatis.Net or iBATIS (java) does). A mapper let you define granular rules in SQL instructions. In this scenario could be easy to find some ORM concepts as for example session management.
ORM and Mappers let to apply some very interesting Design Patterns, which could be not so easy to apply in the same way to a Table Model and in this case to a Dataset.
First of all excuse me for this long answer and about my poor english, but for me, an answer like this makes me in past to understand well the difference between this models and then between implementations.
the Dataset class is definitly not an ORM; an ORM maps relational data with an object oriented representation.
It can be regarded as some kind of 'unit of work' though, since it keeps track of the rows that have to be deleted/updated/inserted.
ADO.NET DataSet =
http://msdn.microsoft.com/en-us/library/zb0sdh0b(VS.80).aspx
ORM =
http://en.wikipedia.org/wiki/Object-relational_mapping
(Example Developer Express
XPO,DataObjects.NET)
ORM is based on mapping between objects and tables. Not the case for this dataset. Dataset is itself in a way directly to the table. ORM is based on a minimum of SQL script. But enough to use the dataset you write SQL clause. Dataset in this case is not an ORM.
Look at dataset and ORM.
No, Datasets are not ORM's. They may look like orms because datasets map tables to objects just like ORM's the main difference lies in what objects they map to.
Datasets have their own table and row object types that closely resemble the structure of the database. You're rebuilding part of the database's relational model in objects. Restricting these objects into something resembling a relational database gets around some of the problems inherent in mapping a database to an object model.
An ORM maps the tables and rows from the database into your own object model. The structure of your object model can be optimized for your application instead of resembling a relational database. The ORM takes care of the difficulties in transforming a relational model into an object model.
DataSet is a DTO, a data transfer object. DataSet itself can't do anything. You can use a DataAdapter (of the provider used) to produce sql or call predefined queries, though it still isn't doing anything.