I am trying to visually graph the relationship between components, where there components have a many to many relationship between each other.
If anyone is familiar with the database diagram tool in SQL server management studio or Linq to SQL/Entity framework designers in visual studio, they are very capable of finding a fairly optimal layout to cleanly represent tables in a database and how they relate to each other.
While I'm not trying to represent a database schema, the concept of representing "components" or "items" and their relationships to one another is perfectly analogous. I've tried the springy and cose layouts but they are essentially random and often don't provide optimal positioning of nodes. breadthfirst is a better choice, but nodes are positioned sub optimally, and do not reduce edges crossing each other.
Related
I want to create an application for creating graphical documents, each document consists of several geometric shapes (Ball, Brick, Cylinder, Cube).
So I created two diagrams for my application, as shown in this picture :
I want to know which diagram is better and why ? and what are the disadvantages and the advantages of both approaches.
Of course it depends on the requirements. But from a neutral standpoint the left one is definitely the better one since it has a better perspective for GraphicDocument. You don't need to know about the form of the elements only that they are shapes. So you can easily extend it without having to change GraphicDocument.
In other words: the left is loosely coupled while the right one is tightly coupled.
As Thomas says, it depends, but the right diagram has the potential to be more specific. That diagram opens the door to expressing existential quantification, which is to say that there must be some number of Cubes in a Document, for example. If you don't care to express that, the left diagram is clearly more expressive with fewer symbols.
I am writing 3D geometry visualization software for schools. I am designing my engine as an Entity-Component system, because it has served me well in games. In this case I have some specific requirements:
There is a limited amount of different geometries I need to render. I would like to render these in batches. So I render all lines as one batch, all triangles as one batch, all planes as one batch, ... It works well even with transparent objects, since I am using depth peeling and don't need to sort them by distance.
One logical object will typically have more than one mesh associated: e.g. a plane entity has a border "child"-entity that has four lines as its body, these lines all share the same material.
I would like to have a clean design, so I am trying to stay true to the no-code-in-components principle and same-structure for one type of components.
What I have now is: A different component type for each type of geometry (point, line, plane, ...). The corresponding system stores a batch with a mesh + instance data and renders it in one draw call. The instance data for different types of geometry is different, hence I decided to go with one component type per geometry type. (A bad design?)
Question:
Now I'm wondering how to handle entities that seem to need multiple components of the same type, like the plane border, that has a body consisting of four lines.
I could think of several solutions, which all have draw-backs:
1. Make each line of the border entity an entity itself. Each would have a "line" component and a "child" component. That would model the border and the lines as five entities, with the four lines attached to the border entity via "child" component. This seems like quite a waste of entities. Some special entities would have several dozens of children then.
2. Allow the border entity to have multiple components of the "line" type. This seems like a hack, since all ECS article I've seen discourage using multiple components of the same type on one entity.
3. Make a unified "geometry" component that may contain an arbitrary number of elementary geometries. That would introduce quite some indirections, but seems like the best solution to me, at the moment.
Could someone help me to sort this chaotic thoughts into a good solution? I'm sure I'm missing a straight-forward approach, but I just couldn't find one yet.
I have a lot of experience in programming (10+ years), but unfortunately, just recently started with Entity-Component systems. So I'm still struggling with the concept, it seems.
Thank you very much.
Which is the best DB for a tree strucutre?
I have different kinds of objects which can have parent or child objects . The structure of this objects is dinamic, ej: some can have a 'name' field while others dont, some can have a 'menu' field and others an 'image' field.
One element can have 1000 fields(or attributes) while other can have just 1.
An SQL database is discarted, because it can not be schemaless
Currently, I am storing this in mongoDB, but I think it is not the most appropiate thing, because I can not have infinite childs or parents on one document(its limited to 16mb) so I have to make a separate document for every object and then one of the greatest advantaje of mongodb is lost.
Another solution migth be a graph db, im not familiar with them, but they seem the perfect solution, a tree is a graph after all.
So what do you think?
A graph database sounds like the right answer. Please consider looking at TinkerPop which is an open source graph technology stack. It enables connection to most any graph database (Neo4j, Titan, OrientDB, Bitsy, etc.) in an agnostic way. Obviously, that enables you to try out different graph implementations to find the right one for you.
While far from being performant, compared to true graph databases, there's even a MongoDB implementation of a graph. I'd recommend starting with a simple in-memory TinkerGraph and a Gremlin REPL to begin your learning process.
Take a look at the graph databases. Neo4j is leading here.
I am trying to create a QTreeView to display data from a SQL database. This is a large database, so simply loading the data into a QStandardItemModel seems prohibitive.
None of Qt's pre-built SQL model classes are sufficient for the task. Therefore it seems necessary to subclass QAbstractItemModel.
In the first place, I can find no examples where this is done, so I am wondering whether it is the correct approach.
Implementing QAbstractItemModel::data is pretty straightforward. I am uncertain how to implement QAbstractItemModel::parent.
Qt's "Simple Tree Model Example" example would be informative, but in that example the tree structure is represented in memory with the TreeItem class. I could copy that, but if I am going to duplicate the database structure, it would be just as easy to use QStandardItemModel. If I need to maintain a separate data structure (in addition to the database and the QAbstractItemModel subclass) to represent the tree structure, is there any advantage to subclassing QAbstractItemModel over just using a QStandardItemModel?
The challenge in the tree structure is to always be able to identify a model index's parent (i.e., overloading the parent() method). In the Simple Tree example, this is done by storing the three structure in a separate data structure. For large SQL queries this is impractical. For the right database structure, you might be able to calculate the proper parent node given the child, but that is not a guarantee. The only alternative I can imagine is passing a quint32 to QAbstractItemModel::createIndex which encodes the item's parent.
One performance consideration that might be useful. After giving up on sublcassing QAbstractItemModel, I tried populating a QStandardItemModel from the database. I loaded about 1200 items into the model, and four child items to each item with two separate database calls. This took about 3 seconds on a 2009 laptop. That is faster than I had been expecting. (And there would be performance gains if I used a single query instead of repeated queries.)
In the end I went another route: having several QTableViews in a the GUI, with signals and slots to show different aspects of the data. My code is much simpler, and the proper functionality is in place, so this feels like the "right" solution.
So I have an interesting problem that's been the fruit of lots of good discussion in my group at work.
We have some scientific software producing SQLlite files, and this software is basically a black box. We don't control its table designs, formats, etc. It's entirely conceivable that this black box's output could change, and our design needs to be able to handle that.
The SQLlite files are entire databases which our user would like to query across. There are two ways (we see) of implementing this, one, to create a single database and a backend in Python that appends tables from each database to the master database, and two, querying across separate databases' tables and unifying the results in Python.
Both methods run into trouble when the black box produces alters its table structures, say for example renaming a column, splitting up a table, etc. We have to take this into account, and we've discussed translation tables that translate queries of columns from one table format to another.
We're interested in ease of implementation, how well the design handles a change in database/table layout, and speed. Also, a last dimension is how well it would work with existing Python web frameworks (Django doesn't support cross-database queries, and neither does SQLAlchemy, so we know we are in for a lot of programming.)
If you find yourself querying across databases, you should look into consolidating. Cross-database queries are evil.
If your queries are essentially relegated to individual databases, then you may want to stick with multiple databases, as clearly their separation is necessary.
You cannot accommodate arbitrary changes in a database's schema without categorizing and anticipating that change in some way. In the very best case with nontrivial changes, you can sometimes simply ignore new data or tables, in the worst case, your interpretation of the data will entirely break down.
I've encountered similar issues where users need data pivoted out of a normalized schema. The schema does NOT change. However, their required output format requires a fixed number of hierarchical levels. Thus, although the database design accommodates all the changes they want to make, their chosen view of that data cannot be maintained in the face of their changes. Thus it is impossible to maintain the output schema in the face of data change (not even schema change). This is not to say that it's not a valid output or input schema, but that there are limits beyond which their chosen schema cannot be used. At this point, they have to revise the output contract, the pivoting program (which CAN anticipate this and generate new columns) can then have a place to put the data in the output schema.
My point being: the semantics and interpretation of new columns and new tables (or removal of columns and tables which existing logic may depend on) is nontrivial unless new columns or tables can be anticipated in some way. However, in these cases, there are usually good database designs which eliminate those strategies in the first place:
For instance, a particular database schema can contain any number of tables, all with the same structure (although there is no theoretical reason they could not be consolidated into a single table). A particular kind of table could have a set of columns all similarly named (although this "array" violates normalization principles and could be normalized into a commonkey/code/value schema).
Even in a data warehouse ETL situation, a new column is going to have to be determined whether it is a fact or a dimensional attribute, and then if it is a dimensional attribute, which dimension table it is best assigned to. This could somewhat be automated for facts (obvious candidates would be scalars like decimal/numeric) by inspecting the metadata for unmapped columns, altering the DW table (yikes) and then loading appropriately. But for dimensions, I would be very leery of automating somethings like this.
So, in summary, I would say that schema changes in a good normalized database design are the least likely to be able to be accommodated because: 1) the database design already anticipates and accommodates a good deal of change and flexibility and 2) schema changes to such a database design are unlikely to be able to be anticipated very easily. Conversely, schema changes in a poorly normalized database design are actually more easy to anticipate as shortcomings in the database design are more visible.
So, my question to you is: How well-designed is the database you are working from?
You say that you know that you are in for a lot of programming...
I'm not sure about that. I would go for a quick and dirty solution not a 'generic' solution because generic solutions like the entity attribute value model often have a bad performance. Don't do client side joining (unifying the results) inside your Python code because that is very slow. Use SQL for joining, it is designed for that purpose. Users can also make their own reports with all kind of reporting tools that generate sql statements. You don't have to do everything in your app, just start with solving 80% of the problems, not 100%.
If something breaks because something inside the black box changes you can define views for backward compatibility that keeps your app functioning.
Maybe the scientific software will add a lot of new features and maybe it will change its datamodel because of those new features..? That is possible but then you will have to change your application anyways to take profit from those new features.
It sounds to me as if your problem isn't really about MySQL or SQLlite. It's about the sharing of data, and the contract that needs to exist between the supplier of data and the user of the same data.
To the extent that databases exist so that data can be shared, that contract is fundamental to everything about databases. When databases were first being built, and database theory was first being solidified, in the 1960s and 1970s, the sharing of data was the central purpose in building databases. Today, databases are frequently used where files would have served equally well. Your situation may be a case in point.
In your situation, you have a beggar's contract with your data suppliers. They can change the format of the data, and maybe even the semantics, and all you can do is suck it up and deal wth it. This situation is by no means uncommon.
I don't know the specifics of your situation, so what follows could be way off target.
If it was up to me, I would want to build a database that was as generic, as flexible, and as stable as possible, without losing the essential features of structured and managed data. Maybe, some design like star schema would make sense, but I might adopt a very different design if I were actually in your shoes.
This leaves the problem of extracting the data from the databases you are given, transforming the data into the stable format the central database supports, and loading it into the central database. You are right in guessing that this involves a lot of programming. This process, known as "ETL" in data warehousing texts, is not the simplest of programming challenges.
At least ETL collects all the hard problems in one place. Once you have the data loaded into a database that's built for your needs, and not for the needs of your suppliers, turning the data into valuable information should be relatively easy, at least at the programming or SQL level. There are even OLAP tools that make using the data as simple as a video game. There are challenges at that level, but they aren't the same kind of challenges I'm talking about here.
Read up on data warehousing, and especially data marts. The description may seem daunting to you at first, but it can be scaled down to meet your needs.