Creating simple ontologies/ hierarchies with Google Protocol Buffer format - serialization

I want to know if it is possible to represent a hierarchy of concepts like very simple ontologies in Google Protocol Buffer format? if yes, can someone give me an example?

Protocol Buffer has support for nested types, the link contains an example, which could be used to represent a simple hierarchy; although I'm not sure how useful it would be.

Related

How exactly do references in flatbuffers work?

According to the "Writing a schema" guide for Google FlatBuffers it is possible to share data using references: "Remember that you can share data (refer to the same string/table within a buffer), so factoring out repeating data into its own data structure may be worth it."
However, I don't quite understand how this is meant to be accomplished. I have a flatbuffer that I'm trying to reverse engineer and I discovered that there are multiple offsets pointing to the same string value. When I compile the decoded JSON file again, there are multiple occurences of that string. What exactly do I have to specify in the schema file to prevent this?
Thank you :)
JSON has no way to represent such references, so a buffer is "flattened" to a tree when output as JSON. Only at the binary level can a FlatBuffer represent a DAG. You can construct such a DAG simply by using a child offset twice in parent(s) while serializing.

Does it make sense to allow retrieval of data from OpenGL's context

I am trying to abstract some of OpenGLs concepts into an object oriented style, wrapping elements like Buffers, Arrays, Vertices etc. into objects that save their access-id, data-types, buffer-sizes, used indices etc. and provice further simplifications to their usage.
Though right now I mentioned: Does anyone actually want to reaccess this data that was once pushed into the GPU? Are functions like glGetBufferSubData actually ever used other than for Debugging, since the documentation of these functions on the official wiki isn't very elaborate and I have never seen it in any tutorial.
GL is the general conecpt that everything can be queried. Reading back stuff that you yourself put should be avoided and is usually more expensive than if you keep a local copy. However, there is also data which is generated by the GPU which you might read back. Examples of this are of course frambeuffer contents, textures you rendered into, or vertex data which you stored via transform feedback into a buffer. So yes, there are real use cases for things like glGetBufferSubData() (although I prefer buffer mappings in most situations).
If you need support for such operations is another matter entirely, and one whoch I think is off-topic here and primarily opinion-based. The problem with those abstractions one builds without the intended use case in mind is that one tends to over-abstract things. YMMV.
I wrote a program to generate meshes using transform feedback, and needed to read the data in buffers to save the resulting mesh.
The transform feedback generated the data. It wasn't data that I originally pushed there.
So, yes.

What design patterns for marshalling JSON APIs to/from SQL

I'm working on a first JSON-RPC/JSON-REST API. One of the conveniences of JSON is that it can easily represent structured data (a user may have multiple email addresses, multiple addresses), etc...
For example, the Facebook Graph API nicely represents the kind of thing that's handy to return as JSON objects:
https://fbcdn-dragon-a.akamaihd.net/hphotos-ak-ash3/851559_339008529558010_1864655268_n.png
However, in implementing an API such as this with a relational database, we end up shattering structured objects into very many tables (at least one for each list in the JSON object), and un-shattering them when responding to requests. So:
requires a lot of modelling (separate models for JSON object and SQL tables).
inconsistencies creep in between the models: e.g. user_id (in SQL) vs. userID (in JSON)
marshaling stuff between one model and the other is very time consuming (tedious, error-prone and pointless boilerplate).
What design-patterns exist to help in this situation?
I'm not sure you are looking for design patterns. I would look for tools that handle this better.
I assume that you want to be able to query these objects, and not just store them in TEXT fields. Many databases support XML fairly well, so I would convert the JSON to XML (with a library) and then store that in the database.
You may also want to consider a JSON document based database. That will definitely get you where you want to go.
If you don't need to be able to query these, or only need to query a very small subset of fields, just store the objects as text, and extract those query-able fields into actual columns. This way you don't need to touch the majority of the data, but you can still query the few fields you care about. (Plus you can index them for speedier lookup.)
I have always chosen to implement this functionality in a facade pattern. Since the point of the facade is to simplify (abstract) an underlying complexity as a boundary between two or more systems, it seemed like the perfect place to handle this.
I realize however that this does not quite answer the question. I am talking about the container for the marshalling while the question is about how to better manage the contents (the code that does the job).
My approach here is somewhat old fashioned, but since this an old question maybe that’s okay. I employ (as much as possible) stored procedures in the dB. This promotes better encapsulation than one typically finds with a code layer outside of the dB that has to “know about” dB structure. What inevitably happens in the latter case is that more than one system will be written to do this (one large company I worked at had at least 6 competing ESBs) and there will be conflicts. Also, usually the stored procedure scripting will benefit from some sort of IDE that will helps maintain contextual awareness of the dB structure.
So this approach - even though it is not a pattern per se - makes managing the ORM a lot easier.

Organizing interconnected objects

This is a generic question, I don't know if it belongs to Programming or StackOverflow.
I'm writing a litte simulation. Without going very deep into its details, consider that many kind of identities are involved. They correspond to Object since I'm using a OOP language.
There are Guys that inhabit the world simulated
There are Maps
A map has many Lots, that are pieces of land with some characteristics
There are Tribes (guys belong to tribes)
There is a generic class called Position to locate the elements
There are Bots in control of tribes that move guys around
There is a World that represents the world simulated
and so on.
If the simulated world was laid down as a database, the objects would be tables with lots of references, but in memory I have to use a different strategy. So, for example, a Tribe has an array of Guys as a property, The world has a, array of Bots, of Tribes, of Maps. A Map has a Dictionary whose key is a Position and whose value is a Lot. A Guy has a Position that is where he stands.
The way I lay down such connections is pretty much arbitrary. For example, I could have an array of Guys in the World, or an Array of guys per Lot (the guys standing on a piece of land), or an array of Guys per Bot (with the Guys controlled by the bot).
Doing so, I also have to pass around a lot of objects. For example, a Bot must have informations about the Map and opponent Guys to decide how to move its Guys.
As said, in a database I'd have a Guys table connected to the Lots table (indicating its position), to the Tribe table (indicating which Tribe it belongs to) and so it would also be easy to query "All the guys in Position [1, 5]". "All the Guys of Tribe 123". "All the Guys controlled by Bot B standing on the Lot b34 not belonging to the Tribe 456" and so on.
I've worked with APIs where to get the simplest information you had to make an instance of the CustomerContextCollection and pass it to CustomerQueryFactory to get back a CustomerInPlaceQuery to... When people criticize OOP and cite verbose abstractions that soon smell ridiculous, that's what I mean. I want to avoid such things and having to relay on deep abstractions and (anti pattern) abstract contexts.
The question is: what is the preferred, clean way to manage entities and collections of entities that are deeply linked in multiple ways?
It depends on your definition of "clean". In my case, I define clean as: I can implement desired behavior in an obvious, efficient manner.
Building OOP software is not a data modeling exercise. I'd suggest stepping back a little. What does each one of those objects actually do? What methods are you going to implement?
Just because "guys are in a lot" doesn't mean that the lot object needs a collection of guys; it only needs one if there are operations on a lot that affect all the guys in it. And even then, it doesn't necessarily need a collection of guys - it needs a way to get the guys in the lot. This may be an internally stored collection, but it could also be a simple method that calls back into the world to find guys matching a criteria. The implementation of that lookup should be transparent to anyone.
From the tenor of your questions, it seems like you're thinking of this from a "how do I generate reports" perspective. Step back and think of the behaviors you're trying to implement first.
Another thing I find extremely valuable is to differentiate between Entities and Values. Entities are objects where identity matters - you may have two guys, both named "Chris", but they are two different objects and remain distinct despite having the same "key". Values, on the other hand, act like ints. From your above list, Position sounds a lot like a value - Position(0,0) is Position(0,0) regardless of which chunk of memory (identity) those bits are stored in. The distinction has a bit effect on how you compare and store values vs. entities. For example, your Guy objects (entities) would store their Position as a simple member variable.
I've found a great reference for how to think about such things is Eric Evan's "Domain Driven Design" book. He's focused on business systems, but the discussions are very valuable for how you think about building OO systems in general I've found.
I would say that no 'true' answer exists to your core question -- a best way to manage collections of entities that are linked in multiple ways. It really depends on the kind of application (simulation) - here are some thoughts:
Is execution time important?
If this is the case, there is really no way around analyzing in which way your simulator will iterate over (query) the objects from the pool: sketch out the basic simulation loop and check what kind of events will require to iterate over what kind of model entities (I assume you are developing a discrete-event simulation?). Then you should organize the data structures in a way that optimizes the most frequent/time-consuming events (as opposed to "laying down the connections arbitrarily"). Additionally, you may want to use special data structures (such as k-d trees) to organize entities with properties that you need to query often (e.g., position data). For some typical problems, e.g. collision detection, there is also a whole lot of approaches to solve them efficiently (so look for suitable libraries/frameworks, e.g. for multi-agent simulation).
How flexible do you want to make it?
If you really want to make it super-flexible and really don't want to decide on the hierarchy of the model entities, why not just use an in-memory database? As you already said, databases are easily applicable to your problem (and you can easily save the model state, which may also be useful).
How clean is clean enough?
If you want to be absolutely sure that the rest of your simulator is not affected by the design choices you make in regards of your model representation, hide it behind an interface (say, ModelWorld), which defines methods for all the types of queries your simulator may invoke (this is orthogonal to the second point and may help with the first point, i.e. figuring out what kind of access pattern your simulator exhibits). This allows you to change implementations easily, without affecting any other parts of the simulator code.

Flexible Persistence Layer

I am designing an ASP.NET MVC 2 application. Currently I am leveraging Entity Framework 4 with switchable SQLServer and MySQL datastores.
A requirement recently surfaced for the application to allow user-defined models/entities to be manipulated. Now I'm unsure if a SQL/relational database is appropriate at all; instead of adding/removing 'Employee' objects, for example, the user should be able to define an 'Employee' and what properties it has - effectively adding/removing tables and columns on the fly, at runtime.
Is SQL unsuitable for this? Are there options which allow me to stay within a relational database structure and still satisfy this requirement? Within the Entity Framework, can I regenerate .edmx files 'on the fly' or are there alternatives which achieve similar goals?
I've looked briefly at other options like 'document-based' dbs and 'schema-free/no-sql' dbs, such as MongoDb. I've also looked at some serialization formats such as Google's Protocol Buffers, JSON, and XML. From your experience, are any of these particularly suitable for this purpose? Serialization performance is not a big concern.
The application is in its infancy and I have no time constraints. Essentially I am free to rewrite it as I please, so if scrapping and starting over is a better alternative, I am very open to this. What are your suggestions? Thanks in advance!
Before looking at options I'd suggest (if you have not already done it :-) that you need to get a clear definition of exactly what users will be able to define. Once you have that you can then deduce an idea of the level of flexibility needed and therefore the type of data store needed to do the job.
One other word of advice would be that if they clients demand to be able to create anything any way they want - walk away. I've dealt with clients and users at all levels and one thing that is guaranteed is is that users have no interest if the effective and efficient design of data and therefore will always reduce the data to a pile of poo through shear neglect.
You need to set some boundaries so that the data store behind the system maintains some integrity.