What is the difference between object-oriented and document databases?

What is the difference between object-oriented and document databases? - ravendb

What is the difference between object-oriented and document databases?
I didn't use object-oriented databases, but when I use document database (RavenDb) I store and read usual object-oriented classes without problems.

I went from db4o (OODB) to RavenDB (document DB). The big difference, that I found, is that object DBs store the full objects, and when an object gets stored with another object within it, that sub-object is stored in full and it's the latest version of that object. With a document DB, objects are still stored, but they're organized differently. An aggregate/root object will store parts of a sub-object so that the aggregate/root object is self-contained. When you retrieve the root object, you're not reaching out and grabbing objects that are related to it.
An OODB would store a Team this way:
TeamName
City
List<Player> // The entire player objects would be stored here
A document DB would store a team this way:
TeamName
City
List<string> PlayerNames
PlayerNames would be stored here, because that's all the team object needs.
RavenDB has a good explanation of the theory of document DBs here:
http://ravendb.net/docs/theory/document-structure-design

Related

Expressing related documents as one object in RavenDB

I'm just getting started with RavenDB and have studied the documentation about related documents. The system that I am thinking about working on has two classes, Family and Member.
A Family can have many Members and a Member can belong to many Families. Members can be retrieved individually as well and updated. From what I'm reading, the proper way to model this is to have a Document for Family and another document for Member. And to have Family contain a list of MemberIds.
Great!
The above approach is described on the RavenDB site at https://ravendb.net/docs/article-page/3.0/csharp/indexes/querying/handling-document-relationships
But, this means that you handle each of the object separately. You have to "fetch" each of the Member object individually by using a separate Load command. Even if you use an Include to "prefetch" all of the related objects into the session, you still have to issue a separate Load command for each one.
This means that the Family object doesn't contain a list of Member objects but only contains a list of MemberIds.
Here is my question: Is there a way to have RavenDB return a complex object from related documents so that when you get back a Family object, it will contain a list of the members objects that are related? I know that if you put the member objects directly in the family object, this can be done, but that will not allow a member object to be used elsewhere (other families).
Furthermore, is there a way to tell RavenDB to "deconstruct" certain embedded lists into related documents instead of storing it in the same document?
It seems like this would be a very helpful thing to have.

You can use a transformer for that, yes.

How should I link archived objects to core data managed objects?

I need to persist up to a maximum of about 100 complex objects (call them Object A). I say complex because each object is composed of other nested heirarchical objects.
I decided against storing them in core data because of their complex object graphs, so I was thinking of using archiving for persisting these objects.
However, I need to form relationships between these objects and other managed objects in core data (call them Object B). The cardinality is one object A (archived) to many object Bs (core data).
My question is, what would be the best way of doing this? I thought of using UUIDs for each archived object A and storing references to those UUIDs as string attributes in core data for Object B.
But I understand there may be performance and storage penalties associated with doing this. Is there another type of object ID for Object A perhaps that I may use?

It seems for all the effort you are going to put into mapping between CoreData and your archived object, it would just be easier to put it all through CoreData. If you have "complex" properties in this object that makes using CoreData undesirable, don't forget that CoreData has a transformable property.
This might be what you need.

Difference between DataSource and DataSet

I am currently working on project whose main task is to read data stored in SQL database and to display them in user-friendly form. Programming language used is C++. I am working in Borland C++ Builder 6 environment. But I think question posed in title is independent from programming language or libraries. When reading data from db i am quite frequently meeting with these terms in class names without knowing exactly what they represent. I understand that they behave as interface to data stored in db. But why there is need to use two interface classes instead of one?

DataSource = How you connect to your database
DataSet = Structure of your database in memory
More in details (from the Exam 70-516: TS: Accessing Data with Microsoft .NET Framework 4 book):
DataSource This is the primary property to which you assign your data. You can
assign anything that implements the IList, IListSource, IBindingList, or IBindingListView
interface. Some examples of items that can be assigned to the DataSource property are
arrays (IList), lists (IList), data tables (IListSource), and data sets (IListSource).
DataSet is a memory-based, tabular, relational representation of data and is the primary disconnected data object. Conceptually, think of DataSet as an in-memory relational database, but it’s simply cached data and doesn’t provide any of the transactional properties (atomicity, consistency, isolation, durability) that are essential to today’s relational databases. DataSet contains a collection of DataTable and DataRelation objects

Assuming you are talking about the .NET ecosystem, these two terms mean very different things.
A DataSet is a class representing relational data in the process memory (that is, outside the database) - normally populated from a database. It represents tables and relationships between them (say foreign key constraints).
DataSource is an attribute in data binding - assigning an object to a control on the DataSource property binds a source of data (such as a DataSet) to a control.

Where should I store virtual/calculated/complex object fields in my models?

I have models corresponding to database tables. For example, the House class has "color", "price", "square_feet", "real_estate_agent_id" columns.
It is very common for me to want to display the agent name when I display information about a house. As a result, my House class has the following fields:
class House {
String color;
Double price;
Integer squareFeet;
Integer realEstateAgentId;
String realEstateAgentName;
}
I've been referring to realEstateAgentName as a virtual field, as it is pulled from a foreign table (join on real_estate_agent_id).
This doesn't feel right to me, as it mixes actual database columns with foreign object's properties. But it's quick, and in many cases it really works out well.
Other times I find myself doing something like this:
class House {
String color;
Double price;
Integer squareFeet;
Integer realEstateAgentId;
RealEstateAgent realEstateAgent;
}
As you can see, I'm storing the actual object corresponding to the ID that is stored in the House table.
I tend to make the decision to store the entire object vs some key information associated with the ID (e.g. Name) depending on the likelihood I see of needing to access other information about the object it represents.
I have a few questions:
Of the two methods I've been mixing and matching, which is best? I'm leaning towards storing the id + the object, rather than pulling out just the properties from the foreign object that I think I may need. Of the two, this seems more "correct." But it's not perfect, because in many cases I don't have any need to hydrate the entire foreign object, and doing so would cause undue waste of resources or would not be feasible because of the amount of data or the number of joins that would be required when I don't have any use for all the info being brought in. Given that this is the case, it seems like a poor design choice because I will have lots of null fields that aren't really null in my database, but are so in memory simply because there was no need to populate them -- now I have to keep track of which ones I populated.
But is it best practice to store an ID alongside the object it represents? Should I even be storing the object as a property, or should it live externally in some map, with the ID being the key?
In an Object world it seems like the ID shouldn't even be stored as a property, with the foreign Object it represents being the logical replacement. But with everything being tightly coupled with a relational database it doesn't seem very feasible.
Is this frustrating impurity of my models/classes something I just have to live with, or are there patterns out there that address this by having some kind of fork or parent/child subclassing going on where one is a "pure" object while the other is flat like the database?
EDIT: I am looking for design suggestions here rather than specific ORM frameworks like Hibernate/nHibernate/etc. The particular language I'm working in does not have an ORM solution for my language version that I am satisfied with, and the examples were Java-esque but that's not what my source code is written in.

I can tell about Hibernate, because this is the ORM tool I am most familiar with. I believe that other ORM tools also support similar behaviour to some extent.
Hibernate solves your problem with lazy loading. You add your agent as a property to the house, and by default, when the house object is loaded, the agent is represented by a proxy object generated by Hibernate, which contains only the ID. If you query some other property of the agent, Hibernate loads the full object in the background:
class House {
String color;
Double price;
Integer squareFeet;
RealEstateAgent realEstateAgent;
// getters, setters,...
}
House house = (House) session.load(House.class, new Long(123));
// at this point, house refers to a proxy object created by Hibernate
// in the background - no house or agent data has been loaded from DB
house.getId();
// house still refers to the proxy object
RealEstateAgent agent = house.getRealEstateAgent();
// house is now loaded, but agent not - it refers to a proxy object
String name = agent.getName(); // Now the agent data is loaded from DB
OTOH if you are sure that for a specific class you (almost) always need a specific property, you can specify eager loading in the ORM mapping for that property, in which case the property is loaded as soon as the containing object. In the mapping you can also specify whether you want a join query or a subselect query.

LINQ to SQL uses ID + Object and it works out well. I prefer that model as it's most flexible. Hibernate can do the same. One issue you will face is deep loading: when do you actually load the object and not just the ID? Both LINQ to SQL and Hibernate have lazy loading and give you control over this issue.
The Entity Framework however looks to give you this complete control where you can decide just how the data appears regardless the physical underpinnings. It has not been fully realized yet however.
There's really no impurity going on here. The problem is you're trying to represent an abstraction of data that is relationship in an object oriented fashion. To get around the pains of developing like this, larger scale projects are moving to Domain Driven Design where the underlying data is abstracted out into logical groupings of Repositories. Thinking in tables as classes can be problematic for large scale solutions.
Just my 2 cents.

Hibernate, the most popular ORM tool in the Java ecosystem, usually allows you to do this:
class House {
String color;
Double price;
Integer squareFeet;
RealEstateAgent realEstateAgent;
}
This translates to a DB-table that looks like this: house(id, color, price, squareFeet, real_estate_agent_id)
If you need to print the name of the agent you just walk traverse the object graph:
house.getRealEstatAgent().getName()
Through lazy loading, this is done quite efficiently. I wouldn't worry about the fact that an extra query trip to the database may have to be done until your stress tests prove this to be a problem.
Edit after your edit:
All the solutions out there have dealt with the paradigm mismatch (between the OO and Relational worlds) in a similar fashion. The designs have been made, the problem is solved. And yes, it remains a pain in the butt to deal with as an application developer but I suppose it is just the way it is as long as we want to use relational databases and object oriented persistence together.

how to model value object relationships?

context:
I have an entity Book. A book can have one or more Descriptions. Descriptions are value objects.
problem:
A description can be more specific than another description. Eg if a description contains the content of the book and how the cover looks it is more specific than a description that only discusses how the cover looks. I don't know how to model this and how to have the repository save it. It is not the responsibility of the book nor of the book description to know these relationships. Some other object can handle this and then ask the repository to save the relationships. But BookRepository.addMoreSpecificDescription(Description, MoreSpecificDescription) seems difficult for the repository to save.
How is such a thing handled in DDD?

The other two answers are one direction (+1 btw). I am coming in after your edit to the original question, so here are my two cents...
I define a Value Object as an object with two or more properties that can (and is) shared amongst other entities. They can be shared only within a single Aggregate Root, that's fine too. Just the fact that they can (and are) shared.
To use your example, you define a "Description" as a Value Object. That tells me that "Description" with multiple properties can be shared amongst several Books. In the real-world, that does not make sense as we all know each book has unique descriptions written by the master of who authored or published the book. Hehe. So, I would argue that Descriptions aren't really Value Objects, but themselves are additional Entity objects within your Book Aggregate Root Entity boundery (you can have multiple entities within a single aggregate root's entity). Even books that are re-released, a newer revision, etc have slightly different descriptions describing that slight change.
I believe that answers your question - make the descriptions entity objects and protect them behind your main Book Entity Aggregate Root (e.g. Book.GetDescriptions()...). The rest of this answer addresses how I handle Value Objects in Repositories, for others reading this post...
For storing Value Objects in a repository, and retrieving them, we start to encroach onto the same territory I wrestled with myself when I went switched from a "Database-first" modeling approach to a DDD approach. I myself wreslted with this one, on how to store a Value Object in the DB, and retrieve it without an Identity. Until I stepped back and realized what i was doing...
In Domain Driven Design, you are modeling the Value Objects in your domain - not your data store. That is the key phrase. It means you are not designing the Value Objects to be stored as independant objects in the data store, you can store them however you like!
Let's take the common DDD example of Value Objects, that being an Address(). DDD presents that an Mailing Address is the perfect Value Object example, as the definition of a Value Object is an object of who's properties sum up to create the uniqueness of the object. If a property changes, it will be a different Value Object. And the same Value Object 9teh sum of its properties) can be shared amongst other Entities.
A Mailing Address is a location, a long/lat of a specific location on the planet. Multiple people can live at the address, and when someone moves, the new people to occupy the same Mailing Address now use the same Value Object.
So, I have a Person() object with a MailingAddress() object that has the address information in it. It is protected behind my Person() aggregate root with get/update/create methods/services.
Now, how do we store that and share it amongst the people in the same household? Ah, there lies DDD - you aren't modeling your data store straight from your DDD (even though, that would be nice). With that said, you simple create a single Table that presents your Person object, and it has the columns for your mailing address within it. It is the job of your Repository to re-hydrate that information back into your Person() and MailingAddress() object from the data store, and to split it up during the Create/Update operations.
Yep, you'd have duplicate data now in your data store. Three Person() entities with the same mailing address all now have three seperate copies of that Value Object data - and that is ok! Value Objects are meant to be copied and destoyed quite easily. "Copy" is the optimum word there in the DDD playbook.
So to sum up, Domain Drive Design is about modeling your Domain to represent your actual business use of the objects. You model a Person() entity and a MailingAddress Value Object seperately, as they are represented differently in your application. You persist them a copied-data, that being additional columns in the same table as your Person table.
All of the above is strict-DDD. But, DDD is meant to be just "suggestions", not rules to live by. That's why you are free to do what myself and many others have done, kind of a loose-DDD style. If you don't like the copied data, your only option is that being you can create a seperate table for MailingAddress() and stick an Identity column on it, and update your MailingAddress() object to have now have that identity on it - knowing you only use that identity to link it to other Person() objects that share it (I personally like a 3rd many-to-many relationship table, to keep the speed of the queries up). You would mask that Idenity (i.e. internal modifier) from being exposed outside of your Aggregate Root/Domain, so other layers (such as the Application or UI) do not know of the Identity column of the MailingAddress, if possible. Also, I would create a dedicated Repository just for MailingAddress, and use your PersonService layer to combine them into the correct object, Person.MailingAddress().
Sorry for the rant... :)

First, I think that reviews should be entities.
Second, why are you trying to model relationships between reviews? I don't see a natural relationship between them. "More specific than" is too vague to be useful as a relationship.
If you're having difficulty modeling the situation, that suggests that maybe there is no relationship.

I agree with Jason. I don't know what your rationale is for making reviews value objects.
I would expect a BookReview to have BookReviewContentItems so that you could have a method on the BookReview to call to decide if it is specific enough, where the method decides based on querying its collection of content items.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas