How to model relationships in MongoDB? [duplicate] - sql

This question already has answers here:
MongoDB relationships: embed or reference?
(10 answers)
Closed 8 years ago.
I started my studies in MongoDB recently and I didn't understand much better how we make the relationship between the entities that we have in the system.
So, as I am used to make this relationships in the SQL way, I get kind of confused when I change the logic to think in NoSQL way.
I saw that MongoDB has to types of modeling: Embedded and Referenced.
If I understand correct, Referenced is like we do in SQL:
Example: 1-to-N
Create two tables to represent the entities, like User and Address.
Create an user object and an address object
Put the Address ID into an user object
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000"),
ObjectId("52ffc4a5d85242602e000001")
]
}
And the Embedded:
Create just one table, in this case User.
Create an user object and inside of it put the address object:
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address": [
{
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
},
{
"building": "170 A, Acropolis Apt",
"pincode": 456789,
"city": "Chicago",
"state": "Illinois"
}]
}
So, my questions are:
To use the best features of a NoSQL Database like MongoDB, I have to use Embedded Modeling ?
In Embedded Modeling I just create only one Entity and the entity that is inside, the address object in this case, will not have an ID, since I didn't create a table ?

You're right with MongoDB. As you said, you have two ways to stock data.
The best way depends on what you need.
If your address object is going to be used somewhere else, put it in another table, if not, you can put it directly in your object.
If you're using MongoDB with NodeJS, you can use MongooseJS. It's a kind of framework which has the particularity to define schemas for your mongodb objects. It works fine particulary with embedded objects because it add an objectId for every object you embed.

Related

Is having two dependent resources not compliance with the RESTFul approach?

Context
In our project, we need to represent resources defined by the users. That is, every user can have different resources, with different fields, different validations, etc. So we have two different things to represent in our API:
Resource definition: this is just a really similar thing to a json schema, it contains the fields definitions of the resource and its limitations (like min and max value for numeric fields). For instance, this could be the resource definition for a Person:
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
Resource instance: this is just an instance of the specified resource. For instance, for the Person resource definition, we can have the following instances:
[
{
"firstName": "Elena",
"lastName": "Gomez",
},
{
"firstName": "Elena2",
"lastName": "Gomez2",
},
]
First opinion
So, it seems this kind of presents some conflicts with the Restful API approach. In particular, I think it has some problems with the Uniform Interface. When you get a resource, you should be able to handle the resource without any additional information. With this design, you need to make an additional request to first get the resource definition. Let's see this with an example:
Suppose you are our web client. And you are logged in as an user with the Person resource. To show a person in the UI, you first need to know the structure of the Person resource, that is, you to do the following request: GET /resource_definitions/person. And then, you need to request the person object: GET /resource/person/123.
Second opinion
Others seem to think that this is not a problem and that the design is still RESTful. Every time you ask for something to an API, you need to know the format previously, is not self-documented in the API, so it makes sense for this endpoint to behave the same as the others.
Question
So what do you think? Is the proposed solution compliance with the RESTful approach to API design?
The simple solution is to add a link:
{
"_links": {
"describedby": {
"href": "https://example.com/person.schema.json",
"type": "application/schema+json"
}
},
"firstName": "Elena",
"lastName": "Gomez"
}
You could also put this in a header. This is semantically equivalent:
Link: <https://example.com/person.schema.json>; rel="describedby" type="application/schema+json"
It does not violate the uniform interface if there is no standard for this kind of stuff, but there is. RDF e.g. JSON-LD and schema.org vocab can handle most of these types. Even for REST there is an RDF vocab called Hydra, though the community is not that active nowadays.
As of the actual problem, I would look around, maybe RDF technologies or graph technologies are better for it, though I am not sure how much connection there is in your graph. If it is just a few types and instances, then I would probably stick to REST.
Ohh I see meanwhile, you used an actual JSON schema. Then that part is certainly uniform interface compatible. As of the instances you need to add something like type: "https://example.com/person.schema.json" and you are ok. Maybe a vendor specific JSON derived MIME type which describes what "type" means in this context if you want to be super precise or just use JSON-LD instead. https://www.w3.org/2019/wot/json-schema Or an alternative more common solution is using RDFS and XSD with JSON-LD instead of JSON Schema.

JSON API relationship with meta information

I have an entity contract with relationship contract_contacts that should be presented in JSON API format.
To be more clear here's the structure of my entities:
Contract
id
name
ContractContact
contract_id
contact_id
type
comment
Contact
id
name
Possible JSON API output will look like:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"type": "contract_contacts",
"id": "1"
},
{
"type": "contract_contacts",
"id": "2"
}
]
}
}
}
}
This approach is not good enough - you have to create additional resource for relation where you will store your contact and comment with type. You have to include with 2 levels deep to get you contact fields. Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship
with id from above
The second approach is seems hacky to me because it will use meta and it's up to you how to use it. Example:
{
"data": {
"type": "contracts",
"id": "1",
"attributes": {
"name": "Contract 1"
},
"relationships": {
"contacts": {
"data": [
{
"meta": {
"comment": "comment 1",
"type": 1
},
"type": "contacts",
"id": "10"
},
{
"meta": {
"comment": "comment 2",
"type": 2
},
"type": "contacts",
"id": "11"
}
]
}
}
}
}
This approach will simplify the mess with api requests that was in previous example.
But is that correct to POST/PUT/PATCH with meta fields as they are not supposed to be changed from client (or supposed to be)? I'm confused with this part.
The relationship that you are describing is often referred to as a has-many-through relationship: A contract has a many contacts through a contract_contacts. These is defined as a relationship that links two resources through an intermediate resource.
JSON:API specification does not provide first-level support for these kind of relationship. You should instead model them through separate resources as described by you as your first option. This allows you to create, modify and delete your intermediate resource in the same way as any other resource. Doing so reduces the complexity as the intermediate resource is just another resource type as any other.
You mentioned two problems with doing so:
You have to include with 2 levels deep to get you contact fields.
This is true but shouldn't be an issue. include query parameter allows your client to sideload resources any many level deep as it needs. The response document might be a little bit bigger than it would be if the information of the intermediate resource is stored on the relationship itself but that shouldn't be relevant in production after gzip.
Also in this case to create contract frontend should work with both resources:
Create contract contact and get id
Then Create contract with relationship with id from above
This is true and a serious limitation of the current stable version of JSON:API specification (v1.0). It's not directly related to has-many-through relationships so. It's a general limitation of the specification, which does not support creating, modifying and/or deleting more than one resource with one request.
An official Atomic Operations extension is proposed for v1.1 of the specification to address that limitation. It's very likely that these one or a similar proposal will be included in the upcoming version.
It might be tempting to store the information of the intermediate model as meta data on the relationship. But doing so will introduce serious limitations which for I would strongly recommend to not take that path:
The JSON:API specification does not cover changing meta data. You would need to introduce your own specification to create or update these meta data.
Client-side libraries for the JSON:API specification do not expect such information to be available as meta data of the relationship. It's very likely that the consumers will have a hard time processing the information.
Storing information of the intermediate resource would lock you into using resource linkage to express relationship information in a resource document. You would not be able to use related resource links. These may introduce serious performance issues as resource linkage requires to always lookup the IDs of related resources in the database, which is not required if using related resource links.

SQL Database for Magic Cardgame

For school I am creating a deckbuilder website based on Magic the gathering. It's the project that decides if I get my degree or not. Trough the website from Deckbrew I have been able to get data like the following:
[
{
"name": "About Face",
"id": "about-face",
"url": "https://api.deckbrew.com/mtg/cards/about-face",
"store_url": "http://store.tcgplayer.com/magic/urzas-legacy/about-face",
"types": [
"instant"
],
"colors": [
"red"
],
"cmc": 1,
"cost": "{R}",
"text": "Switch target creature's power and toughness until end of turn.",
"formats": {
"commander": "legal",
"legacy": "legal",
"vintage": "legal"
},
"editions": [
{
"set": "Urza's Legacy",
"rarity": "common",
"artist": "Melissa A. Benson",
"multiverse_id": 12414,
"flavor": "The overconfident are the most vulnerable.",
"number": "73",
"layout": "normal",
"price": {
"low": 0,
"average": 0,
"high": 0
},
"url": "https://api.deckbrew.com/mtg/cards?multiverseid=12414",
"image_url": "http://mtgimage.com/multiverseid/12414.jpg",
"set_url": "https://api.deckbrew.com/mtg/sets/ULG",
"store_url": "http://store.tcgplayer.com/magic/urzas-legacy/about-face"
}
]
}
]
It's obvious that it's in jSon format. I have found the way to turn this into objects and the structure of the project is 4-layer MVC with entity framework and C#, which is working (kinda)...The problem is the database. I have been working on it for 2 months now and I am not getting any further. The thing I get stuck on is the database. I have not seen much on how to create databases and that's where it goes wrong. I don't get how to build the database. The creation itself would work if I figured out how to include certain things...
1) Formats: if the card is legal in a format, Formats is filled with: "legacy": "legal", "commander":"legal", ... so only the legal formats are included.
2) Types and colors are just plain arrays of words, but since I'm very bad with databases I don't even know how to figure this one out.
3) Editions is something completely different. It's an array of the object Edition which I believe has to have a table of its own. The problem here is that I thought I needed to use a foreign key but since it's an array of Editions I don't really know how to start doing that either.
4) and then there's Price: It always has 3 values: low, average and high which can be 0 if there's no price known.
So here you have it. To me this database is very complex or maybe I am making it too complex. Is there anybody who can help me to get this database organized so I can get on with my project, because I'm so lost at the moment that I feel I am not going to get this ready by the end of next month and that would be awful.
1: No, you should include all.
2: Table with colors, standard m:n binding table in between mapping the card table with the color table. Not knowing how to make a m:n relationship thing makes me thing you skipped all classes... this is fundamental and basic.
3: Seems like "cardedition" is the main table actually, and everything before is a master type table. Not sure- I don't really do magic at all, so I lack what is called domain knowledge. Are cards changed so multiple editions exist? Why is that an array in json?
3: magic values, 0,1,2,3. What is the question?
To me this database is very complex
I suggest you start from scratch (making things easier) and just have maybe 10 or so tables. Go step by step. Follow what you learned, go to 3rd of 4th normal form and go relational.

Without JOINs, what is the right way to handle data in document databases?

I understand that JOINs are either not possible or frowned upon in document databases. I'm coming from a relational database background and trying to understand how to handle such scenarios.
Let's say I have an Employees collection where I store all employee related information. The following is a typical employee document:
{
"id": 1234,
"firstName": "John",
"lastName": "Smith",
"gender": "Male",
"dateOfBirth": "3/21/1967",
"emailAddresses":[
{ "email": "johnsmith#mydomain.com", "isPrimary": "true" },
{ "email": "jsmith#someotherdomain.com", "isPrimary": "false" }
]
}
Let's also say, I have a separate Projects collection where I store project data that looks something like that:
{
"id": 444,
"projectName": "My Construction Project",
"projectType": "Construction",
"projectTeam":[
{ "_id": 2345, "position": "Engineer" },
{ "_id": 1234, "position": "Project Manager" }
]
}
If I want to return a list of all my projects along with project teams, how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc?
Is it two separate queries? One for projects and the other for people whose ID's appear in the projects collection?
If so, how do I then insert the data about people i.e. full names, email addresses? Do I then do a foreach loop in my app to update the data?
If I'm relying on my application to handle populating all the pertinent data, is this not a performance hit that would offset the performance benefits of document databases such as MongoDB?
Thanks for your help.
"...how do I handle making sure that I return all the pertinent information about individuals in the team i.e. full names, email addresses, etc? Is it two separate queries?"
It is either 2 separate queries OR you denormalize into the Project document. In our applications we do the 2nd query and keep the data as normalized as possible in the documents.
It is actually NOT common to see the "_id" key anywhere but on the top-level document. Further, for collections that you are going to have millions of documents in, you save storage by keeping the keys "terse". Consider "name" rather than "projectName", "type" rather than "projectType", "pos" rather than "position". It seems trivial but it adds up. You'll also want to put an index on "team.empId" so the query "how many projects has Joe Average worked on" runs well.
{
"_id": 444,
"name": "My Construction Project",
"type": "Construction",
"team":[
{ "empId": 2345, "pos": "Engineer" },
{ "empId": 1234, "pos": "Project Manager" }
]
}
Another thing to get used to is that you don't have to write the whole document every time you want to update an individual field or, say, add a new member to the team. You can do targeted updates that uniquely identify the document but only update an individual field or array element.
db.projects.update(
{ _id : 444 },
{ $addToSet : "team" : { "empId": 666, "position": "Minion" } }
);
The 2 queries to get one thing done hurts at first, but you'll get past it.
Mongo DB is a document storage database.
It supports High Availability, and Scalability.
For returning a list of all your projects along with project team(details),
according to my understanding, you will have to run 2 queries.
Since mongoDb do not have FK constraints, we need to maintain it at the program level.
Instead of FK constraints,
1) if the data is less, then we can embed the data as a sub document.
2) rather than normalized way of designing the db, in MongoDb we need to design according to the access pattern. i.e. the way we need to query the data more likely. (However time for update is more(slow), but at the user end the performance mainly depends on read activity, which will be better than RDBMS)
The following link provides a certificate course on mongo Db, free of cost.
Mongo DB University
They also have a forum, which is pretty good.

Find relation between two entities in Freebase

I am new in Freebase and I have a simple question . I would like to use Freebase KB to find relation between two entities. For example if I have name entities "Washington" and "United States" , I would like to send a query to Freebase and get :
Location/Location/Capital or Null in the case of No relation.
Thank you very much.
If you only want to go one ply out (ie nearest neighbors), this is pretty simple to do using the reflection API if you're using the online version of Freebase. If you're using the bulk downloads, you'll need to work with whatever query engine you're using (probably SPARQL unless you converted the RDF to something else).
If you want to find the shortest path(es) regardless of who far apart they are, it becomes a graph search algorithm.
EDIT: If you only want to find capitols, you can fill in your IDs in this query:
[{
"type": "/location/administrative_division_capital_relationship",
"capital": [{
"id": null
}],
"administrative_division": [{
"id": null
}],
"limit": 1
}]
Note that for Washington, D.C., this will return null because the data isn't in Freebase.
If you need to handle arbitrary properties, you'll need to use reflection. See https://developers.google.com/freebase/mql/ch03#reflection