How to approach graph modeling using Cypher - should I use property or node? - cypher

I have books and authors. In SQL, I would have two tables and then I would create relations between them.
How does this work in the graph world? Should books and authors be separate nodes or are authors just additional node properties?
I've come up with the following code, but I'm not sure if it is redundant. I added authors both to Book node, and I've created a relationship to Author node.
CREATE
(b1:Book {title: "The Catcher in the Rye", author: "J.D. Salinger"}),
(b2:Book {title: "The Great Gatsby", author: "F. Scott Fitzgerald"}),
(b3:Book {title: "The Old Man and the Sea", author: "Ernest Hemingway"}),
(b4:Book {title: "For Whom The Bell Tolls", author: "Ernest Hemingway"}),
(a1:Author {name: "J.D. Salinger"}),
(a2:Author {name: "F. Scott Fitzgerald"}),
(a3:Author {name: "Ernest Hemingway"}),
(a1)-[:WROTE]->(b1),
(a1)-[:WROTE]->(b2),
(a3)-[:WROTE]->(b3),
(a3)-[:WROTE]->(b4)
Is adding authors to Book nodes redundant?

The answer to this question in some ways is "it depends".
I think in general, you would start by not including the author names as properties, so you would just build a structure along the lines of:
(:Author {name: "x"})-[:WROTE]->(:Book {title: "y"})
I have seen cases where it makes sense to store the information also as a property to avoid additional dereferences, but in general I would start with the structure above and only resort to having the information in multiple places if some very good reason arises.
The reasons tend to be unique to particular implementations. In terms of general graph data modeling I would start with the simple "Author wrote book" structure. That also has the advantage of only having to maintain one version of the information (in this case the edge between the author and the book)

Related

Is Keyed Node meant to be used with Lazy?

I am reading about optimization in the Elm Guide. It talks about keyed nodes, using US Presidents as an example:
import Html exposing (..)
import Html.Keyed as Keyed
import Html.Lazy exposing (lazy)
viewPresidents : List President -> Html msg
viewPresidents presidents =
Keyed.node "ul" [] (List.map viewKeyedPresident presidents)
viewKeyedPresident : President -> (String, Html msg)
viewKeyedPresident president =
( president.name, lazy viewPresident president )
viewPresident : President -> Html msg
viewPresident president =
li [] [ ... ]
Then give this as an explanation:
Now the Virtual DOM implementation can recognize when the list is resorted. It first matches all the presidents up by key. Then it diffs those. We used lazy for each entry, so we can skip all that work. Nice! It then figures out how to shuffle the DOM nodes to show things in the order you want. So the keyed version does a lot less work in the end.
My confusion is this: If I don't use lazy inside the keyed nodes, the Virtual DOM still has to diff every entry of the list, even if it can match some keys. It seems keyed nodes' usefulness really depends on the lazy inside. Is my understanding correct?
Let's consider an example:
name: Apple, price: $3.2, pic: 🍏
name: Banana, price: $2, pic: 🍌
name: Orange, price: $2.8, pic: 🍊
Now let's imagine that the user sorts by price:
name: Banana, price: $2, pic: 🍌
name: Orange, price: $2.8, pic: 🍊
name: Apple, price: $3.2, pic: 🍏
without keyed nodes, the diffing is going to look like this:
name: AppleBanana, price: $3.22, pic: 🍏🍌
name: BananaOrange, price: $22.8, pic: 🍌🍊
name: OrangeApple, price: $2.83.2, pic: 🍊🍏
which is going to issue in this example 9 replaceElement operations with 9 createTextElement operations (for example, the exact semantics might work slighly differently, but I think the point stands).
The keyed version will understand that the order changed and will issue a single removeChild and appendChild for the apple node.
Hence all the performance savings are on the DOM side. Now this is not just for performance, if those lists had input elements, keeping them keyed if you had your cursor in the Apple input, it would stay in the apple input, but if they weren't keyed, it would now be in the banana input.
You are correct that without lazy the diffing still happens, but the diffing is generally the cheap part, the more expensive part is actually patching the DOM, which is what keyed helps prevent.

Spacy NER - How to Identify People names using matcher patterns

I'm trying to identify the People names using following matcher patterns in Spacy. but this is identifying other words like 'my', and 'name'. Can anyone help me identify the issue in the pattern.?
person_pattern = [
{"label":"PERSON",
"pattern": [{'POS':'PROPN'}, {"ENT_TYPE": "PERSON"}],
"comment": "Spacy's in-built PERSON capure"
}]
Example:
My Name as in Google Record is Hannah, but i would like to modify Name as in AADHAR Hanna. My CDS ID is JANAN34
Result/Behavior:
text: My, pos_: PRON, ent_type_: PERSON
text: Name, pos_: NOUN, ent_type_: PERSON
I ran some sample code using your pattern and it seems that your pattern isn't matching anything, so the problem isn't with the Matcher. The problem seems to be with spaCy's NER models.
Your text is kind of unusual - "My Name as in..." is not normal capitalization, and the model seems to mistake it for an actual name. If you change "Name" to "name" then it's no longer detected as an entity.
I think this is just a case of your data not being similar to spaCy's training data, which is more like newspaper articles that use formal capitalization. The v3 models are a little weak to case changes at the moment because some data augmentation was accidentally left out when training them, but that should be resolved in the v3.1 release coming up soon.
If you have training data, you might look at training using spaCy's data augmentation to be more resilient to unusual data.

Product attributes db structure for e-commerce

Backstory:
I'm building an e-commerce web app (online store)
Now I got to the point of choosing a database system and an appropriate design.
I got stuck with developing a design for product attributes
I've been considering of choosing NoSQL (MongoDB) or SQL database systems
I need you advice and help
The problem:
When you choose a product type (e.g. table) it should show you the corresponding filters for such a type (e.g. height, material etc.). When you choose another type, say "car", it provides you with the car specific filter attributes (e.g. fuel, engine volume)
For example, here on one popular online store if you choose a data storage type you get a filter fo this type attributes, such as hard drive size or connection type
Question
What approach is the best for such a problem? I described some below, but maybe you have your own thoughts in regard to it
MongoDB
Possible solution:
You can implement such product attrs structure pretty easy.
You can create one collection with a field attrs for each product and put there whatever you want, like they suggest here (field "details"):
https://docs.mongodb.com/ecosystem/use-cases/product-catalog/#non-relational-data-model
The structure will be
Problem:
With such a solution you don't have product types at all so you can't filter the products out by their types. Each product contains it's own arbitrary structure in attrs field and don't follow any pattern
Ir maybe I can somehow go with this approach?
SQL
There are solutions like single table where all the products store in one table and you end up with as many fields as an attribute number of all the products taken together.
Or for every product type you create a new table
But I won't consider these ones. One is very bulky and another one isn't much flexible and requires a dynamic scheme design
Possible solution
There is one pretty flexible solution called EAV https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Our schema would be:
EAV
Such a design may be done on MongoDB system, but I'm not sure it's been made for such a normalised structure
Problem
The schema is going to get really huge and really hard to query and grasp
If you choose SQL database, take a look PostgreSQL which supports JSON features. Not necessarily you need to follow Database normalization.
If you choose MongoDB, you need to store attrs array with generic {key:"field", value:"value"} pairs.
{id:1, attrs:[{key: "prime", value: true}, {key:"height", value:2}, {key:"material", value:"wood"},{key:"color", "value":"brown"}]}
{id:2, attrs:[{key: "prime", value: true}, {key:"fuel", value:"gas"}, {key:"volume", "value":3}]}
{id:3, attrs:[{key: "prime", value: true}, {key:"fuel", value:"diesel"}, {key:"volume", "value":1.5}]}
Then you define Multi-key index like this:
db.collection.createIndex({"attrs.key":1, "attrs.value":1})
If you want apply step-by-step filters, use MongoDB aggregation with $elemMatch operator
☑ Prime
☑ Fuel
☐ Other
...
☑ Volume 3
☐ Volume 1.5
Query's representation
db.collection.aggregate([
{
$match: {
$and: [
{
attrs: {
$elemMatch: {
key: "prime",
value: true
}
}
},
{
attrs: {
$elemMatch: {
key: "fuel"
}
}
},
{
attrs: {
$elemMatch: {
key: "volume",
"value": 3
}
}
}
]
}
}
])
MongoPlayground

Should the response body of GET all parent resource return a list of child resource?

Please bear with me if the title is a bit confusing, I will try my best to explain my question below.
Say I have the following two endpoints
api/companies (returns a list of all companies like below)
[{name: "company1", id: 1}, {name: "company2", id: 2}]
api/companies/{companyeId}/employees (returns a list of all employees for a specific company like below)
[{name: "employee1", id: 1}, {name: "employee2", id: 2}]
What the client side needs is a list of companies, each one of which has a list of employees. The result should looks like this:
[
{
name: "company1",
id: 1,
employees: [ {name: "employee1", id: 1}, {name: "employee2", id: 2} ]
},
{
name: "company2",
id: 2,
employees: [ {name: "employee3", id: 3}, {name: "employee4", id: 4} ]
},
]
There are two ways I can think of to do this:
Get a list of company first and loop through the company list to
make a api call for each company to get its list of employees. (I'm wondering if this is a better way of design because of HATEOAS principle if I understand correctly? Because the smallest unit of resource of api/companies is company but not employees so client is expected to discover companies as the available resource but not employees.)
a REST client should then be able to use server-provided links dynamically to discover all the available actions and resources it needs
Return a list of employees inside each company object and then return a list of companies through api/companies. Maybe add a query parameter to this endpoint called responseHasEmployees which is a boolean default to be false, so when user make a GET through api/companies?responseHasEmployees=true, the response body will have a list of employees inside each company object.
So my question is, which way is a better way to achieve the client side's goal? (Not necessarily has to be the above two.)
Extra info that might be helpful: companies and employees are stored in different tables, and employees table has a company_fk column.
Start by asking yourself a couple of questions:
Is this a common scenario?
Is it logical to request data in this way?
If so, it might make sense to make data available in this way.
Next, do you already have api calls that pass variables implemented?
Based on your HATEOAS principle, you probably shouldn't. Clients shouldn't need to know, or understand, variable values in your url.
If not, stay away from it. Make it as clean to the client side as possible. You could make a third distinct api "api/companiesWithEmployees" This fits your HATEOAS principle, the client doesn't need to know anything about parameters or other workings of the api, only that they will get "Companies with Employees".
Also, the cost is minimal; an additional method in the code base. It's simpler for the client side at a low cost.
Next think about some of the developmental consequences:
Are you opening the door to more specific api requests?
Are you able to maintain a hard line on data you want accessible through the api?
Are you able to maintain your HATEOAS principle in that the clients know everything they need to know based on the api url?
Next incorporate scenarios like this into future api design:
Can you preemptively make similar api calls available? ie (Customers and Orders, would you simply make a single api call available that gets the two related to each other?)
Ultimately, my answer to your question would be to go ahead and make this a new api call. The overhead for setting up, testing, and maintaining this particular change seem extremely small, and the likelihood of data being requested in this way appears high.
I assume that the client you build is going to have an interface to view a list of companies where there will be an option to view employees of the company. So it is best to do it by pull on demand and not load the whole data at once.
If you can consider a property of your resource as a sub-resource, do not add the whole sub-resource data into the main resource API. You may include a referral link which can be used by the client to fetch the sub-resource data.
Here, in your case,
Main-Resource - Companies
Sub-Resource - Employees
Company name, contact number, address - These are properties of the company object and not the sub-resource of a company, whereas, employees can be very well considered as sub-resource.

RESTful API Design: PUT or POST for creating many-to-many relationships?

For designing and creating a RESTful API the following question occurs:
The API supports GET (for queries), POST (for creating), PUT (for updates) and DELETE (for deleting).
Lets assume in the database we have an article and a shop both already existing.
Now we need a rest call to link the article instance to the shop instance. Which of the following solutions is the best / most clean REST design:
/shop/id/article/id/ --> with POST
/shop/id/article/id/ --> with PUT
/shoparticlerelation/ --> with POST (object with ids in body)
/shoparticlerelation/ --> with PUT (object with ids in body)
If there is no clear answer or all solutions are equally good this may also be a valid answer if there is a clear argumentation why.
I presume in this situation you already have a collection of shops and a collection of articles, and you just wish to link two together.
One option is to expose a more db like 'resource' that presents this link, and have operations like
POST /shopArticleLinks HTTP/1.1
{ "shop" : xxx,
"article: YYY
}
I would personally look to expose it as a property of the shops and/or articles in a more natural manor, like
PUT /shop/<ID> HTTP/1.1
{ /* existing details */
"articles": [ /* list of articles */ ]
}
I've used JSON there, but of course use what ever format you want to use. I've also stuck with using PUT as you stated, but keep in mind that with PUT you should send a full replacement for the new modified version, PATCH can be used to send partial updates, but then you need to consider how you want do that, may something like
PATCH /shops/<ID>/articleLinks HTTP/1.1
{ "add" : [],
"remove : []
}
Don't forget that server side you can look at what articles are being refereed to and ensure they have a proper back pointer.
Additional thoughts
Regarding the second method, where you expose the link as a property of the shop and/or article resources. Keep in mind that it is perfectly acceptable (and in this case rather appropriate) that when you update the links in a given shop that the links in the corresponding articles are also updated.
/shop/id/article/id/
You cannot use this because at the moment you want to link them, this endpoint doesn't (or at least shouldn't) yet exist. It is the action of linking them together that should define this endpoint.
/shoparticlerelation/
You should not use this because a shoparticlerelation is not a resource / entity. Usually with rest, every named url segment represents a resource that can be CRUD-ed. /shops is a good example and so is /articles but this one isn't.
I suggest the following:
Define the following endpoints
/shops for POSTing new shops
/shops/id for operating on a single shop
/articles for POSTing new articles
/articles/id for operating on a single article
Then to link them together you can do a so called PATCH request, to update a shop's articles, or an article's shops:
PATCH /shops/1 HTTP/1.1
{
"attribute": "articles",
"operation": "add",
"value": "8" // the article id
}
and
PATCH /articles/9 HTTP/1.1
{
"attribute": "shops",
"operation": "add",
"value": "1" // the shop id
}
Based on your comments I made the assumption that an Article model has a list of Shops as attribute, and vice-versa, making this approach valid.
A PATCH request is used to modify an existing resource by specifying how and what to update. This is different from a PUT because a PUT replaces the entire resource with values from the request, however PATCH is only used to modify (not replace) a resource.