ElasticSearch mapping for nested enumerable objects (i18n) - lucene

I'm at a loss as to how to map a document for search with the following structure:
{
"_id": "007ff234cb2248",
"ids": {
"source1": "123",
"source2": "456",
"source3": "789"
}
"names": [
{"en":"Example"},
{"fr":"exemple"},
{"es":"ejemplo"},
{"de":"Beispiel"}
],
"children" : [
{
"ids": {
"source1": "CXXIII",
"source2": "CDLVI",
"source3": "DCCLXXXIX",
}
names: [
{"en":"Example Child"},
{"fr":"exemple enfant"},
{"es":"Ejemplo niño"},
{"de":"Beispiel Kindes"}
]
}
],
"relatives": {
// Typically no "ids" at this level.
"relation": 'uncle',
"children": [
{
"ids": {
"source1": "0x7B",
"source2": "0x1C8",
"source3": "0x315"
},
"names": [
{"en":"Example Cousin"},
{"fr":"exemple cousine"},
{"es":"Ejemplo primo"},
{"de":"Beispiel Cousin"}
]
}
]
}
}
The child object may appear in the children section directly, or further nested in my document as uncle.children (cousins, in this case). The IDs field is common to levels one (the root), level two (the children and the uncle), and to level three (the cousins), the naming structure is also common to levels one and three.
My use-case is to be able to search for IDs (nested objects) by prefix, and by the whole ID. And also to be able to search for child names, following an (as yet undefined) set of analyzer rules.
I haven't been able to find a way to map these in any useful way. I don't believe I'll have much success using the same technique for ids and names, as there's an extra level of mapping between names and the document root.
I'm not even certain that it is even mappable. I believe at least in principle that the ids should be mappable as terms, and perhaps that if I index the names as terms in some way, too.
I'm simply at a loss, and the documentation doesn't seem to cover anything like this level of complex mapping.
I have limited (read: no) control of the document as it's coming from the CouchDB river, and the upstream application already relies on this format, so I can't really change it.
I'm looking for being able to search by the following pseudo conditions, all of which should match:
ID: "123"
ID by source (I don't know how best to mark this up in pseudo language)
ID prefix: "CDL"
Name: "Example", "Example Child"
Localized name (I don't even know how best to pseudo-mark this up!
The specifics of tokenising and analysis I can figure out for myself, when I at least know how to map
Objects when both the key and the value of the object properties are important
Enumerable objects when the key and value are important.

If the mapping from an ID to its children is 1-to-many, then you could store the children's names in a child field, as a field can have multiple values. Each document would then have an ID field, possibly a relation field, and zero or more child fields.

Related

How to sort Redis list of objects using the properties of object

I have JSON data(see the ex below) which I'm storing in Redis list using 'rpush' with a key as 'person'.
Ex data:
[
{ "name": "john", "age": 30, "role": "developer" },
{ "name": "smith", "age": 45, "role": "manager" },
{ "name": "ram", "age": 35, "role": "tester" },
]
Now when I get this data using lrange person 0 -1, it gives me results as '[object object]'.
So, to actually get them with property names I'm storing them by stringifying them and parsing them back to objects to use the object properties.
But the issue with converting to a string is that I'm not able to sort them using any property, say name, age or role.
My question is, how do I store this JSON in Redis list and sort them using any of the properties.
Thanks.
Very recently I posted an answer for a very similar question.
The easiest approach is to use Redis Search module (which makese the approach portable to many clients / languages):
Store each needed object as separate key, following a prefixed key pattern (keys named prefix:something), and standard schema (all user keys are JSON, and all contain the field you want to sort).
Make a search index with FT.CREATE, with ON JSON parameter to search JSON-type keys, and likely PREFIX parameter to search just the needed keys, as well as x AS y parameters for all needed search fields, where x is field name, and y is type of field (TEXT, TAG, NUMERIC, etc. -- see documentation), optionally adding SORTABLE to the fields if they need to be sorted.
Use FT.SEARCH command with any combination of "#field=value" search parameters, and optionally SORTBY.
Otherwise, it's possible to just get all keys that follow a pattern using KEYS command, and use manual language-specific sorting code. That is of course more involved, depends on language and available libraries, and is therefore less portable.

How do I require only one of a set of properties are specified with json schema

I have three properties: one, two, three.
If one of those properties is specified the other two must not be included. So this is a mutual exclusion rule.
I tried to write this rule in a concise way, but this appears not to work:
"oneOf": [
{
"required": ["one"],
"not": {"required": ["two", "three"]}
},
{
"required": ["two"],
"not": {"required": ["one", "three"]}
},
{
"required": ["three"],
"not": {"required": ["one", "two"]}
},
]
That will only throw an error if all three are specified together instead of just more than one. I almost want something like an enum, but for properties- to be able to say only one of this list of properties can be specified.
EDIT
Per user comments I removed the nots and that worked, but I'm really disappointed in the error message:
- (root): Must validate one and only one schema (oneOf)
- myObj.0: Must validate one and only one schema (oneOf)
Super not helpful. It doesn't say anything about which properties are failing validation. Is there a way to describe this in a way where users will get an error that looks more like:
- myObj.0: Must include one and only one of properties one, two, or three
Otherwise it kind of leaves you in the dark and forces you to review the actual schema instead of making it more obvious.
You can use maxProperties: 1 along with additionalProperties: false.
additionalProperties: false prevents the inclusion of properties you don't define, which means they have to use yours, but then maxProperties: 1 will require that they can only use one of them.

Can I compose two JSON Schemas in a third one?

I want to describe the JSON my API will return using JSON Schema, referencing the schemas in my OpenAPI configuration file.
I will need to have a different schema for each API method. Let’s say I support GET /people and GET /people/{id}. I know how to define the schema of a "person" once and reference it in both /people and /people/{id} using $ref.
[EDIT: See a (hopefully) clearer example at the end of the post]
What I don’t get is how to define and reuse the structure of my response, that is:
{
"success": true,
"result" : [results]
}
or
{
"success": false,
"message": [string]
}
Using anyOf (both for the success/error format check, and for the results, referencing various schemas (people-multi.json, people-single.json), I can define a "root schema" api-response.json, and I can check the general validity of the JSON response, but it doesn’t allow me to check that the /people call returns an array of people and not a single person, for instance.
How can I define an api-method-people.json that would include the general structure of the response (from an external schema of course, to keep it DRY) and inject another schema in result?
EDIT: A more concrete example (hopefully presented in a clearer way)
I have two JSON schemas describing the response format of my two API methods: method-1.json and method-2.json.
I could define them like this (not a schema here, I’m too lazy):
method-1.json :
{
success: (boolean),
result: { id: (integer), name: (string) }
}
method-2.json :
{
success: (boolean),
result: [ (integer), (integer), ... ]
}
But I don’t want to repeat the structure (first level of the JSON), so I want to extract it in a response-base.json that would be somehow (?) referenced in both method-1.json and method-2.json, instead of defining the success and result properties for every method.
In short, I guess I want some kind of composition or inheritance, as opposed to inclusion (permitted by $ref).
So JSON Schema doesn’t allow this kind of composition, at least in a simple way or before draft 2019-09 (thanks #Relequestual!).
However, I managed to make it work in my case. I first separated the two main cases ("result" vs. "error") in two base schemas api-result.json and api-error.json. (If I want to return an error, I just point to the api-error.json schema.)
In the case of a proper API result, I define a schema for a given operation using allOf and $ref to extend the base result schema, and then redefine the result property:
{
"$schema: "…",
"$id": "…/api-result-get-people.json",
"allOf": [{ "$ref": "api-result.json" }],
"properties": {
"result": {
…
}
}
}
(Edit: I was previously using just $ref at the top level, but it doesn’t seem to work)
This way I can point to this api-result-get-people.json, and check the general structure (success key with a value of true, and a required result key) as well as the specific form of the result for this "get people" API method.

Product attributes db structure for e-commerce

Backstory:
I'm building an e-commerce web app (online store)
Now I got to the point of choosing a database system and an appropriate design.
I got stuck with developing a design for product attributes
I've been considering of choosing NoSQL (MongoDB) or SQL database systems
I need you advice and help
The problem:
When you choose a product type (e.g. table) it should show you the corresponding filters for such a type (e.g. height, material etc.). When you choose another type, say "car", it provides you with the car specific filter attributes (e.g. fuel, engine volume)
For example, here on one popular online store if you choose a data storage type you get a filter fo this type attributes, such as hard drive size or connection type
Question
What approach is the best for such a problem? I described some below, but maybe you have your own thoughts in regard to it
MongoDB
Possible solution:
You can implement such product attrs structure pretty easy.
You can create one collection with a field attrs for each product and put there whatever you want, like they suggest here (field "details"):
https://docs.mongodb.com/ecosystem/use-cases/product-catalog/#non-relational-data-model
The structure will be
Problem:
With such a solution you don't have product types at all so you can't filter the products out by their types. Each product contains it's own arbitrary structure in attrs field and don't follow any pattern
Ir maybe I can somehow go with this approach?
SQL
There are solutions like single table where all the products store in one table and you end up with as many fields as an attribute number of all the products taken together.
Or for every product type you create a new table
But I won't consider these ones. One is very bulky and another one isn't much flexible and requires a dynamic scheme design
Possible solution
There is one pretty flexible solution called EAV https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
Our schema would be:
EAV
Such a design may be done on MongoDB system, but I'm not sure it's been made for such a normalised structure
Problem
The schema is going to get really huge and really hard to query and grasp
If you choose SQL database, take a look PostgreSQL which supports JSON features. Not necessarily you need to follow Database normalization.
If you choose MongoDB, you need to store attrs array with generic {key:"field", value:"value"} pairs.
{id:1, attrs:[{key: "prime", value: true}, {key:"height", value:2}, {key:"material", value:"wood"},{key:"color", "value":"brown"}]}
{id:2, attrs:[{key: "prime", value: true}, {key:"fuel", value:"gas"}, {key:"volume", "value":3}]}
{id:3, attrs:[{key: "prime", value: true}, {key:"fuel", value:"diesel"}, {key:"volume", "value":1.5}]}
Then you define Multi-key index like this:
db.collection.createIndex({"attrs.key":1, "attrs.value":1})
If you want apply step-by-step filters, use MongoDB aggregation with $elemMatch operator
☑ Prime
☑ Fuel
☐ Other
...
☑ Volume 3
☐ Volume 1.5
Query's representation
db.collection.aggregate([
{
$match: {
$and: [
{
attrs: {
$elemMatch: {
key: "prime",
value: true
}
}
},
{
attrs: {
$elemMatch: {
key: "fuel"
}
}
},
{
attrs: {
$elemMatch: {
key: "volume",
"value": 3
}
}
}
]
}
}
])
MongoPlayground

REST API Design: Subcollection inheriting an editable variation of a resource

Bear with me, as I'm new to API Design and stackoverflow.
For my API design I have three main components:
key/value pairs (a singleton resource)
group (a collection of KVPs)
subgroup (a variation of the corresponding group)
In my API the user should be able to:
create, retrieve, update, and delete KVPs (basic CRUD actions)
organize KVPs into groups and access a listing of those groups
create subgroup variations of the groups that automatically inherit the parent groups' KVPs, and in those subgroups they can edit each KVPs data independently
For example, my Create a New Group request in my documentation looks like this:
{
"group": "age",
"kvps" : [
{
"key": "height",
"value": 5.2
}, {
"key": "weight",
"value": 150
}
],
"subgroups": [
"adult",
"teen",
"baby"
]
}
Using this as an example, I would like the subgroups "adult", "teen", and "baby" to have their own versions of the KVPs "height" and "weight" each with a different value (e.g. adult with a height of 6 and weight of 200, teen with a height of 5.2 and weight 140, etc.) that can be edited independently from each other.
My question: How should I structure my api so that a user could either:
have just editable kvps
have any number of editable kvps organized into any number of groups
or
have any number of editable kvps organized into groups and then further into subgroups
WITHOUT duplicating any of the data?
So far I have a uri structure like this:
/settings/kvps/{kvp_id} (CRUD on one single KVP)
/settings/groups/{group_id} (CRUD on one single group)
/settings/groups/{group_id}/kvps/{kvp_id}
/settings/groups/{group_id}/subgroups/{subgroup_id}
/settings/groups/{group_id}/subgroups/{subgroup_id}/kvps/{kvp_id}
So the kvps have 3 possible URIs, which seems messy to me, and from what I know editing the value of a kvp at
/settings/groups/1/subgroups/1/kvps/1
would also change its value at
/settings/groups/1/subgroups/2/kvps/1
/settings/groups/1/subgroups/3/kvps/1
/settings/groups/1/kvps/1
/settings/kvps/1
and so on, which is the opposite of what I want.
Any ideas?
Once again, I apologize if this is a very simple question that I am way off of, this is my first time designing an API.