How to sort Redis list of objects using the properties of object - redis

I have JSON data(see the ex below) which I'm storing in Redis list using 'rpush' with a key as 'person'.
Ex data:
[
{ "name": "john", "age": 30, "role": "developer" },
{ "name": "smith", "age": 45, "role": "manager" },
{ "name": "ram", "age": 35, "role": "tester" },
]
Now when I get this data using lrange person 0 -1, it gives me results as '[object object]'.
So, to actually get them with property names I'm storing them by stringifying them and parsing them back to objects to use the object properties.
But the issue with converting to a string is that I'm not able to sort them using any property, say name, age or role.
My question is, how do I store this JSON in Redis list and sort them using any of the properties.
Thanks.

Very recently I posted an answer for a very similar question.
The easiest approach is to use Redis Search module (which makese the approach portable to many clients / languages):
Store each needed object as separate key, following a prefixed key pattern (keys named prefix:something), and standard schema (all user keys are JSON, and all contain the field you want to sort).
Make a search index with FT.CREATE, with ON JSON parameter to search JSON-type keys, and likely PREFIX parameter to search just the needed keys, as well as x AS y parameters for all needed search fields, where x is field name, and y is type of field (TEXT, TAG, NUMERIC, etc. -- see documentation), optionally adding SORTABLE to the fields if they need to be sorted.
Use FT.SEARCH command with any combination of "#field=value" search parameters, and optionally SORTBY.
Otherwise, it's possible to just get all keys that follow a pattern using KEYS command, and use manual language-specific sorting code. That is of course more involved, depends on language and available libraries, and is therefore less portable.

Related

Best Way to Iterate Through JSON Object Passed to Feature File in Karate

I have a JSON object like {"id1": 123, "id2": 234}, that I pass to a feature file in order to plugin each of the values above for a series of API requests like:
And path `/somePath/${id1}/detail`
And request {"id": "#(id1)", "action": "Reassign"}
My first thought was to use a Scenario Outline, but couldn't figure out how to set the ID's in the Examples table so they were properly read. Then I looked in documentation, and it seems like when I pass a JSON object like this, it should run the scenario for each value automatically. The problem is, not sure how to set the variable, as the key changes each time.
Or maybe there is a much better way to do this I'm not seeing?
In order for it to work your keys need to be the same. In this case you actually have a single JSON object with two properties (id1, id2). Instead what it sounds like you want are two objects:
[
{ "id": 123, "path": "/mydir1/" },
{ "id": 456, "path": "/mydir2/" }
]
Also notice that it has to be a list/array of json objects that you are passing in, not just one blob like you have. And if each object contains a path name, you should add that internal to the object itself.
But, again, notice that each object in the list/array has the same structure, just with different data. You then use the common property names in your script.

Is there a way to add a default to a json schema array

I just want to understand if there is a way to add a default set of values to an array. (I don't think there is.)
So ideally I would like something like how you might imagine the following working. i.e. the fileTypes element defaults to an array of ["jpg", "png"]
"fileTypes": {
"description": "The accepted file types.",
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": ["jpg", "png", "pdf"]
},
"default": ["jpg", "png"]
},
Of course, all that being said... the above actually does seem to be validate as json schema however for example in VS code this default value does not populate like other defaults (like for strings) populate when creating documents.
It appears to be valid based on the spec.
9.2. "default"
There are no restrictions placed on the value of this keyword. When multiple occurrences of this keyword are applicable to a single sub-instance, implementations SHOULD remove duplicates.
This keyword can be used to supply a default JSON value associated with a particular schema. It is RECOMMENDED that a default value be valid against the associated schema.
See https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.9.2
It's up to the tooling to take advantage of that keyword in the JSON Schema and sounds like VS code is not.

Lucene - Custom Analyzer/Parser for JSON objects?

I have a requirement for a very specific Lucene implementation which stores multiple "Properties" fields with deserialized JSON strings.
Example:
Document:
ID: "99"
Text: "Lorepsum Ipsum"
Properties: "{
"lastModified": "1/2/2015",
"user": "johndoe",
"modifiedChars": 2,
"before": "text a",
"after": "text b",
}"
Properties:"{
"lastModified": "1/2/2013",
"user": "johncotton",
"modifiedChars": 6,
"before": "text aa",
"after": "text bbb",
}"
Properties: "{
"lastModified": "1/3/2015",
"user": "johnmajor",
"modifiedChars": 3,
"before": "text aa",
"after": "text b",
}"
I'm aware that ElasticSearch and Solr have implementations to lookup within JSON objects but I'm using Lucene's core API (3.0.5).
My goal is to use lucene's API and with some added implementation to search within the JSON strings, for example:
Building a type of BooleanQuery where at least one "Properties" Field MUST match all the values in the query. (e.g query "+user:tom +modifiedChars:3 +before:"text A", etc)
I have some ideas but I have no clue where to begin. What I'm asking is some high level ideas to achieve such implementation. A custom analyzer maybe to use with a query parser?
Consider it an open ended question. All suggestions are welcome.
If you will always search for the complete set of values...
Create a "property" field for each set. The value would just be the concatenated set of values ie "1/2/2015:johndoe:2:text a:text b".
Alternatively... create a separate doc for each set. This would allow you to search for different combinations of values without conflating the different sets.
Yes that might mean duplicating the Text field. If it's not big then I wouldn't care too much (especially if you're not using a "stored" field).
Do you need to need to combine text and property in your queries? ("text:ipsum AND property:xxx")
If not then put the text in yet another doc.
If the idea is to search in order to get the "ID" field then some combination of the above ought to work

not_indexed field is stored in index

I'm trying to optimize my elasticsearch scheme.
I have a field which is a URL - I do not want to be able to query or filter it, just retreive it.
My understanding is that a field that is defined as "index":"no" is not indexed, but is still stored in the index.
(see slide 5 in http://www.slideshare.net/nitin_stephens/lucene-basics)
This should match to Lucene UnIndexed, right?
This confuses me, is there a way to store some fields, without them taking more storage than simply their content, and without encumbering the index for the other fields?
What am I missing?
I'm new to posting on stack exchange but believe I can help a bit!
There are a few considerations here:
Analyzing
As you don't want to do extra work you should set "index": "no". This will mean the field will not be run through any tokenizers and filters.
Furthermore it will not be searchable when directing a query at the specific field: (no hits)
"query": {
"term": {
"url": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
*here "url" is the field name.
However the field will still be searchable in the _all field: (might have a hit)
"query": {
"term": {
"_all": "http://www.domain.com/exact/url/that/was/sent/to/elasticsearch"
}
}
_all field
By default every field gets put in the _all field. Set "include_in_all": "false" to stop that. This might not be an issue with you as you are unlikely to search against the _all field with a URL by mistake.
I was working with a schema where countries were stored as 2 letter codes, e.g.: "NO" means Norway, and it is possible someone might do a search against the all field with "NO", so I make sure to set "include_in_all": "false".
Note: Any query where you don't specify a field explicitly will be executed against the _all field.
Storing
By default, elasticsearch will store your entire document (unanalyzed, as you sent it) and this will be returned to you in a hit's _source field. If you turn this off (if your elasticsearch db is getting huge perhaps?) then you need to explicitly set "store": "yes" to store fields individually. (One thing to notice is that store takes yes or no and not true or false - it tripped me up!)
Note: if you do this you will need to request the fields you want returned to you explicitly. e.g.:
curl -XGET http://path/index_name/type_name/id?fields=url,another_field
finally...
I would leave elasticsearch to store your whole document (as the default) and use the following mapping.
"type_name": {
"properties": {
"url": {
"type": "string",
"index": "no",
"include_in_all": "false"
},
// other fields' mappings
}
}
Source: elasticsearch documentation
There are two ways to input data into the index. Indexing and Storing. Indexing a piece of data means that it is tokenized, and placed in the inverted index, and can be searched. Storing data means it is not tokenized, or analyzed or anything, and is not added to the inverted index. It is stored in an entirely separate area, in it's full text form. It can not be searched against, but can be retrieved, in it's original form, by it's document ID.
The typical Lucene query process is to query against indexed data, and get the back Document IDs of matching documents, then to use those document IDs to retrieve the stored data for those documents, and display it to the user.
Data which is indexed, but not stored is searchable, but can not be retrieved in it's original form.
Data which is stored, but not indexed can be retrieved once you have found a hit, but is not searchable.
Data which is indexed and stored can be searched or retrieved.
Data which is neither can not be added to the index at all.
This is covered a bit in the Lucene FAQ.
You are looking for the 'index' => 'not_analyzed' mapping option.
Also, if you use the _source, you do not have to specify the store => false option.

ElasticSearch mapping for nested enumerable objects (i18n)

I'm at a loss as to how to map a document for search with the following structure:
{
"_id": "007ff234cb2248",
"ids": {
"source1": "123",
"source2": "456",
"source3": "789"
}
"names": [
{"en":"Example"},
{"fr":"exemple"},
{"es":"ejemplo"},
{"de":"Beispiel"}
],
"children" : [
{
"ids": {
"source1": "CXXIII",
"source2": "CDLVI",
"source3": "DCCLXXXIX",
}
names: [
{"en":"Example Child"},
{"fr":"exemple enfant"},
{"es":"Ejemplo niƱo"},
{"de":"Beispiel Kindes"}
]
}
],
"relatives": {
// Typically no "ids" at this level.
"relation": 'uncle',
"children": [
{
"ids": {
"source1": "0x7B",
"source2": "0x1C8",
"source3": "0x315"
},
"names": [
{"en":"Example Cousin"},
{"fr":"exemple cousine"},
{"es":"Ejemplo primo"},
{"de":"Beispiel Cousin"}
]
}
]
}
}
The child object may appear in the children section directly, or further nested in my document as uncle.children (cousins, in this case). The IDs field is common to levels one (the root), level two (the children and the uncle), and to level three (the cousins), the naming structure is also common to levels one and three.
My use-case is to be able to search for IDs (nested objects) by prefix, and by the whole ID. And also to be able to search for child names, following an (as yet undefined) set of analyzer rules.
I haven't been able to find a way to map these in any useful way. I don't believe I'll have much success using the same technique for ids and names, as there's an extra level of mapping between names and the document root.
I'm not even certain that it is even mappable. I believe at least in principle that the ids should be mappable as terms, and perhaps that if I index the names as terms in some way, too.
I'm simply at a loss, and the documentation doesn't seem to cover anything like this level of complex mapping.
I have limited (read: no) control of the document as it's coming from the CouchDB river, and the upstream application already relies on this format, so I can't really change it.
I'm looking for being able to search by the following pseudo conditions, all of which should match:
ID: "123"
ID by source (I don't know how best to mark this up in pseudo language)
ID prefix: "CDL"
Name: "Example", "Example Child"
Localized name (I don't even know how best to pseudo-mark this up!
The specifics of tokenising and analysis I can figure out for myself, when I at least know how to map
Objects when both the key and the value of the object properties are important
Enumerable objects when the key and value are important.
If the mapping from an ID to its children is 1-to-many, then you could store the children's names in a child field, as a field can have multiple values. Each document would then have an ID field, possibly a relation field, and zero or more child fields.