Hi I am trying to understand json index vs text index in Cloudant. Now I know using
{ "index": {}, "type": "text" }
Will make the entire document searchable. But what is the difference between say,
{
"index": {
"fields": [
"title"
]
},
"type": "json"
}
and
{
"index": {
"fields": [
{
"name": "title",
"type": "string"
}
]
},
"name": "title-text",
"type": "text"
}
Thanks.
the json type:
leverages the Map phase of MapReduce
will build and query faster than a text type for a fixed key
no bookmark field
cannot use combination or array logical operators such as $regex as the basis of a query
only equality operators such as $eq, $gt, $gte, $lt, and $lte (but not $ne) can be used as the basis of a query
might end up doing more work in memory for complex queries
sorting fields must be indexed
the text type:
leverages a Lucene search index
permits indexing all fields in documents automatically with a single simple command
provides more flexibility to perform adhoc queries and sort across multiple keys
permits you to use any operator as a basis for query in a selector
type (:string, :number) sometimes need to be appended to sort field
from: https://docs.cloudant.com/cloudant_query.html
If you know exactly what data you want to look for, or you want to
keep storage and processing requirements to a minimum, you can specify
how the index is created, by making it of type json.
But for maximum possible flexibility when looking for data, you would
typically create an index of type text.
additional information:
https://developer.ibm.com/clouddataservices/docs/cloudant/get-started/use-cloudant-query/
Related
I have JSON data(see the ex below) which I'm storing in Redis list using 'rpush' with a key as 'person'.
Ex data:
[
{ "name": "john", "age": 30, "role": "developer" },
{ "name": "smith", "age": 45, "role": "manager" },
{ "name": "ram", "age": 35, "role": "tester" },
]
Now when I get this data using lrange person 0 -1, it gives me results as '[object object]'.
So, to actually get them with property names I'm storing them by stringifying them and parsing them back to objects to use the object properties.
But the issue with converting to a string is that I'm not able to sort them using any property, say name, age or role.
My question is, how do I store this JSON in Redis list and sort them using any of the properties.
Thanks.
Very recently I posted an answer for a very similar question.
The easiest approach is to use Redis Search module (which makese the approach portable to many clients / languages):
Store each needed object as separate key, following a prefixed key pattern (keys named prefix:something), and standard schema (all user keys are JSON, and all contain the field you want to sort).
Make a search index with FT.CREATE, with ON JSON parameter to search JSON-type keys, and likely PREFIX parameter to search just the needed keys, as well as x AS y parameters for all needed search fields, where x is field name, and y is type of field (TEXT, TAG, NUMERIC, etc. -- see documentation), optionally adding SORTABLE to the fields if they need to be sorted.
Use FT.SEARCH command with any combination of "#field=value" search parameters, and optionally SORTBY.
Otherwise, it's possible to just get all keys that follow a pattern using KEYS command, and use manual language-specific sorting code. That is of course more involved, depends on language and available libraries, and is therefore less portable.
I just want to understand if there is a way to add a default set of values to an array. (I don't think there is.)
So ideally I would like something like how you might imagine the following working. i.e. the fileTypes element defaults to an array of ["jpg", "png"]
"fileTypes": {
"description": "The accepted file types.",
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": ["jpg", "png", "pdf"]
},
"default": ["jpg", "png"]
},
Of course, all that being said... the above actually does seem to be validate as json schema however for example in VS code this default value does not populate like other defaults (like for strings) populate when creating documents.
It appears to be valid based on the spec.
9.2. "default"
There are no restrictions placed on the value of this keyword. When multiple occurrences of this keyword are applicable to a single sub-instance, implementations SHOULD remove duplicates.
This keyword can be used to supply a default JSON value associated with a particular schema. It is RECOMMENDED that a default value be valid against the associated schema.
See https://json-schema.org/draft/2020-12/json-schema-validation.html#rfc.section.9.2
It's up to the tooling to take advantage of that keyword in the JSON Schema and sounds like VS code is not.
can I have common type restrictions or a new type, which I could use for more properties in JSON scema? I am referencing some type properties, but I do not get what I would like to. For instance:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "Common types",
"definitions": {
"YN": {
"description": "Y or N field (can be empty, too)",
"type": "string",
"minLength": 0,
"maxLength": 1,
"enum": [ "Y", "N", "" ]
},
"HHMM": {
"description": "Time in HHMM format (or empty).",
"type": "string",
"minLength": 0,
"maxLength": 4,
"pattern": "^[0-2][0-9][0-5][0-9]$|^$"
}
},
"properties" : {
"is_registered": {
"description": "User registered. (this description is overriden)",
"$ref": "#/definitions/YN"
},
"is_valid": {
"description": "User valid. (this description is overriden)",
"$ref": "#/definitions/YN"
},
"timeofday": {
"description": "User registered at HHMM. (this description is overriden)",
"$ref": "#/definitions/HHMM"
}
}
}
In the presented schema I have two strings with some restrictions (enum, pattern, etc.). I do not want to repeat these restrictions in every field of such type. Therefore, I have defined them in definitions and reused them. If type constraints changes, I change only the definitions.
However, there are two issues I have.
First, description is duplicated. If I load this into XMLSpy, only the description of the type is shown and not the description of the actual field. If the description of the type is empty, description of field is not used. I tried combining title, and description in a way that title would be used from common definition and description would be used from field description. It seems that always title and description are used from the common type definition. How could I use common type and the description of the field, which tells, what this field actually is.
Second, if description is inherited from definitions, can I just use common pattern or any other type property, and reference the pattern defined somehow in definitions or somewhere else?
In order to answer your question, first consider that JSON Schema does not have inheritance, only references.
For draft-04, using references mean that the WHOLE subschema object is replaced by the referenced schema object. This means your more specific field description is lost. (You can wrap them in allOf's but it probably won't do what you want in terms of generating documentation).
If you can move to draft 2019-09, $ref can be used alongside other keywords, as it is then classified as an applicator keyword. I don't know if the tooling your using will work as you expect though.
In terms of your second question, not built in to JSON Schema. $ref references can only be used when you have a schema (or subschema). If you want to de-duplicate common parts which are NOT schemas, a common approach is to use a templating engine and compile your schemas at build or runtime. I've seen this done at scale using jsonnet.
For {keyA:valueA},{KeyB:valueB} Is it possible to define in the schema, valueB must equal to valueA. In other words, copying down ValueA to ValueB?
I understand it causes duplication. But two different keys must be used to meet different standards.
For example, I want to use name as sample name in the schema below.
Schema
{
"$id": "sampleSchema",
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"sample name":{
"type":"string"
},
}
}
The data will be like:
{
"name":"example1",
"sample name":"example1"
}
JSON Schema does not support operations like this.
We call this "data consistency validation" because it tests that data in one place is consistent with how it's defined in another location.
Supporting these types of operations would be very difficult. It would probably require a general purpose programming language to support most of the cases that people would like to see.
For more information, see Scope of JSON Schema Validation.
As an alternative, some validators allow you to implement custom keywords, or implement events or hooks when an instance is being validated against a schema with a particular ID. You can use this to implement the functionality you're looking for.
I'm new to JSON.
I see in various examples of JSON, like the following, where complex values are prefixed with "type":"object", properties { }
{
"$schema": "http://json-schema.org/draft-06/schema#",
"motor" : {
"type" : "object",
"properties" : {
"class" : "string",
"voltage" : "number",
"amperage" : "number"
}
}
}
I have written JSON without type, object, and properties, like the following.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"motor" : {
"class" : "string",
"voltage" : "number",
"amperage" : "number"
}
}
and submitted to an on-line JSON schema validator with no errors.
What is the purpose of type:object, properties { }? Is it optional?
Yes it is optional, try removing it and use your validator.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"foo": "bar"
}
You actually don't even need to use the $schema keyword i.e. {} is valid json
I would start by understanding what json is, https://www.json.org/ is the best place to start but you may prefer something easier to read like https://www.w3schools.com/js/js_json_intro.asp.
A schema is just a template (or definition) to make sure you're producing valid json for the consumer
As an example let's say you have an application that parses some json and looks for a key named test_score and saves the value (the score) in a database in some table/column. For this example we'll call the table tests and the column score. Since a database column requires a type we'll choose a numeric type, i.e. integer for our score column.
A valid json example for this may look like
{
"test_score": 100
}
Following this example the application would parse the key test_score and save the value 100 to the tests.score database table/column.
But let's say a score is absent so you put in a string i.e "NA"
{
"test_score": "NA"
}
When the application attempts to save NA to the database it will error because NA is a string not an integer which the database expects.
If you put each of those examples into any online json validator they are valid json example. However, while it's valid json to use "NA" or 100 it is not valid for the actual application that needs to consume the json.
So now you may understand that the author of the json may wonder
What are the different valid types I can use as values for my test
score?
The responsibility then falls on the writers of the application to provide some sort of definition (i.e a schema) that the clients (authors) can reference so the author knows exactly how to structure the json so the application can process it accordingly. Having a schema also allows you to validate/test your json so you know it can be processed by the application without actually having to send your json through the application.
So putting it altogether let's say in the schema you see
"$test_score": {
"type": "integer",
"format": "tinyint"
},
The writer of the json now knows that they must pass an integer and the range is 0 to 255 because it's a tinyint. They no longer have to trial by error different values and see which ones the application process. This is a big benefit to having a schema.