Rename a field within an array in DocumentDB via pymongo - mongodb-query

I am using docudmentDB(version 3.6) in AWS. I use a python lambda in AWS for this task. The task is to rename a field that is in an array. Here is the sample JSON document I have. Here I need to rename 'version' to 'label'.
{
"_id": "93ee62b2-4a1f-478f-a716-b4e2f435f27d",
"egrouping": [
{
"type": "Video",
"language_code": "eng",
"hierarchy": {
"pype": {
"parent": "episode",
"version": "1",
"uuid": "933433-4a1f-478f-a716-b4e2f435f27d"
}
}
},
{
"type": "Captions",
"language_code": "eng",
"hierarchy": {
"pype": {
"parent": "episode",
"version": "1",
"uuid": "943454-4a1f-478f-a716-b4e2f435f27d"
}
}
}
]
}
The following code snippet I tried to rename the 'version' field which is in an error:
collection.aggregate([
{
'$project': {
'egrouping': {
'$map': {
"input": "$egrouping",
'as': "egroup",
'in': {
"hierarchy.pype.label": "$$egroup.hierarchy.pype.version"
}
}
}
}
}
])
But I end up with this error:
"errorMessage": "Aggregation project operator not supported: '$map'",

Amazon DocumentDB does not support $map. For the complete list of APIs that DocumentDB supports, refer to https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html.
We are constantly working backwards from the APIs our customers are looking to use. You can keep an eye on our future launches here https://aws.amazon.com/documentdb/resources/

The $map operator is now supported, in both Amazon DocumentDB versions, 3.6 and 4.0.

Related

JSON Schema v7: formatMinimum & formatMaximum validate everything

I am using ajv json schema library (v7) and trying to validate a date based on some value. It looks pretty straightforward with using formatMinimum/formatMaximum but it seems that every date is validated when using these keywords
Here's my schema
"some-date": {
"type": "object",
"properties": {
"data": {
"type": "object",
"properties": {
"value": {
"type": "string",
"format": "date-time",
"formatMinimum": "2021-03-10T14:25:00.000Z"
}
}
}
}
}
Here's the json:
{
"some-date": {
"data": {
"value": "2011-03-10T14:25:00.000Z"
}
}
}
Here's how I am validating:
const ajv = new Ajv({allErrors: true})
require('ajv-formats')(ajv)
require('ajv-errors')(ajv)
require('ajv-keywords')(ajv)
const validate = ajv.validate(mySchema)
const isValid = validate(myJSON)
I've tried it on JSONSchemalint and it validates the above json with the given schema. Also, I have tried with several dates and it validates everything.
Please let me know if I am missing something.
Thanks
I'm not sure where you're getting formatMinimum and formatMaximum from, but they are not standard keywords in the JSON Schema specification, under any version. Are they documented as supported keywords in the implementation that you are using?

Importing Data to Contentful programatically from a json file

I am trying to import some data programatically into contentful:
I am following the docs here
And running the command inside my integrated terminal
contentful space import --config config.json
Where the config file is
{
"spaceId": "abc123",
"managementToken": "112323132321adfWWExample",
"contentFile": "./dataToImport.json"
}
And the dataToImport.json file is
{
"data": [
{
"address": "11234 New York City"
},
{
"address": "1212 New York City"
}
]
}
The thing is I don't understand what format my dataToImport.json should be and what is missing inside this file or in my config file so that the array of addresses from the .json file get added as new entries to an already created content model inside the Contentful UI show in the screenshot below
I am not specifying the content model for the data to go into so I believe that is one issue, and I don't know how I do that. An example or repo would help me out greatly
The types of data you can import are listed : in their documentation
your json top level should say "entries" and not data, if new content of a content type is what you would like to import.
This is an example of a blog post as per content model of the tutorial they provide.
The only thing i didn't work out yet is where the user id is :D so i substituted for one of the content type 'person' also provided in their tutorial (I think it's called Gatsby Starter)
{"entries": [
{
"sys": {
"space": {
"sys": {
"type": "Link",
"linkType": "Space",
"id": "theSpaceIdToReceiveYourImport"
}
},
"type": "Entry",
"createdAt": "2019-04-17T00:56:24.722Z",
"updatedAt": "2019-04-27T09:11:56.769Z",
"environment": {
"sys": {
"id": "master",
"type": "Link",
"linkType": "Environment"
}
},
"publishedVersion": 149, -- these are not compulsory, you can skip
"publishedAt": "2019-04-27T09:11:56.769Z", -- you can skip
"firstPublishedAt": "2019-04-17T00:56:28.525Z", -- you can skip
"publishedCounter": 3, -- you can skip
"version": 150,
"publishedBy": { -- this is an example of a linked content
"sys": {
"type": "Link",
"linkType": "person",
"id": "personId"
}
},
"contentType": {
"sys": {
"type": "Link",
"linkType": "ContentType",
"id": "blogPost" -- here should be your content type 'RealtorProperties'
}
}
},
"fields": { -- here should go your content type fields, i can't see it in your post
"title": {
"en-US": "Test 1"
},
"slug": {
"en-US": "Test-1"
},
"description": {
"en-US": "some description"
},
"body": {
"en-US": "some body..."
},
"publishDate": {
"en-US": "2016-12-19"
},
"heroImage": { -- another example of a linked content
"en-US": {
"sys": {
"type": "Link",
"linkType": "Asset",
"id": "idOfTHisImage"
}
}
}
}
},
--another entry, ...]}
Have a look at this repo. I am also trying to figure this out. Looks like there's quite a lot of fields that need to be included in the json file. I was hoping there'd be a simple solution but it seems you (me too actually) will need to create scripts to "convert" your json file to data contentful can read and import.
I'll let you know if I find anything better.

Is there a way to get Step Functions input values into EMR step Args

We are running batch spark jobs using AWS EMR clusters. Those jobs run periodically and we would like to orchestrate those via AWS Step Functions.
As of November 2019 Step Functions has support for EMR natively. When adding a Step to the cluster we can use the following config:
"Some Step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--class",
"com.some.package.Class",
"JarUri",
"--startDate",
"$.time",
"--daysToLookBack",
"$.daysToLookBack"
]
}
}
},
"Retry" : [
{
"ErrorEquals": [ "States.ALL" ],
"IntervalSeconds": 1,
"MaxAttempts": 1,
"BackoffRate": 2.0
}
],
"ResultPath": "$.firstStep",
"End": true
}
Within the Args List of the HadoopJarStep we would like to set arguments dynamically. e.g. if the input of the state machine execution is:
{
"time": "2020-01-08",
"daysToLookBack": 2
}
The strings in the config starting with "$." should be replaced accordingly when executing the State Machine, and the step on the EMR cluster should run command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate 2020-01-08 --daysToLookBack 2. But instead it runs command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate $.time --daysToLookBack $.daysToLookBack.
Does anyone know if there is a way to do this?
Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray".
So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state you can either call a Lambda function and format your input/output through code or for something as simple as adding a dynamic value to an array you can use a Pass State to reformat the data and then inside your task State Parameters you can use JSONPath to get the array which you defined in in the pre-processor. Here's an example:
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "HardCodedInputs",
"States": {
"HardCodedInputs": {
"Type": "Pass",
"Parameters": {
"cluster": {
"ClusterId": "ValueForClusterIdVariable"
},
"time": "ValueForTimeVariable",
"daysToLookBack": "ValueFordaysToLookBackVariable"
},
"Next": "Pre-Process"
},
"Pre-Process": {
"Type": "Pass",
"Parameters": {
"FormattedInputsForEmr": {
"ClusterId.$": "$.cluster.ClusterId",
"Args": [
{
"Arg1": "spark-submit"
},
{
"Arg2": "--class"
},
{
"Arg3": "com.some.package.Class"
},
{
"Arg4": "JarUri"
},
{
"Arg5": "--startDate"
},
{
"Arg6.$": "$.time"
},
{
"Arg7": "--daysToLookBack"
},
{
"Arg8.$": "$.daysToLookBack"
}
]
}
},
"Next": "Some Step"
},
"Some Step": {
"Type": "Pass",
"Parameters": {
"ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.FormattedInputsForEmr.Args[*][*]"
}
}
},
"End": true
}
}
}
You can use the States.Array() intrinsic function. Your Parameters becomes:
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--class', 'com.some.package.Class', 'JarUri', '--startDate', $.time, '--daysToLookBack', '$.daysToLookBack')"
}
}
}
Intrinsic functions are documented here but I don't think it explains the usage very well. The code snippets provided in the Step Functions console are more useful.
Note that you can also do string formatting on the args using States.Format(). For example, you could construct a path using an input variable as the final path segment:
"Args.$": "States.Array('mycommand', '--path', States.Format('my/base/path/{}', $.someInputVariable))"

Why ElasticSearch is not able to search when special characters are available?

I have an ElasticSearch index with below configuration:
{
"my_ind": {
"settings": {
"index": {
"mapping": {
"total_fields": {
"limit": "10000000"
}
},
"number_of_shards": "3",
"provided_name": "my_ind",
"creation_date": "1539773409246",
"analysis": {
"analyzer": {
"default": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "1",
"uuid": "3wC7i-E_Q9mSDjnTN2gxrg",
"version": {
"created": "5061299"
}
}
}
}
}
I want to search below content with plain search:
DL-1234170386456
This contents are available in the below field:
DNumber
This filed has mapping like below:
{
"DNumber": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
I am trying to implement it in JAVA language. I came across the ElasticSearch Analyzers and Tokenizers so I made use of "whitespace" tokenizer.
I am trying to search with below query:
{
"query": {
"multi_match": {
"query": "DL-1234170386456",
"fields": [
"_all"
],
"type": "best_fields",
"operator": "OR",
"analyzer": "default",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
}
What wrong I am doing?
After doing lot of research and Trial & Error, found out the answer!
Some basic but important points:
We need to specify Analyzers and Tokenizers while creating/indexing the index/data.
In specified string i.e. "DL-1234170386456", special character (i.e. "-") is available and ElasticSearch is using by default Standard Analyzer.
Standard Analyzer contains Standard Tokenizer which is based on the Unicode Text Segmentation algorithm.
Actual Problem:
ElasticSearch is separating the String ("DL-1234170386456") into two different parts like "DL" and "1234170386456".
Solution:
We need to specify Whitespace Analyzer which contains Whitespace Tokenizer.
It will split the word whenever space is encountered. So, String ("DL-1234170386456") will kept as it is by ElasticSearch and we are able to find it out.

How to specify an analyzer while creating an index in ElasticSearch

I'd like to specify an analyzer, name it, and use that name in a mapping while creating an index. I'm lost, my ES instance always returns me an error message.
This is, roughly, what I'd like to do:
"settings": {
"mappings": {
"alfedoc": {
"properties": {
"id": { "type": "string" },
"alfefield": { "type": "string", "analyzer": "alfeanalyzer" }
}
}
},
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern":"\\s+"
}
}
}
}
But this does not seem to work; the ES instance always returns me an error like
MapperParsingException[mapping [alfedoc]]; nested: MapperParsingException[Analyzer [alfeanalyzer] not found for field [alfefield]];
I tried putting the "analysis" branch of the dictionary at several places (inside the mapping etc.) but to no avail. I guess a working complete example (which I couldn't find up to now) would help me along as well. Probably I'm missing something rather basic.
"analysis" goes in the "settings" block, which goes either before or after the "mappings" block when creating an index.
"settings": {
"analysis": {
"analyzer": {
"alfeanalyzer": {
"type": "pattern",
"pattern": "\\s+"
}
}
}
},
"mappings": {
"alfedoc": { ... }
}
Here's a good complete, example: Example 1