Turned off the dynamic mapping in Elasticsearch, but the custom mapping still not work? - dynamic

my problem is: I have a JsonObject like this:
{
"success": true,
"type": "message",
"body": {
"_id": "5215bdd32de81e0c0f000005",
"id": "411c79eb-a725-4ad9-9d82-2db54dfc80ee",
"type": "metaModel",
"title": "testchang",
"authorId": "5215bd552de81e0c0f000001",
"drawElems": [
{
"type": "App.draw.metaElem.ModelStartPhase",
"id": "27re7e35-550j",
"x": 60,
"y": 50,
"width": 50,
"height": 50,
"title": "problem engagement",
"isGhost": true,
"pointTo": "e88e2845-37a4-4c45-a030-d02a3c3e03f9",
"bindingId": "90f79d70-0afc-11e3-98d2-83967d2ad9a6",
"model": "meta",
"entityType": "phase",
"domainId": "411c79eb-a725-4ad9-9d82-2db54dfc80ee",
"authorId": "5215bd552de81e0c0f000001",
"userData": {},
"_id": "5215f4c5d89f629c1700000d"
},
{...}
]
}
}
And I tried to define a mapping as follows to index only parts of this object.
String mapping = XContentFactory.jsonBuilder()
.startObject()
.startObject("domaindata").field("dynamic","false")
.startObject("properties")
.startObject("id").field("type","string").field("store","yes").endObject()
.startObject("type").field("type","string").field("store","yes").endObject()
.startObject("title").field("type","integer").field("store","yes").endObject()
.startObject("drawElems")
.startObject("properties")
.startObject("type").field("store","yes").field("type","string").endObject()
.startObject("title").field("store","yes").field("type","string").endObject()
.endObject().endObject().endObject().endObject().endObject().string();
after adding this mapping into my type with:
node.client().admin()
.indices().prepareCreate("test")
.addMapping("domaindata", mapping)
.execute().actionGet();
I still got all of the jsonobject in my indexresponse, it seems that my mapping does not work.
Could anybody help me? Thanks a lot!

The problem here is that using static mapping only means that fields that are not already present in the mapping won't be added to it, thus won't be indexed either. But as they are part of the source document that you sent, they are returned as part of the _source field.
Same goes if you disable a specific object in the mapping ("enable":false) as mentioned here. That object won't be parsed nor indexed, but will still be part of the stored _source field.
If you want to avoid storing part of the _source you can use the source includes/excludes feature as described here.

Related

Azure Data Factory JSON syntax

In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...
In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.

What is equivalent to multiple types in OpenAPI 3.1? anyOf or oneOf?

I want to change multiple types (supported in the latest drafts of JSON Schema so does OpenAPI v3.1) to anyOf, oneOf but I am a bit confused to which the types would be mapped to. Or can I map to any of the two.
PS. I do have knowledge about anyOf, oneOf, etc. but multiple types behavior is a little ambiguous. (I know the schema is invalid but it is just an example that is more focused towards type conversion)
{
"type": ["null", "object", "integer", "string"],
"properties": {
"prop1": {
"type": "string"
},
"prop2": {
"type": "string"
}
},
"enum": [2, 3, 4, 5],
"const": "sample const entry",
"exclusiveMinimum": 1.22,
"exclusiveMaximum": 50,
"maxLength": 10,
"minLength": 2,
"format": "int32"
}
I am converting it this way.
{
"anyOf": [{
"type": "null"
},
{
"type": "object",
"properties": {
"prop1": {
"type": "string"
},
"prop2": {
"type": "string"
}
}
},
{
"type": "integer",
"enum": [2, 3, 4, 5],
"exclusiveMinimum": 1.22,
"exclusiveMaximum": 50,
"format": "int32"
},
{
"type": "string",
"maxLength": 10,
"minLength": 2,
"const": "sample const entry"
}
]
}
anyOf gives you a closer match for the semantics than oneOf;
The problem (or benefit!) of oneOf is that it will fail if you happen to match 2 different cases.
That is unlikely to be what you want, given the source of your conversion which has those looser semantics.
Imagine converting ["integer","number"], for example; if the input was a 1, you'd match both and fail using oneOf.
First of all, your example is not valid:
The initial schema doesn't match anything, it's an "impossible" schema. The "enum": [2, 3, 4, 5] and "const": "sample const entry" constraints are mutually exclusive, and so are "const": "sample const entry" and "maxLength": 10.
The rewritten schema is not equivalent to the original schema because the enum and const were moved from the root level into subschemas. Yes, this way the schema makes more sense and will sort of work (e.g. it will match the specified numbers - but not strings! because of const vs maxLength contradiction), but it's not the same the original schema.
With regard to oneOf/anyOf:
It depends.
The choice between anyOf and oneOf depends on the context, i.e. whether an instance is can match more than one subschema or exactly one subschema. In other words, whether multiple subschema match is considered OK or an error. Nullable references typically need anyOf rather than oneOf, but other cases vary from schema to schema.
For example,
"type": ["number", "integer"]
corresponds to anyOf because there's an overlap - integer values are also valid "number" values in JSON Schema.
Whereas
"type": ["string", "integer"]
can be represented using either oneOf or anyOf. oneOf is semantically closer since strings and integers are totally different data types with no overlap. But technically anyOf also works, it's just there won't be more than one subschema match in this particular case.
In your example, all base type values are distinct with no overlap, so I would use oneOf, but technically anyOf will also work.

How to extract the field from JSON object with QueryRecord

I have been struggling with this problem for a long time. I need to create a new JSON flowfile using QueryRecord by taking an array (field ref) from input JSON field refs and skip the object field as shown in example below:
Input JSON flowfile
{
"name": "name1",
"desc": "full1",
"refs": {
"ref": [
{
"source": "source1",
"url": "url1"
},
{
"source": "source2",
"url": "url2"
}
]
}
}
QueryRecord configuration
JSONTreeReader setup as Infer Schema and JSONRecordSetWriter
select name, description, (array[rpath(refs, '//ref[*]')]) as sources from flowfile
Output JSON (need)
{
"name": "name1",
"desc": "full1",
"references": [
{
"source": "source1",
"url": "url1"
},
{
"source": "source2",
"url": "url2"
}
]
}
But got error:
QueryRecord Failed to write MapRecord[{references=[Ljava.lang.Object;#27fd935f, description=full1, name=name1}] with schema ["name" : "STRING", "description" : "STRING", "references" : "ARRAY[STRING]"] as a JSON Object due to java.lang.ClassCastException: null
Try the following approach, in your case it shoud work:
1) Read your JSON field fully (I imitated it with GenerateFlowFile processor with your example)
2) Add EvaluateJsonPath processor which will put 2 header fileds (name, desc) into the attributes:
3) Add SplitJson processor which will split your JSON byt refs/ref/ groups (split by "$.refs.ref"):
4) Add ReplaceText processor which will add you header fields (name, desc) to the split lines (replace "[{]" value with "{"name":"${json.name}","desc":"${json.desc}","):
5) It`s done:
Full process in my demo case:
Hope this helps.
Solution!: use JoltTransformJSON to transform JSON by Jolt specification. About this specification.

How to use fit property of projection?

I assigned a geojson object to the fit property of the projection as described in the documentation: https://vega.github.io/vega/docs/projections/
I am always getting the error "Unsupported parameter object: {"type": "FeatureCollection"..."
I am assigning the geojson object as following: (my source data is topojson format)
spec.projections[0].fit = topojson.feature(
mapData,
mapData.objects.topology
);
The documentation explicitly says that this parameter should be a GeoJSON Feature or FeatureCollection. How am I supposed to use the fit property?
You have to use a reference to a data object.
"data": [
{
"name": "counties",
"url": "data/us-10m.json",
"format": {"type": "topojson", "feature": "counties" }
}
],
"projections": [
{
"name": "projection",
"type": "mercator",
"fit": {"signal": "data('counties')"},
"size": {"signal": "[width, height]"}
}
]
Here is an example from VEGA that you can test in their edit.
https://github.com/vega/vega/blob/master/test/specs-valid/map-fit.vg.json
Also notice the "size" property. The fit does not get reflected without it.

Freebase search_api and excluding results by specified type

is anyone know, how to exclude some topics with specified type(s) using search api and mql?
For example i'm try to find all topics "Voodoo People", and exclude only those, that have composition and release types, and sort result by score desc: http://tinyurl.com/3tjkb7y.
Sorting work perfect, but i can't find functionality for excluding :(
I'm try to use mql_filter: http://tinyurl.com/644xkow, but releases still there.
And one more question: i see in type_strict param possible values: "all", "any", "should". But there is no value "not" or "not in". Is needed result can be obtained in any other way?
The syntax that you're looking for is "optional" : "forbidden". In your query that would look like this:
[{
"search": {
"query": "Voodoo People",
"score": null,
"mql_filter": [{
"type": {
"id": "/music/release",
"optional": "forbidden"
}
}]
},
"name": null,
"id": null,
"type": [],
"/common/topic/notable_for": {
},
"limit": 15,
"sort": "-search.score"
}]​