How to load Nested json data into a single column in druid

How to load Nested json data into a single column in druid - apache

I am trying to load nested json data in Apache druid:
Data-->
{
"a": "a_data",
"b": "b_data",
"c_blob_Column": {"aaaa"{"k":"sample"{"c":"sample2"}}}}
Spec -->
{ "type" : "kafka", "dataSchema" : { "dataSource" : "blob", "parser" : { "type" : "string", "parseSpec" : { "format" : "json", "dimensionsSpec" : { "dimensions" : [ "a", "b", "c_blob_Column"
]
},
"timestampSpec": {
"column": "timestamp",
"format": "iso"
}
}
},
"metricsSpec" : [],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "DAY",
"queryGranularity" : "none",
"rollup" : false
}
},
"ioConfig" : {
"topic":"blob_topic",
"consumerProperties":{
"bootstrap.servers":"<local server>"
},
"appendToExisting" : false,
"useEarliestOffset": true,
"taskDuration": "PT15M"
},
"tuningConfig" : {
"type" : "kafka",
"maxRowsPerSegment" : 5000000,
"maxRowsInMemory" : 25000
}
}
Output columns-->
a,b,c_blob_Column,__time
I am able to load the data but the issue is in the column c_blob_Column the data is not coming as in json form data Could someone please help me to find how to load the json blob data?

you can use jq expression:
"flattenSpec": {
"fields": [
{
"type": "jq",
"name": "c_blob_Column",
"expr": ".c_blob_Column | tojson"
}
]
}

Related

How do i change MongoDB JSON data to array

I need to update the MongoDB field with the array of objects where JSON object to be updated with as an array
if I have something like this in MongoDB
"designSectionContents" : [
{
"_id" : "5bae17ecbd7595540145ec98",
"type" : "subSection",
"columns" : [
{
"0" : {
"itemId" : "5b7465980783d9a37058f160",
"type" : "field"
}
},
{
"0" : {
"itemId" : "5b7465630783d9a37058f15c",
"type" : "field"
}
},
{
"0" : {
"itemId" : "5b7465810783d9a37058f15e",
"type" : "field"
}
}
],
"subSectionContentLayout" : {
"labelPlacement" : "Top",
"columns" : 3
}
}
]
I want to change the above snippet to below in MongoDB
"designSectionContents" : [
{
"_id" : ObjectId("5bae17ecbd7595540145ec98"),
"type" : "subSection",
"columns" : [
[
{
"itemId" : "5b7465980783d9a37058f160",
"type" : "field"
}
],
[
{
"itemId" : "5b7465630783d9a37058f15c",
"type" : "field"
}
],
[
{
"itemId" : "5b7465810783d9a37058f15e",
"type" : "field"
}
]
]
}
]
curly braces opening and closing tag has to be changed to array

This should work:
db.collection.aggregate([
{
"$project": {
"designSectionContents": {
"$map": {
"input": "$designSectionContents",
"as": "designSectionContent",
"in": {
"_id": "$$designSectionContent._id",
"type": "$$designSectionContent.type",
"columns": {
"$map": {
"input": "$$designSectionContent.columns",
"as": "inp",
"in": [
"$$inp.0"
]
}
}
}
}
}
}
}
]);
Here's the working link.

mapper_parsing_exception in Elasticsearch(Reason: No type specified for field [X])

I wanted to provide explicit mapping to the fields in my document, So I defined a mapping for my index demo and It looks like this below:
PUT /demo
{
"mappings": {
"properties": {
"X" : {
"X" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"Sub_X" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
After running the query , I am getting error as :
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "No type specified for field [X]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Failed to parse mapping [_doc]: No type specified for field [X]",
"caused_by" : {
"type" : "mapper_parsing_exception",
"reason" : "No type specified for field [X]"
}
},
"status" : 400
}
The field X in json document looks like :
"X" : {
"X" : [
"a"
],
"Sub_X" : [
[
"b"
]
]
},
Please help me out with this elastic search mapper_parse_exception error.

What you have is called nested data type
You have X which in turn contains X and Sub_X.
Mapping:
{
"properties": {
"X": {
"type": "nested"
}
}
}
Data:
{
"X": {
"X": [
"a"
],
"Sub_X": [
[
"b"
]
]
}
}
Query:
{
"query": {
"nested": {
"path": "X",
"query": {
"bool": {
"must": [
{ "match": { "X.X": "a" }},
{ "match": { "X.Sub_X": "b" }}
]
}
}
}
}
}
It outputs the document.

Setting API Key required to true using serverless.template in AWS API Gateway

I am deploying an ASP.Net Core project on AWS Lambda and I am struggling with making the API Key required.
Here is my Json template:
{
"AWSTemplateFormatVersion" : "2010-09-09",
"Transform" : "AWS::Serverless-2016-10-31",
"Description" : "An AWS Serverless Application that uses the ASP.NET Core framework running in Amazon Lambda.",
"Parameters" : {
"ShouldCreateBucket" : {
"Type" : "String",
"AllowedValues" : ["true", "false"],
"Description" : "If true then the S3 bucket that will be proxied will be created with the CloudFormation stack."
},
"BucketName" : {
"Type" : "String",
"Description" : "Name of S3 bucket that will be proxied. If left blank a name will be generated.",
"MinLength" : "0"
}
},
"Conditions" : {
"CreateS3Bucket" : {"Fn::Equals" : [{"Ref" : "ShouldCreateBucket"}, "true"]},
"BucketNameGenerated" : {"Fn::Equals" : [{"Ref" : "BucketName"}, ""]}
},
"Resources" : {
"AspNetCoreFunction" : {
"Type" : "AWS::Serverless::Function",
"Properties": {
"Handler": "SmartClockAPI::SmartClockAPI.LambdaEntryPoint::FunctionHandlerAsync",
"Runtime": "dotnetcore2.1",
"CodeUri": "",
"MemorySize": 256,
"Timeout": 30,
"Role": null,
"Policies": [ "AWSLambdaFullAccess","AmazonCognitoPowerUser","AmazonAPIGatewayAdministrator"],
"Environment" : {
"Variables" : {
"AppS3Bucket" : { "Fn::If" : ["CreateS3Bucket", {"Ref":"Bucket"}, { "Ref" : "BucketName" } ] }
}
},
"Events": {
"PutResource": {
"Type": "Api",
"Properties": {
"Path": "/{proxy+}",
"Method": "ANY"
}
}
}
}
},
"BasicUsagePlan" : {
"Type" : "AWS::ApiGateway::UsagePlan",
"Properties" : {
"UsagePlanName" : "Basic plan",
"Quota" : {
"Limit" : 100,
"Period" : "MONTH"
},
"Throttle" : {
"RateLimit" : 10,
"BurstLimit" : 10
}
}
},
"Bucket" : {
"Type" : "AWS::S3::Bucket",
"Condition" : "CreateS3Bucket",
"Properties" : {
"BucketName" : { "Fn::If" : ["BucketNameGenerated", {"Ref" : "AWS::NoValue" }, { "Ref" : "BucketName" } ] }
}
}
},
"Outputs" : {
"ApiURL" : {
"Description" : "API endpoint URL for Prod environment",
"Value" : { "Fn::Sub" : "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/" }
},
"S3ProxyBucket" : {
"Value" : { "Fn::If" : ["CreateS3Bucket", {"Ref":"Bucket"}, { "Ref" : "BucketName" } ] }
}
}
}
What I am trying to achieve is setting this value to true from the Json template.
I was expecting some extra property for Proxy where I can specify this value.
Any ideas?

Use AWS::ApiGateway::UsagePlanKey
{
"Type" : "AWS::ApiGateway::UsagePlanKey",
"Properties" : {
"KeyId" : String,
"KeyType" : String,
"UsagePlanId" : String
}
}
Add a DependsOn to ApiKey, ApiUsagePlan, and ApiUsagePlanKey to ensure they were created in the correct order. This is a nice working example

ELasticsearch Post bulk on elastic xpack role

I have an Elastic cluster with xpack enable.
I'd like to make a backup of all xpack roles created :
GET _xpack/security/role
=> I get a big JSON, ex :
{
"kibana_dashboard_only_user": {
"cluster": [],
"indices": [
{
"names": [
".kibana*"
],
"privileges": [
"read",
"view_index_metadata"
]
}
],
"run_as": [],
"metadata": {
"_reserved": true
},
"transient_metadata": {
"enabled": true
}
},
"watcher_admin": {
"cluster": [
"manage_watcher"
],
"indices": [
{
"names": [
".watches",
".triggered_watches",
".watcher-history-*"
],
"privileges": [
"read"
]
}
],
"run_as": [],
"metadata": {
"_reserved": true
},
"transient_metadata": {
"enabled": true
}
},
....
}
And now I'd like to put it back in the cluster (or another). I cannot just PUT it to _xpack/security/role. If i understand correctly I have to use bulk :
$ curl --user elastic:password https://elastic:9200/_xpack/security/_bulk?pretty -XPOST -H 'Content-Type: application/json' -d '
{"index":{"_index": "_xpack/security/role"}}
{"ROOOOLE" : {"cluster" : [ ],"indices" : [{"names" : [".kibana*"],"privileges" : ["read","view_index_metadata"]}],"run_as" : [ ],"metadata" : {"_reserved" : true},"transient_metadata" : {"enabled" : true}}}
'
But I get an error:
{
"took" : 3,
"errors" : true,
"items" : [
{
"index" : {
"_index" : "_xpack/security/role",
"_type" : "security",
"_id" : null,
"status" : 400,
"error" : {
"type" : "invalid_index_name_exception",
"reason" : "Invalid index name [_xpack/security/role], must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]",
"index_uuid" : "_na_",
"index" : "_xpack/security/role"
}
}
}
]
}
Is there a way to do this easily? Or do I have to parse the JSON, and put each role one by one to:
_xpack/security/role/rolexxx
_xpack/security/role/roleyyy
...
More globally, is there a way to get all data of an index (config index), then upload it back or put it into another cluster?

Morphline config file not indexing avro nexted data

I am generating index for my avro data in solr. Index are only getting generated for data elements which are at root level and not which are nested.
Below is the sample schema (not including all of it)
My Avro Schema is as below.
{
"type" : "record",
"name" : "abcd",
"namespace" : "xyz",
"doc" : "Schema Definition for Low Fare Search Shopping Request/Response Data",
"fields" : [ {
"name" : "ShopID",
"type" : "string"
}, {
"name" : "RqSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RqTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsSysTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "RsTimestamp",
"type" : [ "null", "string" ],
"default" : null
}, {
"name" : "Request",
"type" : {
"type" : "record",
"name" : "RequestStruct",
"fields" : [ {
"name" : "TransactionID",
"type" : [ "string", "null" ]
}, {
"name" : "AgentSine",
"type" : [ "string", "null" ]
}, {
"name" : "CabinPref",
"type" : [ {
"type" : "array",
"items" : {
"type" : "record",
"name" : "CabinStruct",
"fields" : [ {
"name" : "Cabin",
"type" : [ "string", "null" ]
}, {
"name" : "PrefLevel",
"type" : [ "string", "null" ]
} ]
}
}, "null" ]
}, {
"name" : "CountryCode",
"type" : [ "string", "null" ]
},
"name" : "PassengerStatus",
"type" : [ "string", "null" ]
}, {
}
How do i refer "TransactionID" in my morphline config file. I tried all options but it does not generate index for data elements which are nested.
Below is the sample of my morphline config file.
extractAvroPaths {
flatten : true
paths : {
ShopID : /ShopID
RqSysTimestamp : /RqSysTimestamp
RqTimestamp : /RqTimestamp
RsSysTimestamp :/RsSysTimestamp
RsTimestamp : /RsTimestamp
TransactionID : "/Request/RequestStruct/TransactionID"
AgentSine : "/Request/RequestStruct/AgentSine"
Cabin :/Cabin
PrefLevel :/PrefLevel
CountryCode :/CountryCode
FrequentFlyerStatus :/FrequentFlyerStatus

The toAvro command expects a java.util.Map as input on conversion to a nested Avro record. So this is my solution.
morphlines: [
{
id: convertJsonToAvro
importCommands: [ "org.kitesdk.**" ]
commands: [
# read the JSON blob
{ readJson: {} }
# java code
{
java {
imports : """
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.kitesdk.morphline.base.Fields;
import java.io.IOException;
import java.util.Set;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
"""
code : """
String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString();
ObjectMapper mapper = new ObjectMapper();
Map<String, Object> map = null;
try {
map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class);
} catch (IOException e) {
e.printStackTrace();
}
Set<String> keySet = map.keySet();
for (String o : keySet) {
record.put(o, map.get(o));
}
return child.process(record);
"""
}
}
# convert the extracted fields to an avro object
# described by the schema in this field
{ toAvro {
schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc
} }
#{ logInfo { format : "loginfo: {}", args : ["#{}"] } }
# serialize the object as avro
{ writeAvroToByteArray: {
format: containerlessBinary
} }
]
}
]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to load Nested json data into a single column in druid - apache

you can use jq expression: "flattenSpec": { "fields": [ { "type": "jq", "name": "c_blob_Column", "expr": ".c_blob_Column | tojson" } ] }

Related

How do i change MongoDB JSON data to array

mapper_parsing_exception in Elasticsearch(Reason: No type specified for field [X])

Setting API Key required to true using serverless.template in AWS API Gateway

ELasticsearch Post bulk on elastic xpack role

Morphline config file not indexing avro nexted data

Categories

Resources