MultiLevel JSON in PIG - apache-pig

I am new to PIG scripting and working with JSONs. I am in the need of parsing multi-level json files in PIG. Say,
{
"firstName": "John",
"lastName" : "Smith",
"age" : 25,
"address" :
{
"streetAddress": "21 2nd Street",
"city" : "New York",
"state" : "NY",
"postalCode" : "10021"
},
"phoneNumber":
[
{
"type" : "home",
"number": "212 555-1234"
},
{
"type" : "fax",
"number": "646 555-4567"
}
]
}
I am able to parse a single level json through JsonLoader() and do join and other operations and get the desired results as JsonLoader('name:chararray,field1:int .....');
Is it possible to parse the above mentioned JSON file using the built-in JsonLoader() function of PIG 0.10.0. If it is. Please explain me how it is done and accessing fields of the particular JSON?

You can handle nested json loading with Twitter's Elephant Bird: https://github.com/kevinweil/elephant-bird
a = LOAD 'file3.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This will parse the JSON into a map http://pig.apache.org/docs/r0.11.1/basic.html#map-schema the JSONArray gets parsed into a DataBag of maps.

It is possible by creating your own UDF. A simple UDF example is shown in below link
http://pig.apache.org/docs/r0.9.1/udf.html#udf-java

C = load 'path' using JsonLoader('firstName:chararray,lastName:chararray,age:int,address:(streetAddress:chararray,city:chararray,state:chararray,postalCode:chararray),
phoneNumber:{(type:chararray,number:chararray)}')

Related

Mongodb query problem, how to get the matching items of the $or operator

Thank you for first.
MongoDB Version:4.2.11
I have a piece of data like this:
{
"name":...,
...
"administration" : [
{"name":...,"job":...},
{"name":...,"job":...}
],
"shareholder" : [
{"name":...,"proportion":...},
{"name":...,"proportion":...},
]
}
I want to match some specified data through regular expressions:
For a example:
db.collection.aggregate([
{"$match" :
{
"$or" :
[
{"name" : {"$regex": "Keyword"}}
{"administration.name": {"$regex": "Keyword"}},
{"shareholder.name": {"$regex": "Keyword"}},
]
}
},
])
I want to set a flag when the $or operator successfully matches any condition, which is represented by a custom field, for example:{"name" : {"$regex": "Keyword"}}Execute on success:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" : "name"
}
},
{"administration.name" : {"$regex": "Keyword"}}Execute on success:"__regex_type__" : "administration.name"
I try do this:
{"$project" :
{
"_id":false,
"name" : true,
"__regex_type__" :
{
"$switch":
{
"branches":
[
{"case": {"$regexMatch":{"input":"$name","regex": "Keyword"}},"then" : "name"},
{"case": {"$regexMatch":{"input":"$administration.name","regex": "Keyword"}},"then" : "administration.name"},
{"case": {"$regexMatch":{"input":"$shareholder.name","regex": "Keyword"}},"then" : "shareholder.name"},
],
"default" : "Other matches"
}
}
}
},
But $regexMatch cannot match the array,I tried to use $unwind again, but returned the number of many array members, which did not meet my starting point.
I want to implement the same function as mysql this SQL statement in mongodb, like this:
SELECT name,administration.name,shareholder.name,(
CASE
WHEN name REGEXP("Keyword") THEN "name"
WHEN administration.name REGEXP("Keyword") THEN "administration.name"
WHEN shareholder.name REGEXP("Keyword") THEN "shareholder.name"
END
)AS __regex_type__ FROM db.mytable WHERE
name REGEXP("Keyword") OR
shareholder.name REGEXP("Keyword") OR
administration.name REGEXP("Keyword");
Maybe this method is stupid, but I don’t have a better solution.
If you have a better solution, I would appreciate it!!!
Thank you!!!
Since $regexMatch does not handle arrays, use $filter to filter individual array elements with $regexMatch, then use $size to see how many elements matched.
[{"$match"=>{"$or"=>[{"a"=>"test"}, {"arr.a"=>"test"}]}},
{"$project"=>
{"a"=>1,
"arr"=>1,
"src"=>
{"$switch"=>
{"branches"=>
[{"case"=>{"$regexMatch"=>{"input"=>"$a", "regex"=>"test"}},
"then"=>"a"},
{"case"=>
{"$gte"=>
[{"$size"=>
{"$filter"=>
{"input"=>"$arr.a",
"cond"=>
{"$regexMatch"=>{"input"=>"$$this", "regex"=>"test"}}}}},
1]},
"then"=>"arr.a"}],
"default"=>"def"}}}}]
[{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ad'), "a"=>"test", "src"=>"a"},
{"_id"=>BSON::ObjectId('5ffb2df748966813f82f15ae'),
"arr"=>[{"a"=>"test"}],
"src"=>"arr.a"}]

How to extract the field from JSON object with QueryRecord

I have been struggling with this problem for a long time. I need to create a new JSON flowfile using QueryRecord by taking an array (field ref) from input JSON field refs and skip the object field as shown in example below:
Input JSON flowfile
{
"name": "name1",
"desc": "full1",
"refs": {
"ref": [
{
"source": "source1",
"url": "url1"
},
{
"source": "source2",
"url": "url2"
}
]
}
}
QueryRecord configuration
JSONTreeReader setup as Infer Schema and JSONRecordSetWriter
select name, description, (array[rpath(refs, '//ref[*]')]) as sources from flowfile
Output JSON (need)
{
"name": "name1",
"desc": "full1",
"references": [
{
"source": "source1",
"url": "url1"
},
{
"source": "source2",
"url": "url2"
}
]
}
But got error:
QueryRecord Failed to write MapRecord[{references=[Ljava.lang.Object;#27fd935f, description=full1, name=name1}] with schema ["name" : "STRING", "description" : "STRING", "references" : "ARRAY[STRING]"] as a JSON Object due to java.lang.ClassCastException: null
Try the following approach, in your case it shoud work:
1) Read your JSON field fully (I imitated it with GenerateFlowFile processor with your example)
2) Add EvaluateJsonPath processor which will put 2 header fileds (name, desc) into the attributes:
3) Add SplitJson processor which will split your JSON byt refs/ref/ groups (split by "$.refs.ref"):
4) Add ReplaceText processor which will add you header fields (name, desc) to the split lines (replace "[{]" value with "{"name":"${json.name}","desc":"${json.desc}","):
5) It`s done:
Full process in my demo case:
Hope this helps.
Solution!: use JoltTransformJSON to transform JSON by Jolt specification. About this specification.

How to pass json parameters in jmeter?

In jmeter, I want to pass dynamic parameters. For simple json its easy to put ${value1} but if json structure is complex like array or with multiple values then what is the proper method to pass parameter dynamically. Please refer below json.
Below is json with parameter :
{
"squadName": "Super hero squad",
"homeTown": "Metro City",
"formed": 2016,
"secretBase": "Super tower",
"active": true,
"members": [
{
"name": "Molecule Man",
"age": 29,
"secretIdentity": "Dan Jukes",
"powers": [
"Radiation resistance",
"Turning tiny",
"Radiation blast"
]
},
{
"name": "Madame Uppercut",
"age": 39,
"secretIdentity": "Jane Wilson",
"powers": [
"Million tonne punch",
"Damage resistance",
"Superhuman reflexes"
]
},
{
"name": "Eternal Flame",
"age": 1000000,
"secretIdentity": "Unknown",
"powers": [
"Immortality",
"Heat Immunity",
"Inferno",
"Teleportation",
"Interdimensional travel"
]
}
]
}
=======
Now I have used below method to send parameter through csv config file.
Is there any other simple method to pass parameter through variables in Jmeter for complex json (5-6 level with array data) ?
CSV DATA config is the best to parameterize your test data.
If you want to customize the way you want to pick values from CSV you can use BeanShell /JSR223 sampler
here is one article that shows how to pick random values from CSV data config.

Filter an object array to modify json with circe

I am evaluating Circe and couldn't find out how to use filter for arrays to transform a JSON. I read the guide on its website and API doc, still no clue. Help much appreciated.
Sample data:
{
"Department" : "HR",
"Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age" : 30 }]
}
Task:
How to use a filter for Employees to transform the JSON to another JSON, for example, all employees with age older than 50?
For some reason I can't filter from data source before JSON is generated, in case you ask.
Thanks
One possible way of doing this is by
val data = """{"Department" : "HR","Employees" :[{ "name": "abc", "age": 25 }, {"name":"def", "age":30}]}"""
def ageFilter(j:Json): Json = j.withArray { x =>
Json.fromValues(x.filter(_.hcursor.downField("age").as[Int].map(_ > 26).getOrElse(false)))
}
val y: Either[ParsingFailure, Json] = parse(data).map( _.hcursor.downField("Employees").withFocus(ageFilter).top.get)
println(s"$y")

Mule:Dataweave Iteration not working

I am trying to take output from Salesforce & transform it to a json. here is my code:
%dw 1.0
%output application/json
payload map {
headerandlines:{ id : $.Id,
agreementLineID : $.LineItems__r.Id,
netPrice : $.LineItems__r.Price__c,
volume : $.Volume__c,
name : $.Name,
StartDate : $.Start_Date__c,
EndDate : $.End_Date__c,
poField : $.PO_Field__c,
ConsoleNumber : $.Console_Number__c,
Term : $.Term__c,
ownerID : $.OwnerId,
Unit : $.Unit__c,
siteNumber : $.Site_Num__c,
customerNumber : $.Customer_Num__c
}
}
input payload looks like this.. it is a collection of objects. Somehow after the transformation only the first object is sent & rest is clobbered.
[
{
"id": "DA0YAAW",
"LineID": [
"jGEAU",
"jBEAU",
"j6EAE"
],
"Price": [
"50000.0",
"12000.0",
"45000.0"
],
"netPrice": null,
"volume": null,
"name": " Test 2.24",
"StartDate": "2017-02-17",
"EndDate": "2018-02-17",
"poField": "123456",
"ConsoleNumber": "8888888",
"PaymentTerm": "thirty (30)",
"ownerID": “abcd”,
"OperatingUnit": " International Company",
"siteNumber": null,
"customerNumber": null
},
{
"id": "a37n0000000DAMAAA4",
"LineID": [
"JunEAE",
"JuiEAE",
"KdMEAU",
"JuYEAU"
],
"Price": [
"5000.0",
"8000.0",
"5000.0",
"5000.0"
],
"netPrice": null,
"volume": null,
"name": " Test 3.6",
"StartDate": "2017-03-06",
"EndDate": "2018-03-16",
"poField": "12345",
"ConsoleNumber": "123456-",
"PaymentTerm": "30 NET",
"ownerID": “dfgh”,
"OperatingUnit": ", inc.",
"siteNumber": null,
"customerNumber": null
},
….
]
When I call this code from the browser (using API testing) I get the complete payload with multiple objects. When I call this from another API I get only one 1 object indicating it is not looping through. I can confirm that the payload has multiple objects . Is there anything I am missing in terms of looping through this code to extract multiple objects? I assume that '$' notation is good enough for iteration.
#insaneyogi, your input is either incorrect or your dataweave is incorrect.
Here in the input you have specified id in the small. but in dataweave, it is mentioned in capital.
I think the problem here is with your Lineitem and Price type elements. They are collection within and element. In your data mapping $. will take care of the outer object. However, i think the mapping like LineItems__r.Price__c is not correct. It should have proper index , probably LineItems__r.Price__c[0]. Please try that and it should work. First change the input with single element for price or line-item and test.
It looks like the agreementLineID and netPrice are arrays and you need to loop through them with a map operator within the bigger outer map to get all the line items. That should work.