JSON input in pentaho data integration [closed] - pentaho

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have json inside json array coming from mongoDb.
"{ "_id" : { ""$oid"" : ""54b76bce44ae90e9e919d6e1""} ,
"_class" : ""com.argusoft.hkg.nosql.model.HkEventDocument"" , "featureName" : "EVENT" ,
"instanceId" : 577 ,
"fieldValue" :
{ "reg_label_type" : "types" ,
"third_custom_field" : "tttttttttttttttt" ,"people_capacity1": "20"}
, "sectionList" :
[ { "sectionName" : "REGISTRATION" ,
"customFields" :
[ { "_id" : 577 , "fieldValue" :{ "multiselect1" : [ "[]"] , "datess" : { "$date" : "2015-01-16T18:30:00.000Z"}}}]} , { "sectionName" : "INVITATIONCARD" , "customFields" : [ ]}] , "franchiseId" : 2}";
And I want to access the fieldValue array in json input. How can I do that?

Related

I want to merge data using lodash [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
[
{
"label": "google_da_desktop_1",
"data": [
0,
22.596473199939744,
32.932481810474904,
41.05015952885718,
47.99635538700579,
54.183927487925395,
59.82719604246855,
65.0546652236101,
69.95069924256464
],
"week": 1,
"group": "google_da_desktop",
"groupNo": 0
},
{
"label": "google_da_desktop_2",
"data": [
0,
22.596473199939744,
32.932481810474904,
41.05015952885718,
47.99635538700579,
54.183927487925395,
59.82719604246855,
65.0546652236101,
69.95069924256464
],
"week": 1,
"group": "google_da_desktop",
"groupNo": 0
}
]
I want to group these data into groups and sum the data, but is there any way?
I tried using lodash groupBy but it didn't work

Updating an array field columns in big query [duplicate]

This question already has an answer here:
Update values in struct arrays in BigQuery
(1 answer)
Closed 2 years ago.
Problem statement :
How to update an array field in big query
Below is my table
Test_table
-------------------------------
file.fileName | file.count
-------------------------------
abc.txt. | 100
-------------------------------
From the above table I need to update the both fileName and count fields
Schema:
{
"name": "file"
"type" : "record"
"mode" : "repeated"
"fields" :[
{
"name": fileName
"type": string
"mode" :nullable
},
{
"name": count
"type": string
"mode" :nullable
}
]
}
can some one help me on how to execute update query on this table
Can't you do something like this?
update t
set file[safe_offset(1)].filename = ?,
file[safe_offset(1)].count = ?

return 0 instead of nothing for attribute value that is not available in any collections

I have a document as follow:
{
"_id" : ObjectId("5491d65bf315c2726a19ffe0"),
"tweetID" : NumberLong(535063274220687360),
"tweetText" : "19 RT Toronto #SunNewsNetwork: WATCH: When it comes to taxes, regulations, and economic freedom, is Canada more \"American\" than America? http://t.co/D?",
"retweetCount" : 1,
"Added" : ISODate("2014-11-19T04:00:00.000Z"),
"tweetLat" : 0,
"tweetLon" : 0,
"url" : "http://t.co/DH0xj0YBwD ",
"sentiment" : 18
}
now I want to get all document like this where Added is between 2014-11-19 and 2014-11-23 but we should note that there might be no data in for example this date : 2014-11-21 and now the problem starts: here when this happens I want 0 for sum of sentiment for this date instead of returning nothing( I know I can check this in java but it is not reasonable), my code is as follow which works fine except for the date that is not available it returns nothing instead of 0:
andArray.add(new BasicDBObject("Added", new BasicDBObject("$gte",
startDate)));
andArray.add(new BasicDBObject("Added", new BasicDBObject("$lte",
endDate)));
DBObject where = new BasicDBObject("$match", new BasicDBObject("$and",
andArray));
stages.add(where);
DBObject groupFields = new BasicDBObject("_id", "$Added");
groupFields.put("value",
new BasicDBObject("$avg", "$sentiment"));
DBObject groupBy = new BasicDBObject("$group", groupFields);
stages.add(groupBy);
DBObject project = new BasicDBObject("_id", 0);
project.put("value", 1);
project.put("Date", "$_id");
stages.add(new BasicDBObject("$project", project));
DBObject sort = new BasicDBObject("$sort", new BasicDBObject("Date", 1));
stages.add(sort);
AggregationOutput output = collectionG.aggregate(stages);
Now I want value 0 for the date that is not available in the collections that I have,
For example consider 2014-11-21 in the following :
[ { "value" : 6.0 , "Date" : { "$date" : "2014-11-19T04:00:00.000Z"}} , { "value" : 20.0 , "Date" : { "$date" : "2014-11-20T04:00:00.000Z"}},{ "value" : 0 , "Date" : { "$date" : "2014-11-21T04:00:00.000Z"}}]
instead of :
[ { "value" : 6.0 , "Date" : { "$date" : "2014-11-19T04:00:00.000Z"}} , { "value" : 20.0 , "Date" : { "$date" : "2014-11-20T04:00:00.000Z"}}}]
Is it possible to do that?
Why is checking in Java not reasonable? Setting average to 0 for 'nothing' is reasonable?
Depending on the context of your problem, one solution is for you to insert dummy records with 0 sentiment.

JSON Bulk load with Apache Phoenix

I have a problem with loading data from json files. How can i export data from json files into the table in Hbase?
Here is json-structure:
{ "_id" : { "$oid" : "53ba5e86eb07565b53374901"} , "_api_method" : "database.getSchools" , "id" : "0" , "date_insert" : "2014-07-07 11:47:02" , "unixdate" : 1404722822 , "city_id" : "1506490" , "response" : [ 1 , { "id" : 354053 , "title" : "шк. Аджамская"}]};
Help me please!
For your json format, you could not use importtsv. I suggest you write a Mapreduce to parse you json data and put data to HBase.

Merging 2 rows in pentaho kettle transformation

My KTR is:
MongoDB Json Input gives the JSON as follows:
{ "_id" : { "$oid" : "525cf3a70fafa305d949ede0"} , "asset" :
"RO2500AS1" , "Salt Rejection" : "82%" , "Salt Passage" : "18%" ,
"Recovery" : "56.33%" , "Concentration Factor" : "2.3" , "status" :
"critical" , "Flow Alarm" : "High Flow"}
And one Table input which returns 2 rows:
In StreamLookUp step, Key to look up is configured as asset = AssetName
My final Output is returning 2 jsons:
{"data":[{"Estimated Cost":"USD
15","AssetName":"RO2500AS1","Description":"Pump
Maintenance","Index":1,"json":"{ \"_id\" : { \"$oid\" :
\"525cf3a70fafa305d949ede0\"} , \"asset\" : \"RO2500AS1\" , \"Salt
Rejection\" : \"82%\" , \"Salt Passage\" : \"18%\" , \"Recovery\" :
\"56.33%\" , \"Concentration Factor\" : \"2.3\" , \"status\" :
\"critical\" , \"Flow Alarm\" : \"High
Flow\"}","Type":"Service","DeadLine":"13 November 2013"}]}
{"data":[{"Estimated Cost":"USD
35","AssetName":"RO2500AS1","Description":"Heat
Sensor","Index":2,"json":"{ \"_id\" : { \"$oid\" :
\"525cf3a70fafa305d949ede0\"} , \"asset\" : \"RO2500AS1\" , \"Salt
Rejection\" : \"82%\" , \"Salt Passage\" : \"18%\" , \"Recovery\" :
\"56.33%\" , \"Concentration Factor\" : \"2.3\" , \"status\" :
\"critical\" , \"Flow Alarm\" : \"High
Flow\"}","Type":"Replacement","DeadLine":"26 November 2013"}]}
I want my final JSON output to merge show result something like:
{"data": [{"Estimated Cost":"USD 15", "AssetName":"RO2500AS1",
"Description":"Pump Maintenance", "Index":1, "Type":"Service",
"DeadLine":"13 November 2013"}, {"Estimated Cost":"USD 35",
"AssetName":"RO2500AS1", "Description":"Heat Sensor", "Index":2,
"Type":"Replacement", "DeadLine":"26 November 2013"}],
"json":{ "_id" : "525cf3a70fafa305d949ede0"} , "asset" : "RO2500AS1"
, "Salt Rejection" : "82%" , "Salt Passage" : "18%" , "Recovery" :
"56.33%" , "Concentration Factor" : "2.3" , "status" : "critical" ,
"Flow Alarm" : "High Flow"}
which means merging 2 rows.
Can anybody help please
you can use MergeJoin after Tableinput. That will merge the rows from Mysql output rows and you will have only one JSON as output...
You would want to use the Merge step for your purpose. Don't forget to sort the input streams.
Note: In this step rows are expected in to be sorted on the specified key fields. When using the Sort step, this works fine. When you sorted the data outside of PDI, you may run into issues with the internal case sensitive/insensitive flag