Pentaho: Getting only one in row from JSON input file - pentaho

I am getting one JSON file from SFTP and trying it to insert into oracle but in the preview section i am getting only one row and only one row inserting into table. I already tried to modify to number to preview rows to 10000 but nothing working out.
{"postal_code":"XX","build_id":"XX","categories":[],"closed":false,"closed_reasons":[],"email":["XX"],"external_link":{"facebook":[],"yelp":[""]},"hq":false,"location":{"lat":xxxxx,"lon":xxxxx},"metro":"Cxxxxx, IL","naics_codes":[{"category_code":"XX","category_description":"XX"},{"category_code":"XX","category_description":"xxxxx "},{"category_code":"XX","category_description":"xxxxx "},{"category_code":xxxxx","category_description":"XX"}],"name":"XX","place_id":"XX","place_ids":["xxx","xx"],"sic_codes":[{"category_code":"XX","category_description":"XX"},{"category_code":"XX","category_description":"XX"}]}

Your example json is:
{
"postal_code":"XX",
"build_id":"XX",
"categories":[],
"closed":false,
"closed_reasons":[],
"email":["XX"],
"external_link":{
"facebook":[],
"yelp":[""]
},
"hq":false,
"location":{
"lat":xxxxx,
"lon":xxxxx
},
"metro":"Cxxxxx, IL",
"naics_codes":[
{
"category_code":"XX",
"category_description":"XX"
},
{
"category_code":"XX",
"category_description":"xxxxx "
},
{
"category_code":"XX",
"category_description":"xxxxx "
},
{
"category_code":xxxxx",
"category_description":"XX"
}
],
"name":"XX",
"place_id":"XX",
"place_ids":["xxx","xx"],
"sic_codes":[
{
"category_code":"XX",
"category_description":"XX"
},
{
"category_code":"XX",
"category_description":"XX"
}
]
}
So, if that's the total values in the response you're getting from the SFTP server, Pentaho is behaving well, because it's only one record. If you want PDI to recognize the fields inside that json, and split it's content, you need to specify the path to each field on the "Path" field available on the "Fields" tab on the "Json Input" step.

Try using the input json file as follows:
{
"data" : [
{"key":"val","key":"val"},
{"key":"val","key":"val"},
{"key":"val","key":"val"},
{"id":"666","name":"jnit"}
]
}

Related

How to use Aggregate in AppSmith

In my database of MongoDB, I can't insert new register in the historial
I want insert the phone and start date
My model in Mongo is this
And the code what write is this
{
"aggregate": "dids",
"pipeline": [
{
"$match": {
"_id": ObjectId({{sPhone.selectedOptionValue}})
}
}
],
"cursor": {
"batchSize": 10
}
}
how insert $addFields in pipeline segment?
"$addFields": {
"history.contract_id": {{tableOfContracts.selectedRow._id}},
"history.start_date": {{dtpkStart.selectedDate}}
}
Yes is written below the match segment, throws error ('A pipeline stage specification object must contain exactly one field')
Any solution?
To insert data, you can consult it in the appsmith documentation in the following links.
Link to the explanation of mongodb methods: https://www.mongodb.com/docs/manual/reference/insert-methods/
Link to query syntax: https://docs.appsmith.com/reference/datasources/querying-mongodb/mongo-syntax
Query configuration:enter image description here

How to convert Json to CSV and send it to big query or google cloud bucket

I`m new to nifi and I want to convert big amount of json data to csv format.
This is what I am doing at the moment but it is not the expected result.
These are the steps:
processes to create access_token and send request body using InvokeHTTP(This part works fine I wont name the processes since this is the expected result) and getting the response body in json.
Example of json response:
[
{
"results":[
{
"customer":{
"resourceName":"customers/123456789",
"id":"11111111"
},
"campaign":{
"resourceName":"customers/123456789/campaigns/222456422222",
"name":"asdaasdasdad",
"id":"456456546546"
},
"adGroup":{
"resourceName":"customers/456456456456/adGroups/456456456456",
"id":"456456456546",
"name":"asdasdasdasda"
},
"metrics":{
"clicks":"11",
"costMicros":"43068982",
"impressions":"2079"
},
"segments":{
"device":"DESKTOP",
"date":"2021-11-22"
},
"incomeRangeView":{
"resourceName":"customers/456456456/incomeRangeViews/456456546~456456456"
}
},
{
"customer":{
"resourceName":"customers/123456789",
"id":"11111111"
},
"campaign":{
"resourceName":"customers/123456789/campaigns/222456422222",
"name":"asdasdasdasd",
"id":"456456546546"
},
"adGroup":{
"resourceName":"customers/456456456456/adGroups/456456456456",
"id":"456456456546",
"name":"asdasdasdas"
},
"metrics":{
"clicks":"11",
"costMicros":"43068982",
"impressions":"2079"
},
"segments":{
"device":"DESKTOP",
"date":"2021-11-22"
},
"incomeRangeView":{
"resourceName":"customers/456456456/incomeRangeViews/456456546~456456456"
}
},
....etc....
]
}
]
Now I am using:
===>SplitJson ($[].results[])==>JoltTransformJSON with this spec:
[{
"operation": "shift",
"spec": {
"customer": {
"id": "customer_id"
},
"campaign": {
"id": "campaign_id",
"name": "campaign_name"
},
"adGroup": {
"id": "ad_group_id",
"name": "ad_group_name"
},
"metrics": {
"clicks": "clicks",
"costMicros": "cost",
"impressions": "impressions"
},
"segments": {
"device": "device",
"date": "date"
},
"incomeRangeView": {
"resourceName": "keywords_id"
}
}
}]
==>> MergeContent( here is the problem which I don`t know how to fix)
Merge Strategy: Defragment
Merge Format: Binary Concatnation
Attribute Strategy Keep Only Common Attributes
Maximum number of Bins 5 (I tried 10 same result)
Delimiter Strategy: Text
Header: [
Footer: ]
Demarcator: ,
What is the result I get?
I get a json file that has parts of the json data
Example: I have 50k customer_ids in 1 json file so I would like to send this data into big query table and have all the ids under the same field "customer_id".
The MergeContent uses the split json files and combines them but I will still get 10k customer_ids for each file i.e. I have 5 files and not 1 file with 50k customer_ids
After the MergeContent I use ==>>ConvertRecord with these settings:
Record Reader JsonTreeReader (Schema Access Strategy: InferSchema)
Record Writer CsvRecordWriter (
Schema Write Strategy: Do Not Write Schema
Schema Access Strategy: Inherit Record Schema
CSV Format: Microsoft Excel
Include Header Line: true
Character Set UTF-8
)
==>>UpdateAttribute (custom prop: filename: ${filename}.csv) ==>> PutGCSObject(and put the data into the google bucket (this step works fine- I am able to put files there))
With this approach I am UNABLE to send data to big query(After MergeContent I tried using PutBigQueryBatch and used this command in bq sheel to get the schema I need:
bq show --format=prettyjson some_data_set.some_table_in_that_data_set | jq '.schema.fields'
I filled all the fields as needed and Load file type: I tried NEWLINE_DELIMITED_JSON or CSV if I converted it to CSV (I am not getting errors but no data is uploaded into the table)
)
What am I doing wrong? I basically want to map the data in such a way that each fields data will be under the same field name
The trick you are missing is using Records.
Instead of using X>SplitJson>JoltTransformJson>Merge>Convert>X, try just X>JoltTransformRecord>X with a JSON Reader and a CSV Writer. This skips a lot of inefficiency.
If you really need to split (and you should avoid splitting and merging unless totally necessary), you can use MergeRecord instead - again with a JSON Reader and CSV Writer. This would make your flow X>Split>Jolt>MergeRecord>X.

Get values from JSON using JSON_VALUE in SQL Server

I have the below JSON (Sorry I dont know how to format it!) and I am struggling to understand how to extract values at the different levels.
My code so far is this which brings back the data required from the first few columns..
,JSON_VALUE(jsonstring,'$[0].addPoint') Addpoint
,JSON_VALUE(jsonstring,'$[0].department') department
,JSON_VALUE(jsonstring,'$[0].subBuilding') subBuilding
,JSON_VALUE(jsonstring,'$[0].buildingNumber') Buildingnumber
,JSON_VALUE(jsonstring,'$[0].buildingGroup') buildingGroup
However I am not sure how I would get the below columns..
"mpan"
"serialnumber"
Can someone advise me as to what I am missing here? I haven't worked with JSON before and have googled but can't find a definite solution
[
{
"addPoint":null,
"department":null,
"subBuilding":null,
"buildingNumber":"1",
"buildingName":null,
"buildingGroup":null,
"poBox":null,
"subStreet":"The Arches",
"subLocality":null,
"stateRegion":"Lancashire",
"subAdministrativeArea":null,
"administrativeArea":null,
"superAdministrativeArea":null,
"countryCode":"GBR",
"countryName":null,
"dpsZipPlus":"1B5",
"formattedAddress":"TEST,,MANC,Lancashire,66666",
"welshSubStreet":null,
"welshStreet":null,
"welshSubLocality":null,
"welshLocality":null,
"welshTown":null,
"geographicInformation":null,
"additionalItems":{
"item":[
{
"key":"DATASOURCE",
"value":"tu_REGISTER"
}
],
"tmp":null
},
"groupedAdditionalItems":null,
"persons":null,
"uprn":null,
"lpi":null,
"blpu":null,
"streetDescriptor":null,
"streetInformation":null,
"companyInformation":null,
"dnbCompanyInformation":null,
"onsPointerInformation":null,
"classification":null,
"osAl2Toid":null,
"osItnToid":null,
"osTopoToid":null,
"voaCtRecord":null,
"voaNdrRecord":null,
"apOSAPR":null,
"rmUDPRN":"2744498",
"mrOccCountSpecified":false,
"alias":null,
"utilitiesInformation":{
"fuelType":1,
"fuelTypeSpecified":true,
"gasInformation":null,
"electricityInformation":{
"meterPoint":[
{
"mpan":"162558070",
"meter":[
{
"serialNumber":"D07W05001",
"type":"N"
}
],
"profileType":"02",
"timeSwitchCode":"811",
"lineLossFactorId":"531",
"standardSettlementConfiguration":"0151",
"energisationStatus":"E",
"energisationEffectiveFromDate":{
"day":5,
"daySpecified":true,
"month":12,
"monthSpecified":true,
"year":2014,
"yearSpecified":true
},
"distributorId":"16",
"gspid":"_G"
}
],
"tmp":null
},
"tmp":null
},
"organisation":null,
"street":"Clive Street",
"town":"MANCH",
"postCode":"r4d 1ES",
"locality":null
}]
You can use the following paths:
$[0].utilitiesInformation.electricityInformation.meterPoint[0].mpan
$[0].utilitiesInformation.electricityInformation.meterPoint[0].meter[0].serialNumber

Is there a way to use the graphLookup aggregation pipeline stage for arrays?

I am currently working on an application that uses MongoDB as the data repository. I am mainly concerned about the graphLookup query to establish links between different people, based on what flights they took. My document contains an array field, that in turn contains key value pairs. I need to establish the links based on one of the key:value pairs of that array.
I have already tried some queries of aggregation pipeline with $graphLookup as one of the stages and they have all worked fine. But now that I am trying to use it with an array, I am hitting a blank.
Below is the array field from the first document :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1550932676000,
"arrivalDateTimeMillis":1551019076000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA007_1550932676000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_8",
"departureMonthlyTemporalSpatialWindow":"DOH_2",
"arrivalWeeklyTemporalSpatialWindow":"LHR_8",
"arrivalMonthlyTemporalSpatialWindow":"LHR_2"
}
]
The other document has the below field :
"movementSegments":[
{
"carrierCode":"MO269",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
},
{
"carrierCode":"MO270",
"departureDateTimeMillis":1548254276000,
"arrivalDateTimeMillis":1548340676000,
"departurePort":"DOH",
"arrivalPort":"LHR",
"departurePortText":"HAMAD INTERNATIONAL AIRPORT",
"arrivalPortText":"LONDON HEATHROW",
"serviceNameText":"",
"serviceKey":"BA003_1548254276000",
"departurePortLatLong":"25.273056,51.608056",
"arrivalPortLatLong":"51.4706,-0.461941",
"departureWeeklyTemporalSpatialWindow":"DOH_4",
"departureMonthlyTemporalSpatialWindow":"DOH_1",
"arrivalWeeklyTemporalSpatialWindow":"LHR_4",
"arrivalMonthlyTemporalSpatialWindow":"LHR_1"
}
]
And I am running the below query :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'carrierCode',
connectToField: 'carrierCode',
as: 'carrier_connections'
}
}
])
The above query creates an array field in the document, but there are no values in it. As per the expectation, both my documents should get linked based on the carrier number.
Just to be clear about the query, the documents contain an eventId field, and the match pipeline returns one document to me after the match stage.
Well, I don't know how I missed it, but here is the solution to my problem which gives me the required results :
db.person_events.aggregate([
{ $match: { eventId: "22446688" } },
{
$graphLookup: {
from: 'person_events',
startWith: '$movementSegments.carrierCode',
connectFromField: 'movementSegments.carrierCode',
connectToField: 'movementSegments.carrierCode',
as: 'carrier_connections'
}
}
])

Matching values from response to values from DB

My response looks as follows :
{
"data":[
{
"foo":bar1
"user_email":"user#user.com",
"user_id":1
},
{
"foo":bar2
"user_email":"user#user.com",
"user_id":1
}
]
}
* def DBOutput = #fetching values from DB
* match response.data[*].foo contains [DBOutput1[0][0],DBOutput1[1][0]]
DBOutput1 has data as follows : [["bar1"],["bar2"]]
This match fails, for some reasons value passed into the expected list in match statement is DBOutput1[0][0]
This is the error I am getting actual: ["bar1","bar2"], expected: 'DBOutput1[0][0]',
You have some seriously malformed JSON in your example above. The below snippet works, just paste it into a new Scenario:
* def response =
"""
{
"data":[
{
"foo": "bar1",
"user_email":"user#user.com",
"user_id":1
},
{
"foo": "bar2",
"user_email":"user#user.com",
"user_id":1
}
]
}
"""
* match response.data[*].foo contains ['bar1', 'bar2']
Now it is up to you to fix your test, without knowing what your DBOutput is no one can help further.
I iterated over the DBoutput and stored them in a new array list. I then matched the response to the list and it worked.
match response.data[*].foo contains ListFromDB