How to Condense & Nest a (CSV) Payload in Dataweave 2.0? - mule

I have a CSV payload TV Programs & Episodes that I want to Transform (Nest & Condense) to a JSON, with the following conditions:
Merge consecutive Program Lines (that are not followed by an Episode Line), so that it becomes 1 Program with the Start Date of the 1st Instance and the Summation of the Duration.
Episode Lines after a Program Line are Nested under the Program
INPUT
Channel|Name|Start|Duration|Type
ACME|Broke Girls|2018-02-01T00:00:00|600|Program
ACME|Broke Girls|2018-02-01T00:10:00|3000|Program
ACME|S03_8|2018-02-01T00:13:05|120|Episode
ACME|S03_9|2018-02-01T00:29:10|120|Episode
ACME|S04_1|2018-02-01T00:44:12|120|Episode
ACME|Lost In Translation|2018-02-01T02:01:00|1800|Program
ACME|Lost In Translation|2018-02-01T02:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:30:00|1800|Program
ACME|Photon|2018-02-01T05:00:00|1800|Program
ACME|Photon|2018-02-01T05:30:00|1800|Program
ACME|Miles & Smiles|2018-02-01T06:00:00|3600|Program
ACME|S015_1|2018-02-01T06:13:53|120|Episode
ACME|S015_2|2018-02-01T06:29:22|120|Episode
ACME|S015_3|2018-02-01T06:46:28|120|Episode
ACME|Ice Age|2018-02-01T07:00:00|300|Program
ACME|Ice Age|2018-02-01T07:05:00|600|Program
ACME|Ice Age|2018-02-01T07:15:00|2700|Program
ACME|S01_4|2018-02-01T07:17:17|120|Episode
ACME|S01_5|2018-02-01T07:32:11|120|Episode
ACME|S01_6|2018-02-01T07:47:20|120|Episode
ACME|My Girl Friday|2018-02-01T08:00:00|3600|Program
ACME|S05_7|2018-02-01T08:17:28|120|Episode
ACME|S05_8|2018-02-01T08:31:59|120|Episode
ACME|S05_9|2018-02-01T08:44:42|120|Episode
ACME|Pirate Bay|2018-02-01T09:00:00|3600|Program
ACME|S01_1|2018-02-01T09:33:12|120|Episode
ACME|S01_2|2018-02-01T09:46:19|120|Episode
ACME|Broke Girls|2018-02-01T10:00:00|1200|Program
ACME|S05_3|2018-02-01T10:13:05|120|Episode
ACME|S05_4|2018-02-01T10:29:10|120|Episode
OUTPUT
{
"programs": [
{
"StartTime": "2018-02-01T00:00:00",
"Duration": 3600,
"Name": "Broke Girls",
"episode": [
{
"name": "S03_8",
"startDateTime": "2018-02-01T00:13:05",
"duration": 120
},
{
"name": "S03_9",
"startDateTime": "2018-02-01T00:29:10",
"duration": 120
},
{
"name": "S04_1",
"startDateTime": "2018-02-01T00:44:12",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T06:00:00",
"Duration": 3600,
"Name": "Miles & Smiles",
"episode": [
{
"name": "S015_1",
"startDateTime": "2018-02-01T06:13:53",
"duration": 120
},
{
"name": "S015_2",
"startDateTime": "2018-02-01T06:29:22",
"duration": 120
},
{
"name": "S015_3",
"startDateTime": "2018-02-01T06:46:28",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T07:00:00",
"Duration": 3600,
"Name": "Ice Age",
"episode": [
{
"name": "S01_4",
"startDateTime": "2018-02-01T07:17:17",
"duration": 120
},
{
"name": "S01_5",
"startDateTime": "2018-02-01T07:32:11",
"duration": 120
},
{
"name": "S01_6",
"startDateTime": "2018-02-01T07:47:20",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T08:00:00",
"Duration": 3600,
"Name": "My Girl Friday",
"episode": [
{
"name": "S05_7",
"startDateTime": "2018-02-01T08:17:28",
"duration": 120
},
{
"name": "S05_8",
"startDateTime": "2018-02-01T08:31:59",
"duration": 120
},
{
"name": "S05_9",
"startDateTime": "2018-02-01T08:44:42",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T09:00:00",
"Duration": 3600,
"Name": "Pirate Bay",
"episode": [
{
"name": "S01_1",
"startDateTime": "2018-02-01T09:33:12",
"duration": 120
},
{
"name": "S01_2",
"startDateTime": "2018-02-01T09:46:19",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T10:00:00",
"Duration": 1200,
"Name": "Broke Girls",
"episode": [
{
"name": "S05_3",
"startDateTime": "2018-02-01T10:13:05",
"duration": 120
},
{
"name": "S05_4",
"startDateTime": "2018-02-01T10:29:10",
"duration": 120
}
]
}
]
}

Give this a try, comments are embedded:
%dw 2.0
output application/dw
var data = readUrl("classpath://data.csv","application/csv",{separator:"|"})
var firstProgram = data[0].Name
---
// Identify the programs by adding a field
(data reduce (e,acc={l: firstProgram, c:0, d: []}) -> do {
var next = acc.l != e.Name and e.Type == "Program"
var counter = if (next) acc.c+1 else acc.c
---
{
l: if (next) e.Name else acc.l,
c: counter,
d: acc.d + {(e), pc: counter}
}
}).d
// group by the identifier of individual programs
groupBy $.pc
// Get just the programs throw away the program identifiers
pluck $
// Throw away the programs with no episodes
filter ($.*Type contains "Episode")
// Iterate over the programs
map do {
// sum the program duration
var d = $ dw::core::Arrays::sumBy (e) -> if (e.Type == "Program") e.Duration else 0
// Get the episodes and do a little cleanup
var es = $ map $-"pc" filter ($.Type == "Episode")
---
// Form the desired structure
{
($[0] - "pc" - "Duration"),
Duration: d,
Episode: es
}
}
NOTE1: I stored the contents in a file and read it using readUrl, you need to adjust to accommodate from where you get your data from.
NOTE2: Maybe you need to rethink your inputs, organize them better, if possible.
NOTE3: Studio will show errors (at least Studio 7.5.1 does). They are false positives, the code runs
NOTE4: Lots of steps because of the non-trivial input. Potentialy the code could be optimized but I did spend enough time on it--I 'll let you deal with the optimization or somebody else from the community can help.

Related

JSON element extraction from response based on scenario outline examples or external file

This is my api response. Want to extract the value of the Id based on the displayNumber. This display number is a given in the list of values in examples/csv file.
{
"Acc": [
{
"Id": "2b765368696b3441673633325",
"code": "SGD",
"val": 406030.83,
"displayNumber": "8957",
"curval": 406030.83
},
{
"Id": "4e676269685a73787472355776764b50717a4",
"code": "GBP",
"val": 22.68,
"displayNumber": "1881",
"curval": 22.68
},
{
"Id": "526e666d65366e67626244626e6266467",
"code": "SGD",
"val": 38404.44,
"displayNumber": "1004",
"curval": 38404.44
},
],
"combinations": [
{
"displayNumber": "3444",
"Code": "SGD",
"Ids": [
{
"Id": "2b765368696b34416736333254462"
},
{
"Id": "4e676269685a7378747235577"
},
{
"Id": "526e666d65366e6762624d"
}
],
"destId": "3678434b643530456962435272d",
"curval": 3.85
},
{
"displayNumber": "8957",
"code": "SGD",
"Ids": [
{
"Id": "3678434b6435304569624357"
},
{
"Id": "4e676269685a73787472355776764b50717a4"
},
{
"Id": "526e666d65366e67626244626e62664679"
}
],
"destId": "2b765368696b344167363332544",
"curval": 406030.83
},
{
"displayNumber": "1881",
"code": "GBP",
"Ids": [
{
"Id": "3678434b643530456962435275"
},
{
"Id": "2b765368696b3441673"
},
{
"Id": "526e666d65366e67626244626e626"
}
],
"destId": "4e676269685a7378747d",
"curval": 22.68
},
]
}
Examples
|displayNumber|
|8957|
|3498|
|4943|
Below expression works if i give the value
* def tempid = response
* def fromAccount = get[0] tempid.Acc[?(#.displayNumber==8957].Id
I'm not sure how to make this comparison value (i.e. 1881) as a variable which can be read from examples (scenario outline) or a csv file. Went through the documentation, which recommends, karate filters or maps. However, not able to follow how to implement.
You almost got it :-). This is the way you want to solve this
Scenario Outline: Testing SO question for Navneeth
* def tempid = response
* def fromAccount = get[0] tempid.Acc[?(#.displayNumber == <displayNumber>)]
* print fromAccount
Examples:
|displayNumber|
|8957|
|1881|
|3444|
You need to pass the placeholder in examples as -
'<displayNumber>'

Low-Fare endpoint returning JSON with missing operating.carrierCode

The Low-fare endpoint just recently started returning JSON with some flight segments missing a key-value pair for operating.carrierCode, for example:
"operating": {
"number": “5898"
This was the second flight segment of the first result (data[0].offerItems[0].services[0].segments[1].flightSegment) when yesterday I searched:
https://test.api.amadeus.com/v1/shopping/flight-offers?origin=LON&destination=PAE&departureDate=2019-10-29&returnDate=2019-11-13&adults=1&nonStop=false&max=50
Here it is in context:
{
"data": [
{
"type": "flight-offer",
"id": "1564934270644-1482530186",
"offerItems": [
{
"services": [
{
"segments": [
{
"flightSegment": {
"departure": {
"iataCode": "LHR",
"terminal": "2",
"at": "2019-10-29T09:20:00Z"
},
"arrival": {
"iataCode": "SFO",
"terminal": "I",
"at": "2019-10-29T13:30:00-07:00"
},
"carrierCode": "SN",
"number": "9101",
"aircraft": {
"code": "777"
},
"operating": {
"carrierCode": "UA",
"number": "9101"
},
"duration": "0DT11H10M"
},
"pricingDetailPerAdult": {
"travelClass": "ECONOMY",
"fareClass": "K",
"availability": 4,
"fareBasis": "KLP5ULGT"
}
},
{
"flightSegment": {
"departure": {
"iataCode": "SFO",
"terminal": "3",
"at": "2019-10-29T16:15:00-07:00"
},
"arrival": {
"iataCode": "PAE",
"at": "2019-10-29T18:32:00-07:00"
},
"carrierCode": "UA",
"number": "5898",
"aircraft": {
"code": "E7W"
},
"operating": {
"number": “5898”
},
"duration": "0DT2H17M"
},
"pricingDetailPerAdult": {
"travelClass": "ECONOMY",
"fareClass": "K",
"availability": 9,
"fareBasis": "KLP5ULGT"
}
}
]
…
Is this a known bug? It was pretty easy to write a workaround, but I was surprised that this data was missing since it had been working correctly for months.
This is a common issue with the operating carrier even in cryptic mode
FLT/DATE RTNG CKIN TM DEP ARR TM EQP ML DURA DIST
UA5898 Y 29OCT SFOPAE 3 415P 632P E7W G 2:17 711
SFOPAE OPERATED BY SKYWEST DBA UNITED EXPRES
>
Whilst at shopping using enterprise apis you can use the same text describing the schedule data resumed by flight number.
Dont know of any parameter to get it in the api response too.

Rally Lookback API doesn't retrieve records newer than 1 week

I'm running some queries with Rally Lookback API and it seems that revisions newer than 1 week are not being retrieved:
λ date
Wed, Nov 28, 2018 2:26:45 PM
using the query below:
{
"ObjectID": 251038028040,
"__At": "current"
}
results:
{
"_rallyAPIMajor": "2",
"_rallyAPIMinor": "0",
"Errors": [],
"Warnings": [
"Max page size limited to 100 when fields=true"
],
"GeneratedQuery": {
"find": {
"ObjectID": 251038028040,
"$and": [
{
"_ValidFrom": {
"$lte": "2018-11-21T14:44:34.694Z"
},
"_ValidTo": {
"$gt": "2018-11-21T14:44:34.694Z"
}
}
],
"_ValidFrom": {
"$lte": "2018-11-21T14:44:34.694Z"
}
},
"limit": 10,
"skip": 0,
"fields": true
},
"TotalResultCount": 1,
"HasMore": false,
"StartIndex": 0,
"PageSize": 10,
"ETLDate": "2018-11-21T14:44:34.694Z",
"Results": [
{
"_id": "5bfe7e3c3f1f4460feaeaf11",
"_SnapshotNumber": 30,
"_ValidFrom": "2018-11-21T12:22:08.961Z",
"_ValidTo": "9999-01-01T00:00:00.000Z",
"ObjectID": 251038028040,
"_TypeHierarchy": [
-51001,
-51002,
-51003,
-51004,
-51005,
-51038,
46772408020
],
"_Revision": 268342830516,
"_RevisionDate": "2018-11-21T12:22:08.961Z",
"_RevisionNumber": 53,
}
],
"ThreadStats": {
"cpuTime": "15.463705",
"waitTime": "0",
"waitCount": "0",
"blockedTime": "0",
"blockedCount": "0"
},
"Timings": {
"preProcess": 0,
"findEtlDate": 88,
"allowedValuesDisambiguation": 1,
"mongoQuery": 1,
"authorization": 3,
"suppressNonRequested": 0,
"compressSnapshots": 0,
"allowedValuesHydration": 0,
"TOTAL": 93
}
}
Having in mind that this artifact have, as for now, 79 revisions with the latest revision pointing to 11/21/2018 02:41 PM CST as per revisions tab at Rally Central.
One other thing is that if I run the query a couple of minutes later the ETL date seems to be updating, as some sort of indexing being run:
{
"_rallyAPIMajor": "2",
"_rallyAPIMinor": "0",
"Errors": [],
"Warnings": [
"Max page size limited to 100 when fields=true"
],
"GeneratedQuery": {
"find": {
"ObjectID": 251038028040,
"$and": [
{
"_ValidFrom": {
"$lte": "2018-11-21T14:45:50.565Z"
},
"_ValidTo": {
"$gt": "2018-11-21T14:45:50.565Z"
}
}
],
"_ValidFrom": {
"$lte": "2018-11-21T14:45:50.565Z"
}
},
"limit": 10,
....... rest of the code ommited.
Is there any reason why Lookback API shouldn't processing current data instead of one week of difference between records?
It appears that your workspace's data is currently being "re-built". The _ETLDate is the date of the most-current revision in the LBAPI database and should eventually catch up to the current revision's date.

How to visualize array of objects in Kibana?

I have a few documents of the below format.
Document1:
{
"_index": "myIndex",
"preferenceCount": [
{
"name": "apple",
"count": 1
},
{
"name": "mango",
"count": 1
},
{
"name": "apple",
"count": 1
}
]
}
Document2:
{
"_index": "myIndex",
"preferenceCount": [
{
"name": "mango",
"count": 1
},
{
"name": "mango",
"count": 1
},
{
"name": "orange",
"count": 1
}
]
}
I want to visualise this data aggregated in such a way that I get the below graph (sorry for not uploading picture)
apple: 2 (sum of count for name = apple across documents in time range)
mango: 3
orange: 1
I tried
sum(preferenceCount.count) groupBy (preferenceCount.name.keyword)
But that sums count across documents and displays below graph
apple: 3
mango: 6
orange: 3
Please let me know how might I achieve this.
Thanks!
I don't know kibana but in Vega-Lite, you can extract data from a property:
{
"data": {
"property": "preferenceCount",
"type": "json",
...
}
}

Nested "for loop" searches in SQL - Azure CosmosDB

I am using Cosmos DB and have a document with the following simplified structure:
{
"id1":"123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
}
My issue is - I have an array of these documents and need to search name to see if it matches my search word. For example I want both big trees and trees to return if a user types in trees.
So currently I push every document into an array and do the following:
For each document
for each stuff
for each a.b.c.d[0].e
for each classes
var splice = name.split(' ')
if (splice.includes(searchWord))
return id1, id2 and id3.
Using cosmosDB I am using SQL with the following code:
client.queryDocuments(
collection,
`SELECT * FROM root r`
).toArray((err, results) => {stuff});
This effectively brings every document in my collection into an array to perform the search manually above as mentioned.
This is going to cause issues when I have 1000s or 1,000,000s of documents in the array and I believe I should be leveraging the search mechanics available within Cosmos itself. Is anyone able to help me to work out what SQL query would be able to perform this type of function?
Having searched everything is it also possible to search the 5 latest documents?
Thanks for any insight in advance!
1.Is anyone able to help me to work out what SQL query would be able to
perform this type of function?
According to your sample and description, I suggest you using ARRAY_CONTAINS in cosmos db sql. Please refer to my sample:
sample documents:
[
{
"id1": "123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "456",
"stuff": [
{
"id2": "stuff2",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things2",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "trees",
"meta": 1
}
]
},
{
"id3": "default2",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "789",
"stuff": [
{
"id2": "stuff3",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things3",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big",
"meta": 1
}
]
},
{
"id3": "default3",
"name": "other",
"classes": [
{
"name": "big trees",
"meta": 1
}
]
}
]
}
]
}
}
}
}
]
}
]
query :
SELECT distinct c.id1,stuff.id2,e.id3 FROM c
join stuff in c.stuff
join d in stuff.a.b.c.d
join e in d.e
where ARRAY_CONTAINS(e.classes,{name:"trees"},true)
or ARRAY_CONTAINS(e.classes,{name:"big trees"},true)
output:
2.Having searched everything is it also possible to search the 5 latest
documents?
Per my research, features like LIMIT is not supported in cosmos so far. However , TOP is supported by cosmos db. So if you could add sort field(such as date or id), then you could use sql:
select top 5 from c order by c.sort desc