Transform array of records to summary or pivot using ramdajs - ramda.js

[
{
"door_id": 324,
"action": "door open",
"timestamp": "2018-03-30 10:34:44",
"date": "2018-03-30"
},
{
"door_id": 324,
"action": "door close",
"timestamp": "2018-03-30 10:39:44",
"date": "2018-03-30"
},
{
"door_id": 324,
"action": "door open",
"timestamp": "2018-03-30 10:59:44",
"date": "2018-03-30"
},
{
"door_id": 325,
"action": "door open",
"timestamp": "2018-03-31 14:59:44",
"date": "2018-03-31"
},
{
"door_id": 325,
"action": "door close",
"timestamp": "2018-03-31 15:00:44",
"date": "2018-03-31"
}
]
I'm trying to transform this array of objects into the expected format using ramda.js.
The open and close actions will always comes in sequence but not necessarily make a complete set (e.g. there's a log for door opening, but no log for door closing because the door is open)
I prefer more like step-by-step using mapper approach/partial function.
const expected = [
{
"door_id": 324,
"date": "2018-03-30",
"status" : "Open",
"actions_set_count": 2,
"actions": [
{
"open": "2018-03-30 10:34:44",
"close": "2018-03-30 10:39:44",
"duration": 300
},
{
"open": "2018-03-30 10:59:44",
"close": null,
"duration": null
}
]
},
{
"door_id": 325,
"date": "2018-03-31",
"status" : "Closed",
"actions_set_count": 1,
"actions": [
{
"open": "2018-03-30 14:59:44",
"close": "2018-03-30 15:00:44",
"duration": 60
}
]
}
]
What have i done so far but it far from complete
const isOpen = R.propEq('action','door open')
const isClosed = R.propEq('action','door close')
R.pipe(
R.groupBy(R.prop('date')),
R.map(R.applySpec({
"date": R.pipe(R.head(), R.prop('date')),
"door_id": R.pipe(R.head(), R.prop('door_id')),
"open" : R.filter(isOpen),
"close" : R.filter(isClosed),
"sets": R.zip(R.filter(isOpen),R.filter(isClosed))
})),
)(logs)

In a transformation like this, when I can't think of something elegant, I fall back on reduce. Using groupBy (and if necessary, sortBy) and values we can put the data together in an order that allows us to do then do a straightforward -- if a bit tedious -- reduction on it.
const duration = (earlier, later) =>
(new Date(later) - new Date(earlier)) / 1000
const transform = pipe(
groupBy(prop('door_id')),
map(sortBy(prop('timestamp'))), // Perhaps unnecessary, if data is already sorted
values,
map(reduce((
{actions, actions_set_count},
{door_id, action, timestamp, date}
) => ({
door_id,
date,
...(action == "door open"
? {
status: 'Open',
actions_set_count: actions_set_count + 1,
actions: actions.concat({
open: timestamp,
close: null,
duration: null
})
}
: {
status: 'Closed',
actions_set_count,
actions: [
...init(actions),
{
...last(actions),
close: timestamp,
duration: duration(last(actions).open, timestamp)
}
]
}
)
}), {actions: [], actions_set_count: 0}))
)
const doors = [
{door_id: 324, action: "door open", timestamp: "2018-03-30 10:34:44", date: "2018-03-30"},
{door_id: 324, action: "door close", timestamp: "2018-03-30 10:39:44", date: "2018-03-30"},
{door_id: 324, action: "door open", timestamp: "2018-03-30 10:59:44", date: "2018-03-30"},
{door_id: 325, action: "door open", timestamp: "2018-03-31 14:59:44", date: "2018-03-31"},
{door_id: 325, action: "door close", timestamp: "2018-03-31 15:00:44", date: "2018-03-31"}
]
console.log(transform(doors))
<script src="https://bundle.run/ramda#0.26.1"></script><script>
const {pipe, groupBy, prop, map, sortBy, values, reduce, init, last} = ramda </script>
There are other ways we could approach this. My first thought was to use splitEvery(2) to get these in open-close pairs, and then generate the actions. The trouble is that we would still need the actual original data to fill in the rest (door_id, date, etc.) So I ended up with reduce.
Obviously this is far from elegant. Part of that is that the underlying transformation is not particularly elegant (why the actions_set_count field, which is just the length of actions?), nor is the data (why have the date and timestamp fields?) But I suspect that I've also missed things which would make for a nicer implementation. I'd love to hear what those are.
Note that I chose to use the final date field rather than the initial one. Sometimes that's easier to do in a reduce call, and it sounds as though that's not important yet.

Related

JSON SQL column in azure data factory

I have a JSON type SQL column in SQL table as below example. I want the below code to be converted into separate columns such as drugs as table name and other attribute as column name, how can I use adf or any other means please guide. The below code is a single column in a table called report where I need to convert this into separate columns .
{
"drugs": {
"Codeine": {
"bin": "Y",
"name": "Codeine",
"icons": [
93,
103
],
"drug_id": 36,
"pathway": {
"code": "prodrug",
"text": "is **inactive**, its metabolites are active."
},
"targets": [],
"rxnorm_id": "2670",
"priclasses": [
"Analgesic/Anesthesiology"
],
"references": [
1,
16,
17,
100
],
"subclasses": [
"Analgesic agent",
"Antitussive agent",
"Opioid agonist",
"Phenanthrene "
],
"metabolizers": [
"CYP2D6"
],
"phenotype_ids": {
"metabolic": "5"
},
"relevant_genes": [
"CYP2D6"
],
"dosing_guidelines": [
{
"text": "Reduced morphine formation. Use label recommended age- or weight-specific dosing. If no response, consider alternative analgesics such as morphine or a non-opioid.",
"source": "CPIC",
"guidelines_id": 1
},
{
"text": "Analgesia: select alternative drug (e.g., acetaminophen, NSAID, morphine-not tramadol or oxycodone) or be alert to symptoms of insufficient pain relief.",
"source": "DPWG",
"guidelines_id": 362
}
],
"drug_report_notes": [
{
"text": "Predicted codeine metabolism is reduced.",
"icons_id": 58,
"sort_key": 58,
"references_id": null
},
{
"text": "Genotype suggests a possible decrease in exposure to the active metabolite(s) of codeine.",
"icons_id": 93,
"sort_key": 56,
"references_id": null
},
{
"text": "Professional guidelines exist for the use of codeine in patients with this genotype and/or phenotype.",
"icons_id": 103,
"sort_key": 50,
"references_id": null
}
]
}
Since this json is already in a SQL column, you don't need ADF to break it down to parts. You can use JSON functions in SQL server to do that.
example of few first columns:
declare #json varchar(max) = '{
"drugs": {
"Codeine": {
"bin": "Y",
"name": "Codeine",
"icons": [
93,
103
],
"drug_id": 36,
"pathway": {
"code": "prodrug",
"text": "is **inactive**, its metabolites are active."
},
"targets": [],
"rxnorm_id": "2670",
"priclasses": [
"Analgesic/Anesthesiology"
],
"references": [
1,
16,
17,
100
],
"subclasses": [
"Analgesic agent",
"Antitussive agent",
"Opioid agonist",
"Phenanthrene "
],
"metabolizers": [
"CYP2D6"
],
"phenotype_ids": {
"metabolic": "5"
},
"relevant_genes": [
"CYP2D6"
],
"dosing_guidelines": [
{
"text": "Reduced morphine formation. Use label recommended age- or weight-specific dosing. If no response, consider alternative analgesics such as morphine or a non-opioid.",
"source": "CPIC",
"guidelines_id": 1
},
{
"text": "Analgesia: select alternative drug (e.g., acetaminophen, NSAID, morphine-not tramadol or oxycodone) or be alert to symptoms of insufficient pain relief.",
"source": "DPWG",
"guidelines_id": 362
}
],
"drug_report_notes": [
{
"text": "Predicted codeine metabolism is reduced.",
"icons_id": 58,
"sort_key": 58,
"references_id": null
},
{
"text": "Genotype suggests a possible decrease in exposure to the active metabolite(s) of codeine.",
"icons_id": 93,
"sort_key": 56,
"references_id": null
},
{
"text": "Professional guidelines exist for the use of codeine in patients with this genotype and/or phenotype.",
"icons_id": 103,
"sort_key": 50,
"references_id": null
}
]
}
}
}
select JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.bin') as bin,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.name') as name,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.drug_id') as drug_id,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.icons[0]') as icon_1
'
You need to decide how to handle arrays, such as icons, where there are multiple values inside the same element.
References:
JSON_QUERY function
JSON_VALUE function

Karate: I get missing property in path $['data'] while using json filter path

I have gone through karate documentation and questions asked on stack overflow. There are 2 json arrays under resp.response.data. I am trying to retrieve and assert "bId": 81 in below json from the resp.response.data[1] but I get this missing property error while retrieving id value 81. Could you please help if I am missing something ?
* def resp =
"""
{
"response": {
"data": [
{
"aDetails": {
"aId": 15,
"aName": "Test",
"dtype": 2
},
"values": [
{
"bId": 45,
"value": "red"
}
],
"mandatory": false,
"ballId": "1231231414"
},
{
"aDetails": {
"aId": 25,
"aName": "Description",
"dtype": 2
},
"values": [
{
"bId": 46,
"value": "automation"
},
{
"bId": 44,
"value": "NESTED ARRAY"
},
{
"bId": 57,
"value": "sfERjuD"
},
{
"bId": 78,
"value": "zgSyPdg"
},
{
"bId": 79,
"value": "NESTED ARRAY"
},
{
"bId": 80,
"value": "NESTED ARRAY"
},
{
"bId": 81,
"value": "NESTED ARRAY"
}
],
"mandatory": true,
"ballId": "1231231414"
}
],
"corId": "wasdf-242-efkn"
}
}
"""
* def expectedbID=81
* def RespValueId = karate.jsonPath(resp, "$.data[1][?(#.bId == '" + expectedbID + "')]")
* match RespValueId[0] == expectedbID
Maybe you are over-complicating things ?
* match resp.response.data[1].values contains { bId: 81, value: 'NESTED ARRAY' }

How to Condense & Nest a (CSV) Payload in Dataweave 2.0?

I have a CSV payload TV Programs & Episodes that I want to Transform (Nest & Condense) to a JSON, with the following conditions:
Merge consecutive Program Lines (that are not followed by an Episode Line), so that it becomes 1 Program with the Start Date of the 1st Instance and the Summation of the Duration.
Episode Lines after a Program Line are Nested under the Program
INPUT
Channel|Name|Start|Duration|Type
ACME|Broke Girls|2018-02-01T00:00:00|600|Program
ACME|Broke Girls|2018-02-01T00:10:00|3000|Program
ACME|S03_8|2018-02-01T00:13:05|120|Episode
ACME|S03_9|2018-02-01T00:29:10|120|Episode
ACME|S04_1|2018-02-01T00:44:12|120|Episode
ACME|Lost In Translation|2018-02-01T02:01:00|1800|Program
ACME|Lost In Translation|2018-02-01T02:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T03:30:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:00:00|1800|Program
ACME|The Demolition Man|2018-02-01T04:30:00|1800|Program
ACME|Photon|2018-02-01T05:00:00|1800|Program
ACME|Photon|2018-02-01T05:30:00|1800|Program
ACME|Miles & Smiles|2018-02-01T06:00:00|3600|Program
ACME|S015_1|2018-02-01T06:13:53|120|Episode
ACME|S015_2|2018-02-01T06:29:22|120|Episode
ACME|S015_3|2018-02-01T06:46:28|120|Episode
ACME|Ice Age|2018-02-01T07:00:00|300|Program
ACME|Ice Age|2018-02-01T07:05:00|600|Program
ACME|Ice Age|2018-02-01T07:15:00|2700|Program
ACME|S01_4|2018-02-01T07:17:17|120|Episode
ACME|S01_5|2018-02-01T07:32:11|120|Episode
ACME|S01_6|2018-02-01T07:47:20|120|Episode
ACME|My Girl Friday|2018-02-01T08:00:00|3600|Program
ACME|S05_7|2018-02-01T08:17:28|120|Episode
ACME|S05_8|2018-02-01T08:31:59|120|Episode
ACME|S05_9|2018-02-01T08:44:42|120|Episode
ACME|Pirate Bay|2018-02-01T09:00:00|3600|Program
ACME|S01_1|2018-02-01T09:33:12|120|Episode
ACME|S01_2|2018-02-01T09:46:19|120|Episode
ACME|Broke Girls|2018-02-01T10:00:00|1200|Program
ACME|S05_3|2018-02-01T10:13:05|120|Episode
ACME|S05_4|2018-02-01T10:29:10|120|Episode
OUTPUT
{
"programs": [
{
"StartTime": "2018-02-01T00:00:00",
"Duration": 3600,
"Name": "Broke Girls",
"episode": [
{
"name": "S03_8",
"startDateTime": "2018-02-01T00:13:05",
"duration": 120
},
{
"name": "S03_9",
"startDateTime": "2018-02-01T00:29:10",
"duration": 120
},
{
"name": "S04_1",
"startDateTime": "2018-02-01T00:44:12",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T06:00:00",
"Duration": 3600,
"Name": "Miles & Smiles",
"episode": [
{
"name": "S015_1",
"startDateTime": "2018-02-01T06:13:53",
"duration": 120
},
{
"name": "S015_2",
"startDateTime": "2018-02-01T06:29:22",
"duration": 120
},
{
"name": "S015_3",
"startDateTime": "2018-02-01T06:46:28",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T07:00:00",
"Duration": 3600,
"Name": "Ice Age",
"episode": [
{
"name": "S01_4",
"startDateTime": "2018-02-01T07:17:17",
"duration": 120
},
{
"name": "S01_5",
"startDateTime": "2018-02-01T07:32:11",
"duration": 120
},
{
"name": "S01_6",
"startDateTime": "2018-02-01T07:47:20",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T08:00:00",
"Duration": 3600,
"Name": "My Girl Friday",
"episode": [
{
"name": "S05_7",
"startDateTime": "2018-02-01T08:17:28",
"duration": 120
},
{
"name": "S05_8",
"startDateTime": "2018-02-01T08:31:59",
"duration": 120
},
{
"name": "S05_9",
"startDateTime": "2018-02-01T08:44:42",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T09:00:00",
"Duration": 3600,
"Name": "Pirate Bay",
"episode": [
{
"name": "S01_1",
"startDateTime": "2018-02-01T09:33:12",
"duration": 120
},
{
"name": "S01_2",
"startDateTime": "2018-02-01T09:46:19",
"duration": 120
}
]
},
{
"StartTime": "2018-02-01T10:00:00",
"Duration": 1200,
"Name": "Broke Girls",
"episode": [
{
"name": "S05_3",
"startDateTime": "2018-02-01T10:13:05",
"duration": 120
},
{
"name": "S05_4",
"startDateTime": "2018-02-01T10:29:10",
"duration": 120
}
]
}
]
}
Give this a try, comments are embedded:
%dw 2.0
output application/dw
var data = readUrl("classpath://data.csv","application/csv",{separator:"|"})
var firstProgram = data[0].Name
---
// Identify the programs by adding a field
(data reduce (e,acc={l: firstProgram, c:0, d: []}) -> do {
var next = acc.l != e.Name and e.Type == "Program"
var counter = if (next) acc.c+1 else acc.c
---
{
l: if (next) e.Name else acc.l,
c: counter,
d: acc.d + {(e), pc: counter}
}
}).d
// group by the identifier of individual programs
groupBy $.pc
// Get just the programs throw away the program identifiers
pluck $
// Throw away the programs with no episodes
filter ($.*Type contains "Episode")
// Iterate over the programs
map do {
// sum the program duration
var d = $ dw::core::Arrays::sumBy (e) -> if (e.Type == "Program") e.Duration else 0
// Get the episodes and do a little cleanup
var es = $ map $-"pc" filter ($.Type == "Episode")
---
// Form the desired structure
{
($[0] - "pc" - "Duration"),
Duration: d,
Episode: es
}
}
NOTE1: I stored the contents in a file and read it using readUrl, you need to adjust to accommodate from where you get your data from.
NOTE2: Maybe you need to rethink your inputs, organize them better, if possible.
NOTE3: Studio will show errors (at least Studio 7.5.1 does). They are false positives, the code runs
NOTE4: Lots of steps because of the non-trivial input. Potentialy the code could be optimized but I did spend enough time on it--I 'll let you deal with the optimization or somebody else from the community can help.

Rally Lookback API doesn't retrieve records newer than 1 week

I'm running some queries with Rally Lookback API and it seems that revisions newer than 1 week are not being retrieved:
λ date
Wed, Nov 28, 2018 2:26:45 PM
using the query below:
{
"ObjectID": 251038028040,
"__At": "current"
}
results:
{
"_rallyAPIMajor": "2",
"_rallyAPIMinor": "0",
"Errors": [],
"Warnings": [
"Max page size limited to 100 when fields=true"
],
"GeneratedQuery": {
"find": {
"ObjectID": 251038028040,
"$and": [
{
"_ValidFrom": {
"$lte": "2018-11-21T14:44:34.694Z"
},
"_ValidTo": {
"$gt": "2018-11-21T14:44:34.694Z"
}
}
],
"_ValidFrom": {
"$lte": "2018-11-21T14:44:34.694Z"
}
},
"limit": 10,
"skip": 0,
"fields": true
},
"TotalResultCount": 1,
"HasMore": false,
"StartIndex": 0,
"PageSize": 10,
"ETLDate": "2018-11-21T14:44:34.694Z",
"Results": [
{
"_id": "5bfe7e3c3f1f4460feaeaf11",
"_SnapshotNumber": 30,
"_ValidFrom": "2018-11-21T12:22:08.961Z",
"_ValidTo": "9999-01-01T00:00:00.000Z",
"ObjectID": 251038028040,
"_TypeHierarchy": [
-51001,
-51002,
-51003,
-51004,
-51005,
-51038,
46772408020
],
"_Revision": 268342830516,
"_RevisionDate": "2018-11-21T12:22:08.961Z",
"_RevisionNumber": 53,
}
],
"ThreadStats": {
"cpuTime": "15.463705",
"waitTime": "0",
"waitCount": "0",
"blockedTime": "0",
"blockedCount": "0"
},
"Timings": {
"preProcess": 0,
"findEtlDate": 88,
"allowedValuesDisambiguation": 1,
"mongoQuery": 1,
"authorization": 3,
"suppressNonRequested": 0,
"compressSnapshots": 0,
"allowedValuesHydration": 0,
"TOTAL": 93
}
}
Having in mind that this artifact have, as for now, 79 revisions with the latest revision pointing to 11/21/2018 02:41 PM CST as per revisions tab at Rally Central.
One other thing is that if I run the query a couple of minutes later the ETL date seems to be updating, as some sort of indexing being run:
{
"_rallyAPIMajor": "2",
"_rallyAPIMinor": "0",
"Errors": [],
"Warnings": [
"Max page size limited to 100 when fields=true"
],
"GeneratedQuery": {
"find": {
"ObjectID": 251038028040,
"$and": [
{
"_ValidFrom": {
"$lte": "2018-11-21T14:45:50.565Z"
},
"_ValidTo": {
"$gt": "2018-11-21T14:45:50.565Z"
}
}
],
"_ValidFrom": {
"$lte": "2018-11-21T14:45:50.565Z"
}
},
"limit": 10,
....... rest of the code ommited.
Is there any reason why Lookback API shouldn't processing current data instead of one week of difference between records?
It appears that your workspace's data is currently being "re-built". The _ETLDate is the date of the most-current revision in the LBAPI database and should eventually catch up to the current revision's date.

How to store JSON data in a meaningful way in Oracle

Using Twitter API, I can get tweets like this :
{
"coordinates": null,
"created_at": "Mon Sep 24 03:35:21 +0000 2012",
"id_str": "250075927172759552",
"entities": {
"urls": [
],
"hashtags": [
{
"text": "freebandnames",
"indices": [
20,
34
]
}
],
"user_mentions": [
]
},
"in_reply_to_user_id_str": null,
"contributors": null,
"text": "Aggressive Ponytail #freebandnames",
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
},
"retweet_count": 0,
"profile_background_color": "C0DEED",
"verified": false,
"geo_enabled": true,
"time_zone": "Pacific Time (US & Canada)",
"description": "Born 330 Live 310",
"default_profile_image": false,
"profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
"statuses_count": 579,
"friends_count": 110,
"following": null,
"show_all_inline_media": false,
"screen_name": "sean_cummings"
},
"in_reply_to_screen_name": null,
"source": "Twitter for Mac",
"in_reply_to_status_id": null
}
You can see that this data is perfect for MongoDB, you can easily write the data to there. I want to store this data on an SQL db like Oracle. I don't know how to store nested parts like :
"entities": {
"urls": [
],
"hashtags": [
{
"text": "freebandnames",
"indices": [
20,
34
]
}
],
"user_mentions": [
]
Can you tell me how I should write such properties on Oracle? Should I create a new table for each nested property(which I am unwilling to do) or is there another way? Is there a magical such that I can store all Tweet data in one place like it's done on NoSQL? Thanks.