JSON SQL column in azure data factory - sql

I have a JSON type SQL column in SQL table as below example. I want the below code to be converted into separate columns such as drugs as table name and other attribute as column name, how can I use adf or any other means please guide. The below code is a single column in a table called report where I need to convert this into separate columns .
{
"drugs": {
"Codeine": {
"bin": "Y",
"name": "Codeine",
"icons": [
93,
103
],
"drug_id": 36,
"pathway": {
"code": "prodrug",
"text": "is **inactive**, its metabolites are active."
},
"targets": [],
"rxnorm_id": "2670",
"priclasses": [
"Analgesic/Anesthesiology"
],
"references": [
1,
16,
17,
100
],
"subclasses": [
"Analgesic agent",
"Antitussive agent",
"Opioid agonist",
"Phenanthrene "
],
"metabolizers": [
"CYP2D6"
],
"phenotype_ids": {
"metabolic": "5"
},
"relevant_genes": [
"CYP2D6"
],
"dosing_guidelines": [
{
"text": "Reduced morphine formation. Use label recommended age- or weight-specific dosing. If no response, consider alternative analgesics such as morphine or a non-opioid.",
"source": "CPIC",
"guidelines_id": 1
},
{
"text": "Analgesia: select alternative drug (e.g., acetaminophen, NSAID, morphine-not tramadol or oxycodone) or be alert to symptoms of insufficient pain relief.",
"source": "DPWG",
"guidelines_id": 362
}
],
"drug_report_notes": [
{
"text": "Predicted codeine metabolism is reduced.",
"icons_id": 58,
"sort_key": 58,
"references_id": null
},
{
"text": "Genotype suggests a possible decrease in exposure to the active metabolite(s) of codeine.",
"icons_id": 93,
"sort_key": 56,
"references_id": null
},
{
"text": "Professional guidelines exist for the use of codeine in patients with this genotype and/or phenotype.",
"icons_id": 103,
"sort_key": 50,
"references_id": null
}
]
}

Since this json is already in a SQL column, you don't need ADF to break it down to parts. You can use JSON functions in SQL server to do that.
example of few first columns:
declare #json varchar(max) = '{
"drugs": {
"Codeine": {
"bin": "Y",
"name": "Codeine",
"icons": [
93,
103
],
"drug_id": 36,
"pathway": {
"code": "prodrug",
"text": "is **inactive**, its metabolites are active."
},
"targets": [],
"rxnorm_id": "2670",
"priclasses": [
"Analgesic/Anesthesiology"
],
"references": [
1,
16,
17,
100
],
"subclasses": [
"Analgesic agent",
"Antitussive agent",
"Opioid agonist",
"Phenanthrene "
],
"metabolizers": [
"CYP2D6"
],
"phenotype_ids": {
"metabolic": "5"
},
"relevant_genes": [
"CYP2D6"
],
"dosing_guidelines": [
{
"text": "Reduced morphine formation. Use label recommended age- or weight-specific dosing. If no response, consider alternative analgesics such as morphine or a non-opioid.",
"source": "CPIC",
"guidelines_id": 1
},
{
"text": "Analgesia: select alternative drug (e.g., acetaminophen, NSAID, morphine-not tramadol or oxycodone) or be alert to symptoms of insufficient pain relief.",
"source": "DPWG",
"guidelines_id": 362
}
],
"drug_report_notes": [
{
"text": "Predicted codeine metabolism is reduced.",
"icons_id": 58,
"sort_key": 58,
"references_id": null
},
{
"text": "Genotype suggests a possible decrease in exposure to the active metabolite(s) of codeine.",
"icons_id": 93,
"sort_key": 56,
"references_id": null
},
{
"text": "Professional guidelines exist for the use of codeine in patients with this genotype and/or phenotype.",
"icons_id": 103,
"sort_key": 50,
"references_id": null
}
]
}
}
}
select JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.bin') as bin,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.name') as name,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.drug_id') as drug_id,
JSON_VALUE(JSON_QUERY(#json,'$.drugs.Codeine'),'$.icons[0]') as icon_1
'
You need to decide how to handle arrays, such as icons, where there are multiple values inside the same element.
References:
JSON_QUERY function
JSON_VALUE function

Related

Karate: I get missing property in path $['data'] while using json filter path

I have gone through karate documentation and questions asked on stack overflow. There are 2 json arrays under resp.response.data. I am trying to retrieve and assert "bId": 81 in below json from the resp.response.data[1] but I get this missing property error while retrieving id value 81. Could you please help if I am missing something ?
* def resp =
"""
{
"response": {
"data": [
{
"aDetails": {
"aId": 15,
"aName": "Test",
"dtype": 2
},
"values": [
{
"bId": 45,
"value": "red"
}
],
"mandatory": false,
"ballId": "1231231414"
},
{
"aDetails": {
"aId": 25,
"aName": "Description",
"dtype": 2
},
"values": [
{
"bId": 46,
"value": "automation"
},
{
"bId": 44,
"value": "NESTED ARRAY"
},
{
"bId": 57,
"value": "sfERjuD"
},
{
"bId": 78,
"value": "zgSyPdg"
},
{
"bId": 79,
"value": "NESTED ARRAY"
},
{
"bId": 80,
"value": "NESTED ARRAY"
},
{
"bId": 81,
"value": "NESTED ARRAY"
}
],
"mandatory": true,
"ballId": "1231231414"
}
],
"corId": "wasdf-242-efkn"
}
}
"""
* def expectedbID=81
* def RespValueId = karate.jsonPath(resp, "$.data[1][?(#.bId == '" + expectedbID + "')]")
* match RespValueId[0] == expectedbID
Maybe you are over-complicating things ?
* match resp.response.data[1].values contains { bId: 81, value: 'NESTED ARRAY' }

Duplicate elements in AWS cloudwatch Embedded metrics

I am trying to log my service request.
First I try to get the service from my partner, upon failure I
try the same from my vendor, hence I need to add the same metrics under two different dimensions.
Following is my log structure, apparently, this is wrong as JSON does not support duplicate elements,
and AWS picks only the latest value in case of duplicates elements.
Kindly suggest the right way of doing this.
{
"_aws": {
"Timestamp": 1574109732004,
"CloudWatchMetrics": [{
"Namespace": "NameSpace1",
"Dimensions": [["Partner"]],
"Metrics": [{
"Name": "requestCount",
"Unit": "Count"
}, {
"Name": "requestFailure",
"Unit": "Count"
}, {
"Name": "responseTime",
"Unit": "Milliseconds"
}]
},
{
"Namespace": "NameSpace1",
"Dimensions": [["vendor"]],
"Metrics": [{
"Name": "requestCount",
"Unit": "Count"
}, {
"Name": "requestSuccess",
"Unit": "Count"
}, {
"Name": "responseTime",
"Unit": "Milliseconds"
}]
}]
},
"Partner": "partnerName",
"requestCount": 1,
"requestFailure": 1,
"responseTime": 1,
"vendor": "vendorName",
"requestCount": 2,
"requestSuccess": 2,
"responseTime": 2,
}
This will give you metrics separated by partner and vendor:
{
"Partner": "partnerName",
"vendor": "vendorName",
"_aws": {
"Timestamp": 1577179437354,
"CloudWatchMetrics": [
{
"Dimensions": [
[
"Partner"
],
[
"vendor"
]
],
"Metrics": [
{
"Name": "requestCount",
"Unit": "Count"
},
{
"Name": "requestFailure",
"Unit": "Count"
},
{
"Name": "requestSuccess",
"Unit": "Count"
},
{
"Name": "responseTime",
"Unit": "Milliseconds"
}
],
"Namespace": "NameSpace1"
}
]
},
"requestCount": 1,
"requestFailure": 1,
"requestSuccess": 1,
"responseTime": 2
}
Note that this will duplicate the metrics between the two dimensions (if partner registers failure it will be registered on the vendor failure metric also). If you need to avoid this, you can either:
have metric names specific to each type (like partnerRequestFailure and vendorRequestFailure)
or you need to publish separate json, one for partner and one for vendor.

Creating a resulting array with unique values in dataweave

Need to compare two arrays efficiently and create a third array with values that are only in the second array using Dataweave transformation in mule. I wanted to use the negation of contains the keyword in mule. But it was giving errors. Hope that I can use a filter and Contains to filter out the values.
arr1 =[
{
"leadId": 127,
"playerId": 334353
},
{
"leadId": 128,
"playerId": 334354
},
{
"leadId": 123,
"playerId": 43456
}
{
"leadId": 122,
"playerId": 43458
}
arr2 =[
{
"leadId": 127,
"name": "James"
},
{
"leadId": 129,
"name": "Joseph"
},
{
"leadId": 120,
"name": "Samuel"
},
{
"leadId": 122,
"name": "Gabriel",
}
Need resulting array as
arr3 = [
{
"leadId": 129,
"name": Joseph
},
{
"leadId": 120,
"name": Samuel
}
]
UPDATED - Including DataWeave 1 and 2
I'm not sure exactly how you have your arrays stored (in the payload or other variables etc), but the below script should give you enough to go on. Comparison is done based on the leadId only.
The transform actually works across both DW1 and DW2, you just need to change the header.
Mule 3 and DW1
%dw 1.0
%output application/json
---
payload.array2 filter (not (payload.array1.leadId contains $.leadId))
Mule 4 and DW2
%dw 2.0
output application/json
---
payload.array2 filter (not (payload.array1.leadId contains $.leadId))
Input
{
"array1": [
{
"leadId": 127,
"playerId": 334353
},
{
"leadId": 128,
"playerId": 334354
},
{
"leadId": 123,
"playerId": 43456
},
{
"leadId": 122,
"playerId": 43458
}
],
"array2": [
{
"leadId": 127,
"name": "James"
},
{
"leadId": 129,
"name": "Joseph"
},
{
"leadId": 120,
"name": "Samuel"
},
{
"leadId": 122,
"name": "Gabriel"
}
]
}
Output
[
{
"leadId": 129,
"name": "Joseph"
},
{
"leadId": 120,
"name": "Samuel"
}
]

How to store JSON data in a meaningful way in Oracle

Using Twitter API, I can get tweets like this :
{
"coordinates": null,
"created_at": "Mon Sep 24 03:35:21 +0000 2012",
"id_str": "250075927172759552",
"entities": {
"urls": [
],
"hashtags": [
{
"text": "freebandnames",
"indices": [
20,
34
]
}
],
"user_mentions": [
]
},
"in_reply_to_user_id_str": null,
"contributors": null,
"text": "Aggressive Ponytail #freebandnames",
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
},
"retweet_count": 0,
"profile_background_color": "C0DEED",
"verified": false,
"geo_enabled": true,
"time_zone": "Pacific Time (US & Canada)",
"description": "Born 330 Live 310",
"default_profile_image": false,
"profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
"statuses_count": 579,
"friends_count": 110,
"following": null,
"show_all_inline_media": false,
"screen_name": "sean_cummings"
},
"in_reply_to_screen_name": null,
"source": "Twitter for Mac",
"in_reply_to_status_id": null
}
You can see that this data is perfect for MongoDB, you can easily write the data to there. I want to store this data on an SQL db like Oracle. I don't know how to store nested parts like :
"entities": {
"urls": [
],
"hashtags": [
{
"text": "freebandnames",
"indices": [
20,
34
]
}
],
"user_mentions": [
]
Can you tell me how I should write such properties on Oracle? Should I create a new table for each nested property(which I am unwilling to do) or is there another way? Is there a magical such that I can store all Tweet data in one place like it's done on NoSQL? Thanks.

Select game scores with a player's context from Postgres

I am developing a web app with rails and postgres as a shooting range game backend. It has an endpoint to save players scores, which writes the scores to DB and returns the first three places and the players' ranks with some context around it: a place before and a place after. For example if a player has rank 23, the method should return 1st, 2nd, 3rd places, the player rank itself and also two records with ranks 22 and 24 besides it. I don't store the rank in the DB it is calculated dynamically using following sql query:
SELECT RANK() OVER(ORDER BY score DESC) AS place, name, score
FROM scores
WHERE game_id=? AND rules_version=? AND game_mode=?
LIMIT 10
POST request with following data:
{
"players": [
{ "name": "Jack", "score": 100, "local_id": 1, "stats": {}},
{ "name": "Anna", "score": 200, "local_id": 2, "stats": {}}
]
}
should return following result set:
{
"scores": [
{
"name": "Piter",
"place": 1,
"score": 800
},
{
"name": "Lisa",
"place": 2,
"score": 700
},
{
"name": "Philip",
"place": 3,
"score": 600
},
{
"name": "Max",
"place": 25,
"score": 204
},
{
"name": "Anna",
"place": 26,
"score": 200,
"local_id": 2
},
{
"name": "Ashley",
"place": 27,
"score": 193
}
{
"name": "Norman",
"place": 36,
"score": 103
},
{
"name": "Jack",
"place": 37,
"score": 100,
"local_id": 1
},
{
"name": "Chris",
"place": 38,
"score": 95
}
]
}
Every field except local_id and as I said place is stored in DB. I can't figure out the right sql query to do such select. Please help.