How add values in JSON to get a cumulative total using Dataweave - mule

I have a json object that has the memory, cpu and number of replicas for each API in the system.
I need to calculate the total CPU and Memory where the CPU / Memory for each API is cpu * number fo replicas. Take a look at the following input JSON
var payload = [
{
"id": "5a9b06b3-ed2c-4382-cf41427b6f56",
"name": "api-1",
"cpuReserved": 0.2,
"cpuLimit": 0.5,
"memReserved": 2,
"memLimit": 2,
"replicas": 1,
"status": "RUNNING"
},
{
"id": "79a90d5e-d042-9a6c-61cbe1341d04",
"name": "api-2",
"cpuReserved": 0.2,
"cpuLimit": 1,
"memReserved": 2,
"memLimit": 2,
"replicas": 2,
"status": "RUNNING"
},
{
"id": "15d0e51f-198c-948c-c4c864a5dd72",
"name": "api-3",
"cpuReserved": 0.2,
"cpuLimit": 0.5,
"memReserved": 2,
"memLimit": 2,
"replicas": 1,
"status": "RUNNING"
}
]
The output required is something like this:
"report": {
"count": 3,
"totalCPULimit": 3,
"totalCPUReserved": 0.8,
"totalMemReserved": 8,
"totalMemLimit": 8
}
Note: out of the 3 APIS in the input; 1 API has 2 replicas while the other 2 APIs have 1 replica each.
How can I generate this output using Dataweave in MuleSoft 4?

Pretty crude though it can get you to the desired:
%dw 2.0
output application/json
var totals = payload map {
a: $.cpuLimit * $.replicas,
b: $.cpuReserved * $.replicas,
c: $.memReserved * $.replicas,
d: $.memLimit *$.replicas
}
---
"report": {
"count": sizeOf(payload),
"totalCPULimit": sum(totals.a),
"totalCPUReserved": sum(totals.b),
"totalMemReserved": sum(totals.c),
"totalMemLimit": sum(totals.d),
}

I found another way of doing it deriving from an XML transformation code
%dw 2.0
output application/json
---
"report": {
"count": sizeOf(payload) as String,
"totalCPULimit": sum(payload map($.cpuLimit * $.replicas)),
"totalCPUReserved": sum(payload map($.cpuReserved * $.replicas)),
"totalMemReserved": sum(payload map($.memReserved * $.replicas)),
"totalMemLimit": sum(payload map($.memLimit * $.replicas)),
}
]

Related

How to load a jsonl file into BigQuery when the file has mix data fields as columns

During my work flow, after extracting the data from API, the JSON has the following structure:
[
{
"fields":
[
{
"meta": {
"app_type": "ios"
},
"name": "app_id",
"value": 100
},
{
"meta": {},
"name": "country",
"value": "AE"
},
{
"meta": {
"name": "Top"
},
"name": "position",
"value": 1
}
],
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
}
]
Then it is store as .jsonl and put on GCS. However, when I load it onto BigQuery for further extraction, the automatic schema inference return the following error:
Error while reading data, error message: JSON parsing error in row starting at position 0: Could not convert value to string. Field: value; Value: 100
I want to convert it in to the following structure:
app_type
app_id
country
position
click
price
count
ios
100
AE
Top
1
1
1
Is there a way to define manual schema on BigQuery to achieve this result? Or do I have to preprocess the jsonl file before put it to BigQuery?
One of the limitations in loading JSON data from GCS to BigQuery is that it does not support maps or dictionaries in JSON.
A invalid example would be:
"metrics": {
"click": 1,
"price": 1,
"count": 1
}
Your jsonl file should be something like this:
{"app_type":"ios","app_id":"100","country":"AE","position":"Top","click":"1","price":"1","count":"1"}
I already tested it and it works fine.
So wherever you process the conversion of the json files to jsonl files and storage to GCS, you will have to do some preprocessing.
Probably you have to options:
precreate target table with an app_id field as an INTEGER
preprocess jsonfile and enclose 100 into quotes like "100"

bigquery nested object : No such field

I have a table with this schema :
I'm trying to upload some data from Google Coud Storage using the python client. The file is JSON newline delimited. Most of my lines don't have the field "passenger_origin.accuracy" but when the filed is present I have the following error :
Error while reading
data, error message: JSON parsing error in row starting at position
2122510: No such field: driver_origin.accuracy. (error code: invalid)
Error while reading
data, error message: JSON parsing error in row starting at position
2126317: No such field: passenger_origin.accuracy. (error code:
invalid)
Example of an invalid row :
{
"id": 1479443,
"is_obsolete": 0,
"seat_count": 1,
"is_ticket_checked": 0,
"score": 0.3709318902,
"is_multimodal": 0,
"fake_paths": 0,
"passenger_origin": {
"id": 2204,
"poi_uuid": "15b4e52c-7c58-442c-98df-1eb06079f6bb",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:15:39",
"created": "2016-02-05T17:06:26",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"driver_origin": {
"id": 412491,
"poi_uuid": "47e90b6d-e178-4e02-9f02-f4ea5f8beaa1",
"user_id": 71471,
"disabled": 0,
"last_update": "2017-11-02T10:09:09",
"created": "2017-11-02T10:09:09",
"modified_by_user": 0,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"passenger_destination": {
"id": 2203,
"poi_uuid": "c531c3ca-47f0-4003-8098-1272fee8d018",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:12:42",
"created": "2016-02-05T17:06:19",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 1,
}
}
The table is created before the upload of the data and is not modified since. I don't understand why the upload is failing on theses fields ? Do the RECORD fields have to be REPEATED ?
To ignore the fields that aren't present in the schema, use a combination of:
configuration.load.ignoreUnknownValues
configuration.load.maxBadRecords
Setting the first to true and the second to some arbitrarily-high number, e.g. 100000, will enable the load to succeed even if there are extra fields.
The problem was configuration.load.autodetect was set to True. I set it to False and the problem was fix

How to schedule two processes if arrival time is same

In a single CPU process scheduler, if two processes arrive at the same time in what order will they execute in case of FCFS, SJF, Non preemptive priority and RR?
Below information is given about processes:
{
"Name": "P1",
"ArrivalTime": 0,
"Brust": 10,
"Priority": 3
},
{
"Name": "P2",
"ArrivalTime": 0,
"Brust": 1,
"Priority": 1
},
{
"Name": "P3",
"ArrivalTime": 0,
"Brust": 2,
"Priority": 3
},
{
"Name": "P4",
"ArrivalTime": 0,
"Brust": 1,
"Priority": 4
},
{
"Name": "P5",
"ArrivalTime": 0,
"Brust": 5,
"Priority": 2
}
Technically speaking, 2 processes can not arrive at the exact same time. Arrival of a process means that the process (PCB) is added to a queue (any scheduling algorithm basically reads / writes / updates this queue and / or it's elements). Now, when you are modifying data structures like a queue, you will add one element at a time (in a multi-threaded environment, the processes which add elements to the queue would be synchronized). HTH.

BigQuery: Create column of JSON datatype

I am trying to load json with the following schema into BigQuery:
{
key_a:value_a,
key_b:{
key_c:value_c,
key_d:value_d
}
key_e:{
key_f:value_f,
key_g:value_g
}
}
The keys under key_e are dynamic, ie in one response key_e will contain key_f and key_g and for another response it will instead contain key_h and key_i. New keys can be created at any time so I cannot create a record with nullable fields for all possible keys.
Instead I want to create a column with JSON datatype that can then be queried using the JSON_EXTRACT() function. I have tried loading key_e as a column with STRING datatype but value_e is analysed as JSON and so fails.
How can I load a section of JSON into a single BigQuery column when there is no JSON datatype?
Having your JSON as a single string column inside BigQuery is definitelly an option. If you have large volume of data this can end up with high query price as all your data will end up in one column and actually querying logic can become quite messy.
If you have luxury of slightly changing your "design" - I would recommend considering below one - here you can employ REPEATED mode
Table schema:
[
{ "name": "key_a",
"type": "STRING" },
{ "name": "key_b",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{ "name": "key",
"type": "STRING"},
{ "name": "value",
"type": "STRING"}
]
},
{ "name": "key_e",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{ "name": "key",
"type": "STRING"},
{ "name": "value",
"type": "STRING"}
]
}
]
Example of JSON to load
{"key_a": "value_a1", "key_b": [{"key": "key_c", "value": "value_c"}, {"key": "key_d", "value": "value_d"}], "key_e": [{"key": "key_f", "value": "value_f"}, {"key": "key_g", "value": "value_g"}]}
{"key_a": "value_a2", "key_b": [{"key": "key_x", "value": "value_x"}, {"key": "key_y", "value": "value_y"}], "key_e": [{"key": "key_h", "value": "value_h"}, {"key": "key_i", "value": "value_i"}]}
Please note: it should be newline delimited JSON so each row must be on one line
You can't do this directly with BigQuery, but you can make it work in two passes:
(1) Import your JSON data as a CSV file with a single string column.
(2) Transform each row to pack your "any-type" field into a string. Write a UDF that takes a string and emits the final set of columns you would like. Append the output of this query to your target table.
Example
I'll start with some JSON:
{"a": 0, "b": "zero", "c": { "woodchuck": "a"}}
{"a": 1, "b": "one", "c": { "chipmunk": "b"}}
{"a": 2, "b": "two", "c": { "squirrel": "c"}}
{"a": 3, "b": "three", "c": { "chinchilla": "d"}}
{"a": 4, "b": "four", "c": { "capybara": "e"}}
{"a": 5, "b": "five", "c": { "housemouse": "f"}}
{"a": 6, "b": "six", "c": { "molerat": "g"}}
{"a": 7, "b": "seven", "c": { "marmot": "h"}}
{"a": 8, "b": "eight", "c": { "badger": "i"}}
Import it into BigQuery as a CSV with a single STRING column (I called it 'blob'). I had to set the delimiter character to something arbitrary and unlikely (thorn -- 'รพ') or it tripped over the default ','.
Verify your table imported correctly. You should see your simple one-column schema and the preview should look just like your source file.
Next, we write a query to transform it into your desired shape. For this example, we'd like the following schema:
a (INTEGER)
b (STRING)
c (STRING -- packed JSON)
We can do this with a UDF:
// Map a JSON string column ('blob') => { a (integer), b (string), c (json-string) }
bigquery.defineFunction(
'extractAndRepack', // Name of the function exported to SQL
['blob'], // Names of input columns
[{'name': 'a', 'type': 'integer'}, // Output schema
{'name': 'b', 'type': 'string'},
{'name': 'c', 'type': 'string'}],
function (row, emit) {
var parsed = JSON.parse(row.blob);
var repacked = JSON.stringify(parsed.c);
emit({a: parsed.a, b: parsed.b, c: repacked});
}
);
And a corresponding query:
SELECT a, b, c FROM extractAndRepack(JsonAnyKey.raw)
Now you just need to run the query (selecting your desired target table) and you'll have your data in the form you like.
Row a b c
1 0 zero {"woodchuck":"a"}
2 1 one {"chipmunk":"b"}
3 2 two {"squirrel":"c"}
4 3 three {"chinchilla":"d"}
5 4 four {"capybara":"e"}
6 5 five {"housemouse":"f"}
7 6 six {"molerat":"g"}
8 7 seven {"marmot":"h"}
9 8 eight {"badger":"i"}
One way to do it, is to load this file as CSV instead of JSON (and quote the values or eliminate newlines in the middle), then it will become single STRING column inside BigQuery.
P.S. You are right that having a native JSON data type would have made this scenario much more natural, and BigQuery team is well aware of it.

How to make a intersect in SOLR?

I am implementing a solr project, with the structure below of my indexed object.
{
"OBJECT_HEADER_ID": 173604,
"CHARACTERISTIC_VALUE_ID": 143287,
"OBJECT_TYPE_ID": 1,
"SEQUENCE": 0,
"CHARACTERISTIC_ID": 1488,
"OBJECT_VARIANT_ID": 169941,
"ID": "84445897",
"TYPE": 0
},
{
"OBJECT_HEADER_ID": 173604,
"CHARACTERISTIC_VALUE_ID": 23502,
"OBJECT_TYPE_ID": 1,
"SEQUENCE": 0,
"CHARACTERISTIC_ID": 992,
"OBJECT_VARIANT_ID": 169941,
"ID": "84445898",
"TYPE": 0
}
And I need to make a intersect between various results in sub queries, and I don't found nothing on the WEB about how to make the query, for example:
-> Get all results that have (CHARACTERISTIC_ID = 1488 and CHARACTERISTIC_VALUE_ID = 143287) INTERSECT BY OBJECT_VARIANT_ID WITH (CHARACTERISTIC_ID = 992 and CHARACTERISTIC_VALUE_ID = 23502).