Accessing values in JSON array - amazon-cloudwatch

I am following the instruction in the documentation for how to access JSON values in CloudWatch Insights where the recomendation is as follows
JSON arrays are flattened into a list of field names and values. For example, to specify the value of instanceId for the first item in requestParameters.instancesSet, use requestParameters.instancesSet.items.0.instanceId.
ref
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
I am trying the following and getting nothing in return. The intellisense autofills up to processList.0 but no further
fields processList.0.vss
| sort #timestamp desc
| limit 1
The JSON I am woking with is
"processList": [
{
"vss": xxxxx,
"name": "aurora",
"tgid": xxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.01,
"id": xxxxx,
"rss": xxxxx
},
{
"vss": xxxx,
"name": "aurora",
"tgid": xxxxxx,
"vmlimit": "unlimited",
"parentID": 1,
"memoryUsedPc": 16.01,
"cpuUsedPc": 0.06,
"id": xxxxx,
"rss": xxxxx
}]

Have you tried the following?
fields ##timestamp, #processList.0.vss
| sort ##timestamp desc
| limit 5
It may be a syntax error. If not, please post a couple of records worth of the overall structure, with #timestamp included.

The reference link that you have posted also states the following.
CloudWatch Logs Insights can extract a maximum of 100 log event fields
from a JSON log. For extra fields that are not extracted, you can use
the parse command to parse these fields from the raw unparsed log
event in the message field.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
For very large JSON messages, Insights intellisense may not be parsing all the fields into named fields. So, the solution is to use parse on the complete JSON string in the field where you expect your data field to be present. In your example and mine it is processList.
I was able to extract the value of specific cpuUsedPc under processList by using a query like the following.
fields #timestamp, cpuUtilization.total, processList
| parse processList /"name":"RDS processes","tgid":.*?,"parentID":.*?,"memoryUsedPc":.*?,"cpuUsedPc":(?<RDSProcessesCPUUsedPc>.*?),/
| sort #timestamp asc
| display #timestamp, cpuUtilization.total, RDSProcessesCPUUsedPc

Related

How to extract the data present in {} in Splunk Search

If the data present in json format {[]} get extracted, however when data present in {} as shown below doesn't behave same. How fields and values can be extracted from data in {}
_raw data:
{"AlertEntityId": "abc#domai.com", "AlertId": "21-3-1-2-4--12", "AlertType": "System", "Comments": "New alert", "CreationTime": "2022-06-08T16:52:51", "Data": "{\"etype\":\"User\",\"eid\":\"abc#domai.com\",\"op\":\"UserSubmission\",\"tdc\":\"1\",\"suid\":\"abc#domai.com\",\"ut\":\"Regular\",\"ssic\":\"0\",\"tsd\":\"Jeff Nichols <jeff#Nichols.com>\",\"sip\":\"1.2.3.4\",\"srt\":\"1\",\"trc\":\"abc#domai.com\",\"ms\":\"Grok - AI/ML summary, case study, datasheet\",\"lon\":\"UserSubmission\"}"}
When I perform query "| table Data", I get the below result, But how to get values of "eid", "tsd".
{"etype":"User","eid":"abc#domai.com","op":"UserSubmission","tdc":"1","suid":"abc#domai.com","ut":"Regular","ssic":"0","tsd":"Jeff Nichols <jeff#Nichols.com>","sip":"1.2.3.4","srt":"1","trc":"abc#domai.com","ms":"Grok - AI/ML summary, case study, datasheet","lon":"UserSubmission"}
| spath
by default this will parse the _raw field if the data is in the field "Data"
| spath input=Data
After which eid and tsd will be in fields of the same name.
https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Spath

searching for a keyword in JSON having multiple fields

{
"TaskType": "kkkk",
"Status": "SUCCESS",
"jobID": "18056",
"DownloadFilePath": "https://abcd",
"accountId": "1234",
"customerId": "hhff"
}
This is sample data in message in notification table. I need to search for the keyword inside 3 or 4 fields out of 6 fields.
One way is to use multiple OR statements like:
SELECT message AS note
FROM notification
WHERE message::jsonb->'TaskType' LIKE '%1234%'
OR message::jsonb->'jobId' LIKE '%1234%'
OR message::jsonb->'Status' LIKE '%1234%';
The results are satisfactory, but is there a way to optimize the query in case I need to search in more fields.
you could just cast the message column to text and search inside the text:
SELECT message AS note
FROM notification
WHERE message::text like '%1234%'
You need to iterate through all key/value pairs:
select n.*
from notification n
where exists (select *
from jsonb_each_text(n.message) as m(key,value)
where key in ('TaskType', 'jobId', 'Status')
and value like '%1234%');
If you want to search in all values, just leave out the key in (..) part in the sub-select.
Searching in all values can be done using a JSON path operator in Postgres 12 or later:
select *
from notification
where message ## '$.* like_regex "1234"'
This assumes that messages is defined as jsonb - which it should be. If it's not, you need to cast it: message::jsonb

Parse JSON using Snowflake SQL

I have a JSON object that's written in a weird way.
> {"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {
> "name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }
Not sure how to parse something like this. The keys are addressIdNum and cancelledDateAt and the values are 12345678 and 2017-02-30T01:43:04.000Z respectively.
How do I parse this using Snowflake SQL?
Thanks for all your help!
Best,
Preet Rajdeo
If your input is ALWAYS in this form (two elements in an array, with the same fields in the same element), you can combine PARSE_JSON function and the path access.
Just try this:
with input as (
select parse_json(
'{"custom": [ { "name": "addressIdNum", "valueNum": 12345678}, {"name": "cancelledDateAt", "valueAt": "2017-02-30T01:43:04.000Z" }] }')
as json)
select json:custom[0].valueNum::integer, json:custom[1].valueAt::timestamp from input;
----------------------------------+-----------------------------------+
JSON:CUSTOM[0].VALUENUM::INTEGER | JSON:CUSTOM[1].VALUEAT::TIMESTAMP |
----------------------------------+-----------------------------------+
12345678 | 2017-03-01 17:43:04 |
----------------------------------+-----------------------------------+
However, if the structure of your data might be different (e.g. elements in the array might be in a different order), it's probably best to write a JavaScript UDF in Snowflake to convert such messy data into something easier.

Query data inside an attribute array in a json column in Postgres 9.6

I have a table say types, which had a JSON column, say location that looks like this:
{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
} ...
]
}
Each row in the table has the data, and all have the "type": "state" in it. I want to just extract the value of "type": "state" from every row in the table, and put it in a new column. I checked out several questions on SO, like:
Query for element of array in JSON column
Index for finding an element in a JSON array
Query for array elements inside JSON type
but could not get it working. I do not need to query on this. I need the value of this column. I apologize in advance if I missed something.
create table t(data json);
insert into t values('{"attribute":[{"type": "state","value": "CA"},{"type": "distance","value": "200.00"}]}'::json);
select elem->>'value' as state
from t, json_array_elements(t.data->'attribute') elem
where elem->>'type' = 'state';
| state |
| :---- |
| CA |
dbfiddle here
I mainly use Redshift where there is a built-in function to do this. So on the off-chance you're there, check it out.
redshift docs
It looks like Postgres has a similar function set:
https://www.postgresql.org/docs/current/static/functions-json.html
I think you'll need to chain three functions together to make this work.
SELECT
your_field::json->'attribute'->0->'value'
FROM
your_table
What I'm trying is a json extract by key name, followed by a json array extract by index (always the 1st, if your example is consistent with the full data), followed finally by another extract by key name.
Edit: got it working for your example
SELECT
'{ "attribute":[
{
"type": "state",
"value": "CA"
},
{
"type": "distance",
"value": "200.00"
}
]
}'::json->'attribute'->0->'value'
Returns "CA"
2nd edit: nested querying
#McNets is the right, better answer. But in this dive, I discovered you can nest queries in Postgres! How frickin' cool!
I stored the json as a text field in a dummy table and successfully ran this:
SELECT
(SELECT value FROM json_to_recordset(
my_column::json->'attribute') as x(type text, value text)
WHERE
type = 'state'
)
FROM dummy_table

Query and count on jsonb column

I'm new to the postgreSQL(9.5) Json world. Looking for help writing this query. Take this simplified table as an example.
CREATE TABLE activity_log (uri varchar,
data jsonb );
Example of data inside of 'data' column.
"{"ListingInputFilterBean":{"searchItems": [], "listingStatus": "ACTIVE"}"
"{"ListingInputFilterBean":{"searchItems": [{"name": "Dachshund", "type": "BREED"}], "listingStatus": "ACTIVE"}}"
"{"ListingInputFilterBean":{"searchItems": [{"name": "Lab", "type": "BREED"}, {"name": "Black Lab", "type": "CST"}], "listingStatus": "ACTIVE"}}"
The 'data' column is used to log specific sets of data for each URI call. In this case the searchItems array contain the items used in the search. I'm looking to write a query that finds the most searched for 'breed'. I'd like to count the number of times each 'name' is used when type is 'BREED'.
My initial approach was to pull back each of the 'searchItems'. Turn those into a row set using jsonb_to_recordset, but I quickly got in over my head when reading the documentation (sorry, I'm a noob).
Any suggestions on how to write that SQL?
WITH log_activity(data) AS ( VALUES
('{"ListingInputFilterBean":{"searchItems": [], "listingStatus": "ACTIVE"}}'::JSONB),
('{"ListingInputFilterBean":{"searchItems": [{"name": "Dachshund", "type": "BREED"}], "listingStatus": "ACTIVE"}}'::JSONB),
('{"ListingInputFilterBean":{"searchItems": [{"name": "Lab", "type": "BREED"}, {"name": "Black Lab", "type": "CST"}], "listingStatus": "ACTIVE"}}'::JSONB)
)
SELECT search_item->>'name',count(search_item->>'name')
FROM
log_activity la,
jsonb_array_elements(la.data#>'{ListingInputFilterBean,searchItems}') as search_item
WHERE search_item->>'type' = 'BREED'
GROUP BY search_item;
Result:
name | count
-----------+-------
Lab | 1
Dachshund | 1
(2 rows)
Here you just need to iterate over the list of searchItems and group only those entries, which do match your criteria. Steps are the following:
Get jsonb array of searchItems with #> operator, it will get JSON object at specified path;
Iterate over the list of elements retrieved from step 1 with jsonb_array_elements(), function which expands a JSON array to a set of JSON values;
count() names where searchItems' type = BREED, you can get actual text value with ->> operator;
UPDATE
With jsonb_to_recordset() it looks shorter, but you need explicitly define search_item columns' types:
SELECT search_item.name ,count(search_item.name)
FROM
log_activity la,
jsonb_to_recordset(la.data#>'{ListingInputFilterBean,searchItems}') as search_item(name text,type text)
WHERE search_item.type = 'BREED'
GROUP BY search_item.name;