How to extract data from array in a JSON message using CloudWatch Logs Insights? - amazon-cloudwatch

I log messages that are JSON objects. The JSON has an array that contains key/value pairs:
{
...
"arr": [{"key": "foo", "value": "bar"}, ...],
...
}
Now I want to filter results that contains a specific key and extract the values for a specific key in the array.
I've tried using regex, something like parse #message /.*"key":"my_specific_key","value":(?<value>.*}).*/ which extracts the value but also returns the rest of the message. Also it doesn't filter the results.
How can I filter results and extract the values for a specific key?

If in your log entry in the cloudwatch log group they are actually showing up as json, you can just reference the key directly in any place you would a field.
(don't need the #, cloudwatch appends that automatically to all default values)
If you are using python, you can use aws_lambda_powertools to do this as well, in a very slick way (and its an actual aws product)
If they are showing up in your log as a string, then it may be an escaped string and you'll have to match it -exactly- - including spaces and what not. when you parse, you will want to do something like this:
if this is the string of your log message '{"AKey" : "AValue", "Key2" : "Value2"}
parse #message "{\"*\" : \"*\",\"*\" : \"*\"} akey, akey_value, key2, key2_value
then you can filter or count or anything against those variables. parse is specifically a statement to match a pattern and assign the wildcard to a variable, one at a time in order
tho with a complex json, if your above regex works than all you need is a filter statement
field #message
| pares #message ... your regex as value_var
| filer value_var /some more regex/
if its not a string in the log entry, but an actual json, you can just reference against the key:
filter a_key ~="some value" (or regex here)
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
for more info

Related

Splunk field extractor unable to extract all values

I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...

How do you convert text column data with Ruby JSON format ("key" => "value") to standard JSON?

I have data in the comment column of the payments table. The data is stored as plain text in the following format:
{"foo"=>"bar"}
I need to query the value of the specific "foo" key and tried the following:
select comment::json -> 'foo' from payments
but because the data stored is not in JSON format I get the following error:
invalid input syntax for type json DETAIL: Token "=" is invalid. CONTEXT: JSON data, line 1: {"foo"=>"bar"}
which refers to the => that Ruby uses for Hashes.
Is there a way to convert the text data to JSON data on-the-fly so I can then access the specific keys I need?
You can replace the => with a : to make that example a valid JSON value:
replace(comment, '=>', ':')::jsonb ->> 'foo'
It sounds like the data is technically valid ruby which means we can do something a bit clever.
require 'json'
def parse_data(data_string)
eval(data_string).to_json
end
Should do the trick so long as the data is trusted.

Need Pentaho JSON without array

I wanted to output json data not as array object and I did the changes mentioned in the pentaho document, but the output is always array even for the single set of values. I am using PDI 9.1 and I tested using the ktr from the below link
https://wiki.pentaho.com/download/attachments/25043814/json_output.ktr?version=1&modificationDate=1389259055000&api=v2
below statement is from https://wiki.pentaho.com/display/EAI/JSON+output
Another special case is when 'Nr. rows in a block' = 1.
If used with empty json block name output will looks like:
{
"name" : "item",
"value" : 25
}
My output comes like below
{ "": [ {"name":"item","value":25} ] }
I have resolved myself. I have added another JSON input step and defined as below
$.wellDesign[0] to get the array as string object

How to Extract the value of resultSet returned from JDBC response (Via MEL) Mule ESB

I have JDBC where I'm calling the stored Procedure, It is returning the response as below, But I'm pretty not sure how to extract the value of result set
Please find the response from DB
{updateCount1=4,resultSet1=[{XML_F5RYI-11YTR=<Customers><Customer1>John<Customer1><Customer2>Ganesh<Customer2><Customers>}],resultSet2[{SequenceNumber=94}],updateCount2=1, updateCount3=4}
I have used the this expression #[message.payload.get(0)], It has return the ResultSet as below, But not exactly value required. I need to take the xml value of XML_F5RYI-11YTR.
{XML_F5RYI-11YTR=<Customers><Customer1>John<Customer1><Customer2>Ganesh<Customer2><Customers>}
Also tried like below
#[message.payload.get(0).XML_F5RYI-11YTR] but getting error , not able to extract the xml.
Could you please suggest how can I extract the xml from the ResultSet1
In most cases, the way you did it should work. I think what is happening here is that the hyphen in the column name is interpreted by the MEL parser as a subtraction. So you could change yours to this syntax, and it should work:
#[message.payload.get(0)['XML_F5RYI-11YTR']]
Also you can omit "message", as payload is resolvable directly:
#[payload.get(0)['XML_F5RYI-11YTR']]
You could use array bracket syntax to access the first row in the result set, instead of the get method:
#[payload[0]['XML_F5RYI-11YTR']]
Finally, you might want to do something for each row returned from the database. If you use a collection-splitter or a for-each, your payload will be the map that represents the row, instead of a list of maps representing the whole result set:
<collection-splitter />
<logger message="#[payload['XML_F5RYI-11YTR']]" />
EDIT
To access the result set in the payload shown in the question, you would need to access it like so:
#[payload.resultSet1[0]['XML_F5RYI-11YTR']]
The database connector gives you a list of maps. The map keys will be the name of the columns. Therefore if you want to get updateCount1, you can use something like this:
#[payload.get('updateCount1')]"
Thump rule - you database connector gives you list of map, not sure what format does it is carry, if you want XML_F5RYI.. value then do the below
[message.payload.get(0)] convert it to json or map from which #[message.payload.get("XML_F5RYI-11YTR")]

filter result by taking out a matching regex in pig latin

I have some data that contains a url string, which all have some variety substring embeded.
my goal to to get a set of results which have the substring removed from the string:
e.g.
rawdata: {
id Long,
url String
}
here's some sample rawdata:
1,/213112341_v1.html
2,43524254243_v2.html
5,/000000_v3.html
5,/000000_v4.html
the result I want is:
1,/213112341.html
2,43524254243.html
5,/000000.html
so basically remove teh subversion number( _v1|_v2|v3|_v4) from the url and create unique results.
How do I do that in pig?
Thanks,
Your best bet would be to do something like the following:
FOREACH data GENERATE id, CONCAT(REGEX_EXTRACT(url, '(/?[0-9]*)_,',1),'.html');
EDIT:
How about trying the following if the data is more complicated
FOREACH data GENERATE id, CONCAT(STRSPLIT(url, '_v[0-9]',1),'.html')
That should get everything before the version #, with the concat adding the .html back in. If both the before verson number and after verison number sections are more comlicated you could do something like:
FOREACH data GENERATE id, CONCAT(FLATTEN(STRSPLIT(url, '_v[0-9]',2)))