How to make pie chart of these values in Splunk - splunk

Have the following query index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method = GET | stats count(eval(searchmatch("true))) as Hit, count(eval(searchmatch("found=false"))) as Miss
Need to make a pie chart of two values "Hit and Miss rates"
The field where it is possible to distinguish the values is Message=[CACHE_NAME=RATE_SHOPPER some_other_strings method=GET found=false]. or found can be true

With out knowing the structure of your data it's harder to say what exactly you need todo but,
Pie charts is a single data series so you need to use a transforming command to generate a single series. PieChart Doc
if you have a field that denotes a hit or miss (You could use an Eval statement to create one if you don't already have this) you can use it to create the single series like this.
Lets say this field is called result.
|stats count by result
Here is a link to the documentation for the Eval Command
Good luck, hope you can get the results your looking for

Since you seem to be concerned only about whether "found" equals either "hit" or "miss", try this:
index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method=GET found IN("hit","miss")
| stats count by found

Pie charts require a single field so it's not possible to graph the Hit and Miss fields in a pie. However, if the two fields are combined into one field with two possible values, then it will work.
index=app (splunk_server_group=bex OR splunk_server_group=default) sourcetype=rpm-web* host=rpm-web* "CACHE_NAME=RATE_SHOPPER" method = GET
| eval result=if(searchmatch("found=true"), "Hit", "Miss")
| stats count by result

Related

Splunk field extractor unable to extract all values

I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...

Change order of categorical bars in Plotly parallel categories

I am trying to visualize changes in gene expression as categorical variables (up, down, no change) over various timepoints.
I have a dataframe describing differential expression data that looks like this:
data = {'gene':['Svm3G0018840','Svm5G0011050','Svm9G0059770'],
'01h': ['nc','up','down'], '04h': ['up', 'down', 'nc'],'08h':['nc','down','up']}
df=pd.DataFrame.from_dict(data)
df=df.set_index('gene')
I can use this df to create the parallel plot using the following code:
fig = px.parallel_categories(herbdf, dimensions=['01h', '04h', '08h','24h','48h'],
labels={'01h':'', '04h':'', '08h':'','24h':'','48h':''})
fig.show()
However, the categories (up, down, nc) are not always in the same order for every time point which makes the figure very difficult to read. I can change this in the interactive figure in a notebook, but I only have the option to output the corrected figure as a low quality png. I need the image in an svg format, which means I need to use the line:
fig.write_image("/figs/herb_de_pp.svg")
But when I add this line in the code block to save the figure I have no control of the order the categorical boxes end up in:
I have tried to add fig.update_ lines to solve this problem, such as:
fig.update_layout(xaxis={'categoryorder':'total descending'})
but this doesn't seem to change the output at all.
I could be missing something simple- any help would be much appreciated!
Parallel coordinates diagrams don't have xaxis/yaxis properties, you need to update traces in order to change the dimensions order:
dimensions = ['01h', '04h', '08h','24h','48h']
...
fig.update_traces(dimensions=[{"categoryorder": "category descending"} for _ in dimensions])
not great answer here, but something that I think will work in a pinch...
It looks like the order of the categories of each figure/column come from the order that they are in the original dataset. That is, in your first column, nc is the first unique item, then down is the second unique item, up is third.
So, if you can rearrange/sort your data so that the data shows up in the order you want it displayed, that should work.
Have your first row be nc | nc | nc | nc | nc, second row down | down | down | down | down, and third row up | up | up | up | up (assuming you actually have records like that). That should do it, but isn't very elegant...
Given the above solution, this is the line needed to sort the dataframe and produce the figure with ordered categories:
sorteddf = df.sort_values(by=['01h','04h','08h'], axis=0, ascending=False)

How can I put several extracted values from a Json in an array in Kusto?

I'm trying to write a query that returns the vulnerabilities found by "Built-in Qualys vulnerability assessment" in log analytics.
It was all going smoothly I was getting the values from the properties Json and turning then into separated strings but I found out that some of the terms posses more than one value, and I need to get all of them in a single cell.
My query is like this right now
securityresources | where type =~ "microsoft.security/assessments/subassessments"
| extend assessmentKey=extract(#"(?i)providers/Microsoft.Security/assessments/([^/]*)", 1, id), IdAzure=tostring(properties.id)
| extend IdRecurso = tostring(properties.resourceDetails.id)
| extend NomeVulnerabilidade=tostring(properties.displayName),
Correcao=tostring(properties.remediation),
Categoria=tostring(properties.category),
Impacto=tostring(properties.impact),
Ameaca=tostring(properties.additionalData.threat),
severidade=tostring(properties.status.severity),
status=tostring(properties.status.code),
Referencia=tostring(properties.additionalData.vendorReferences[0].link),
CVE=tostring(properties.additionalData.cve[0].link)
| where assessmentKey == "1195afff-c881-495e-9bc5-1486211ae03f"
| where status == "Unhealthy"
| project IdRecurso, IdAzure, NomeVulnerabilidade, severidade, Categoria, CVE, Referencia, status, Impacto, Ameaca, Correcao
Ignore the awkward names of the columns, for they are in Portuguese.
As you can see in the "Referencia" and "CVE" columns, I'm able to extract the values from a specific index of the array, but I want all links of the whole array
Without sample input and expected output it's hard to understand what you need, so trying to guess here...
I think that summarize make_list(...) by ... will help you (see this to learn how to use make_list)
If this is not what you're looking for, please delete the question, and post a new one with minimal sample input (using datatable operator), and expected output, and we'll gladly help.

How to select by elements in a UniData multivalued field

I'm trying to do an ad hoc search of records that contain duplicate values in the first and second elements of a multivalued UniData field. I was hoping something like this would work but I'm not having any luck.
LIST PERSON WITH EVAL "STATUS[1] = STATUS[2]"
After some testing it looks like I stumbled across a way of reading the field right to left that many characters. Interesting but not useful for what I need.
LIST PERSON NAME EVAL "NAME[3]" COL.HDG 'Last3'
PERSON Name Last3
0001 Smith ith
Any ideas on how to correctly select on specific field elements?
Apparently the EXTRACT function will let me specify an element but I still can't get a selection on it to work properly.
LIST PERSON STATUS EVAL "EXTRACT(STATUS,1,2,0)" COL.HDG 'Status2'
PERSON STATUS Status2
0001 Added Processed
Processed
I would use eval with #RECORD placeholder with the dynamic array notation as such (assuming that STATUS is in Attribute 11.
Edit:
Previous answer was how I would do this in UniVerse
SELECT PERSON WITH EVAL "#RECORD<11,1>" EQ EVAL "#RECORD<11,2>"
Script Wolf's more better way that works in UniVerse and UniData.
SELECT PERSON WITH EVAL "EXTRACT(#RECORD,11,1,0)" EQ EVAL "EXTRACT(#RECORD,11,2,0)"
Good Luck.

Need to query splunk using rest api call and pull mean and stdev

I am trying to query using Rest API on splunk with the following:
curl -u "<user>":"<pass>" -k https://splunkserver.com:8089/services/search/jobs/export -d'search=search index%3d"<index_name" sourcetype%3d"access_combined_wcookie" starttime%3d06/02/2013:0:0:0 endtime%3d06/10/2013:0:0:0 uri_path%3d"<uri1>" OR uri_path%3d"<uri2>" user!%3d"-" referer!%3d"-" | eval Time %3d request_time_length%2f1000000 | stats stdev%28Time%29 as stdev, mean%28Time%29 as mean, count%28uri_path%29 as count by uri_path'
However I do not get the computed mean and stdev, I only see count. How can I add the mean and stdev?
The query looks about right. I tried a similar query on my end it seemed to give me all 3 aggregates. Only thing I can think of is to make sure you have events that match the search criteria. It could be your time boundaries. Try expanding those or maybe removing one/both of them to see if you get any data for mean and stdev.