Splunk breakdown results by matched search phrase - splunk

I'm searching for a few different search terms, and I would like stats grouped by which term matched:
"PhraseA" "PhraseB" "PhraseC" | timechart count by <which Phrase matched>
What should be in place of <which Phrase matched>? I will be building a stacked bar chart with the results.

try creating a category field using eval and case, and using that in your chart:
index=whatever_index "PhraseA" "PhraseB" "PhraseC"
| eval matched_phrase=case(searchmatch("PhraseA"), "PhraseA", searchmatch("PhraseB"), "PhraseB", searchmatch("PhraseC"), "PhraseC")
| timechart count by matched_phrase
Lots more good info in the Splunk documentation for these functions

Related

Search using previous query's values

I am relatively new to Splunk, and I am attempting to perform a query like the following. The snippets below each step show some of what's been attempted.
Query for initial set of events containing a string value
* "string_value"
Get list of distinct values for a specific field returned from step 1
* "string_value" | stats list(someField)
Search for events containing any of the specific field's values returned from step 2
* "string_value" | stats list(someField) as myList | search someField in myList
I'm not entirely certain if this can be accomplished. I've read documents on subqueries, foreach, and various aggregate methods, though I am still uncertain on how to achieve what I need.
Other attempts:
someField IN [search * "string_value" | stats list(someField) as myList]
Thanks in advance.
You certainly can sequentially build a search like this, but you're likely better off doing it this way:
index=ndx sourcetype=srctp someField IN("my","list","of","values") "string_value"
| stats values(someField) as someField
The more you can put in your initial search, the better (in general)

Splunk: I am trying to create report by writing a query but the values are not displaying under statistics. How can I resolve this?

I am new to the Splunk tool. I am trying to create a report by using a query. The data is not getting loaded under Statistics but I can see the logs under the Events. Is there somthing that I am missing in my below query:
index="cba_strat_risk" sourcetype IN ("kube:container:abc-service", "kube:container:xyz-service")
| stats count as count, count(eventtype="nix-all-logs") as success-count, count(eventtype="nix_errors") as error-count
| eval success_percentage=round(success-count/count*100,2)
| eval error_percentage=round(error-count/count*100,2)
| fields sourcetype eventtype success-count error-count success_percentage error_percentage
Also attaching the screenshot:
Please do let me know, if I am missing something.
The Statistics tab is loaded by stats commands (like stats, chart, and timechart), which you have in your query. The problem is the values shown are either null or zero.
First, avoid hyphens (a.k.a. minus signs) in field names. They only lead to parsing problems.
Second, the construct count(eventtype="nix-all-logs") won't work. To count the results of an expression, you must use eval, as in count(eval(eventtype="nix-all-logs")). However, in that case, count is not the function to use as it will return the number of ones and zeroes returned by eval. Instead, use sum(eval(eventtype="nix-all-logs")) to get the number of events meeting the eval criteria.
I was able to make it work:
index="cba_strat_risk" sourcetype IN ("kube:container:abc-service", "kube:container:xyz-portal", "kube:container:zzz-landing")
| stats count as total_count, count(eval(eventtype="nix-all-logs")) as success_count, count(eval(eventtype="nix_errors")) as error_count by sourcetype
| eval success_percentage=round(success_count/total_count*100,2)
| eval error_percentage=round(error_count/total_count*100,2)
| rename sourcetype as Service
| rename success_count as "Success Count"
| rename error_count as "Error Count"
| rename error_percentage as "Error Percentage"
| rename success_percentage as "Success Percentage"
| fields Service "Success Count" "Success Percentage" "Error Count" "Error Percentage"

Splunk search the key in json

Could anyone help me with the below Splunk query?
I want to get the count of records by message.type. The message.type can take value either 'typeA' or 'typeB'.
I tried the below query but it lists and doesn't give the count in the result. That is, separate count for typeA and typeB.
The messages are below.
message: name=app1,version=1, type=typeA,task=queryapp
message: name=app2,version=1, type=typeB,task=testapp
message: name=app1,version=1, type=typeB,task=issuefix
index=myapp message="name=app1"
| stats count by message.type
Ideally, you would modify the logs so that type is its own json field.
However, if you are stuck with with
{"message" : "name=app1,version=1, type=typeA,task=queryapp"}
Then I suggest the following solution:
index=myapp message=*
| rex field=message "type=(?<myType>[a-zA-Z]+)"
| stats count by myType
The rex command here is extracting a new splunk field named myType from the existing message field based on the supplied regular expression.

Alternative to subsearch to search more than million entries

Hi I have a sub search command which gives me the required results but is dead slow in doing so. I am having more than a million log entries that i need to search which is the reason why i am looking for an optimized solution. I have gone through answers asked for similar questions but not able to achieve what i need
I have a log which has transactions against an entry_id which always has a main entry and may or may not have subEntry
I want to find the count of version number for all the mainEntry log which has a subEntry
sample Query that i used
index=index_a [search index=index_a ENTRY_FIELD="subEntry"| fields Entry_ID] Entry_FIELD="mainEntry" | stats count by version
Sample data
Index=index_a
1) Entry_ID=abcd Entry_FIELD="mainEntry" version=1
Entry_ID=abcd ENTRY_FIELD="subEntry"
2)Entry_ID=1234 Entry_FIELD="mainEntry" version=1
3)Entry_ID=xyz Entry_FIELD="mainEntry" version=2
4)Entry_ID=lmnop Entry_FIELD="mainEntry" version=1
Entry_ID=lmnop ENTRY_FIELD="subEntry"
5)Entry_ID=ab123 Entry_FIELD="mainEntry" version=3
Entry_ID=ab123 ENTRY_FIELD="subEntry"
Please help in optimizing this
Its not entirely clear what your sample data looks like.
Is it that events 1, 4 and 5 have the fields Entry_ID, Entry_FIELD, version, Entry_ID, Entry_FIELD ? That is, 2 occurances of Entry_ID and Entry_FIELD?
You can try something like the following, but I think you need to explain your data a bit better.
index=index_a Entry_FIELD="subEntry" OR Entry_FIELD="mainEntry"
| stats dc(Entry_FIELD) as Entry_FIELD_Count by Entry_ID, version
| where Entry_FIELD_Count==2
| stats count by version

Query to loop through data in splunk

I've below lines in my log:
...useremail=abc#fdsf.com id=1234 ....
...useremail=pqr#fdsf.com id=4565 ....
...useremail=xyz#fdsf.com id=5773 ....
Capture all those userids for the period from -1d#d to #d
For each user, search from beginning of index until -1d#d & see if the userid is already present by comparing actual id field
If it is not present, then add it into the counter
Display this final count.
Can I achieve this in Splunk?
Thanks!
Yes, there are several ways to do this in Splunk, each varying in degrees of ease and ability to scale. I'll step through the subsearch method:
1) Capture all those userids for the period from -1d#d to #d
You want to first validate a search that returns only a list of ids, which will then be turned into a subsearch:
sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id
2) For each user, search from beginning of index until -1d#d & see if the userid is already present by comparing actual id field
Construct a main search with a different timeframe that using the subsearch from (1) to match against those ids (note that the subsearch must start with search):
sourcetype=<MY_SOURCETYPE> [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d
This will return a raw dataset of all events from the start of the index up to but not including 1d#d that contain the ids from (1).
3) If it is not present, then add it into the counter
Revise that search with a NOT against the entire subsearch and pipe the outer search to stats to see the ids it matched:
sourcetype=<MY_SOURCETYPE> NOT [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d | stats values(id)
4) Display this final count.
Revise the last stats command to return a distinct count number instead:
sourcetype=<MY_SOURCETYPE> NOT [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d | stats dc(id)
Performance considerations:
The above method works reasonably well for datasets under 1 million rows, on commodity hardware. The issue is that the subsearch is blocking, thus the outer search needs to wait. If you have larger datasets to deal with, then alternative methods need to be employed to make this an efficient search.
FYI, Splunk has a dedicated site where you can get answers to questions like this much faster: http://splunk-base.splunk.com/answers/