Query to loop through data in splunk - splunk

I've below lines in my log:
...useremail=abc#fdsf.com id=1234 ....
...useremail=pqr#fdsf.com id=4565 ....
...useremail=xyz#fdsf.com id=5773 ....
Capture all those userids for the period from -1d#d to #d
For each user, search from beginning of index until -1d#d & see if the userid is already present by comparing actual id field
If it is not present, then add it into the counter
Display this final count.
Can I achieve this in Splunk?
Thanks!

Yes, there are several ways to do this in Splunk, each varying in degrees of ease and ability to scale. I'll step through the subsearch method:
1) Capture all those userids for the period from -1d#d to #d
You want to first validate a search that returns only a list of ids, which will then be turned into a subsearch:
sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id
2) For each user, search from beginning of index until -1d#d & see if the userid is already present by comparing actual id field
Construct a main search with a different timeframe that using the subsearch from (1) to match against those ids (note that the subsearch must start with search):
sourcetype=<MY_SOURCETYPE> [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d
This will return a raw dataset of all events from the start of the index up to but not including 1d#d that contain the ids from (1).
3) If it is not present, then add it into the counter
Revise that search with a NOT against the entire subsearch and pipe the outer search to stats to see the ids it matched:
sourcetype=<MY_SOURCETYPE> NOT [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d | stats values(id)
4) Display this final count.
Revise the last stats command to return a distinct count number instead:
sourcetype=<MY_SOURCETYPE> NOT [search sourcetype=<MY_SOURCETYPE> earliest=-1d#d latest=-#d | stats values(id) AS id] earliest=0 latest=-1d#d | stats dc(id)
Performance considerations:
The above method works reasonably well for datasets under 1 million rows, on commodity hardware. The issue is that the subsearch is blocking, thus the outer search needs to wait. If you have larger datasets to deal with, then alternative methods need to be employed to make this an efficient search.
FYI, Splunk has a dedicated site where you can get answers to questions like this much faster: http://splunk-base.splunk.com/answers/

Related

Search using previous query's values

I am relatively new to Splunk, and I am attempting to perform a query like the following. The snippets below each step show some of what's been attempted.
Query for initial set of events containing a string value
* "string_value"
Get list of distinct values for a specific field returned from step 1
* "string_value" | stats list(someField)
Search for events containing any of the specific field's values returned from step 2
* "string_value" | stats list(someField) as myList | search someField in myList
I'm not entirely certain if this can be accomplished. I've read documents on subqueries, foreach, and various aggregate methods, though I am still uncertain on how to achieve what I need.
Other attempts:
someField IN [search * "string_value" | stats list(someField) as myList]
Thanks in advance.
You certainly can sequentially build a search like this, but you're likely better off doing it this way:
index=ndx sourcetype=srctp someField IN("my","list","of","values") "string_value"
| stats values(someField) as someField
The more you can put in your initial search, the better (in general)

Splunk join two query to based on result of first query

In Splunk query I have two query like below
Query 1- index=mysearchstring1
Result - employid =123
Query 2- index=mysearchstring2
Here I want to use employid=123 in my query 2 to lookup and return final result.
Is it possible in Splunk?
It sounds like you're looking for a subsearch.
index=mysearchstring2 [ search index=mysearchstring1 | fields employid | format ]
Splunk will run the subsearch first and extract only the employid field. The results will be formatted into something like (employid=123 OR employid=456 OR ...) and that string will be appended to the main search before it runs.

SignalFX detector data().count() based on condition

Is it possible to implement count() MTS based on condition?
For instance:
We need to monitor the amount of time RDS CPU picks the point of 95% for the last 3 days.
A = data('CPU_Utilization').count(...when point > 95%).
detector(when(A > {number_of_times_breached}, lasting='3d')).publish(...)
Update.
Solution was found by my colleague:
A = data('CPU_Utilization').above({condition_value}, inclusive=True).count(...)
You can use eval() with boolean result inside count() in your SPL query.
Something like
| <your search> | stats count(eval(point>0.95))

Alternative to subsearch to search more than million entries

Hi I have a sub search command which gives me the required results but is dead slow in doing so. I am having more than a million log entries that i need to search which is the reason why i am looking for an optimized solution. I have gone through answers asked for similar questions but not able to achieve what i need
I have a log which has transactions against an entry_id which always has a main entry and may or may not have subEntry
I want to find the count of version number for all the mainEntry log which has a subEntry
sample Query that i used
index=index_a [search index=index_a ENTRY_FIELD="subEntry"| fields Entry_ID] Entry_FIELD="mainEntry" | stats count by version
Sample data
Index=index_a
1) Entry_ID=abcd Entry_FIELD="mainEntry" version=1
Entry_ID=abcd ENTRY_FIELD="subEntry"
2)Entry_ID=1234 Entry_FIELD="mainEntry" version=1
3)Entry_ID=xyz Entry_FIELD="mainEntry" version=2
4)Entry_ID=lmnop Entry_FIELD="mainEntry" version=1
Entry_ID=lmnop ENTRY_FIELD="subEntry"
5)Entry_ID=ab123 Entry_FIELD="mainEntry" version=3
Entry_ID=ab123 ENTRY_FIELD="subEntry"
Please help in optimizing this
Its not entirely clear what your sample data looks like.
Is it that events 1, 4 and 5 have the fields Entry_ID, Entry_FIELD, version, Entry_ID, Entry_FIELD ? That is, 2 occurances of Entry_ID and Entry_FIELD?
You can try something like the following, but I think you need to explain your data a bit better.
index=index_a Entry_FIELD="subEntry" OR Entry_FIELD="mainEntry"
| stats dc(Entry_FIELD) as Entry_FIELD_Count by Entry_ID, version
| where Entry_FIELD_Count==2
| stats count by version

Splunk breakdown results by matched search phrase

I'm searching for a few different search terms, and I would like stats grouped by which term matched:
"PhraseA" "PhraseB" "PhraseC" | timechart count by <which Phrase matched>
What should be in place of <which Phrase matched>? I will be building a stacked bar chart with the results.
try creating a category field using eval and case, and using that in your chart:
index=whatever_index "PhraseA" "PhraseB" "PhraseC"
| eval matched_phrase=case(searchmatch("PhraseA"), "PhraseA", searchmatch("PhraseB"), "PhraseB", searchmatch("PhraseC"), "PhraseC")
| timechart count by matched_phrase
Lots more good info in the Splunk documentation for these functions