Splunk query reference field in joined data - splunk

Full disclosure, I am very new Splunk so I may explain my question incorrectly.
I have two data sources and was given a query to pull data from them individually. I am trying to join this data together so I can create some type of chart, but I am unsure of this would be a join/search etc.
My initial query is as follows:
This allows me to search through the mail logs by sender address and show all emails with a bcSendAction=1, which is a successful send.
index=mail sourcetype=barracuda [search index=mail sourcetype=barracuda bcSender="someemail#domain.com" | table bcMsgId] bcSendAction=1
The result of this search is as follows:
Now, my other search is a log that shows all of the sender email addresses during a certain time period. I would like to use the result of this (the email value) in the first search so that I don't have to hard-code the bcSender, but rather have it use the results from the other source.
// Returns an email address
index=mail sourcetype=sendmail_syslog *#sfdc.net |
rex field=from "<(?<from>.*)>" |
table from | dedup from
I was able to parse the log and pull out just the email addresses that I want to use to plug into my first search.
I followed a few emails and tutorials, but a lot of the joins I was seeing only used two different sources/datasets and didn't use the search as I did in my first query.
My attempt at this was something like:
index=mail sourcetype=sendmail_syslog *#sfdc.net
| rex field=from "<(?<from>.*)>"
| table from | dedup from
| join from
[search index=mail sourcetype=barracuda [search index=mail sourcetype=barracuda bcSender=from | table bcMsgId] bcSendAction=1]
I don't know that I am referencing the email from the first result set correctly.
Can someone point me in the right direction with how to approach this search?

If I understand your request properly, then you need 3 steps:
get the sender addresses from index=mail sourcetype=sendmail_syslog
use these sender addresses to get a list of messageID's from index=mail sourcetype=barracuda
use these messageID's to finally get the events you are looking for
This sounds like you need a subsearch (for getting the sender addresses) inside of another subsearch (for getting the messageID's), meaning your own attempt was pointing in the right direction already.
Try something along these lines:
index=mail sourcetype=barracuda bcSendAction=1
[ search
index=mail sourcetype=barracuda
[ search
index=mail sourcetype=sendmail_syslog *#sfdc.net
| rex field=from "<(?<bcSender>.*)>"
| stats count by bcSender
| fields bcSender
| format
]
| stats count by bcMsgId
| fields bcMsgId
| format
]
I can not really verify it without having your data, but I'll try to explain what it's supposed to do. Let's start from the innermost subsearch.
Line 4 starts the innermost subsearch
Line 5 selects the events in from which you generate the address list
Line 6 extracts the addresses directly into the field bcSender. (We could extract it to the field from first and then rename it, but this is more direct.)
We need the fieldname to be bcSender for the outer search.
Line 7 is a different way to deduplicate by bcSender and at the same time reduce the amount of data which needs to be sent back from indexers to the searchhead (if you have a distributed environment).
Line 8 gets rid of all the fields we don't require. They would be problematic with the following format command.
Line 9 passes the results back to he enclosing search in a way so it can be used as part of the search string.
Line 10, of course, closes the innermost subsearch.
Now let's have a look at the outer subsearch.
Line 2 starts the subsearch.
Line 3 selects the events from which we can get the messageID's. This is, of cause, augmented by the enclosed subsearch we've just discussed.
Line 11 again is a way to dedup the messageID's.
Line 12 again limits things to the field we need.
Line 13 passes the found messageID's to the outermost (main) search in a such a way that they become part of the search string.
Line 14, you already know, closes the subsearch.
And the outermost search:
Line 1 selects the data you are targetting and is augmented by what the subsearches pass to it.

That one side of the join is a single field indicates it is a good candidate for a subsearch. Subsearches run first and their results then become part of the main search.
index=mail sourcetype=barracuda bcSendAction=1
[ search index=mail sourcetype=sendmail_syslog *#sfdc.net
| rex field=from "<(?<from>.*)>"
| fields from | rename from as bcSender | format ]
It's important that the result of the subsearch contain a field present in the main search. That's why I used rename.
After the subsearch runs, you get a search that's equivalent to this:
index=mail sourcetype=barracuda bcSendAction=1 (bcSender="someemail#domain.com" OR bcSender="anotheremail#domain.com")

Related

How to Build Splunk Search Query for below Scenario

I am able to get the multiple events (api's logs) in splunk dashboard like below
event-1:
{ "corrId":"12345", "traceId":"srh-1", "apiName":"api1" }
event-2:
{ "corrId":"69863", "traceId":"srh-2", "apiName":"api2" }
event-3:
{ "corrId":"12345", "traceId":"srh-3", "apiName":"api3" }
I want to retrieve corrId (ex:- "corrId":"12345") dynamically from one event (api log)by providing apiName and build splunk search query based on retrieved corrId value that means it will pull all the event logs which contains same corrId ("corrId":"12345").
Output
In above scenario expected results would be like below
event-1:
{ "corrId":"12345", "traceId":"srh-1", "apiName":"api1" }
event-3:
{ "corrId":"12345", "traceId":"srh-3", "apiName":"api3" }
I am new to splunk, please help me out here, how to fetch "corrId":"12345" dynamically by providing other field like apiName and build Splunk search query based on that.
I have tried out like below, but to no luck.
index = "test_srh source=policy.log [ search index = "test_srh source=policy.log | rex field=_raw "apiName":|s+"(?[^"]+)" | search name="api1" | table corrId]
This query gives event-1 log only but we need all other events which contain same corrId ("corrId":"12345"). Appreciate quick help here.
Given you're explicitly extracting the apiName field, I'll assume the corrId field is not automatically extracted, either. That means putting corrId="12345" in the base query won't work. Try index=test_srh source=policy.log corrId="12345" to verify that.
If the corrId field needs to be extracted then try this query.
index=test_srh source=policy.log
| rex "corrId\\":\\"(?<corrId>[^\\"]+)"
| where [ search index = "test_srh source=policy.log
| rex "apiName\":\"(?<name>[^\"]+)"
| search name="api1"
| rex "corrId\\":\\"(?<corrId>[^\\"]+)"
| fields corrId | format ]
Note: I also corrected the regex to properly extract the apiName field.

Search using Lookup from a single field CSV file

I have a list of usernames that I have to monitor and the list is growing every day. I read Splunk documentation and it seems like lookup is the best way to handle this situation.
The goal is for my query to leverage the lookup function and prints out all the download events from all these users in the list.
Sample logs
index=proxy123 activity="download"
{
"machine":"1.1.1.1",
"username":"ABC#xyz.com",
"activity":"download"
}
{
"machine":"2.2.2.2",
"username":"ASDF#xyz.com",
"activity":"download"
}
{
"machine":"3.3.3.3",
"username":"GGG#xyz.com",
"activity":"download"
}
Sample Lookup (username.csv)
users
ABC#xyz.com
ASDF#xyz.com
BBB#xyz.com
Current query:
index=proxy123 activity="download" | lookup username.csv users OUTPUT users | where not isnull(users)
Result: 0 (which is not correct)
I probably don't understand lookup correctly. Can someone correct me and teach me the correct way?
In the lookup file, the name of the field is users, whereas in the event, it is username. Fortunately, the lookup command has a mechanism for renaming the fields during the lookup. Try the following
index=proxy123 activity="download" | lookup username.csv users AS username OUTPUT users | where isnotnull(users)
Now, depending on the volume of data you have in your index and how much data is being discarded when not matching a username in the CSV, there may be alternate approaches you can try, for example, this one using a subsearch.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS username | return username ]
What happens here in the subsearch (the bit in the []) is that the subsearch will be expanded first, in this case, to (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com"). So your main search will turn into
index=proxy123 activity="download" (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com")
which may be more efficient than returning all the data in the index, then discarding anything that doesn't match the list of users.
This approach assumes that you have the username field extracted in the first place. If you don't, you can try the following.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS search | format ]
This expanded search will be
index=proxy123 activity="download" "ABC#xyz.com" OR "ASDF#xyz.com" OR "BBB#xyz.com")
which may be more suitable to your data.

Splunk: search a string, if found only then look for another log with same request-id

I want to find a string (driving factor) and if found, only then look for another string with same x-request-id and extract some details out of it.
x-request-id=12345 "InterestingField=7850373" [this one is subset of very specific request]
x-request-id=12345 "veryCommonField=56789" [this one is a superSet of all kind of requests]
What I've tried:
index=myindex "InterestingField" OR "veryCommonField"
| transition x-request-id
But problem with above is this query join all those request as well which has only veryCommonField in it.
I want to avoid join as they are pretty low in performance.
What I need:
list InterestingField, veryCommonField
Example:
Below represents beginning of all kind of request. We get thousands of such request in a day.
index=myIndex xrid=12345 "Request received for this. field1: 123 field2: test"
Out of all above request below category falls under 100.
index=myIndex xrid=12345 "I belong to blahBlah category. field3: 67583, field4: testing"
I don't want to search in a super-set of 1000k+ but only in matching 100 requests. Because with increased time span, this search query will take very long.
If I'm understanding your use-case, the following may be helpful.
Using stats
index=myindex "InterestingField" OR "veryCommonField" | stats values(InterestingField), values(veryCommonField) by x-request-id
Using subsearch
index=myindex [ index=myindex InterestingField=* | fields x-request-id | format ]
Depending on the number of results that match InterestingField, you can also use map, https://docs.splunk.com/Documentation/Splunk/8.0.3/SearchReference/Map
index=myindex InterestingField="*" | map maxsearches=0 "search index=myindex x-request-id=$x-request-id$ | stats values(InterestingField), values(veryCommonField) by x-request-id"
If you provide more thorough example events, we can assist you further.

Splunk lookuptable

I have a csv with different kind of IoCs in it like email addresses, IPs, etc. I want to run a search on any of my indexes which would return each record that has any match with my list.
This is what I want to achieve:
index=* "item1" OR "item2" OR "item3"
Since I have a thousand items on my list this won't work. So, I uploaded my csv as a lookuptable and tried the following:
index=* [| inputlookup test.csv]
This returns nothing, but if I search for each item "manually" then I get results.
What am I missing?
It would help to know the format of your CSV, but this should help.
index=* [| inputlookup test.csv | format]
If you insist on using index=*, do yourself a favor and use a small time window.

Search with original text that was replaced earlier

I am gathering performance metrics for each each api that we have. With the below query I get results as
method response_time
Create Billing 2343.2323
index="dev-uw2" logger_name="*Aspect*" message="*ApiImpl*" | rex field=message "PerformanceMetrics - method='(?<method>.*)' execution_time=(?<response_time>.*)" | table method, response_time | replace "public com.xyz.services.billingservice.model.Billing com.xyz.services.billingservice.api.BillingApiImpl.createBilling(java.lang.String)” WITH "Create Billing” IN method
If the user clicks on each api text in table cell to drill down further it will open a new search with "Create Billing" obviosuly it will give zero results since we don't have any log with that string.
I want splunk to search with original text that was replaced earlier.
You can use click.value to get around this.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Viz/tokens