Search using Lookup from a single field CSV file - splunk

I have a list of usernames that I have to monitor and the list is growing every day. I read Splunk documentation and it seems like lookup is the best way to handle this situation.
The goal is for my query to leverage the lookup function and prints out all the download events from all these users in the list.
Sample logs
index=proxy123 activity="download"
{
"machine":"1.1.1.1",
"username":"ABC#xyz.com",
"activity":"download"
}
{
"machine":"2.2.2.2",
"username":"ASDF#xyz.com",
"activity":"download"
}
{
"machine":"3.3.3.3",
"username":"GGG#xyz.com",
"activity":"download"
}
Sample Lookup (username.csv)
users
ABC#xyz.com
ASDF#xyz.com
BBB#xyz.com
Current query:
index=proxy123 activity="download" | lookup username.csv users OUTPUT users | where not isnull(users)
Result: 0 (which is not correct)
I probably don't understand lookup correctly. Can someone correct me and teach me the correct way?

In the lookup file, the name of the field is users, whereas in the event, it is username. Fortunately, the lookup command has a mechanism for renaming the fields during the lookup. Try the following
index=proxy123 activity="download" | lookup username.csv users AS username OUTPUT users | where isnotnull(users)
Now, depending on the volume of data you have in your index and how much data is being discarded when not matching a username in the CSV, there may be alternate approaches you can try, for example, this one using a subsearch.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS username | return username ]
What happens here in the subsearch (the bit in the []) is that the subsearch will be expanded first, in this case, to (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com"). So your main search will turn into
index=proxy123 activity="download" (username="ABC#xyz.com" OR username="ASDF#xyz.com" OR username="BBB#xyz.com")
which may be more efficient than returning all the data in the index, then discarding anything that doesn't match the list of users.
This approach assumes that you have the username field extracted in the first place. If you don't, you can try the following.
index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS search | format ]
This expanded search will be
index=proxy123 activity="download" "ABC#xyz.com" OR "ASDF#xyz.com" OR "BBB#xyz.com")
which may be more suitable to your data.

Related

How to Build Splunk Search Query for below Scenario

I am able to get the multiple events (api's logs) in splunk dashboard like below
event-1:
{ "corrId":"12345", "traceId":"srh-1", "apiName":"api1" }
event-2:
{ "corrId":"69863", "traceId":"srh-2", "apiName":"api2" }
event-3:
{ "corrId":"12345", "traceId":"srh-3", "apiName":"api3" }
I want to retrieve corrId (ex:- "corrId":"12345") dynamically from one event (api log)by providing apiName and build splunk search query based on retrieved corrId value that means it will pull all the event logs which contains same corrId ("corrId":"12345").
Output
In above scenario expected results would be like below
event-1:
{ "corrId":"12345", "traceId":"srh-1", "apiName":"api1" }
event-3:
{ "corrId":"12345", "traceId":"srh-3", "apiName":"api3" }
I am new to splunk, please help me out here, how to fetch "corrId":"12345" dynamically by providing other field like apiName and build Splunk search query based on that.
I have tried out like below, but to no luck.
index = "test_srh source=policy.log [ search index = "test_srh source=policy.log | rex field=_raw "apiName":|s+"(?[^"]+)" | search name="api1" | table corrId]
This query gives event-1 log only but we need all other events which contain same corrId ("corrId":"12345"). Appreciate quick help here.
Given you're explicitly extracting the apiName field, I'll assume the corrId field is not automatically extracted, either. That means putting corrId="12345" in the base query won't work. Try index=test_srh source=policy.log corrId="12345" to verify that.
If the corrId field needs to be extracted then try this query.
index=test_srh source=policy.log
| rex "corrId\\":\\"(?<corrId>[^\\"]+)"
| where [ search index = "test_srh source=policy.log
| rex "apiName\":\"(?<name>[^\"]+)"
| search name="api1"
| rex "corrId\\":\\"(?<corrId>[^\\"]+)"
| fields corrId | format ]
Note: I also corrected the regex to properly extract the apiName field.

Best method to keep lookup file value fresh

Say, I have to monitor users' activities from 3 specific departments: Science, History, and Math.
The goal is to send an alert if any of the users in any of those departments download a file from site XYZ.
Currently, I have a lookup file for all the users from those three departments.
users
----------------------
user1#organization.edu
user2#organization.edu
user3#organization.edu
user4#organization.edu
user5#organization.edu
One problem: users can join, leave, or transfer to another department anytime.
Fortunately, those activities (join and leave) are tracked and they are Splunk-able.
index=directory status=*
-----------------------------------------------
{
"username":"user1#organization.edu",
"department":"Science",
"status":"added"
}
{
"username":"user1#organization.edu",
"department":"Science",
"status":"removed"
}
{
"username":"user2#organization.edu",
"department":"History",
"status":"added"
}
{
"username":"user3#organization.edu",
"department":"Math",
"status":"added"
}
{
"username":"MRROBOT#organization.edu",
"department":"Math",
"status":"added"
}
In this example, assuming I forgot to update the lookup file, I won't get an alert when MRROBOT#organization.edu downloads a file, and at the same time, I will still get an alert when user1#organization.edu downloads a file.
One solution that I could think of is to update the lookup manually via using inputlookup and outputlook method like:
inputlookup users.csv | users!=user1#organization.edu | outputlookup users.csv
But, I don't think this is an efficient method, especially there's high likely I might miss a user or two.
Is there a better way to keep the lookup file up-to-date? I googled around, and one suggestion is to use a cronjob CURL to update the list. But, I was wondering if there's a simpler or better alternative than that.
Here's a search that should automate the maintenance of the lookup file using the activity events in Splunk.
`comment("Read in the lookup file. Force them to have old timestamps")`
| inputlookup users.csv | eval _time=1, status="added"
`comment("Add in activity events")`
| append [ search index=foo ]
`comment("Keep only the most recent record for each user")`
| stats latest(_time) as _time, latest(status) as status by username
`comment("Throw out users with status of 'removed'")`
| where NOT status="removed"
`comment("Save the new lookup")`
| table username
| outputlookup users.csv
After the append command, you should have a list that looks like this:
user1#organization.edu added
user2#organization.edu added
user3#organization.edu added
user4#organization.edu added
user5#organization.edu added
user1#organization.edu added
user1#organization.edu removed
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added
The stats command will reduce it to:
user4#organization.edu added
user5#organization.edu added
user1#organization.edu removed
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added
with the where command further reducing it to:
user4#organization.edu added
user5#organization.edu added
user2#organization.edu added
user3#organization.edu added
MRROBOT#organization.edu added

Splunk lookuptable

I have a csv with different kind of IoCs in it like email addresses, IPs, etc. I want to run a search on any of my indexes which would return each record that has any match with my list.
This is what I want to achieve:
index=* "item1" OR "item2" OR "item3"
Since I have a thousand items on my list this won't work. So, I uploaded my csv as a lookuptable and tried the following:
index=* [| inputlookup test.csv]
This returns nothing, but if I search for each item "manually" then I get results.
What am I missing?
It would help to know the format of your CSV, but this should help.
index=* [| inputlookup test.csv | format]
If you insist on using index=*, do yourself a favor and use a small time window.

Search with original text that was replaced earlier

I am gathering performance metrics for each each api that we have. With the below query I get results as
method response_time
Create Billing 2343.2323
index="dev-uw2" logger_name="*Aspect*" message="*ApiImpl*" | rex field=message "PerformanceMetrics - method='(?<method>.*)' execution_time=(?<response_time>.*)" | table method, response_time | replace "public com.xyz.services.billingservice.model.Billing com.xyz.services.billingservice.api.BillingApiImpl.createBilling(java.lang.String)” WITH "Create Billing” IN method
If the user clicks on each api text in table cell to drill down further it will open a new search with "Create Billing" obviosuly it will give zero results since we don't have any log with that string.
I want splunk to search with original text that was replaced earlier.
You can use click.value to get around this.
http://docs.splunk.com/Documentation/SplunkCloud/6.6.3/Viz/tokens

Is there a way to do an LDAP query to get records where a particular attribute is the same?

I am trying to find an example LDAP query where I can find records where a particular attribute matches one or more other records. For instance, a user object where the userid is different, but the employee ids are the same. Is this even possible?
From a single LDAP query no. Unless you know the emplyeeID value you are looking for.
We created an LDAP tool, Duplicate Attribute Value Locater Tool, that will do this.
-jim
It's not possible to do sub queries within the filter itself. In this case, as long as I understand correctly, you'd like to find users that match :
objectClass of User
match on the value of employeeID
Out of the above subset, find all with a DISTINCT 'userid'
If you knew what userid to look for, or NOT look for, you could expand the inital AND clause to include finding, or not finding, that attribute :
userid not equal to 12345 :
(&(objectClass=person)(employeeID=JSmith)(!(userid=12345)))
userid equal to 12345 :
(&(objectClass=person)(employeeID=JSmith)(userid=12345)
I found this example for 'myattribute'. Needs some polish, and depending on the size of your directory, it could take a while to run. If that's the case, I'd break it up by attribute sections {attr=aa*, attr=ab*, attr=ac*, etc.}.
ldapsearch -x -h ldapserver.domain.com -b ou=myldap,o=mydomain.com "(&(myattribute=aa*))" myattribute | grep '^myattribute:' | sort | uniq -c| sort -n|awk '$1 > 1 { print }'