group by part of url using regex splunk - splunk

I have multiple url's all start with /api/net, I want to group by next couple of strings that are separated by / like
/api/net/abc/def?key=value
/api/net/c/d?key1=value1
/api/net/j/h?key2=value2
I have below regular expression which parses all url's but I explicitly have to specify required in regular expression .
| rex field=requestPath "(?<volga>.+?(\/abc\/def)|(\/c\/d)|(\/j\/h).+?)"
volga is a named capturing group, I want to do a group by on volga without adding /abc/def, /c/d,/j/h in regular expression so that I would know number of expressions in there instead of hard coding.
There are other expressions I would not know to add, So I want to group by on next 2 words split by / after "net" and do a group by , also ignore rest of the url. Let me know if you did not understand, I could explain more.

If I understand the question correctly, this regex will parse the URL and return the two domains as 'dom1' and 'dom2', respectively. Then you can group/sort on them.
... | rex field=requestPath "\/api\/net\/(?<dom1>[^\/]+)\/(?<dom2>[^\/\?]+)"
| stats values(*) as * by dom1,dom2

Related

Splunk field extractor unable to extract all values

I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...

Match partial string from list with field

I'm trying to check if a field contains a value from a list using Kusto in Log analytics/Sentinel in Azure.
The list contains top level domains but I only want matches for subdomains of these top levels domains. The list value example.com should match values such as forum.example.com or api.example.com.
I got the following code but it does exact matches only.
let domains = dynamic(["example.com", "amazon.com", "microsoft.com", "google.com"]);
DeviceNetworkEvents
| where RemoteUrl in~ (domains)
| project TimeGenerated, DeviceName, InitiatingProcessAccountUpn, RemoteUrl
I tried with endswith, but couldn't get that to work with the list.
It seems that has_any() would work for you:
let domains = dynamic(["example.com", "amazon.com", "microsoft.com", "google.com"]);
DeviceNetworkEvents
| where RemoteUrl has_any(domains)
| project TimeGenerated, DeviceName, InitiatingProcessAccountUpn, RemoteUrl
Note that you can also use the has_any_index() to get which item in the array was matched
In order to correctly match URLs with a list of domains, you need to build a regex from these domains, and then use the matches regex operator.
Make sure you build the regex correctly, in order not to allow these:
example.com.hacker.com
hackerexample.com
hacker.com/example.com
Etc...

How can I put several extracted values from a Json in an array in Kusto?

I'm trying to write a query that returns the vulnerabilities found by "Built-in Qualys vulnerability assessment" in log analytics.
It was all going smoothly I was getting the values from the properties Json and turning then into separated strings but I found out that some of the terms posses more than one value, and I need to get all of them in a single cell.
My query is like this right now
securityresources | where type =~ "microsoft.security/assessments/subassessments"
| extend assessmentKey=extract(#"(?i)providers/Microsoft.Security/assessments/([^/]*)", 1, id), IdAzure=tostring(properties.id)
| extend IdRecurso = tostring(properties.resourceDetails.id)
| extend NomeVulnerabilidade=tostring(properties.displayName),
Correcao=tostring(properties.remediation),
Categoria=tostring(properties.category),
Impacto=tostring(properties.impact),
Ameaca=tostring(properties.additionalData.threat),
severidade=tostring(properties.status.severity),
status=tostring(properties.status.code),
Referencia=tostring(properties.additionalData.vendorReferences[0].link),
CVE=tostring(properties.additionalData.cve[0].link)
| where assessmentKey == "1195afff-c881-495e-9bc5-1486211ae03f"
| where status == "Unhealthy"
| project IdRecurso, IdAzure, NomeVulnerabilidade, severidade, Categoria, CVE, Referencia, status, Impacto, Ameaca, Correcao
Ignore the awkward names of the columns, for they are in Portuguese.
As you can see in the "Referencia" and "CVE" columns, I'm able to extract the values from a specific index of the array, but I want all links of the whole array
Without sample input and expected output it's hard to understand what you need, so trying to guess here...
I think that summarize make_list(...) by ... will help you (see this to learn how to use make_list)
If this is not what you're looking for, please delete the question, and post a new one with minimal sample input (using datatable operator), and expected output, and we'll gladly help.

I want to extract the string from the string and use it under a field

I want to extract a string from a string...and use it under a field named source.
I tried writing like this bu no good.
index = cba_nemis Status: J source = *AAP_ENC_UX_B.* |eval plan=upper
(substr(source,57,2)) |regex source = "AAP_ENC_UX_B.\w+\d+rp"|stats
count by plan,source
for example..
source=/p4products/nemis2/filehandlerU/encpr1/log/AAP_ENC_UX_B.az_in_aza_277U_ rp-20190722-054802.log
source=/p4products/nemis2/filehandlerU/encpr2/log/AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld-20190723-034121.log
I want to extract the string \
AAP_ENC_UX_B.az_in_aza_277U_ rp from 1st
and
AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld from 2nd.
and put it under the column source along with the counts..
I want results like...
source counts
AAP_ENC_UX_B.az_in_aza_277U_ rp 1
AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld 1
You can use the [rex][1] command that extracts a new field from an existing field by applying a regular expression.
...search...
| rex field=source ".+\/(?<source_v2>[\.\w\s]+)-.+"
| stats count by plan, source_v2
Be careful, though: I called the new field source_v2, what you were asking would rewrite the existing source field without you explicitly requesting this. Just change source_v2 to source in my code in case this is what you want.
The search takes this new source_v2 field into account. Try and see if this is what you need. You can tweak it easily to get your expected results.

regular expression to pull words beginning with #

Trying to parse an SQL string and pull out the parameters.
Ex: "select * from table where [Year] between #Yr1 and #Yr2"
I want to pull out "#Yr1" and "#Yr2"
I have tried many patterns, but none has worked, such as:
matches = Regex.Matches(sSQL, "\b#\w*\b")
and
matches = Regex.Matches(sSQL, "\b\#\w*\b")
Any help?
You're trying to put a word boundary after the #, rather than before. Maybe this:
\w(#[A-Z0-9a-z]+)
or
\w(#[^\s]+)
I would have gone with
/^|\s(#\w+)\s|$/
or if you didn't want to include the #
/^|\s#(\w+)\s|$/
though I also like joel's above, so maybe one of these
/^|\s(#[^\s]+)\s|$/
/^|\s#([^\s]+)\s|$/