Howto break text line into multiple events - splunk

I am new to Splunk and I'm trying to play a little bit with source type and the regex setting of it...Let's say I put following events into HEC:
curl -k https://utu:8088/services/collector/event/1.0 -H "Authorization: Splunk 21755979-ed43-4a1a-8962-e6e45ccf3ccf" -d '{"event": "splunk splunk splunk dog", "sourcetype": "hec_st"}'
curl -k https://utu:8088/services/collector/event/1.0 -H "Authorization: Splunk 21755979-ed43-4a1a-8962-e6e45ccf3ccf" -d '{"event": "splunk splunk splunk cat", "sourcetype": "hec_st"}'
hec_st is the source type with regex:
(splunk)\s+
with SHOULD_LINEMERGE=false
Please why mentioned settings doesn't break string "splunk splunk splunk cat" into multiple events
splunk
splunk
splunk
cat
I'm able to find this string as one event always. Thanks a lot in advance
T.

Firstly, the correct regex for your requirement is
([\s]+)
In case if you want to add it to the props.conf, you can use the following stanza,
[hec_st]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
LINE_BREAKER = (.)\s*
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Miscellaneous
description = Split events by space
pulldown_type = 1
Below are the guidelines to be followed while creating a regex for a source type to Define event boundaries for incoming data.
Specifies a regex that determines how the raw text stream is broken into initial events, before line merging takes place.
*This sets SHOULD_LINEMERGE = false and LINE_BREAKER to the user-provided regular expression.
Defaults to ([\r\n]+), meaning data is broken into an event for each line, delimited by any number of carriage return or newline characters.
The regex must contain a capturing group -- a pair of parentheses which defines an identified subcomponent of the match.
Wherever the regex matches, Splunk considers the start of the first capturing group to be the end of the previous event and considers the end of the first capturing group to be the start of the next event.
The contents of the first capturing group are discarded, and will not be present in any event. You are telling Splunk that this text comes between lines.
Input data:
My name is SV.
Output data:

Alright for anyone solving similar problem...I was successful in the end with following sourcetype stanza:
[hec_type]
BREAK_ONLY_BEFORE_DATE =
DATETIME_CONFIG =
LINE_BREAKER = ([\s+])
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Miscellaneous
description = Split events by space
pulldown_type = 1
with this stanza sentence like "My name is Tomas" is being braked that word = event as I wanted. What really helps is to play with regexp in some online editor and then to put it into props.

Related

Getting specific rows in a Powershell variable/array

I hope I'm able to ask my question as simple as possible. I am very new to working with PowerShell.
Now to my question:
I use Invoke-Sqlcmd to run a query, which puts Data in a variable, let's say $Data.
In this case I query for triggers in an SQL Database.
Then I kind of split the array to get more specific information:
$Data2 = $Data | Where {$_.table -like 'dbo.sportswear'}
$Data3 = $Data2 | Where {$_.event -match "Delete"}
So in the end I have a variable with these Indexes(?), I'm not sure if they are called indexes.
table
trigger_name
activation
event
type
status
definition
Now all I want is to check something in the definition.
So I create a $Data4 = $Data3.definition, so far so good.
But now I have a big text and I want only the content of 2-3 specific rows.
When I used like $Data4[1] or $Data4[1..100], I realized that PowerShell sees every char as a line/row.
But when I just write $Data4 it shows me the content nice formatted with paragraphs, new lines and so on.
Has anyone an idea how I can get specific rows or lines of my variable?
Thank you all :)
It appears $Data4 is a formatted string. Since it is a single string, any indexed element lookups return single characters (of type System.Char). If you want indexes to return longer substrings, you will need to split your string into multiple strings somehow or come up with a more sophisticated search mechanism.
If we assume the rows you are after are actual lines separated by line feed and/or carriage return, you can just split on those newline characters and use indexes to access your lines:
# Array indexing starts at 0 for line 1. So [1] is line 2.
# Outputs lines 2,3,4
($Data4 -split '\r?\n')[1..3]
# Outputs lines 2,7,20
($Data4 -split '\r?\n')[1,6,19]
-split uses regex to match characters and perform a string split on all matches. It results in an array of substrings. \r matches a carriage return. \n matches a line feed. ? matches 0 or one character, which is needed in case there are no carriage returns preceding your line feeds.

How to define what expect will send, based on what is the content of script output?

I have some script, that upon execution returns something like this:
1 - some option
2 - nice option
3 - bad option
4 - other option
What number do you choose?
and it is waiting for the feedback. I want expect to parse this text and always respond with a digit assigned to nice option. The script might change, so sometimes it might be that nice option is a option number 2, sometimes it might be option number 4. How could I do that?
Right now I am doing something like this:
expect -c 'spawn script.sh
set timeout 3600
expect "What number do you choose?"
send "2\r"
expect eof'
But if the script will change and nice option will not be under number 2, then I will have a problem.
I believe that I found the solution, using only expect:
expect -c 'spawn script.sh
expect -re {(\d)\ - nice option}
send "$expect_out(1,string)\r"
expect eof
expect -re will match using regular expression ( \d means "any digit"). Because \d is in capturing group or in other words, in parentheses it is saved in regular expression capturing group number 1 (regexp tutorial link). In expect you can reference up to 9 regex capturing groups, outside of this regex, and they are saved in $expect_out(1,string), $expect_out(2,string) etc up to $expect_out(9,string) (Google Books link). So if we use $expect_out(1,string) instead of $expect_out(0,string) we will send only the digit part that get matched in regexp, instead of whole string that would $expect_out(0,string) return.

Match beginning of string with lookbehind and named group

help needed to match full message in a Lookbehind.
Lets say i have the following simplified string:
1 hostname Here is some Text
at the beginning i could have 1 or 2 digits followed by space, which i would ignore.
then i need the first word captured as "host"
and then i would like to look behind to the first space, so that capture group "message" has everything starting after the first 2 digits and space. i.e. "hostname Here is some Text"
my regex is:
^[1-9]\d{0,2}\s(?<host>[\w][\w\d\.#-]*)\s(?<message>(?<=\s).*$)
this gives me:
host = "hostname"
message = "Here is some Text"
I can't figure out how my lookbehind needs to look like.
Thanks for your help.
ok, i found it. What needs to be done is to put message as the first group, and everything else, including the other groups inside the message group:
^[1-9]\d{0,2}\s(?<message>(?<host>[\w][\w\d\.#-]*)\s.*$)

Extracting Directory String from Text

I have a program that I am making with visual basic 2010 that will pull logs of corrupted files and give the user the location of the corrupted file(s) to fix it. These logs are huge and vary depending on the amount of corruption.
I already have set in code to only pull the lines of text that are flagged as errors but, within these lines, there are directories that point to what file is corrupted. I need to know if there is any way to read these directories and put them into a RichTextBox. Here is an example of a line from a log file:
oa = #0x238282b270->OBJECT_ATTRIBUTES {s:48; rd:NULL; on:[100]"\??\C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f"; a:(OBJ_CASE_INSENSITIVE)}, iosb = #0x238282b250, as = (null), fa = 0,
And here is the part that I need to pull from it:
C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f from this string
I'm pretty new to all of this, so bear with me please.
RegEx provides great flexibility for this sort of thing, but you need to establish a known pattern that defines where the path begins and ends. For instance, if it always is prefixed by on:[100]"\??\ and always ends with ";, then you could extract it with this RegEx pattern:
on:[100]"\\?\?\(.*?)";
Here's what the pattern means:
on:\[100\]"\\\?\?\\ - Matches must begin with on:[100]"\??\ exactly
The extra backslashes are necessary to escape all of the special characters which would otherwise have special meaning. In this case, [, ], \, and ? all have special meaning to RegEx, so they each need to be preceded a the backslash to escape them.
(.*?) - Matches can contain any number of any characters between the preceding on:[100]"\??\ and the following ";. The value of this portion of the input is captured as an unnamed group (i.e. group 1).
( - Begins a capturing group
. - Matches any character
* - Any number of times
? - Matches in a non-greedy fashion (i.e. only captures up through the first instance of whatever follows it in the pattern)
) - Ends the capturing group
"; - Matches must end with these two characters exactly
So, for instance:
Dim input As String = "oa = #0x238282b270->OBJECT_ATTRIBUTES {s:48; rd:NULL; on:[100]""\??\C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f""; a:(OBJ_CASE_INSENSITIVE)}, iosb = #0x238282b250, as = (null), fa = 0,"
Dim m As Match = Regex.Match(input, "on:\[100\]""\\\?\?\\(.*?)"";")
If m.Success Then
Dim path As String = m.Groups(1).Value
End If
Or, if the input can contain multiple matches, you can loop through them like this:
For Each m As Match In Regex.Matches(input, "on:\[100\]""\\\?\?\\(.*?)"";")
Dim path As String = m.Groups(1).Value
Next
That's just an example. Depending upon your needs, you could adjust the RegEx pattern as necessary. RegEx is very flexible, so as long as there's some logical way to recognize where the path is in the string, it should be possible to find it with a RegEx pattern. On a side note, since the pattern is, itself, just a string, it can be stored in a configuration setting outside of the code too, which is an added benefit.

Limitting character input to specific characters

I'm making a fully working add and subtract program as a nice little easy project. One thing I would love to know is if there is a way to restrict input to certain characters (such as 1 and 0 for the binary inputs and A and B for the add or subtract inputs). I could always replace all characters that aren't these with empty strings to get rid of them, but doing something like this is quite tedious.
Here is some simple code to filter out the specified characters from a user's input:
local filter = "10abAB"
local input = io.read()
input = input:gsub("[^" .. filter .. "]", "")
The filter variable is just set to whatever characters you want to be allowed in the user's input. As an example, if you want to allow c, add c: local filter = "10abcABC".
Although I assume that you get input from io.read(), it is possible that you get it from somewhere else, so you can just replace io.read() with whatever you need there.
The third line of code in my example is what actually filters out the text. It uses string:gsub to do this, meaning that it could also be written like this:
input = string.gsub(input, "[^" .. filter .. "]", "").
The benefit of writing it like this is that it's clear that input is meant to be a string.
The gsub pattern is [^10abAB], which means that any characters that aren't part of that pattern will be filtered out, due to the ^ before them and the replacement pattern, which is the empty string that is the last argument in the method call.
Bonus super-short one-liner that you probably shouldn't use:
local input = io.read():gsub("[^10abAB]", "")