Extracting Directory String from Text - vb.net

I have a program that I am making with visual basic 2010 that will pull logs of corrupted files and give the user the location of the corrupted file(s) to fix it. These logs are huge and vary depending on the amount of corruption.
I already have set in code to only pull the lines of text that are flagged as errors but, within these lines, there are directories that point to what file is corrupted. I need to know if there is any way to read these directories and put them into a RichTextBox. Here is an example of a line from a log file:
oa = #0x238282b270->OBJECT_ATTRIBUTES {s:48; rd:NULL; on:[100]"\??\C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f"; a:(OBJ_CASE_INSENSITIVE)}, iosb = #0x238282b250, as = (null), fa = 0,
And here is the part that I need to pull from it:
C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f from this string
I'm pretty new to all of this, so bear with me please.

RegEx provides great flexibility for this sort of thing, but you need to establish a known pattern that defines where the path begins and ends. For instance, if it always is prefixed by on:[100]"\??\ and always ends with ";, then you could extract it with this RegEx pattern:
on:[100]"\\?\?\(.*?)";
Here's what the pattern means:
on:\[100\]"\\\?\?\\ - Matches must begin with on:[100]"\??\ exactly
The extra backslashes are necessary to escape all of the special characters which would otherwise have special meaning. In this case, [, ], \, and ? all have special meaning to RegEx, so they each need to be preceded a the backslash to escape them.
(.*?) - Matches can contain any number of any characters between the preceding on:[100]"\??\ and the following ";. The value of this portion of the input is captured as an unnamed group (i.e. group 1).
( - Begins a capturing group
. - Matches any character
* - Any number of times
? - Matches in a non-greedy fashion (i.e. only captures up through the first instance of whatever follows it in the pattern)
) - Ends the capturing group
"; - Matches must end with these two characters exactly
So, for instance:
Dim input As String = "oa = #0x238282b270->OBJECT_ATTRIBUTES {s:48; rd:NULL; on:[100]""\??\C:\Windows\WinSxS\amd64_3ware.inf.resources_31bf3856ad364e35_10.0.10130.0_en-us_ca9e7cc7a071e60f""; a:(OBJ_CASE_INSENSITIVE)}, iosb = #0x238282b250, as = (null), fa = 0,"
Dim m As Match = Regex.Match(input, "on:\[100\]""\\\?\?\\(.*?)"";")
If m.Success Then
Dim path As String = m.Groups(1).Value
End If
Or, if the input can contain multiple matches, you can loop through them like this:
For Each m As Match In Regex.Matches(input, "on:\[100\]""\\\?\?\\(.*?)"";")
Dim path As String = m.Groups(1).Value
Next
That's just an example. Depending upon your needs, you could adjust the RegEx pattern as necessary. RegEx is very flexible, so as long as there's some logical way to recognize where the path is in the string, it should be possible to find it with a RegEx pattern. On a side note, since the pattern is, itself, just a string, it can be stored in a configuration setting outside of the code too, which is an added benefit.

Related

regex capture middle of url

I'm trying to figure out the base regex to capture the middle of a google url out of a sql database.
For example, a few links:
https://www.google.com/cars/?year=2016&model=dodge+durango&id=1234
https://www.google.com/cars/?year=2014&model=jeep+cherokee+crossover&id=6789
What would be the regex to capture the text to get dodge+durango , or jeep+cherokee+crossover ? (It's alright that the + still be in there.)
My Attempts:
1)
\b[=.]\W\b\w{5}\b[+.]?\w{7}
, but this clearly does not work as this is a hard coded scenario that would only work like something for the dodge durango example. (would extract "dodge+durango)
2) Using positive lookback ,
[^+]( ?=&id )
but I am not fully sure how to use this, as this only grabs one character behind the & symbol.
How can I extract a string of (potentially) any length with any amount of + delimeters between the "model=" and "&id" boundaries?
seems like you could use regexp_replace and access match groups:
regexp_replace(input, 'model=(.*?)([&\\s]|$)', E'\\1')
from here:
The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to indicate
that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain \&
to indicate that the substring matching the entire pattern should be
inserted. Write \ if you need to put a literal backslash in the
replacement text. The flags parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Flag i specifies case-insensitive matching, while flag g
specifies replacement of each matching substring rather than only the
first one
I may be misunderstanding, but if you want to get the model, just select everything between model= and the ampersand (&).
regexp_matches(input, 'model=([^&]*)')
model=: Match literally
([^&]*): Capture
[^&]*: Anything that isn't an ampersand
*: Unlimited times

Limitting character input to specific characters

I'm making a fully working add and subtract program as a nice little easy project. One thing I would love to know is if there is a way to restrict input to certain characters (such as 1 and 0 for the binary inputs and A and B for the add or subtract inputs). I could always replace all characters that aren't these with empty strings to get rid of them, but doing something like this is quite tedious.
Here is some simple code to filter out the specified characters from a user's input:
local filter = "10abAB"
local input = io.read()
input = input:gsub("[^" .. filter .. "]", "")
The filter variable is just set to whatever characters you want to be allowed in the user's input. As an example, if you want to allow c, add c: local filter = "10abcABC".
Although I assume that you get input from io.read(), it is possible that you get it from somewhere else, so you can just replace io.read() with whatever you need there.
The third line of code in my example is what actually filters out the text. It uses string:gsub to do this, meaning that it could also be written like this:
input = string.gsub(input, "[^" .. filter .. "]", "").
The benefit of writing it like this is that it's clear that input is meant to be a string.
The gsub pattern is [^10abAB], which means that any characters that aren't part of that pattern will be filtered out, due to the ^ before them and the replacement pattern, which is the empty string that is the last argument in the method call.
Bonus super-short one-liner that you probably shouldn't use:
local input = io.read():gsub("[^10abAB]", "")

how to cut part of string

How to cut part from this string...
"abb.c.d+de.ee+f.xxx+qaa.+.,,s,"
... where i know position by this:
Result is always between "." (left side of result) and "+" (right side).
I know number of "." from left side and number of "+" from right side, to delimit resulting string.
Problem is right side, cause i need to count "+" from end.
Say...
from left side: begining is at 4th "."
( this is easy ), result is =
"xxx+qaa.+.,,s,"
from right side: end is at second "+" from end!
"xxx[here]+qaa.+.,,s,"
result is =
"xxx"
I try to do this myself with .substring and .indexOf, but with no success...
Any ideas? thanks
You could use the StrReverse function to reverse the character sequence and then count + from the left (using the same method as counting the .).
To find the start of the substring, loop through the string from the left. Count the number of .s you have seen and stop when you've hit the number you want. Store the index in some variable like start.
Similarly to find the end of the substring, loop from the right and count +s.
You can solve this problem using Regex:
Dim r As New Regex("^(.*\.){4}(?<value>.*)(\+.*){2}$")
Dim m As Match = r.Match("abb.c.d+de.ee+f.xxx+qaa.+.,,s,")
Dim result As String = m.Result("${value}")
Explanation
^ Indicates the beginning of the string
(.*\.){4} This means any character (.) repeated any number of times (*) followed by a period (\.). The period has to be escaped with the backslash because otherwise the period would be the any-character wildcard. The .*\. is enclosed in (){4} to say that pattern must repeat four times.
(?<value>.*) This specifies the placeholder for the text we are after. value is the name we are assigning to it. The .* specifies that the value is any number of any characters.
(\+.*){2} This means a plus character (has to be escaped) followed by any number of any characters, repeated twice.
$ Indicates the end of the string

Collect a word between two spaces in objective c

I'm trying to implement stuff similar to spell check, but I need to get the word that is limited by a space. EX: "HI HOW R U", I need to collect HI, HOW and so on as they type. i.e. After user hits HI and space I need to collect HI and do a spell check.
Check the documentation for NSString Here. You want the message componentsSepeparatedByString:.
I don't know objective-C, but I'm fairly sure it'll have a Regexp library - although it'd be straightforward to code it without one.
Regexp: \b([^\s])*\b
\b = word boundary (whitespace, comma, dot, exclamation-mark, etc.)
\s = whitespace character
[...] = character set
[^...] = negated character set (any character(s) EXCEPT ...)
() = grouping construct
* = zero or more times
So the suggested expression would start matching at any word boundary, then match every subsequent character that is not a whitespace character, then match a word boundary.
Your stated case is so simple you may just want to look for spaces (one char at a time) and get the substring, but RegExp is very widely used across a range of languages and platforms, and so it's fairly easy to find an expression when you need to - and one often does for common stuff like checking if zip codes, phone numbers, email addresses and so on are syntactically correct. So it's worth learning in any case. :)

How to extract the img tag in Mail Body in VB.NET

I stored the mail contents(mail body) in database.
I would like to extract the value of "src" attribute of the all image tag() from those mail contents.
One or more image may be included in mail body.
Please let me know how I have to acheive this in VB.NET?
Thanks.
You can use a regular expression.
Try
Dim RegexObj As New Regex("<img[^>]+src=[""']([^""']+)[""']", RegexOptions.Singleline Or RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(SubjectString)
While MatchResults.Success
' SRC attribute is in MatchResults.Groups(1).Value
MatchResults = MatchResults.NextMatch()
End While
Catch ex As ArgumentException
'Syntax error in the regular expression (which there isn't)
End Try
Here's how it works:
<img[^>]+src=["']([^"']+)["']
Match the characters "<img" literally «<img»
Match any character that is not a ">" «[^>]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the characters "src=" literally «src=»
Match a single character present in the list ""'" «["']»
Match the regular expression below and capture its match into backreference number 1 «([^"']+)»
Match a single character NOT present in the list ""'" «[^"']+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character present in the list ""'" «["']»