Modify regex code, only one line - vb.net

I have this code
Dim parts As New List(Of String)(Regex.Split(RichTextBox2.Text, "~\d"))
That splits lines in this format into parts:
~1Hello~2~3Bye~4~5Morning~6
So if I do MsgBox(parts(5)), it will show me "Morning".
I want to do the exact same thing, but now my line is arranged like this:
Hello, Bye, Morning,

Change "~\d" to ", ?". The question mark after the space means that the space is optional.
Alternatively, assuming that you are only looking for single words, instead of Regex.Split you could use Regex.Matches with the regular expression "\w+".

Related

VBA replace certain carriage

All.
I am used to programming VBA in Excel, but am new to the structures in Word.
I am working through a library of text files to update them. Many of them are either OCR documents, or were manually entered.
Each has a recurring pattern, the most common of which is unnecessary carriage returns.
For example, I am looking at several text files where there is a double return after each line. A search and replace of all double carriage returns removes all paragraph distinctions.
However, each line is approximately 30 characters long, and if I manually perform the following logic, it gives me a functional document.
If there is a double carriage return after 30+ characters, I replace them with a space.
If there were less than 30 characters prior to the double return, I replace them with a single return.
Can anyone help me with some rudimentary code that would help me get started on that? I could then modify it for each "pattern" of text documents I have.
e.g.
In this case, there are more than
thirty characters per line. And I
will keep going to illustrate this
example.
This would be a new paragraph, and
would be separated by another of
the single returns.
I want code that would return:
In this case, there are more than thirty character returns. And I will keep going to illustrate this example.
This would be a new paragraph, and would be separated by another of the single returns.
Let me know if anyone can throw something out that I can play with!
You can do this without code (which RegEx requires), simply using Word's own wildcard Find/Replace tools, where:
Find = ([!^13]{30,})[^13]{1,}
Replace = \1^32
and, to clean up the residual multi-paragraph breaks:
Find = [^13]{2,}
Replace = ^p
You could, of course, record the above as a macro...
Here is a RegEx that might work for you:
(\n\n)(?<!\.(\n\n))
The substitution is just a plain space, you can try it out (and modify / tweak it) here: https://regex101.com/r/zG9GPw/4
This 'pattern' tells the RegEx engine to look for the newline character \n which occurs x2 like this \n\n (worth noting this is from your question and might be different in your files, e.g. could be \r\n) and it assumes that a valid line break will be proceeded by a full stop: \..
In RegEx the full stop symbol is a single character wild card so it needs to be escaped with the '\' (n and r are normal characters, escaping them tells the RegEx engine they represent newline and return characters).
So... the expression is looking for a group of x2 newline characters but then uses a negative look-behind to exclude any matches where the previous character was a full stop.
Anyway, it's all explained on the site:
Here is how you could do a RegEx find and replace using NotePad++ (I'm not sure if it comes with RegEx or if a plugin is needed, either way it is easy). But you can set a location, filters (to target specific file types), and other options (such as search in sub-directories).
Other than that, as #MacroPod pointed out you could also do this with MS Word, document by document, not using any code :)

Escape all commas in line except first and last

I have a CSV file which I'm trying to import to a SQL Server table. The file contains lines of 3 columns each, separated by a comma. The only problem is that some of the data in the second column contains an arbitrary number of commas. For example:
1281,I enjoy hunting, fishing, and boating,smith317
I would like to escape all occurrences of commas in each line except the first and the last, such that the result of this line would be:
1281,I enjoy hunting\, fishing\, and boating,smith317
I know I will need some type of regular expression to accomplish this task, but my knowledge of regular expressions is very limited. Currently, I'm trying to use Notepad++ find/replace with regex, but I am open to other ideas.
Any help would be greatly appreciated :-)
Okay, could be a manual stuff. Do this:
Normal find all the , and replace it with \,. Escape everything.
Regex find ^(.*)(\\,) and replace it with $1,.
Regex find (\\,)(.*)$ and replace it with ,$2.
Worked for me in Sublime Text 2.

Objective C parse string for middle chars

This is a bit of a puzzler for me. I have a string that looks like:
fanspd<fanspd>3</fanspd>
doorinprocess<doorinprocess>0</doorinprocess>
timeremaining<timeremaining>0</timeremaining>
macaddr<macaddr>60:CB:FB:99:99:C1</macaddr>
ipaddr<ipaddr>10.0.0.6</ipaddr>
model<model>4.4eWHF</model>
softver: <softver>2.14.2</softver>
interlock1: <interlock1>0</interlock1>
interlock2: <interlock2>0</interlock2>
cfm: <cfm>2200</cfm>
power: <power>120</power>
inside: <house_temp>-99</house_temp>
<DNS1>10.0.0.1</DNS1>
attic: <attic_temp>76</attic_temp>
OA: <oa_temp>-99</oa_temp>
server response: <server_response>Ó£àêEE²ç©þ]kõ «jsÐ</server_response>
DIP Switches: <DIPS>11100</DIPS>
Remote Switch: <switch2>1111</switch2>
Setpoint:<Setpoint>0</Setpoint>
The string includes the "/n" so I have split it into corrisponding lines that look like
fanspd<fanspd>0</fanspd>
All I really want is the char(s) in the middle of the line. In the above example it would be 0.
I can match everything with regular expressions but by doing the following:
(.*)(<[a-z]+>)(.*)(</[a-z]+>)
But what I'd like is something more that would exclude or strip away or remove all the junk and grab the middle chars.
(!(.*)(!<[a-z]+>))(.*)(!(</[a-z]+>))
I've tried this and it does not work. I've also thought of doing another [NSstring componentsSeparatedByString:#"(with either < or or >"] but that would leave be with more parsing yet to do and I think there should be a way to get just the chars inbetween the tags with either regular expressions or string compare or some such way to parse out the
Any suggestions or help would be greatly appreciated.
Thanks
Two things.
Your regular expression does not escape the forward slash.
Your regular expression seems overly complicated for what you are trying to do.
If all you want is that lone middle character with regular expressions,
Try this:
<[a-z]+>(.*)<\/[a-z]+>
Here's a great tool to play around with:
http://rubular.com
Heck you could probably even get away with:
<[a-z]+>(.*)<\/
EDIT:
I figured out your problem partially, some of the tags part way down contain characters other than a through z. So here you go:
<.+>(.*)<\/.+>

How Do I get this Split Function to Work? (VB.NET)

So, I made a program that for the most part, converts numbers to letters. My problem before was it was converting each individual digit instead of each number e.g. (1-0-1 instead of 101). Someone suggested that I use the Split function:
Dim numbers As String() = DTB.Split(" ")
So now it's reading the number all the way through being that it will only the split if there's a space in between. My problem now is that it's translating for example: "[102, 103, 104]" as "[102", "103" and "104]" because it will only split if there's a space between. Obviously, you can't convert "[102" or "104]" because they aren't actual numbers.
Does anyone have a solution on what I should do to get this to convert no matter the spacing? Would Regex be the way to go?
use a regular expression with \d+ it will match numbers
so
12234abcsdf23434
will return two matches
12234
23434

vb.net VB 2010 Underscore and small rectangles in string outputs?

I've made some good progress with my first attempt at a program, but have hit another road block. I'm taking standard output (as a string) froma console CMD window (results of dsquery piped to dsget) and have found small rectangles in the output. I tried using Regex to clean the little bastards but it seems they are related to the _ (underscore), which I need to keep (to return 2000/NT logins). Odd thing is - when I copy the caharcter and paste it into VS2K10 Express it acts like a carrige return??
Any ideas on finding out what these little SOB's are -- and how to remove them?
Going to try using /U or /A CMD switch next..
The square is often just used whenever a character is not displayable. The character could very well be a CR. You can use a Regular Expression to just get normal characters or remove the CR LF characters using string.replace.
You mentioned that you are using the string.replace function, and I am wondering if you are replacing the wrong character or something like that. If all your trying to do is remove a carriage return I would skip the regular expressions and stick with the string.replace.
Something like this should work...
strInputString = strInputString.replace(chr(13), "")
If not could you post a line or two of code.
On a side note, this might give some other examples....
Character replacement in strings in VB.NET