I have a long string of random letters and I need to remove a couple of the front letters a few at a time. By using the replace function, if I replace a piece of string that then repeats later on, it removes the piece of string entirely from the long string instead of just the beginning.
Is there a way to remove a piece of string without using the replace function? The code below might clear up some of the confusion.
Dim protein As String
protein = "GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
Dim IndexPosition
For Each index In protein
If index = "K" Or index = "R" Then
IndexPosition = InStr(protein, index)
Dim NextPosition = IndexPosition + 1
Dim NextLetter = Mid(protein, NextPosition, 0)
If NextLetter <> "P" Then
Dim PortionToCutOut = Mid(protein, 1, IndexPosition)
protein = Replace(protein, PortionToCutOut, "")
Console.WriteLine(PortionToCutOut)
End If
End If
Next index
Regex might be a simpler way to solve this:
Regex.Replace(protein, "^(.*?)[KR][^P]", "$1")
It means "from the start of the string, for zero or more captured characters up to the first occurrence of K or R followed by anything other than P, replace it with (the captured string)"
GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETL
^^^^^^^^^^^^^^^^^
captured string||
xx
Everything underlined with ^^^ is replaced by everything apart from the xx bit
It makes a single replacement, because that's what I interpreted you required when you said:
By using the replace function, if I replace a piece of string that then repeats later on, it removes the piece of string entirely from the long string instead of just the beginning
However if you do want to replace all occurrences of "K OR R followed by not P" it gets simpler:
Regex.Replace(protein, "[KR][^P]", "")
This is "K or R followed by anything other than P", replace with "nothing"
There are several issues with your code. The first issue that is likely to throw an exception is that you're modifying a collection in a For/Each loop.
The second issue that is less severe in immediate impact, but just as important in my opinion is that you're using almost exclusively legacy Visual Basic methods.
InStr should be replaced with IndexOf: https://learn.microsoft.com/en-us/dotnet/api/system.string.indexof
Mid should be replaced with Substring: https://learn.microsoft.com/en-us/dotnet/api/system.string.substring
The third issue is that you're not using the short-circuit operator OrElse in your conditional statement. Or will evaluate the right-hand side of your condition regardles of if the left-hand side is true whereas OrElse won't bother to evaluate the right-hand side if the left-hand side is true.
In terms of wanting to remove a piece of the String without using Replace, well you'd use Substring as well.
Consider this example:
Dim protein = "GLSDGEWQQVLNVWGKVEADIAGHGQEVLIRLFTGHPETLEKFDKFKHLKTEAEMKASEDLKKHGTVVLTALGGILKKKEGHHEAELKPLAQSHATKHKIPIKYLEFISDAIIHVLHSKHRPGDFGADAQGAMTKALELFRNDIAAKYKELGFQG"
Dim counter = 0
Do While counter < protein.Length - 2
counter += 1
Dim currentLetter = protein(counter)
Dim nextLetter = protein(counter + 1)
If (currentLetter = "K"c OrElse currentLetter = "R"c) AndAlso nextLetter <> "P"c Then
protein = protein.Substring(0, counter) & protein.Substring(counter + 1)
End If
Loop
Example: https://dotnetfiddle.net/vrhRdO
Related
I'm using this query in vb.net
Raw_data = Alltext_line.Substring(Alltext_line.IndexOf("R|1"))
and I want to increase R|1 to R|2, R|3 and so on using for loop.
I tried it many ways but getting error
string to double is invalid
any help will be appreciated
You must first extract the number from the string. If the text part ("R") is always separated from the number part by a "|", you can easily separated the two with Split:
Dim Alltext_line = "R|1"
Dim parts = Alltext_line.Split("|"c)
parts is a string array. If this results in two parts, the string has the expected shape and we can try to convert the second part to a number, increase it and then re-create the string using the increased number
Dim n As Integer
If parts.Length = 2 AndAlso Integer.TryParse(parts(1), n) Then
Alltext_line = parts(0) & "|" & (n + 1)
End If
Note that the c in "|"c denotes a Char constant in VB.
An alternate solution that takes advantage of the String type defined as an Array of Chars.
I'm using string.Concat() to patch together the resulting IEnumerable(Of Char) and CInt() to convert the string to an Integer and sum 1 to its value.
Raw_data = "R|151"
Dim Result As String = Raw_data.Substring(0, 2) & (CInt(String.Concat(Raw_data.Skip(2))) + 1).ToString
This, of course, supposes that the source string is directly convertible to an Integer type.
If a value check is instead required, you can use Integer.TryParse() to perform the validation:
Dim ValuePart As String = Raw_data.Substring(2)
Dim Value As Integer = 0
If Integer.TryParse(ValuePart, Value) Then
Raw_data = Raw_data.Substring(0, 2) & (Value + 1).ToString
End If
If the left part can be variable (in size or content), the answer provided by Olivier Jacot-Descombes is covering this scenario already.
Sub IncrVal()
Dim s = "R|1"
For x% = 1 To 10
s = Regex.Replace(s, "[0-9]+", Function(m) Integer.Parse(m.Value) + 1)
Next
End Sub
So basically what I am trying to do in vb.net is remove multiple nested parentheses and all the text inside those parentheses from a string. It's easy to do if there is just one set of parentheses like in the first example below I just find the index of "(" and ")" and then use the str.remove(firstindex, lastindex) and just keep looping until all parentheses have been removed from the string.
str = "This (3) is(fasdf) an (asdas) Example"
Desired output:
str = "This is an example"
However I still can't figure out how to do it if their are multiple nested parentheses in the string.
str = "This ((dsd)sdasd) is ((sd)) an (((d))) an example"
Desired Outcome:
str = "This is an example"
This isn't really a tutorial site, so I shouldn't be answering this, but I couldn't resist.
As Ahmed said, you could use Regex.Replace, but I find Regexes complex and impenetrable. So it would be difficult for someone else to maintain it.
The following code has three loops. The our loop, a While loop, will run the two inner loops as long as the character index is less than the length of the string.
The first inner loop searches for the first "open bracket" in a group and records the position and adds 1 to the number of "open brackets" (the depth). Any subsequent "open brackets" just adds 1 to the number of brackets. This carries on until the first loop finds a "close bracket"
Then the second loop searches for the same number of "close brackets" from that point where the first "close bracket" is found.
When the loop gets to the last "close bracket" in the group, all the characters from the first "open bracket" to the last "close bracket" in the group are removed. Then the While loop starts again if the current index position is not at the end of the updated inputString.
When the While loop finishes, any double spaces are removed and the updated output string is returned from the function
Private Function RemoveBracketsAntContents(inputString As String) As String
Dim i As Integer
While i < inputString.Length
Dim bracketDepth As Integer = 0
Dim firstBracketIndex As Integer = 0
Do
If inputString(i) = "(" Then
If firstBracketIndex = 0 Then
firstBracketIndex = i
End If
bracketDepth += 1
End If
i += 1
Loop Until i = inputString.Length OrElse inputString(i) = ")"
If i = inputString.Length Then Exit While
Do
If inputString(i) = ")" Then
bracketDepth -= 1
End If
i += 1
Loop Until bracketDepth = 0
inputString = inputString.Remove(firstBracketIndex, i - firstBracketIndex)
i = i - (i - firstBracketIndex)
End While
inputString = inputString.Replace(" ", " ")
Return inputString
End Function
I have a difficult situation and so far no luck in finding a solution.
My VBA collects number figures like $80,000.50. and I'm trying to get VBA to remove the last period to make it look like $80,000.50 but without using right().
The problem is after the last period there are hidden spaces or characters which will be a whole lot of new issue to handle so I'm just looking for something like:
replace("$80,000.50.",".**.",".**")
Is this possible in VBA?
I cant leave a comment so....
what about InStrRev?
Private Sub this()
Dim this As String
this = "$80,000.50."
this = Left(this, InStrRev(this, ".") - 1)
Debug.Print ; this
End Sub
Mid + Find
You can use Mid and Find functions. Like so:
The Find will find the first dot . character. If all the values you are collecting are currency with 2 decimals, stored as text, this will work well.
The formula is: =MID(A2,1,FIND(".",A2)+2)
VBA solution
Function getStringToFirstOccurence(inputUser As String, FindWhat As String) As String
getStringToFirstOccurence = Mid(inputUser, 1, WorksheetFunction.Find(FindWhat, inputUser) + 2)
End Function
Other possible solutions, hints
Trim + Clear + Substitute(Char(160)): Chandoo -
Untrimmable Spaces – Excel Formula
Ultimately, you can implement Regular expressions into Excel UDF: VBScript’s Regular Expression Support
How about:
Sub dural()
Dim r As Range
For Each r In Selection
s = r.Text
l = Len(s)
For i = l To 1 Step -1
If Mid(s, i, 1) = "." Then
r.Value = Mid(s, 1, i - 1) & Mid(s, i + 1)
Exit For
End If
Next i
Next r
End Sub
This will remove the last period and leave all the other characters intact. Before:
and after:
EDIT#1:
This version does not require looping over the characters in the cell:
Sub qwerty()
Dim r As Range
For Each r In Selection
If InStr(r.Value, ".") > 0 Then r.Characters(InStrRev(r.Text, "."), 1).Delete
Next r
End Sub
Shortest Solution
Simply use the Val command. I assume this is meant to be a numerical figure anyway? Get rid of commas and the dollar sign, then convert to value, which will ignore the second point and any other trailing characters! Robustness not tested, but seems to work...
Dim myString as String
myString = "$80,000.50. junk characters "
' Remove commas and dollar signs, then convert to value.
Dim myVal as Double
myVal = Val(Replace(Replace(myString,"$",""),",",""))
' >> myVal = 80000.5
' If you're really set on getting a formatted string back, use Format:
myString = Format(myVal, "$000,000.00")
' >> myString = $80,000.50
From the Documentation,
The Val function stops reading the string at the first character it can't recognize as part of a number. Symbols and characters that are often considered parts of numeric values, such as dollar signs and commas, are not recognized.
This is why we must first remove the dollar sign, and why it ignores all the junk after the second dot, or for that matter anything non numerical at the end!
Working with Strings
Edit: I wrote this solution first but now think the above method is more comprehensive and shorter - left here for completeness.
Trim() removes whitespace at the end of a string. Then you could simply use Left() to get rid of the last point...
' String with trailing spaces and a final dot
Dim myString as String
myString = "$80,000.50. "
' Get rid of whitespace at end
myString = Trim(myString)
' Might as well check if there is a final dot before removing it
If Right(myString, 1) = "." Then
myString = Left(myString, Len(myString) - 1)
End If
' >> myString = "$80,000.50"
I have a macro that changes single quotes in front of a number to an apostrophe (or close single curly quote). Typically when you type something like "the '80s" in word, the apostrophe in front of the "8" faces the wrong way. The macro below works, but it is incredibly slow (like 10 seconds per page). In a regular language (even an interpreted one), this would be a fast procedure. Any insights why it takes so long in VBA on Word 2007? Or if someone has some find+replace skills that can do this without iterating, please let me know.
Sub FixNumericalReverseQuotes()
Dim char As Range
Debug.Print "starting " + CStr(Now)
With Selection
total = .Characters.Count
' Will be looking ahead one character, so we need at least 2 in the selection
If total < 2 Then
Return
End If
For x = 1 To total - 1
a_code = Asc(.Characters(x))
b_code = Asc(.Characters(x + 1))
' We want to convert a single quote in front of a number to an apostrophe
' Trying to use all numerical comparisons to speed this up
If (a_code = 145 Or a_code = 39) And b_code >= 48 And b_code <= 57 Then
.Characters(x) = Chr(146)
End If
Next x
End With
Debug.Print "ending " + CStr(Now)
End Sub
Beside two specified (Why...? and How to do without...?) there is an implied question – how to do proper iteration through Word object collection.
Answer is – to use obj.Next property rather than access by index.
That is, instead of:
For i = 1 to ActiveDocument.Characters.Count
'Do something with ActiveDocument.Characters(i), e.g.:
Debug.Pring ActiveDocument.Characters(i).Text
Next
one should use:
Dim ch as Range: Set ch = ActiveDocument.Characters(1)
Do
'Do something with ch, e.g.:
Debug.Print ch.Text
Set ch = ch.Next 'Note iterating
Loop Until ch is Nothing
Timing: 00:03:30 vs. 00:00:06, more than 3 minutes vs. 6 seconds.
Found on Google, link lost, sorry. Confirmed by personal exploration.
Modified version of #Comintern's "Array method":
Sub FixNumericalReverseQuotes()
Dim chars() As Byte
chars = StrConv(Selection.Text, vbFromUnicode)
Dim pos As Long
For pos = 0 To UBound(chars) - 1
If (chars(pos) = 145 Or chars(pos) = 39) _
And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
' Make the change directly in the selection so track changes is sensible.
' I have to use 213 instead of 146 for reasons I don't understand--
' probably has to do with encoding on Mac, but anyway, this shows the change.
Selection.Characters(pos + 1) = Chr(213)
End If
Next pos
End Sub
Maybe this?
Sub FixNumQuotes()
Dim MyArr As Variant, MyString As String, X As Long, Z As Long
Debug.Print "starting " + CStr(Now)
For Z = 145 To 146
MyArr = Split(Selection.Text, Chr(Z))
For X = LBound(MyArr) To UBound(MyArr)
If IsNumeric(Left(MyArr(X), 1)) Then MyArr(X) = "'" & MyArr(X)
Next
MyString = Join(MyArr, Chr(Z))
Selection.Text = MyString
Next
Selection.Text = Replace(Replace(Selection.Text, Chr(146) & "'", "'"), Chr(145) & "'", "'")
Debug.Print "ending " + CStr(Now)
End Sub
I am not 100% sure on your criteria, I have made both an open and close single quote a ' but you can change that quite easily if you want.
It splits the string to an array on chr(145), checks the first char of each element for a numeric and prefixes it with a single quote if found.
Then it joins the array back to a string on chr(145) then repeats the whole things for chr(146). Finally it looks through the string for an occurence of a single quote AND either of those curled quotes next to each other (because that has to be something we just created) and replaces them with just the single quote we want. This leaves any occurence not next to a number intact.
This final replacement part is the bit you would change if you want something other than ' as the character.
I have been struggling with this for days now. My attempted solution was to use a regular expression on document.text. Then, using the matches in a document.range(start,end), replace the text. This preserves formatting.
The problem is that the start and end in the range do not match the index into text. I think I have found the discrepancy - hidden in the range are field codes (in my case they were hyperlinks). In addition, document.text has a bunch of BEL codes that are easy to strip out. If you loop through a range using the character method, append the characters to a string and print it you will see the field codes that don't show up if you use the .text method.
Amazingly you can get the field codes in document.text if you turn on "show field codes" in one of a number of ways. Unfortunately, that version is not exactly the same as what the range/characters shows - the document.text has just the field code, the range/characters has the field code and the field value. Therefore you can never get the character indices to match.
I have a working version where instead of using range(start,end), I do something like:
Set matchRange = doc.Range.Characters(myMatches(j).FirstIndex + 1)
matchRange.Collapse (wdCollapseStart)
Call matchRange.MoveEnd(WdUnits.wdCharacter, myMatches(j).Length)
matchRange.text = Replacement
As I say, this works but the first statement is dreadfully slow - it appears that Word is iterating through all of the characters to get to the correct point. In doing so, it doesn't seem to count the field codes, so we get to the correct point.
Bottom line, I have not been able to come up with a good way to match the indexing of the document.text string to an equivalent range(start,end) that is not a performance disaster.
Ideas welcome, and thanks.
This is a problem begging for regular expressions. Resolving the .Characters calls that many times is probably what is killing you in performance.
I'd do something like this:
Public Sub FixNumericalReverseQuotesFast()
Dim expression As RegExp
Set expression = New RegExp
Dim buffer As String
buffer = Selection.Range.Text
expression.Global = True
expression.MultiLine = True
expression.Pattern = "[" & Chr$(145) & Chr$(39) & "]\d"
Dim matches As MatchCollection
Set matches = expression.Execute(buffer)
Dim found As Match
For Each found In matches
buffer = Replace(buffer, found, Chr$(146) & Right$(found, 1))
Next
Selection.Range.Text = buffer
End Sub
NOTE: Requires a reference to Microsoft VBScript Regular Expressions 5.5 (or late binding).
EDIT:
The solution without using the Regular Expressions library is still avoiding working with Ranges. This can easily be converted to working with a byte array instead:
Sub FixNumericalReverseQuotes()
Dim chars() As Byte
chars = StrConv(Selection.Text, vbFromUnicode)
Dim pos As Long
For pos = 0 To UBound(chars) - 1
If (chars(pos) = 145 Or chars(pos) = 39) _
And (chars(pos + 1) >= 48 And chars(pos + 1) <= 57) Then
chars(pos) = 146
End If
Next pos
Selection.Text = StrConv(chars, vbUnicode)
End Sub
Benchmarks (100 iterations, 3 pages of text with 100 "hits" per page):
Regex method: 1.4375 seconds
Array method: 2.765625 seconds
OP method: (Ended task after 23 minutes)
About half as fast as the Regex, but still roughly 10ms per page.
EDIT 2: Apparently the methods above are not format safe, so method 3:
Sub FixNumericalReverseQuotesVThree()
Dim full_text As Range
Dim cached As Long
Set full_text = ActiveDocument.Range
full_text.Find.ClearFormatting
full_text.Find.MatchWildcards = True
cached = full_text.End
Do While full_text.Find.Execute("[" & Chr$(145) & Chr$(39) & "][0-9]")
full_text.End = full_text.Start + 2
full_text.Characters(1) = Chr$(96)
full_text.Start = full_text.Start + 1
full_text.End = cached
Loop
End Sub
Again, slower than both the above methods, but still runs reasonably fast (on the order of ms).
The following function was given to me via an answer that I asked earlier today.
What I'm trying to do is to remove a character from a string in Excel using VBA. However, whenever the function runs, it ends up erasing the value stored and returning a #!VALUE error. I cannot seem to figure out what is going on. Anyone mind explaining an alternative:
Function ReplaceAccentedCharacters(S As String) As String
Dim I As Long
With WorksheetFunction
For I = 1 To Len(S)
Select Case Asc(Mid(S, I, 1))
' Extraneous coding removed. Leaving the examples which
' do work and the one that is causing the problem.
Case 32
S = .Replace(S, I, 1, "-")
Case 94
S = .Replace(S, I, 1, "/")
' This is the coding that is generating the error.
Case 34
S = .Replace(S, I, 1, "")
End Select
Next I
End With
ReplaceAccentedCharacters = S
End Function
When the string contains a " (or character code 34 in Decimal, 22 in Hexadecimal... I used both) it is supposed to remove the quotation mark. However, instead, Excel ignores it, and still returns the " mark anyway.
I then tried to go ahead and replace the .Replace() clause with another value.
Case 34
S = .Replace(S, I, 1, "/")
End Select
Using the code above, the script indeed does replace the " with a /.
I ended up finding the following example here in Stack Overflow:
https://stackoverflow.com/a/7386565/692250
And in the answer given, I see the same exact code example similar to the one that I gave and nothing. Excel is still ignoring the quotation mark. I even went so far as to expand the definition with curly braces and still did not get anything.
Try this:
Function blah(S As String) As String
Dim arr, i
'array of [replace, with], [replace, with], etc
arr = Array(Chr(32), "-", Chr(94), "/", Chr(34), "")
For i = LBound(arr) To UBound(arr) Step 2
S = Replace(S, arr(i), arr(i + 1))
Next i
blah = S
End Function
This function was designed to replace one character with another. It was not designed to replace a character with nothing. What happens when you try to replace a character with nothing is that the Counter for iterating through the word will now look (at the last iteration) for a character position that is greater than the length of the word. That returns nothing, and when you try to determine ASC(<nothing>) an error occurs. Other errors in the replacement routine will also occur when the length of the string is changed while the code is running
To modify the routine to replace a character with nothing, I would suggest the following:
In the Case statements:
Case 34
S = .Replace(S, I, 1, Chr(1))
And in the assignment statement:
ReplaceAccentedCharacters = Replace(S, Chr(1), "")
Note that VBA Replace is different from Worksheetfunction Replace