I want to extract individual numbers from a string. So for:
x = " 99 1.2 99.25 "
I want to get three individual numbers: 99, 1.2, and 99.25.
Here is my current code. It extracts the first occurring number, but I do not know how to use loops to get the three individual numbers.
Sub ExtractNumber()
Dim rng As Range
Dim TestChar As String
Dim IsNumber As Boolean
Dim i, StartChar, LastChar, NumChars As Integer
For Each rng In Selection
IsNumber = False
i = 1
Do While IsNumber = False And i <= Len(rng)
TestChar = Mid(rng, i, 1)
If IsNumeric(TestChar) = True Then
StartChar = i
IsNumber = True
End If
i = i + 1
Loop
IsNumber = False
Do While IsNumber = False And i <= Len(rng)
TestChar = Mid(rng, i, 1)
If IsNumeric(TestChar) = False Or i = Len(rng) Then
If i = Len(rng) Then
LastChar = i
Else
LastChar = i - 1
End If
IsNumber = True
End If
i = i + 1
Loop
NumChars = LastChar - StartChar + 1
rng.Offset(0, 1).Value = Mid(rng, StartChar, NumChars)
Next rng
End Sub
My previous attempt (input is stored in cell A6):
Dim x, y, z As String
x = Range("A6")
y = Len(x)
For i = 1 To Len(x)
If IsNumeric(Mid(x, i, 1)) Then
z = z & Mid(x, i, 1)
End If
Next i
MsgBox z
If speed is not an issue (if the task is not intensive, etc) then you can use this
Public Sub splitme()
Dim a As Variant
Dim x As String
Dim i, j As Integer
Dim b() As Double
x = "1.2 9.0 0.8"
a = Split(x, " ")
j = 0
ReDim b(100)
For i = 0 To UBound(a)
If (a(i) <> "") Then
b(j) = CDbl(a(i))
j = j + 1
End If
Next i
ReDim Preserve b(j - 1)
End Sub
Error checking needs to be included for b(100), to suit your particular needs - and with CDbl.
If this is to be used as part of a loop, or for large x - or both, consider other options like RegEx (previous answer) - as repeated calls to ReDim Preserve are generally best avoided.
Rather than writing your own code to extract the numbers, why not try using Regular Expressions? This website has a lot of great info and tutorials on regular expressions. It can be a bit baffling at first but once you get the hang of it it's a very powerful tool for solving problems of this type.
Below is an example of extracting the information you're after using a regular expression object.
Public Sub ExtractNumbers()
'Regular Expression Objects
Dim objRegEx As Object
Dim objMatches As Object
Dim Match As Object
'String variable for source string
Dim strSource As String
'Iteration variable
Dim i As Integer
'Create Regular Expression Object
Set objRegEx = CreateObject("VBScript.RegExp")
'Set objRegEx properties
objRegEx.Global = True '<~~ We want to find all matches
objRegEx.MultiLine = True '<~~ Allow line breaks in source string
objRegEx.IgnoreCase = False '<~~ Not strictly necessary for this example
'Below pattern matches an integer or decimal number 'word' within a string
' \b matches the start of the word
' [+-]? optionally matches a + or - symbol
' [0-9]+ matches one or more digits in sequence
' (\.[0-9]+)? optionally matches a period/decimal point followed by one or more digits
' \b matches the end of the word
objRegEx.Pattern = "\b[+-]?[0-9]+(\.[0-9]+)?\b"
'Example String
strSource = "x= 99 10.1 20.6 Aardvark"
'Ensure that at least one match exists
If objRegEx.Test(strSource) Then
'Capture all matches in objMatches
Set objMatches = objRegEx.Execute(strSource)
'TODO: Do what you want to do with them
'In this example I'm just printing them to the Immediate Window
'Print using Match object and For..Each
For Each Match In objMatches
Debug.Print Match.Value
Next Match
'Print using numeric iteration (objMatches.Items is a 0-based collection)
For i = 0 To (objMatches.Count - 1)
Debug.Print objMatches.Item(i)
Next i
End If
End Sub
Both of the print variations shown in this example would print the following output to the Immediate window
99
10.1
20.6
Related
I was wondering how to remove duplicate names/text's in a cell. For example
Jean Donea Jean Doneasee
R.L. Foye R.L. Foyesee
J.E. Zimmer J.E. Zimmersee
R.P. Reed R.P. Reedsee D.E. Munson D.E. Munsonsee
While googling, I stumbled upon a macro/code, it's like:
Function RemoveDupes1(pWorkRng As Range) As String
'Updateby20140924
Dim xValue As String
Dim xChar As String
Dim xOutValue As String
Set xDic = CreateObject("Scripting.Dictionary")
xValue = pWorkRng.Value
For i = 1 To VBA.Len(xValue)
xChar = VBA.Mid(xValue, i, 1)
If xDic.exists(xChar) Then
Else
xDic(xChar) = ""
xOutValue = xOutValue & xChar
End If
Next
RemoveDupes1 = xOutValue
End Function
The macro is working, but it is comparing every letter, and if it finds any repeated letters, it's removing that.
When I use the code over those names, the result is somewhat like this:
Jean Dos
R.L Foyes
J.E Zimers
R.P edsDEMuno
By looking at the result I can make out it is not what I want, yet I got no clue how to correct the code.
The desired output should look like:
Jean Donea
R.L. Foye
J.E. Zimmer
R.P. Reed
Any suggestions?
Thanks in Advance.
Input
With the input on the image:
Result
The Debug.Print output
Regex
A regex can be used dynamically iterating on the cell, to work as a Find tool. So it will extract only the shortest match. \w*( OUTPUT_OF_EXTRACTELEMENT )\w*, e.g.: \w*(Jean)\w*
The Regex's reference must be enabled.
Code
Function EXTRACTELEMENT(Txt As String, n, Separator As String) As String
On Error GoTo ErrHandler:
EXTRACTELEMENT = Split(Application.Trim(Mid(Txt, 1)), Separator)(n - 1)
Exit Function
ErrHandler:
' error handling code
EXTRACTELEMENT = 0
On Error GoTo 0
End Function
Sub test()
Dim str As String
Dim objMatches As Object
Set objRegExp = CreateObject("VBScript.RegExp") 'New regexp
lastrow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row
For Row = 1 To lastrow
str = Range("A" & Row)
F_str = ""
N_Elements = UBound(Split(str, " "))
If N_Elements > 0 Then
For k = 1 To N_Elements + 1
strPattern = "\w*(" & EXTRACTELEMENT(CStr(str), k, " ") & ")\w*"
With objRegExp
.Pattern = strPattern
.Global = True
End With
If objRegExp.test(strPattern) Then
Set objMatches = objRegExp.Execute(str)
If objMatches.Count > 1 Then
If objRegExp.test(F_str) = False Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
ElseIf k <= 2 And objMatches.Count = 1 Then
F_str = F_str & " " & objMatches(0).Submatches(0)
End If
End If
Next k
Else
F_str = str
End If
Debug.Print Trim(F_str)
Next Row
End Sub
Note that you can Replace the Debug.Print to write on the target
cell, if it is column B to Cells(Row,2)=Trim(F_str)
Explanation
Function
You can use this UDF, that uses the Split Function to obtain the element separated by spaces (" "). So it can get every element to compare on the cell.
Loops
It will loop from 1 to the number of elements k in each cell and from row 1 to lastrow.
Regex
The Regex is used to find the matches on the cell and Join a new string with the shortest element of each match.
This solution operates on the assumption that 'see' (or some other three-letter string) will always be on the end of the cell value. If that isn't the case then this won't work.
Function RemoveDupeInCell(dString As String) As String
Dim x As Long, ct As Long
Dim str As String
'define str as half the length of the cell, minus the right three characters
str = Trim(Left(dString, WorksheetFunction.RoundUp((Len(dString) - 3) / 2, 0)))
'loop through the entire cell and count the number of instances of str
For x = 1 To Len(dString)
If Mid(dString, x, Len(str)) = str Then ct = ct + 1
Next x
'if it's more than one, set to str, otherwise error
If ct > 1 Then
RemoveDupeInCell = str
Else
RemoveDupeInCell = "#N/A"
End If
End Function
I'm using VBA for Excel.
I have code that does the following:
Take an array of words (called Search_Terms)
I then have a function (see below) that receives the Search_Terms and a reference to a Cell in Excel.
The function then searches the text within the cell.
It finds all substrings that match the words in Search_Terms within the cell and changes their formatting.
The function shown below already works.
However, it is quite slow when I want to search several thousand cells with an array of 20 or 30 words.
I'm wondering if there is a more efficient/idiomatic way to do this (I'm not really familiar w/ VBA and I'm just hacking my way through).
Thank you!
Dim Search_Terms As Variant
Dim starting_numbers() As Integer ' this is an "array?" that holds the starting position of each matching substring
Dim length_numbers() As Integer 'This is an "array" that holds the length of each matching substring
Search_Terms = Array("word1", "word2", "word3")
Call change_all_matches(Search_Terms, c) ' "c" is a reference to a Cell in a Worksheet
Function change_all_matches(terms As Variant, ByRef c As Variant)
ReDim starting_numbers(1 To 1) As Integer ' reset the array
ReDim length_numbers(1 To 1) As Integer ' reset the array
response = c.Value
' This For-Loop Searches through the Text in the Cell and finds the starting position & length of each matching substring
For Each term In terms ' Iterate through each term
Start = 1
Do
pos = InStr(Start, response, term, vbTextCompare) 'See if we have a match
If pos > 0 Then
Start = pos + 1 ' keep looking for more substrings
starting_numbers(UBound(starting_numbers)) = pos
ReDim Preserve starting_numbers(1 To UBound(starting_numbers) + 1) As Integer ' Add each matching "starting position" to our array called "starting_numbers"
length_numbers(UBound(length_numbers)) = Len(term)
ReDim Preserve length_numbers(1 To UBound(length_numbers) + 1) As Integer
End If
Loop While pos > 0 ' Keep searching until we find no substring matches
Next
c.Select 'Select the cell
' This For-Loop iterates through the starting position of each substring and modifies the formatting of all matches
For i = 1 To UBound(starting_numbers)
If starting_numbers(i) > 0 Then
With ActiveCell.Characters(Start:=starting_numbers(i), Length:=length_numbers(i)).Font
.FontStyle = "Bold"
.Color = -4165632
.Size = 13
End With
End If
Next i
Erase starting_numbers
Erase length_numbers
End Function
The code bellow might be a bit faster (I haven't measured it)
What it does:
Turns off Excel features, as suggested by #Ron (ScreenUpdating, EnableEvents, Calculation)
Sets the used range and captures the last used column
Iterates through each column and applies an AutoFilter for each of the words
If there is more than one visible row (the first one being the header)
Iterates through all visible cells in currently auto-filtered column
Checks that the cell doesn't contain error & is not empty (this order, distinct checks)
When it finds the current filter word makes the changes
Moves to the next cell, then next filter word until all search words are done
Moves to the next column, repeats above process
Clears all filters, and turns Excel features back on
Option Explicit
Const ALL_WORDS = "word1,word2,word3"
Public Sub ShowMatches()
Dim ws As Worksheet, ur As Range, lc As Long, wrdArr As Variant, t As Double
t = Timer
Set ws = Sheet1
Set ur = ws.UsedRange
lc = ur.Columns.Count
wrdArr = Split(ALL_WORDS, ",")
enableXL False
Dim c As Long, w As Long, cVal As String, sz As Long, wb As String
Dim pos As Long, vr As Range, cel As Range, wrd As String
For c = 1 To lc
For w = 0 To UBound(wrdArr)
If ws.AutoFilterMode Then ur.AutoFilter 'clear filters
wrd = "*" & wrdArr(w) & "*"
ur.AutoFilter Field:=c, Criteria1:=wrd, Operator:=xlFilterValues
If ur.Columns(c).SpecialCells(xlCellTypeVisible).CountLarge > 1 Then
For Each cel In ur.Columns(c).SpecialCells(xlCellTypeVisible)
If Not IsError(cel.Value2) Then
If Len(cel.Value2) > 0 Then
cVal = cel.Value2: pos = 1
Do While pos > 0
pos = InStr(pos, cVal, wrdArr(w), vbTextCompare)
wb = Mid(cVal, pos + Len(wrdArr(w)), 1)
If pos > 0 And wb Like "[!a-zA-Z0-9]" Then
sz = Len(wrdArr(w))
With cel.Characters(Start:=pos, Length:=sz).Font
.Bold = True
.Color = -4165632
.Size = 11
End With
pos = pos + sz - 1
Else
pos = 0
End If
Loop
End If
End If
Next
End If
ur.AutoFilter 'clear filters
Next
Next
enableXL True
Debug.Print "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Private Sub enableXL(Optional ByVal opt As Boolean = True)
Application.ScreenUpdating = opt
Application.EnableEvents = opt
Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual)
End Sub
Your code uses ReDim Preserve in the first loop (twice)
slight impact on performance for one cell, but for thousands it becomes significant
ReDim Preserve makes a copy of the initial arr with the new dimension, then deletes the first arr
Also, Selecting and Activating cells should be avoided - most of the times are not needed and slow down execution
Edit
I measured the performance between the 2 versions
Total cells: 3,060; each cell with 15 words, total search terms: 30
Initial code: Time: 69.797 sec
My Code: Time: 3.969 sec
Initial code optimized: Time: 3.438 sec
Initial code optimized:
Option Explicit
Const ALL_WORDS = "word1,word2,word3"
Public Sub TestMatches()
Dim searchTerms As Variant, cel As Range, t As Double
t = Timer
enableXL False
searchTerms = Split(ALL_WORDS, ",")
For Each cel In Sheet1.UsedRange
ChangeAllMatches searchTerms, cel
Next
enableXL True
Debug.Print "Time: " & Format(Timer - t, "0.000") & " sec"
End Sub
Public Sub ChangeAllMatches(ByRef terms As Variant, ByRef cel As Range)
Dim termStart() As Long 'this array holds starting positions of each match
Dim termLen() As Long 'this array holds lengths of each matching substring
Dim response As Variant, term As Variant, strt As Variant, pos As Long, i As Long
If IsError(cel.Value2) Then Exit Sub 'Do not process error
If Len(cel.Value2) = 0 Then Exit Sub 'Do not process empty cells
response = cel.Value2
If Len(response) > 0 Then
ReDim termStart(1 To Len(response)) As Long 'create arrays large enough
ReDim termLen(1 To Len(response)) As Long 'to accommodate any matches
i = 1: Dim wb As String
'The loop finds the starting position & length of each matched term
For Each term In terms 'Iterate through each term
strt = 1
Do
pos = InStr(strt, response, term, vbTextCompare) 'Check for match
wb = Mid(response, pos + Len(term), 1)
If pos > 0 And wb Like "[!a-zA-Z0-9]" Then
strt = pos + 1 'Keep looking for more substrings
termStart(i) = pos 'Add match starting pos to array
termLen(i) = Len(term) 'Add match len to array termLen()
i = i + 1
Else
pos = 0
End If
Loop While pos > 0 'Keep searching until we find no more matches
Next
ReDim Preserve termStart(1 To i - 1) 'clean up array
ReDim Preserve termLen(1 To i - 1) 'remove extra items at the end
For i = 1 To UBound(termStart) 'Modify matches based on termStart()
If termStart(i) > 0 Then
With cel.Characters(Start:=termStart(i), Length:=termLen(i)).Font
.Bold = True
.Color = -4165632
.Size = 11
End With
End If
Next i
End If
End Sub
I am using this macro in Excel 2010 to find and replace words without losing cell formatting (e.g. some words are in bold, some are in italics so this macro just makes sure the cell keeps the same formatting when a word is replaced):
' Replacement of characters in the range(s) with storing of original Font
' Arguments:
' Rng - range for replacement
' FindText - string being searched for
' ReplaceText - replacement string
' MatchCase - [False]/True, True to make the search case sensitive
Sub CharactersReplace(Rng As Range, FindText As String, ReplaceText As String, Optional MatchCase As Boolean)
Dim i&, j&, jj&, k&, v$, m&, x As Range
j = Len(FindText)
jj = Len(ReplaceText)
If Not MatchCase Then m = 1
For Each x In Rng.Cells
If VarType(x) = vbString Then
k = 0
i = 1
With x
v = .Value
While i <= Len(v) - j + 1
If StrComp(Mid$(v, i, j), FindText, m) = 0 Then
.Characters(i + k, j).Insert ReplaceText
k = k + jj - j
i = i + j
Else
i = i + 1
End If
Wend
End With
End If
Next
End Sub
' Testing subroutine
Sub Test_CharactersReplace()
CharactersReplace Range("A743:F764"), "Replace This", "With This", True
End Sub
When I run the macro, there is an issue where the code doesn't work when a cell has more than 255 characters.
I've been looking this up online but have got no real solutions for this! Does any know how to solve this?
EDIT::
It's not a simple solution, but basically what you need to do is this:
get an array filled with values for the FontStyle of each character in the cell
use Replace to replace each instance of your "old string" with your "new string"
move the values in the array to reflect the change in string length
write back both the string and the Fontstyle array to your cell
I managed to create something which works, it's a bit of a long one but I don't know of any other way of doing it.
Also, note that this only takes/replaces the Font Style (Bold, Italic etc) - it won't replicate any changes to color, size, font etc. These could easily be incorporated though, by adding more arrays and setting/amending their values inside the existing loops.
Public Sub RunTextChange()
Dim x as Range
For each x in Range("A743:F764")
Call TextChange(x, "Replace This", "With This")
Next x
End Sub
Public Sub textchange(TargetCell As Range, FindTxt As String, ReplaceTxt As String)
''''Variables for text and length
Dim text1 As Variant: Dim text_length As Long
text1 = TargetCell.Value: text_length = Len(text1)
'variables for lengths of find/replace strings and difference
Dim strdiff As Long: Dim ftlen As Long: Dim rtlen As Long
ftlen = Len(FindTxt): rtlen = Len(ReplaceTxt): strdiff = rtlen - ftlen
'font arrays and loop integers
Dim fonts1 As Variant: Dim x As Long: Dim z As Long
Dim fonts2 As Variant
'set font array to length of string
ReDim fonts1(1 To text_length) As Variant
'make font array to correspond to the fontstyle of each character in the cell
For x = 1 To text_length
fonts1(x) = TargetCell.Characters(Start:=x, Length:=1).Font.FontStyle
Next x
'detect first instance of find text- if not present, exit sub
z = InStr(text1, FindTxt)
If z = 0 Then Exit Sub
'continue loop as long as there are more instances of find string
Do While z > 0
'replace each instance of find string in turn (rather than all at once)
text1 = Left(text1, z - 1) & Replace(text1, FindTxt, ReplaceTxt, z, 1)
'if no difference between find and replace string lengths, there is no need to amend the fonts array
If Not strdiff = 0 Then
'otherwise, expand fonts array and push values forward (or back, if the replace string is shorter)
fonts2 = fonts1
ReDim Preserve fonts1(1 To text_length + strdiff) As Variant
For x = z + ftlen To text_length
fonts1(x + strdiff) = fonts2(x)
Next x
'set all the letters in the replacement string to the same font as the first letter in the find string
For x = z To z + rtlen - 1
fonts1(x) = fonts2(z)
Next x
End If
'change text_length to reflect new length of string
text_length = Len(text1)
'change z to search for next instance of find string - if none, will exit loop
z = InStr(z + rtlen, text1, FindTxt)
Loop
'change cell Value to new string
TargetCell.Value = text1
'change all characters to new font styles
For x = 1 To text_length
TargetCell.Characters(Start:=x, Length:=1).Font.FontStyle = fonts1(x)
Next x
End Sub
I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.
This might be a little tricky, even with VBA...
I have comma separated lists in cells based on start times over 5 minutes intervals but I need to remove times that are only 5 apart.
The numbers are text, not time at this point. For example, one list would be 2210, 2215, 2225, 2230, 2240 (the start times).
In this case, 2215 and 2230 should be removed but I also need to remove the opposite numbers (i.e.,2210 and 2225) in other cases (the end times).
Someone helped me with my specs:
A cell contains times: t(1), t(2), t(3), ... t(n). Starting at time t(1), each value in the list is examined. If t(x) is less than 6 minutes after t(x-1) delete t(x) and renumber t(x+1) to t(n).
Input:
2210, 2215, 2225, 2230, 2240
Output:
column1: 2210
column2: 2240
This does what I think you require.
Option Explicit
Sub DeleteSelectedTimes()
Dim RowCrnt As Long
RowCrnt = 2
Do While Cells(RowCrnt, 1).Value <> ""
Cells(RowCrnt, 1).Value = ProcessSingleCell(Cells(RowCrnt, 1).Value, 1)
Cells(RowCrnt, 2).Value = ProcessSingleCell(Cells(RowCrnt, 2).Value, -1)
RowCrnt = RowCrnt + 1
Loop
End Sub
Function ProcessSingleCell(ByVal CellValue As String, ByVal StepFactor As Long) As String
Dim CellList() As String
Dim CellListCrntStg As String
Dim CellListCrntNum As Long
Dim InxCrnt As Long
Dim InxEnd As Long
Dim InxStart As Long
Dim TimeCrnt As Long ' Time in minutes
Dim TimeLast As Long ' Time in minutes
CellList = Split(CellValue, ",")
If StepFactor = 1 Then
InxStart = LBound(CellList)
InxEnd = UBound(CellList)
Else
InxStart = UBound(CellList)
InxEnd = LBound(CellList)
End If
CellListCrntStg = Trim(CellList(InxStart))
If (Not IsNumeric(CellListCrntStg)) Or InStr(CellListCrntStg, ".") <> 0 Then
' Either this sub-value is not numeric or if contains a decimal point
' Either way it cannot be a time.
ProcessSingleCell = CellValue
Exit Function
End If
CellListCrntNum = Val(CellListCrntStg)
If CellListCrntNum < 0 Or CellListCrntNum > 2359 Then
' This value is not a time formatted as hhmm
ProcessSingleCell = CellValue
Exit Function
End If
TimeLast = 60 * (CellListCrntNum \ 100) + (CellListCrntNum Mod 100)
For InxCrnt = InxStart + StepFactor To InxEnd Step StepFactor
CellListCrntStg = Trim(CellList(InxCrnt))
If (Not IsNumeric(CellListCrntStg)) Or InStr(CellListCrntStg, ".") <> 0 Then
' Either this sub-value is not numeric or if contains a decimal point
' Either way it cannot be a time.
ProcessSingleCell = CellValue
Exit Function
End If
CellListCrntNum = Val(CellListCrntStg)
If CellListCrntNum < 0 Or CellListCrntNum > 2359 Then
' This value is not a time formatted as hhmm
ProcessSingleCell = CellValue
Exit Function
End If
TimeCrnt = 60 * (CellListCrntNum \ 100) + (CellListCrntNum Mod 100)
If Abs(TimeCrnt - TimeLast) < 6 Then
' Delete unwanted time from list
CellList(InxCrnt) = ""
Else
' Current time becomes Last time for next loop
TimeLast = TimeCrnt
End If
Next
CellValue = Join(CellList, ",")
If Left(CellValue, 1) = "," Then
CellValue = Mid(CellValue, 2)
CellValue = Trim(CellValue)
End If
Do While InStr(CellValue, ",,") <> 0
CellValue = Replace(CellValue, ",,", ",")
Loop
ProcessSingleCell = CellValue
End Function
Explanation
Sorry for the lack of instructions in the first version. I assumed this question was more about the technique for manipulating the data than about VBA.
DeleteSelectedTimes operates on the active worksheet. It would be easy to change to work on a specific worksheet or a range of worksheets if that is what you require.
DeleteSelectedTimes ignores the first row which I assume contains column headings. Certainly my test worksheet has headings in row 1. It then processes columns A and B of every row until it reaches a row with an empty column A.
ProcessSingleCell has two parameters: a string and a direction. DeleteSelectedTimes uses the direction so values in column A are processed left to right while values in column B are processed right to left.
I assume the #Value error is because ProcessSingleCell does not check that the string is of the format "number,number,number". I have changed ProcessSingleCell so if the string is not of this format, it does change the string.
I have no clear idea of what you do or do not know so come back with more questions as necessary.
Still not clear on your exact requirements, but this might help get you started....
Sub Tester()
Dim arr
Dim out As String, x As Integer, c As Range
Dim n1 As Long, n2 As Long
For Each c In ActiveSheet.Range("A1:A10")
If InStr(c.Value, ",") > 0 Then
arr = Split(c.Value, ",")
x = LBound(arr)
out = ""
Do
n1 = CLng(Trim(arr(x)))
n2 = CLng(Trim(arr(x + 1)))
'here's where your requirements get unclear...
out = out & IIf(Len(out) > 0, ", ", "")
If n2 - n1 <= 5 Then
out = out & n1 'skip second number
x = x + 2
Else
out = out & n1 & ", " & n2 'both
x = x + 1
End If
Loop While x <= UBound(arr) - 1
'pick up any last number
If x = UBound(arr) Then
out = out & IIf(Len(out) > 0, ", ", "") & arr(x)
End If
c.Offset(0, 1).Value = out
End If
Next c
End Sub
Obviously many ways to skin this cat ... I like to use collections for this sort of thing:
Private Sub PareDownList()
Dim sList As String: sList = ActiveCell ' take list from active cell
Dim vList As Variant: vList = Split(sList, ",") ' convert to variant array
' load from var array into collection
Dim cList As New Collection
Dim i As Long
For i = 0 To UBound(vList): cList.Add (Trim(vList(i))): Next
' loop over collection removing unwanted entries
' (in reverse order, since we're removing items)
For i = cList.Count To 2 Step -1
If cList(i) - cList(i - 1) = 5 Then cList.Remove (i)
Next i
' loop to put remaining items back into a string fld
sList = cList(1)
For i = 2 To cList.Count
sList = sList + "," + cList(i)
Next i
' write the new string to the cell under the activecell
ActiveCell.Offset(1) = "'" + sList ' lead quote to ensure output cell = str type
End Sub
' If activecell contains: "2210, 2215, 2225, 2230, 2240"
' the cell below will get: "2210,2225,2240"
Note: this sample code should be enhanced w some extra validation & checking (e.g. as written assumes all good int values sep by commas & relies in implicit str to int conversions). Also as written will convert "2210, 2215, 2220, 2225, 2230, 2240" into "2210, 2040" - you'll need to tweak the loop, loop ctr when removing an item if that's not what you want.