Easy generic print Dictionary to Excel Worksheet using VBA - vba

I'm working with really intricate data. Because of this I wrote this really nice function to print data to the debug area - the imediate window you can reach with Ctrl + G on the VBA, inside Excel. I need a similar function to print this generic data (that has numbers, strings, dictionarys and arrays) to a worksheet.
'call using: Call PrintDict(data)
' Where data can be a number, a string, a dictionary or an Array,
' with any of these inside.
Sub PrintDict(ByVal dicttoprint As Variant, Optional indent As Integer = 0, Optional wasdict As Boolean = False)
Dim i As Long
Dim j As Long
Dim indentStr As String
indentStr = ""
i = 0
Do While i < indent
indentStr = indentStr + " "
i = i + 1
Loop
Dim key
If (TypeName(dicttoprint) = "Dictionary") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For Each key In dicttoprint.Keys:
Debug.Print indentStr & key & " ";
Call PrintDict(dicttoprint.Item(key), indent + 2, True)
Next
ElseIf (TypeName(dicttoprint) = "Variant()") Then
If (wasdict = True) Then
Debug.Print vbNewLine;
End If
For j = LBound(dicttoprint) To UBound(dicttoprint)
Call PrintDict(dicttoprint(j), indent + 2)
Next j
Else
Debug.Print indentStr & dicttoprint & " "
End If
End Sub
Edit1:
Ok, been thinking about, I have an idea, but can't solve some corner cases...
Example expected output below:
key1:____|__________|__________|__________|_________|
_________|key1.1:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
_________|__________|_arr1Indx1|_numvalue_|_________|
_________|__________|_arr1Indx2|_numvalue_|_________|
_________|__________|_arr1Indx3|_numvalue_|_________|
_________|key1.2:___|_numvalue_|__________|_________|
_________|__________|_numvalue_|__________|_________|
key2:____|_numvalue_|__________|__________|_________|
key3:____|__________|__________|__________|_________|
_________|_arr2Indx1|keyA.1:___|_numvalue_|_________|
_________|__________|keyA.2:___|_strvalue_|_________|
_________|_arr2Indx2|_numvalue_|__________|_________|
Ok, I think now this output solves some corner cases. Any ideas on how to implement it?
I'm thinking on having the function be able to pass X,Y parameters, that are optional and to return last Y. When working with text, the cursor naturally goes down, I don't know how to do this through recursion in a worksheet.
Edit 2:
Ok, this is pseudo code idea - is almost VBA, but I don't know how to make this work...
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 0, _
Optional coli As Integer = 0) As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi, coli).Value = key
coli = coli + PrintToWS(data.Item(key), rowi+1, coli)
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
coli = coli + PrintToWS(data(j), rowi+1, coli)
Next j
Else
Cells(rowi, coli).Value = data
coli = coli + 1
End If
PrintToWS = coli
End Function
Edit2:
Added it in a gist here

Solved. Code is below:
'usage: PrintToWS(yourdata)
' Optional parameters are to be used internally by the function,
'leave optional parameters blank.
Function PrintToWS(ByVal data As Variant, _
Optional rowi As Integer = 1, _
Optional coli As Integer = 1, _
Optional wasdict As Integer = 0) As Integer
Dim key
Dim j As Integer
If (TypeName(data) = "Dictionary") Then
For Each key In data.Keys:
Cells(rowi + wasdict, coli).Value = key
rowi = PrintToWS(data.Item(key), rowi + wasdict, coli + 1, 1)
wasdict = 0
Next
ElseIf (TypeName(data) = "Variant()") Then
For j = LBound(data) To UBound(data)
rowi = PrintToWS(data(j), rowi, coli + 1)
Next j
Else
Cells(rowi, coli).Value = data
rowi = rowi + 1
End If
PrintToWS = rowi
End Function

Related

How to convert array from (x,y)(z) dimensions into (x,y) dimensions?

I am working with Bloomberg's API in VBA and I want to be able to take in the arrays that the API gives out from requesting historical data and put it into a table that has field names. However, the array that the API gives me is given in this format: (x,y)(Z) but I cannot use that for inserting into a table. I also want to be able to add another piece of data into the array while I convert from one form to another
I have tried just going through the Bloomberg array and replacing each element in a different array, but the main issues I have are not being able to know how big I need the array to be and how I am going to loop through the bloomberg API without going out of index and getting an error. I have tried using Ubound, but it does not work the way I have intended.
This is the code I have tried using to convert my array and then insert it. It just puts in blank values and does not put in anything into the table
Sub mWriteToTable(vTableName As String, ByVal vArray As Variant, vCUSIPS As Variant, vFields As Variant)
On Error GoTo ErrorHandler
Dim db As DAO.Database
Dim rs As DAO.Recordset
Dim x As Long, y As Long
Dim TEST As String
Dim DataArray() As Variant
Set db = CurrentDb
Set rs = db.OpenRecordset(vTableName, dbOpenDynaset, dbSeeChanges)
TEST = ""
Dim xBound As Integer, yBound As Integer, ThirdBound As Integer, fieldcount As Integer, NewBoundY As Integer, Z As Integer
Dim Boundarynum As Integer
Boundarynum = 0
Dim Boundarynum1 As Integer
Boundarynum1 = 0
fieldcount = UBound(vFields, 1) + 1
xBound = UBound(vArray, 1)
yBound = UBound(vArray, 2)
NewBoundY = fieldcount * (fieldcount + 1)
ReDim DataArray(0 To 20, 0 To (xBound + 1))
'using a static size for the array for now. Will try and make it the same size as the bloomberg array
'TRANSFORMING ARRAY FROM BLOOMBERG
For x = 0 To xBound
For y = 0 To NewBoundY
For Boundarynum1 = 0 To yBound
On Error Resume Next
DataArray(Boundarynum, Boundarynum1) = vArray(x, y)(Boundarynum1)
Next
Boundarynum = Boundarynum + 1
Next
Next
'TRANSFORMING ARRAY FROM BLOOMBERG
'set CUSIP in array
y = 0
Dim counter As Integer
counter = 0
For Z = 0 To 20
If DataArray(Z, 0) = "" Then
Debug.Print ("")
counter = 1
ElseIf counter = 1 And DataArray(Z, 0) <> "" Then
y = y + 1
DataArray(Z, 3) = vCUSIPS(y)
counter = 0
Else
DataArray(Z, 3) = vCUSIPS(y)
End If
Next
'set CUSIP in array
For x = 0 To 20
With rs
.AddNew
For y = 0 To yBound
' On Error GoTo Line1
' If vArray(x, y) = "NA" Then
' TEST = "This is a test"
' End If
'Line1:
.fields(y) = DataArray(x, y)
Next
.Update
End With
Next
'Call fImmediateWindow(vArray)
ErrorHandler:
If Err.Number <> 0 Then
Dim vMsg As String
vMsg = "Error # " & Str(Err.Number) & " was generated by " & Err.Source & Chr(13) & "Error Line: " & Erl & Chr(13) & Err.Description
MsgBox vMsg, , "Error", Err.HelpFile, Err.HelpContext
End If
rs.Close
Set rs = Nothing
db.Close
Set db = Nothing
End Sub
'''
This is the way the Bloomberg Array looks when I get it. I am unsure of how to really work around this. The array from the program above just becomes blank.
Each element of the Bloomberg array is returning 2 sets of data. The key is to have your array have double the number of elements of the top level Bloomberg array.
Sub ConvertBloombergTestData()
Dim r As Variant
r = getBloombergTestData
Dim Values As Variant
Dim n As Long
Dim j As Long
Dim Item
ReDim Values(1 To (UBound(r) + 1) * 2, 1 To 2)
For n = LBound(r) To UBound(r)
j = j + 1
Item = r(n, 0)
Values(j, 1) = Item(0)
Values(j, 2) = Item(1)
Item = r(n, 1)
j = j + 1
Values(j, 1) = Item(0)
Values(j, 2) = Item(1)
Next
End Sub
Not knowing the the array nesting but knowing that we are returning pairs of data, we could add all the data to a collection and create our array bu iterating over the collection.
Sub Test()
Dim r As Variant, Values As Variant
r = getBloombergTestData
Values = ConvertBloombergArrayTo2d(r)
End Sub
Function ConvertBloombergArrayTo2d(BloombergArray)
Dim Map As New Collection
FlattenArray Map, BloombergArray
Dim Results As Variant
ReDim Results(1 To Map.Count / 2, 1 To 2)
Dim n As Long, j As Long
For n = 1 To Map.Count Step 2
j = j + 1
Results(j, 1) = Map.Item(n)
Results(j, 2) = Map.Item(n + 1)
Next
ConvertBloombergArrayTo2d = Results
End Function
Sub FlattenArray(Map As Collection, Element As Variant)
If Right(TypeName(Element), 2) = "()" Then
Dim Item
For Each Item In Element
FlattenArray Map, Item
Next
Else
Map.Add Element
End If
End Sub

Generate list from strings and numbers vba

This question is based on this puzzle that I am trying to do in vba: https://codegolf.stackexchange.com/questions/166765/fun-with-strings-and-numbers
Basically we have strings in col A and numbers in column B and in column C we have to generate a list so that:
The total count of any string should be exactly equal to its
corresponding number in the input data.
No string should be repeated adjacently in the sequence, and every
string should appear in the output list.
The selection of the next string should be done randomly as long as
they don't break above two rules. Each solution should have a
non-zero probability of being chosen.
If no combination is possible, the output should be just 0.
I tried this but I don't how to solve the problem so that it doesn't break rule #2. Any input would be appreciated thanks.
Sub generateList()
Application.ScreenUpdating = False
Dim fI As Long, totTimes As Long, i As Long, j As Long, fO As Long, tryCount As Long
Dim myArr()
Dim randNum As Long
OUT.Range("A1:A" & OUT.Rows.Count).Clear
fO = 1
With DATA
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
.Sort.SortFields.Clear
.Sort.SortFields.Add Key:=Range("B2:B" & fI), SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With DATA.Sort
.SetRange DATA.Range("A1:B" & fI)
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
fI = .Range("A" & .Rows.Count).End(xlUp).Row
If fI < 2 Then MsgBox "No data!": Exit Sub
totTimes = 0: j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then j = j + 1
Next i
If j < 1 Then MsgBox "No valid data present. Make sure column B has numbers and column A some string.": Exit Sub
ReDim Preserve myArr(1 To j, 1 To 2)
j = 0
For i = 2 To fI
If Trim(.Range("A" & i).Value) <> "" And IsNumeric(.Range("B" & i).Value) Then
totTimes = totTimes + CLng(.Range("B" & i).Value)
j = j + 1
myArr(j, 1) = .Range("A" & i)
myArr(j, 2) = .Range("B" & i)
End If
Next i
Do While totTimes > 0
randNum = WorksheetFunction.RandBetween(1, j)
If myArr(randNum, 2) > 0 Then
totTimes = totTimes - 1
OUT.Range("A" & fO) = myArr(randNum, 1)
myArr(randNum, 2) = myArr(randNum, 2) - 1
fO = fO + 1
End If
tryAgain:
Loop
End With
Application.ScreenUpdating = True
OUT.Activate
MsgBox "Process Completed"
End Sub
I have a solution (that isn't based on yours, unfortunately) that gives correct results... some of the time. I think I know why it falls short, I just have given up on fixing it.
It's also terrible for golfing, since it's a rather large amount of code, and it's an unholy mishmash of different approaches and implementation ideas that I made up as I went (and I never cleaned it up properly)... but maybe some of this will inspire you to get further.
As per rule #3, I select each letter at random. It was hit and miss using only that approach so I moved to weighted probabilities, which is what the code further down uses - and it seems to work somewhat well. Occasionally there will be 1 letter too many for one of the elements, or there will be adjacent equal elements, so it doesn't actually solve the puzzle all the time.
Ideas to remedy this problem:
Adjust the probability weights based on the frequency each letter has already been used. If you set dbg to true, you'll see that I implemented some calculations with that in mind, but never got around to figuring out how to actually adjust the weights themselves.
Hardcode a check or two for how many letters have been used early in the result, for the largest element group
Change the rand section to make more than 1 pass (maybe best out of 3) - the weights are sorted by "size", so doing 3 (or n) passes should increasingly favor the larger element groups
Maybe a combination of the first and the last suggestion.
Here's the code:
Sub NonRepeatSort(v() As String)
Dim lElementCount As Long
Dim lElement As Element ' Largest
Dim tElement As Long ' Total element count
Dim tEleGroups As Long ' Number of groups of elements
Dim tEle As Element
Dim e As Element
Dim EleCol As New Collection
Dim dbg As Boolean
dbg = False
Dim s As String, res As String, previousRes As String, inputString As String
Dim lCounter As Long
For i = 1 To UBound(v)
' Check if element already exists
On Error Resume Next
s = ""
s = EleCol.Item(v(i, 1))
On Error GoTo 0
' If not, create new
If s = "" Then
Set tEle = New Element
With tEle
.SetName = v(i, 1)
.SetTotal = CLng(v(i, 2))
End With
EleCol.Add Item:=tEle, Key:=tEle.Name
End If
Next i
For Each e In EleCol
' Find the largest element
If e.Total > lElementCount Then
lElementCount = e.Total
Set lElement = e
End If
' Count total elements
tElement = tElement + e.Total
' And groups
tEleGroups = tEleGroups + 1
' Generate inputstring
For k = 1 To e.Total
inputString = inputString + e.Name
Next k
Next e
' If the largest element is larger than the total remaining elements, we'll break rule 4
If lElement.Total - (tElement - lElement.Total) > 1 Then
Debug.Print "0"
GoTo EndForSomeReason
End If
' Bubble sort - lowest to highest
' Adapted from https://stackoverflow.com/a/3588073/4604845
Dim tmpE As Element
For x = 1 To EleCol.Count - 1
For y = 1 To EleCol.Count
If EleCol.Item(x).Total > EleCol.Item(y).Total Then
Set tmpE = EleCol.Item(y)
EleCol.Remove y
EleCol.Add tmpE, tmpE.Name, x
End If
Next y
Next x
' Weighted probability array
Dim pArr() As Variant, tmpProb As Double
ReDim Preserve pArr(1 To 2, 1 To EleCol.Count)
For u = 1 To UBound(pArr, 2)
Set pArr(2, u) = EleCol.Item(u)
tmpProb = tmpProb + pArr(2, u).Freq(tElement)
pArr(1, u) = tmpProb
Next u
' The meat of it
Dim r As Long, lBool As Boolean, sLen As Long, o As Double, t As Long
For j = 1 To tElement
Do
' Reset loop control
lBool = False
' Generate a random number between 1 and 100 _
to decide which group we pick a letter from
r = Rand1To100
For i = 1 To UBound(pArr, 2)
If r <= pArr(1, i) And Not r > pArr(1, i) Then
If dbg Then Debug.Print "Probability match: " & pArr(2, t).Name
t = i
Exit For
End If
Next i
Set tEle = EleCol.Item(t)
If dbg Then Debug.Print "Name: " & tEle.Name
' If the random group is different from the previous result, proceed
If tEle.Name <> previousRes Then
lBool = True
Else
If dbg Then Debug.Print "This was also the previous result - skipping"
End If
' If the use-frequency for the random group is lower than _
how many times it appears in the string, proceed
If lBool Then
o = Round((tEle.Used / tElement) * 100, 5)
If dbg Then Debug.Print "Freq: " & tEle.Freq(tElement)
If dbg Then Debug.Print "Used: " & tEle.UsedFreqI()
If dbg Then Debug.Print "res%: " & Round((Len(res) / tElement) * 100, 1)
If dbg Then Debug.Print "o : " & o
' check use-frequency against modeled frequency
If o < tEle.Freq(tElement) Then
If dbg Then Debug.Print "Proceed with " & tEle.Name
lBool = True
Else
lBool = False
End If
End If
If dbg Then Debug.Print "----------"
lCounter = lCounter + 1
Loop While (Not lBool And lCounter < 1000)
tEle.IncrementUsed
res = res + tEle.Name
previousRes = tEle.Name
Next j
' Generate results
Debug.Print "INPUT : " & inputString
Debug.Print "RESULT: " & res
EndForSomeReason:
End Sub
Function Rand1To100() As Long
Dim r As Long
Randomize
r = ((100 - 1) * Rnd + 1)
r = Round(r, 0)
Rand1To100 = r
End Function
Private Sub TestSort()
Dim v(1 To 4, 1 To 2) As String
v(1, 1) = "A"
v(1, 2) = "6"
v(2, 1) = "B"
v(2, 2) = "2"
v(3, 1) = "C"
v(3, 2) = "2"
v(4, 1) = "D"
v(4, 2) = "4"
Call NonRepeatSort(v)
End Sub
And you'll need this class module:
' * Class module named Element
Private pName As String
Private pTotal As Long
Private pUsed As Long
Private FrequencyCoefficient As Long ' Obsolete?
' Name
Public Property Get Name() As String
Name = pName
End Property
Public Property Let SetName(s As String)
pName = s
End Property
' Total
Public Property Get Total() As Long
Total = pTotal
End Property
Public Property Let SetTotal(t As Long)
pTotal = t
End Property
' Used
Public Property Get Used() As Long
Used = pUsed
End Property
Public Sub IncrementUsed()
pUsed = pUsed + 1
End Sub
' Freq coefficient
Public Property Get Freq(f As Long) As Double
' Where f is the total number of elements
'Freq = FrequencyCoefficient
Freq = Round((Me.Total / f) * 100, 5)
End Property
Private Property Let SetFreq(f As Long)
' Obsolete?
' Where f is the total number of elements
FrequencyCoefficient = Round((Me.Total / f) * 100)
End Property
' Used freq - internal
Public Property Get UsedFreqI() As Long
If Me.Used > 0 Then
UsedFreqI = Round((Me.Used / Me.Total) * 100)
'Debug.Print "UF: " & UsedFreqI
Else
UsedFreqI = 0
End If
End Property
' Used freq - external
Public Property Get UsedFreqE(f As Long) As Long
If Me.Used > 0 Then
UsedFreq = Round((Me.Used / f) * 100)
Else
UsedFreq = 0
End If
End Property

Get the value between the parentheses, multiple matches in one string

My spreadsheet has a column with value like this string:
some text (text1) some test (text2) (text1)
How do I get all values between parentheses? The result I am looking for is:
text1, text2
Even if text1, text2... testn is present in the cell multiple times, I need it in the result only once.
I found a function GetParen here: Get the value between the brackets
It is helpful, but it gives the fist available value in the parentheses and ignores the rest.
It seems unwieldy to have one User Defined Function for individual entries and another for a collective result of all entries.
Paste the following into a standard module code sheet.
Function getBracketedText(str As String, _
Optional pos As Integer = 0, _
Optional delim As String = ", ", _
Optional dupes As Boolean = False)
Dim tmp As String, txt As String, a As Long, b As Long, p As Long, arr() As Variant
tmp = str
ReDim arr(1 To 1)
For b = 1 To (Len(tmp) - Len(Replace(tmp, Chr(40), vbNullString)))
p = InStr(p + 1, tmp, Chr(40))
txt = Trim(Mid(tmp, p + 1, InStr(p + 1, tmp, Chr(41)) - (p + 1)))
If UBound(Filter(arr, txt, True)) < 0 Or dupes Then '<~~ check for duplicates within the array
a = a + 1
ReDim Preserve arr(1 To a)
arr(UBound(arr)) = txt
End If
Next b
If CBool(pos) Then
getBracketedText = arr(pos)
Else
getBracketedText = Join(arr, delim)
End If
End Function
Use like any other native worksheet function. There are optional parameters to retrieve an individual element or a collection as well as changing the default <comma><space> delimiter.
    
This code works for me:
Sub takingTheText()
Dim iniP 'first parenthesis
Dim endP 'last parentehis
Dim myText 'the text
Dim txtLen
Dim i
Dim tmp
Dim j
myText = Range("A1").Value
txtLen = Len(myText)
j = 0
Do 'Loop in the text
i = i + 1 'a counter
iniP = InStr(1, myText, "(", 1) 'found the first occurence of the (
endP = InStr(1, myText, ")", 1) 'same as above
tmp = tmp & Right(Left(myText, i), 1) 'take the text garbage text
If i = iniP Then 'here comes the work
j = j + 1 'here take the cell index
myText = Replace(myText, tmp, "") 'remove the garbage text in front the first (
tmp = Left(myText, endP - iniP - 1) 'reuse the var to store the usefull text
Cells(1, 2).Value = Cells(1, 2).Value & Chr(10) & tmp 'store in the cell B1
'If you want to stored in separated cells use the below code
'Cells(j, 2).Value = tmp
myText = Replace(myText, tmp & ")", "", 1, 1) ' remove the garbage text from the main text
tmp = Empty 'empty the var
i = 0 'reset the main counter
End If
Loop While endP <> 0
End Sub
Result:
Please check and tellme if is ok.
Edit#1
Cells(1, 2).Value = Cells(1, 2).Value & Chr(10) & tmp this code store the text in separated lines inside the same cell, may be you want to use spaces between the resulting text because of chr(10) (also you can use chr(13)), then you can use Cells(1, 2).Value = Cells(1, 2).Value & " " & tmp, or use any other character instead the string inside the & symbols

Highlight (not delete) repeat sentences or phrases

I am getting the impression that this is not possible in word but I figure if you are looking for any 3-4 words that come in the same sequence anywhere in a very long paper I could find duplicates of the same phrases.
I copy and pasted a lot of documentation from past papers and was hoping to find a simple way to find any repeated information in this 40+ page document there is a lot of different formatting but I would be willing to temporarily get rid of formatting in order to find repeated information.
To highlight all duplicate sentences, you can also use ActiveDocument.Sentences(i). Here is an example
LOGIC
1) Get all the sentences from the word document in an array
2) Sort the array
3) Extract Duplicates
4) Highlight duplicates
CODE
Option Explicit
Sub Sample()
Dim MyArray() As String
Dim n As Long, i As Long
Dim Col As New Collection
Dim itm
n = 0
'~~> Get all the sentences from the word document in an array
For i = 1 To ActiveDocument.Sentences.Count
n = n + 1
ReDim Preserve MyArray(n)
MyArray(n) = Trim(ActiveDocument.Sentences(i).Text)
Next
'~~> Sort the array
SortArray MyArray, 0, UBound(MyArray)
'~~> Extract Duplicates
For i = 1 To UBound(MyArray)
If i = UBound(MyArray) Then Exit For
If InStr(1, MyArray(i + 1), MyArray(i), vbTextCompare) Then
On Error Resume Next
Col.Add MyArray(i), """" & MyArray(i) & """"
On Error GoTo 0
End If
Next i
'~~> Highlight duplicates
For Each itm In Col
Selection.Find.ClearFormatting
Selection.HomeKey wdStory, wdMove
Selection.Find.Execute itm
Do Until Selection.Find.Found = False
Selection.Range.HighlightColorIndex = wdPink
Selection.Find.Execute
Loop
Next
End Sub
'~~> Sort the array
Public Sub SortArray(vArray As Variant, i As Long, j As Long)
Dim tmp As Variant, tmpSwap As Variant
Dim ii As Long, jj As Long
ii = i: jj = j: tmp = vArray((i + j) \ 2)
While (ii <= jj)
While (vArray(ii) < tmp And ii < j)
ii = ii + 1
Wend
While (tmp < vArray(jj) And jj > i)
jj = jj - 1
Wend
If (ii <= jj) Then
tmpSwap = vArray(ii)
vArray(ii) = vArray(jj): vArray(jj) = tmpSwap
ii = ii + 1: jj = jj - 1
End If
Wend
If (i < jj) Then SortArray vArray, i, jj
If (ii < j) Then SortArray vArray, ii, j
End Sub
SNAPSHOTS
BEFORE
AFTER
I did not use my own DAWG suggestion, and I am still interested in seeing if someone else has a way to do this, but I was able to come up with this:
Option Explicit
Sub test()
Dim ABC As Scripting.Dictionary
Dim v As Range
Dim n As Integer
n = 5
Set ABC = FindRepeatingWordChains(n, ActiveDocument)
' This is a dictionary of word ranges (not the same as an Excel range) that contains the listing of each word chain/phrase of length n (5 from the above example).
' Loop through this collection to make your selections/highlights/whatever you want to do.
If Not ABC Is Nothing Then
For Each v In ABC
v.Font.Color = wdColorRed
Next v
End If
End Sub
' This is where the real code begins.
Function FindRepeatingWordChains(ChainLenth As Integer, DocToCheck As Document) As Scripting.Dictionary
Dim DictWords As New Scripting.Dictionary, DictMatches As New Scripting.Dictionary
Dim sChain As String
Dim CurWord As Range
Dim MatchCount As Integer
Dim i As Integer
MatchCount = 0
For Each CurWord In DocToCheck.Words
' Make sure there are enough remaining words in our document to handle a chain of the length specified.
If Not CurWord.Next(wdWord, ChainLenth - 1) Is Nothing Then
' Check for non-printing characters in the first/last word of the chain.
' This code will read a vbCr, etc. as a word, which is probably not desired.
' However, this check does not exclude these 'words' inside the chain, but it can be modified.
If CurWord <> vbCr And CurWord <> vbNewLine And CurWord <> vbCrLf And CurWord <> vbLf And CurWord <> vbTab And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCr And CurWord.Next(wdWord, ChainLenth - 1) <> vbNewLine And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbCrLf And CurWord.Next(wdWord, ChainLenth - 1) <> vbLf And _
CurWord.Next(wdWord, ChainLenth - 1) <> vbTab Then
sChain = CurWord
For i = 1 To ChainLenth - 1
' Add each word from the current word through the next ChainLength # of words to a temporary string.
sChain = sChain & " " & CurWord.Next(wdWord, i)
Next i
' If we already have our temporary string stored in the dictionary, then we have a match, assign the word range to the returned dictionary.
' If not, then add it to the dictionary and increment our index.
If DictWords.Exists(sChain) Then
MatchCount = MatchCount + 1
DictMatches.Add DocToCheck.Range(CurWord.Start, CurWord.Next(wdWord, ChainLenth - 1).End), MatchCount
Else
DictWords.Add sChain, sChain
End If
End If
End If
Next CurWord
' If we found any matching results, then return that list, otherwise return nothing (to be caught by the calling function).
If DictMatches.Count > 0 Then
Set FindRepeatingWordChains = DictMatches
Else
Set FindRepeatingWordChains = Nothing
End If
End Function
I have tested this on a 258 page document (TheStory.txt) from this source, and it ran in just a few minutes.
See the test() sub for usage.
You will need to reference the Microsoft Scripting Runtime to use the Scripting.Dictionary objects. If that is undesirable, small modifications can be made to use Collections instead, but I prefer the Dictionary as it has the useful .Exists() method.
I chose a rather lame theory, but it seems to work (at least if I got the question right cuz sometimes I'm a slow understander).
I load the entire text into a string, load the individual words into an array, loop through the array and concatenate the string, containing each time three consecutive words.
Because the results are already included in 3 word groups, 4 word groups or more will automatically be recognized.
Option Explicit
Sub Find_Duplicates()
On Error GoTo errHandler
Dim pSingleLine As Paragraph
Dim sLine As String
Dim sFull_Text As String
Dim vArray_Full_Text As Variant
Dim sSearch_3 As String
Dim lSize_Array As Long
Dim lCnt As Long
Dim lCnt_Occurence As Long
'Create a string from the entire text
For Each pSingleLine In ActiveDocument.Paragraphs
sLine = pSingleLine.Range.Text
sFull_Text = sFull_Text & sLine
Next pSingleLine
'Load the text into an array
vArray_Full_Text = sFull_Text
vArray_Full_Text = Split(sFull_Text, " ")
lSize_Array = UBound(vArray_Full_Text)
For lCnt = 1 To lSize_Array - 1
lCnt_Occurence = 0
sSearch_3 = Trim(fRemove_Punctuation(vArray_Full_Text(lCnt - 1) & _
" " & vArray_Full_Text(lCnt) & _
" " & vArray_Full_Text(lCnt + 1)))
With Selection.Find
.Text = sSearch_3
.Forward = True
.Replacement.Text = ""
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
Do While .Execute
lCnt_Occurence = lCnt_Occurence + 1
If lCnt_Occurence > 1 Then
Selection.Range.Font.Color = vbRed
End If
Selection.MoveRight
Loop
End With
Application.StatusBar = lCnt & "/" & lSize_Array
Next lCnt
errHandler:
Stop
End Sub
Public Function fRemove_Punctuation(sString As String) As String
Dim vArray(0 To 8) As String
Dim lCnt As Long
vArray(0) = "."
vArray(1) = ","
vArray(2) = ","
vArray(3) = "?"
vArray(4) = "!"
vArray(5) = ";"
vArray(6) = ":"
vArray(7) = "("
vArray(8) = ")"
For lCnt = 0 To UBound(vArray)
If Left(sString, 1) = vArray(lCnt) Then
sString = Right(sString, Len(sString) - 1)
ElseIf Right(sString, 1) = vArray(lCnt) Then
sString = Left(sString, Len(sString) - 1)
End If
Next lCnt
fRemove_Punctuation = sString
End Function
The code assumes a continuous text without bullet points.

Remove selected numbers from a comma separated list management in Excel?

This might be a little tricky, even with VBA...
I have comma separated lists in cells based on start times over 5 minutes intervals but I need to remove times that are only 5 apart.
The numbers are text, not time at this point. For example, one list would be 2210, 2215, 2225, 2230, 2240 (the start times).
In this case, 2215 and 2230 should be removed but I also need to remove the opposite numbers (i.e.,2210 and 2225) in other cases (the end times).
Someone helped me with my specs:
A cell contains times: t(1), t(2), t(3), ... t(n). Starting at time t(1), each value in the list is examined. If t(x) is less than 6 minutes after t(x-1) delete t(x) and renumber t(x+1) to t(n).
Input:
2210, 2215, 2225, 2230, 2240
Output:
column1: 2210
column2: 2240
This does what I think you require.
Option Explicit
Sub DeleteSelectedTimes()
Dim RowCrnt As Long
RowCrnt = 2
Do While Cells(RowCrnt, 1).Value <> ""
Cells(RowCrnt, 1).Value = ProcessSingleCell(Cells(RowCrnt, 1).Value, 1)
Cells(RowCrnt, 2).Value = ProcessSingleCell(Cells(RowCrnt, 2).Value, -1)
RowCrnt = RowCrnt + 1
Loop
End Sub
Function ProcessSingleCell(ByVal CellValue As String, ByVal StepFactor As Long) As String
Dim CellList() As String
Dim CellListCrntStg As String
Dim CellListCrntNum As Long
Dim InxCrnt As Long
Dim InxEnd As Long
Dim InxStart As Long
Dim TimeCrnt As Long ' Time in minutes
Dim TimeLast As Long ' Time in minutes
CellList = Split(CellValue, ",")
If StepFactor = 1 Then
InxStart = LBound(CellList)
InxEnd = UBound(CellList)
Else
InxStart = UBound(CellList)
InxEnd = LBound(CellList)
End If
CellListCrntStg = Trim(CellList(InxStart))
If (Not IsNumeric(CellListCrntStg)) Or InStr(CellListCrntStg, ".") <> 0 Then
' Either this sub-value is not numeric or if contains a decimal point
' Either way it cannot be a time.
ProcessSingleCell = CellValue
Exit Function
End If
CellListCrntNum = Val(CellListCrntStg)
If CellListCrntNum < 0 Or CellListCrntNum > 2359 Then
' This value is not a time formatted as hhmm
ProcessSingleCell = CellValue
Exit Function
End If
TimeLast = 60 * (CellListCrntNum \ 100) + (CellListCrntNum Mod 100)
For InxCrnt = InxStart + StepFactor To InxEnd Step StepFactor
CellListCrntStg = Trim(CellList(InxCrnt))
If (Not IsNumeric(CellListCrntStg)) Or InStr(CellListCrntStg, ".") <> 0 Then
' Either this sub-value is not numeric or if contains a decimal point
' Either way it cannot be a time.
ProcessSingleCell = CellValue
Exit Function
End If
CellListCrntNum = Val(CellListCrntStg)
If CellListCrntNum < 0 Or CellListCrntNum > 2359 Then
' This value is not a time formatted as hhmm
ProcessSingleCell = CellValue
Exit Function
End If
TimeCrnt = 60 * (CellListCrntNum \ 100) + (CellListCrntNum Mod 100)
If Abs(TimeCrnt - TimeLast) < 6 Then
' Delete unwanted time from list
CellList(InxCrnt) = ""
Else
' Current time becomes Last time for next loop
TimeLast = TimeCrnt
End If
Next
CellValue = Join(CellList, ",")
If Left(CellValue, 1) = "," Then
CellValue = Mid(CellValue, 2)
CellValue = Trim(CellValue)
End If
Do While InStr(CellValue, ",,") <> 0
CellValue = Replace(CellValue, ",,", ",")
Loop
ProcessSingleCell = CellValue
End Function
Explanation
Sorry for the lack of instructions in the first version. I assumed this question was more about the technique for manipulating the data than about VBA.
DeleteSelectedTimes operates on the active worksheet. It would be easy to change to work on a specific worksheet or a range of worksheets if that is what you require.
DeleteSelectedTimes ignores the first row which I assume contains column headings. Certainly my test worksheet has headings in row 1. It then processes columns A and B of every row until it reaches a row with an empty column A.
ProcessSingleCell has two parameters: a string and a direction. DeleteSelectedTimes uses the direction so values in column A are processed left to right while values in column B are processed right to left.
I assume the #Value error is because ProcessSingleCell does not check that the string is of the format "number,number,number". I have changed ProcessSingleCell so if the string is not of this format, it does change the string.
I have no clear idea of what you do or do not know so come back with more questions as necessary.
Still not clear on your exact requirements, but this might help get you started....
Sub Tester()
Dim arr
Dim out As String, x As Integer, c As Range
Dim n1 As Long, n2 As Long
For Each c In ActiveSheet.Range("A1:A10")
If InStr(c.Value, ",") > 0 Then
arr = Split(c.Value, ",")
x = LBound(arr)
out = ""
Do
n1 = CLng(Trim(arr(x)))
n2 = CLng(Trim(arr(x + 1)))
'here's where your requirements get unclear...
out = out & IIf(Len(out) > 0, ", ", "")
If n2 - n1 <= 5 Then
out = out & n1 'skip second number
x = x + 2
Else
out = out & n1 & ", " & n2 'both
x = x + 1
End If
Loop While x <= UBound(arr) - 1
'pick up any last number
If x = UBound(arr) Then
out = out & IIf(Len(out) > 0, ", ", "") & arr(x)
End If
c.Offset(0, 1).Value = out
End If
Next c
End Sub
Obviously many ways to skin this cat ... I like to use collections for this sort of thing:
Private Sub PareDownList()
Dim sList As String: sList = ActiveCell ' take list from active cell
Dim vList As Variant: vList = Split(sList, ",") ' convert to variant array
' load from var array into collection
Dim cList As New Collection
Dim i As Long
For i = 0 To UBound(vList): cList.Add (Trim(vList(i))): Next
' loop over collection removing unwanted entries
' (in reverse order, since we're removing items)
For i = cList.Count To 2 Step -1
If cList(i) - cList(i - 1) = 5 Then cList.Remove (i)
Next i
' loop to put remaining items back into a string fld
sList = cList(1)
For i = 2 To cList.Count
sList = sList + "," + cList(i)
Next i
' write the new string to the cell under the activecell
ActiveCell.Offset(1) = "'" + sList ' lead quote to ensure output cell = str type
End Sub
' If activecell contains: "2210, 2215, 2225, 2230, 2240"
' the cell below will get: "2210,2225,2240"
Note: this sample code should be enhanced w some extra validation & checking (e.g. as written assumes all good int values sep by commas & relies in implicit str to int conversions). Also as written will convert "2210, 2215, 2220, 2225, 2230, 2240" into "2210, 2040" - you'll need to tweak the loop, loop ctr when removing an item if that's not what you want.