get sentence with cursor and multiple commas in word vba - vba

How do I get a sentence with multiple commas in MS Word with VBA that the cursor is in?
All the posts I've found said to get the sentence the cursor is in then use the code:
Selection.Sentences(1)
The above works well with a sentence with only 1 comma. But if I have a sentence with multiple commas like this:
For example, tomorrow is Tuesday(e.g., not Wednesday) or Thursday.
where the cursor is set somewhere in "For example" then "Selection.Sentences(1)" returns between the bars "...(e.g.|, |n...".
I'm using the latest version of Word. I plan on launching the code on an older version (I think 2013) that I first noticed the problem on.

This code is better suited to explain why MS didn't solve your problem than it is to actually solve it. However - depending upon your circumstances - you may like to play with it.
Option Explicit
Sub SelectSentence()
' 30 Jan 2018
' list abbreviations containing periods only
' in sequence of their expected frequency of occurrance
Const Abbs As String = "e.g.,f.i.,etc.,i.e."
Dim Fun As String ' sentence to select
Dim Para As Range
Dim SelStart As Long ' location of selection
Dim Sp() As String ' array of Abbs
Dim Cp() As String ' array of encoded Abbs
With Selection
Set Para = .Paragraphs(1).Range
SelStart = .Start
End With
Sp = Split(Abbs, ",")
With Para
Application.ScreenUpdating = False
.Text = CleanString(.Text, Sp, Cp)
Fun = ActiveDocument.Range(SelStart, SelStart + 1).Sentences(1).Text
SelStart = InStr(.Text, Fun) + .Start - 1
.Text = OriginalString(.Text, Cp)
.SetRange SelStart, SelStart + Len(Fun) - 1
Application.ScreenUpdating = True
.Select
End With
Fun = Selection.Text
Debug.Print Fun
End Sub
Private Function CleanString(ByVal Txt As String, _
Abbs() As String, _
Cp() As String) As String
' 30 Jan 2018
Dim i As Integer
ReDim Cp(UBound(Abbs))
For i = 0 To UBound(Abbs)
If InStr(Txt, ".") = 0 Then Exit For
Cp(i) = AbbToTxt(Abbs(i))
Txt = Replace(Txt, Abbs(i), Cp(i))
Next i
ReDim Preserve Cp(i)
CleanString = Txt
End Function
Private Function AbbToTxt(ByVal Abb As String) As String
' 30 Jan 2018
' use a character for Chr(92) not occurring in your document.
' Apparently it must be a character with a code below 128.
' use same character as function 'AbbToTxt'
AbbToTxt = Replace(Abb, ".", Chr(92))
End Function
Private Function OriginalString(ByVal Txt As String, _
Cp() As String) As String
' 30 Jan 2018
Dim i As Integer
For i = 0 To UBound(Cp) - 1
Txt = Replace(Txt, Cp(i), TxtToAbb(Cp(i)))
Next i
OriginalString = Txt
End Function
Private Function TxtToAbb(ByVal Txt As String) As String
' 30 Jan 2018
' use same character as function 'AbbToTxt'
TxtToAbb = Replace(Txt, Chr(92), ".")
End Function
For one, the code will only handle abbreviations which you program into it (see Const Abbs at the top of the code). For another, it will fail to recognise a period with dual meaning, such as "etc." found at the end of a sentence.
If you are allowed to edit the documents you work with, the better way of tackling your problem may well be to remove the offending periods with Find > Replace. After all, whoever understands "e.g." is also likely to understand "eg". Good Luck!

Related

VBA convert unusual string to Date

I wanted to scrape data from yahoo as an excercise and then make a graph from it. I encountered a problem where when I scrape the dates, they are in a rather weird format:
?10? ?Aug?, ?2020
The question marks in the string are not realy question marks, they are some characters unknown to me, so I cannot remove them with Replace().
Then, when I try to use CDate() to convert this to Date format, the code crashed on "Type mismatch" error.
What I would need is to either find a way to find out what those characters are in order to remove them with Replace(), or to somehow convert even this weird format to a Date.
Alternatively, somehow improving the scraping procedure - so far I've been using for example
ie.document.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(3).innerText
to get the data - would also solve this problem.
If anyone wanted to try to scrape it, too an example url:
https://finance.yahoo.com/quote/LAC/history?period1=1469404800&period2=1627171200&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true
An example of my code follows:
DateString = doc.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(j).innerText
LeftDateString = Clean_NonPrintableCharacters(DateString)
Worksheets("Stock_data").Range("A2").Value = CDate(LeftDateString)
With regexp:
Function GetDate(txt)
' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
Dim re As New RegExp, retval(0 To 2), patterns, i, result
patterns = Array("\b\d\d\b", "\b[a-zA-Z]+\b", "\b\d{4}\b")
For i = 0 To 2
re.Pattern = patterns(i)
Set result = re.Execute(txt)
If result Is Nothing Then Exit Function 'If no day, month or year is found, GetDate() returns ""
retval(i) = result(0)
Next
GetDate = Join(retval)
End Function
Sub Usage()
For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?")
Debug.Print GetDate(txt)
Next
End Sub
Prints:
10 Aug 2020
13 Jul 2020
18 March 2021
Edit 2
Function GetDate2(txt)
' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
Static re As RegExp, months As Collection
Dim result
If re Is Nothing Then 'do it once
Set re = New RegExp
re.Pattern = "[^a-zA-Z0-9]"
re.Global = True
Set months = New Collection
cnt = 1
For Each m In Split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", ",")
months.Add cnt, m
cnt = cnt + 1
Next
End If
result = Split(WorksheetFunction.Trim(re.Replace(txt, " ")))
For i = 0 To UBound(result)
If Not IsNumeric(result(i)) Then
result(i) = Left(LCase(result(i)), 3)
On Error Resume Next
result(i) = months(result(i))
On Error GoTo 0
End If
Next
result = Join(result)
If IsDate(result) Then GetDate2 = CDate(result)
End Function
Sub Usage2()
For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?", _
"01/12/2021", "04.18.2020", "15 10 20")
Debug.Print GetDate2(txt)
Next
End Sub
Prints:
10.08.2020
13.07.2020
18.03.2021
01.12.2021
18.04.2020
15.10.2020
Note. The order of the dd and mm may be vary
I would use something like so. I've used your ? as question marks for this example, i assumed they were all the same wierd character. This outputs
10 Aug 2020
Sub d()
Dim d As String
d = "?10? ?Aug?, ?2020"
d = Replace(Replace(d, Chr(Asc(Left(d, 1))), vbNullString), ",", vbNullString)
Debug.Print d
End Sub
you could loop though each char in the string and check its ascii values and create your date string from that. Example
Sub GetTheDate(sDate As String)
'97 - 122: lower case Ascii values
Dim i As Integer
Dim strDate As String
'loop through each char
For i = 1 To Len(sDate)
'check to see if it is numeric
If IsNumeric(Mid(sDate, i, 1)) Then
'numeric so add it to the string
strDate = strDate & Mid(sDate, i, 1)
Else
'check to see if it is a char a-z
If Asc(LCase(Mid(sDate, i, 1))) >= 97 And Asc(LCase(Mid(sDate, i, 1))) <= 122 Then
'it is an a char from a-z so add it to string
strDate = strDate & Mid(sDate, i, 1)
Else
'chekc for a space and add a comma - this sets up being able to use cdate()
If Mid(sDate, i, 1) = " " Then
strDate = strDate & ","
End If
End If
End If
Next i
'convert it and print it
Debug.Print CDate(strDate)
End Sub

MS Word VBA: Saving a document using the header

I have been trying to figure out a way to, after performing a mail merge, separate the documents into individual ones and name them after a specific item, preferably the first line of the header. I have only been able to find ways to split the document, but cannot figure out how to name it. Any help with how to write the VBA code to save a document as the header would be very much appreciated.
Since you already separated the documents, the code below might give them names using their first sentence.
Private Function DocName(Doc As Document) As String
' 23 Aug 2017
Const Illegals As String = "\:/;?*|>"""
Static FaultCounter As Integer
Dim Fun As String
Dim Title As String
Dim Ch As String
Dim i As Integer
Title = Trim(Doc.Sentences(1))
For i = 1 To Len(Title)
Ch = Mid(Title, i, 1)
If (Asc(Ch) > 31) And (Asc(Ch) < 129) Then
If InStr(Illegals, Ch) = 0 Then Fun = Fun & Ch
End If
Next i
If Len(Fun) = 0 Then
FaultCounter = FaultCounter + 1
Fun = Format(FaultCounter, """Default File Name (""0"")""")
End If
DocName = Fun
End Function
Before saving the file you might want to check for duplicates. Use the Dir() function for that and add a number to duplicate names using the system I included above to name files where the first sentence might be empty.
You may also have to review the characters which aren't permitted in file names. I have simply excluded all below ASCII(32) and above ASCII(128), and then the known ones Windows doesn't like. You might want to modify that range further.
To call the above function use code like this:-
Private Sub GetName()
Debug.Print DocName(ActiveDocument)
End Sub
This is the code I have so far, I was able to find it off of a very helpful website, but the code saves as the word "report" which I set it to right now while I'm trying to figure it out, and then the number of the document.
Option Explicit
Sub splitter()
' splitter Macro
' Macro created by Doug Robbins to save each letter created by a mailmergeas
a separate file.
Application.ScreenUpdating = False
Dim Program As String
Dim DocName As String
Dim Letters As Integer, Counter As Integer
Letters = ActiveDocument.Sections.Count
Selection.HomeKey Unit:=wdStory
Counter = 1
While Counter < Letters
'program = ActiveDocument.MailMerge.DataSource.DataFields("Program_Outcomes_PlanReport_Name").Value
DocName = "Reports" & LTrim$(Str$(Counter)) 'Generic name of document
ActiveDocument.Sections.First.Range.Cut
Documents.Add
Selection.Paste
ActiveDocument.Sections(2).PageSetup.SectionStart = wdSectionContinuous
ActiveDocument.SaveAs filename:="E:\assessment rubrics\Templates" & "\" & DocName, FileFormat:=wdFormatDocument, LockComments:=False, Password:="",
AddToRecentFiles:=False, WritePassword:="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:=False
ActiveWindow.Close
Counter = Counter + 1
Wend
Application.ScreenUpdating = True
End Sub

Excel VBA Custom Function Remove Words Appearing in One String From Another String

I am trying to remove words appearing in one string from a different string using a custom function. For instance:
A1:
the was why blue hat
A2:
the stranger wanted to know why his blue hat was turning orange
The ideal outcome in this example would be:
A3:
stranger wanted to know his turning orange
I need to have the cells in reference open to change so that they can be used in different situations.
The function will be used in a cell as:
=WORDREMOVE("cell with words needing remove", "cell with list of words being removed")
I have a list of 20,000 rows and have managed to find a custom function that can remove duplicate words (below) and thought there may be a way to manipulate it to accomplish this task.
Function REMOVEDUPEWORDS(txt As String, Optional delim As String = " ") As String
Dim x
'Updateby20140924
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .Count > 0 Then REMOVEDUPEWORDS = Join(.keys, delim)
End With
End Function
If you can guarantee that your words in both strings will be separated by spaces (no comma, ellipses, etc), you could just Split() both strings then Filter() out the words:
Function WORDREMOVE(ByVal strText As String, strRemove As String) As String
Dim a, w
a = Split(strText) ' Start with all words in an array
For Each w In Split(strRemove)
a = Filter(a, w, False, vbTextCompare) ' Remove every word found
Next
WORDREMOVE = Join(a, " ") ' Recreate the string
End Function
You can also do this using Regular Expressions in VBA. The version below is case insensitive and assumes all words are separated only by space. If there is other punctuation, more examples would aid in crafting an appropriate solution:
Option Explicit
Function WordRemove(Str As String, RemoveWords As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.ignorecase = True
.Global = True
.Pattern = "(?:" & Join(Split(WorksheetFunction.Trim(RemoveWords)), "|") & ")\s*"
WordRemove = .Replace(Str, "")
End With
End Function
My example is certainly not the best code, but it should work
Function WORDREMOVE(FirstCell As String, SecondCell As String)
Dim FirstArgument As Variant, SecondArgument As Variant
Dim FirstArgumentCounter As Integer, SecondArgumentCounter As Integer
Dim Checker As Boolean
WORDREMOVE = ""
FirstArgument = Split(FirstCell, " ")
SecondArgument = Split(SecondCell, " ")
For SecondArgumentCounter = 0 To UBound(SecondArgument)
Checker = False
For FirstArgumentCounter = 0 To UBound(FirstArgument)
If SecondArgument(SecondArgumentCounter) = FirstArgument(FirstArgumentCounter) Then
Checker = True
End If
Next FirstArgumentCounter
If Checker = False Then WORDREMOVE = WORDREMOVE & SecondArgument(SecondArgumentCounter) & " "
Next SecondArgumentCounter
WORDREMOVE = Left(WORDREMOVE, Len(WORDREMOVE) - 1)
End Function

Find and replace all names of variables in VBA module

Let's assume that we have one module with only one Sub in it, and there are no comments. How to identify all variable names ? Is it possible to identify names of variables which are not defined using Dim ? I would like to identify them and replace each with some random name to obfuscate my code (O0011011010100101 for example), replace part is much easier.
List of characters which could be use in names of macros, functions and variables :
ABCDEFGHIJKLMNOPQRSTUVWXYZdefghijklmnopqrstuvwxyzg€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙ÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙
Below are my function I've wrote recenlty :
Function randomName(n as integer) as string
y="O"
For i = 2 To n:
If Rnd() > 0.5 Then
y = y & "0"
Else
y = y & "1"
End If
Next i
randomName=y
End Function
In goal to replace given strings in another string which represent the code of module I use below sub :
Sub substituteNames()
'count lines in "Module1" which is part of current workbook
linesCount = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.CountOfLines
'read code from module
code = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.Lines(StartLine:=1, Count:=linesCount)
inputStr = Array("name1", "name2", "name2") 'some hardwritten array with string to replace
namesLength = 20 'length of new variables names
For i = LBound(inputStr) To UBound(inputStr)
outputString = randomName(namesLength-1)
code = Replace(code, inputStr(i), outputString)
Next i
Debug.Print code 'view code
End Sub
then we simply substitute old code with new one, but how to identify strings with names of variables ?
Edition
Using **Option Explicit ** decrease safety of my simple method of obfuscation, because to reverse changes you only have to follow Dim statements and replace ugly names with something normal. Except that to make such substitution harder, I think it's good idea to break the line in the middle of variable name :
O0O000O0OO0O0000 _
0O00000O0OO0
the simple method is also replacing some strings with chains based on chr functions chr(104)&chr(101)&chr(108)&chr(108)&chr(111) :
Sub stringIntoChrChain()
strInput = "hello"
strOutput = ""
For i = 1 To Len(strInput)
strOutput = strOutput & "chr(" & Asc(Mid(strInput, i, 1)) & ")&"
Next i
Debug.Print Mid(strOutput, 1, Len(strOutput) - 1)
End Sub
comments like below could make impression on user and make him think that he does not poses right tool to deal with macro etc.:
'(k=Äó¬)w}ż^¦ů‡ÜOyúm=ěËnóÚŽb W™ÄQó’ (—*-ĹTIäb
'R“ąNPÔKZMţ†üÍQ‡
'y6ű˛Š˛ŁŽ¬=iýQ|˛^˙ ‡ńb ¬ĂÇr'ń‡e˘źäžŇ/âéç;1qýěĂj$&E!V?¶ßšÍ´cĆ$Âű׺Ůî’ﲦŔ?TáÄu[nG¦•¸î»éüĽ˙xVPĚ.|
'ÖĚ/łó®Üă9Ę]ż/ĹÍT¶Mµę¶mÍ
'q[—qëýY~Pc©=jÍ8˘‡,Ú+ń8ŐűŻEüńWü1ďëDZ†ć}ęńwŠbŢ,>ó’Űçµ™Š_…qÝăt±+‡ĽČg­řÍ!·eŠP âńđ:ŶOážű?őë®ÁšńýĎáËTbž}|Ö…ăË[®™
You can use a regular expression to find variable assignments by looking for the equals sign. You'll need to add a reference to the Microsoft VBScript Regular Expressions 5.5 and Microsoft Visual Basic for Applications Extensibility 5.3 libraries as I've used early binding.
Please be sure to back up your work and test this before using it. I could have gotten the regex wrong.
UPDATE:
I've refined the regular expressions so that it no longer catches datatypes of strongly typed constants (Const ImAConstant As String = "Oh Noes!" previously returned String). I've also added another regex to return those constants as well. The last version of the regex also mistakenly caught things like .Global = true. That was corrected. The code below should return all variable and constant names for a given code module. The regular expressions still aren't perfect, as you'll note that I was unable to stop false positives on double quotes. Also, my array handling could be done better.
Sub printVars()
Dim linesCount As Long
Dim code As String
Dim vbPrj As VBIDE.VBProject
Dim codeMod As VBIDE.CodeModule
Dim regex As VBScript_RegExp_55.RegExp
Dim m As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim i As Long
Dim j As Long
Dim isInDatatypes As Boolean
Dim isInVariables As Boolean
Dim datatypes() As String
Dim variables() As String
Set vbPrj = VBE.ActiveVBProject
Set codeMod = vbPrj.VBComponents("Module1").CodeModule
code = codeMod.Lines(1, codeMod.CountOfLines)
Set regex = New RegExp
With regex
.Global = True ' match all instances
.IgnoreCase = True
.MultiLine = True ' "code" var contains multiple lines
.Pattern = "(\sAs\s)([\w]*)(?=\s)" ' get list of datatypes we've used
' match any whole word after the word " As "
Set matches = .Execute(code)
End With
ReDim datatypes(matches.count - 1)
For i = 0 To matches.count - 1
datatypes(i) = matches(i).SubMatches(1) ' return second submatch so we don't get the word " As " in our array
Next i
With regex
.Pattern = "(\s)([^\.\s][\w]*)(?=\s\=)" ' list of variables
' begins with a space; next character is not a period (handles "with" assignments) or space; any alphanumeric character; repeat until... space
Set matches = .Execute(code)
End With
ReDim variables(matches.count - 1)
For i = 0 To matches.count - 1
isInDatatypes = False
isInVariables = False
' check to see if current match is a datatype
For j = LBound(datatypes) To UBound(datatypes)
If matches(i).SubMatches(1) = datatypes(j) Then
isInDatatypes = True
Exit For
End If
'Debug.Print matches(i).SubMatches(1)
Next j
' check to see if we already have this variable
For j = LBound(variables) To i
If matches(i).SubMatches(1) = variables(j) Then
isInVariables = True
Exit For
End If
Next j
' add to variables array
If Not isInDatatypes And Not isInVariables Then
variables(i) = matches(i).SubMatches(1)
End If
Next i
With regex
.Pattern = "(\sConst\s)(.*)(?=\sAs\s)" 'strongly typed constants
' match anything between the words " Const " and " As "
Set matches = .Execute(code)
End With
For i = 0 To matches.count - 1
'add one slot to end of array
j = UBound(variables) + 1
ReDim Preserve variables(j)
variables(j) = matches(i).SubMatches(1) ' again, return the second submatch
Next i
' print variables to immediate window
For i = LBound(variables) To UBound(variables)
If variables(i) <> "" And variables(i) <> Chr(34) Then ' for the life of me I just can't get the regex to not match doublequotes
Debug.Print variables(i)
End If
Next i
End Sub

optimal code for detecting formatting differences?

I need to compare two formatted strings. The text in the two of them is the same, only the formatting differs, meaning that some words are bold. The code should tell me if the location of the bold substrings are different e.g. the strings are formatted differently.
So far I tried a char-to-char approach, but it is far too slow.
It's a plain legal current text in MS Word, with cca 10-500 chars per string. Two people independently formatted the strings.
my code so far:
Function collectBold(r As Range) As String
Dim chpos As Integer
Dim ch As Variant
Dim str, strTemp As String
chpos = 1
Do
If r.Characters(chpos).Font.Bold Then
Do
Dim boold As Boolean
strTemp = strTemp + r.Characters(chpos)
chpos = chpos + 1
If (chpos < r.Characters.Count) Then boold = r.Characters(chpos).Font.Bold
Loop While (boold And chpos < r.Characters.Count)
str = str + Trim(strTemp) + "/"
strTemp = ""
Else: chpos = chpos + 1
End If
Loop While (chpos < r.Characters.Count)
collectBold = str
End Function
This code collect all bold substrings (strTemp) and merges them into one string (str), separating them with "/". The function runs for both strings to compare, and then checks if the outputs are the same.
If you only need to see if they are different, this function will do it:
Function areStringsDifferent(range1 As Range, range2 As Range) As Boolean
Dim i As Integer, j As Integer
For i = 1 To range1.Words.Count
'check if words are different formatted
If Not range1.Words(i).Bold = range2.Words(i).Bold Then
areStringsDifferent = True
Exit Function
'words same formatted, but characters may not be
ElseIf range1.Words(i).Bold = wdUndefined Then
For j = 1 To range1.Words(i).Characters.Count
If Not range1.Words(i).Characters(j).Bold = range2.Words(i).Characters(j).Bold Then
areStringsDifferent = True
Exit Function
End If
Next
End If
Next
areStringsDifferent = False
End Function
It first looks if the words are different formatted... If they have the same format but the format is undefinied, it looks into the characters of the word.