optimal code for detecting formatting differences? - vba

I need to compare two formatted strings. The text in the two of them is the same, only the formatting differs, meaning that some words are bold. The code should tell me if the location of the bold substrings are different e.g. the strings are formatted differently.
So far I tried a char-to-char approach, but it is far too slow.
It's a plain legal current text in MS Word, with cca 10-500 chars per string. Two people independently formatted the strings.
my code so far:
Function collectBold(r As Range) As String
Dim chpos As Integer
Dim ch As Variant
Dim str, strTemp As String
chpos = 1
Do
If r.Characters(chpos).Font.Bold Then
Do
Dim boold As Boolean
strTemp = strTemp + r.Characters(chpos)
chpos = chpos + 1
If (chpos < r.Characters.Count) Then boold = r.Characters(chpos).Font.Bold
Loop While (boold And chpos < r.Characters.Count)
str = str + Trim(strTemp) + "/"
strTemp = ""
Else: chpos = chpos + 1
End If
Loop While (chpos < r.Characters.Count)
collectBold = str
End Function
This code collect all bold substrings (strTemp) and merges them into one string (str), separating them with "/". The function runs for both strings to compare, and then checks if the outputs are the same.

If you only need to see if they are different, this function will do it:
Function areStringsDifferent(range1 As Range, range2 As Range) As Boolean
Dim i As Integer, j As Integer
For i = 1 To range1.Words.Count
'check if words are different formatted
If Not range1.Words(i).Bold = range2.Words(i).Bold Then
areStringsDifferent = True
Exit Function
'words same formatted, but characters may not be
ElseIf range1.Words(i).Bold = wdUndefined Then
For j = 1 To range1.Words(i).Characters.Count
If Not range1.Words(i).Characters(j).Bold = range2.Words(i).Characters(j).Bold Then
areStringsDifferent = True
Exit Function
End If
Next
End If
Next
areStringsDifferent = False
End Function
It first looks if the words are different formatted... If they have the same format but the format is undefinied, it looks into the characters of the word.

Related

Separate a String into Two parts 'name' and 'number' using VBA

I need to separate following strings into Name and Number: e.g.
evil333 into evil and 333
bili454 into bili and 454
elvis04 into elvis and 04
Split(String, "#") ' don't work here because numbers are unknown
similarly
Mid(String, 1, String - #) ' don't work because Numbers length is unknown
so what should be the best way to start? Just want to keep it simple as possible
Update:
For further info follow - https://youtu.be/zjF7oLLgtms
Two more ways for solving this:
Sub test()
Dim sInputString As String
Dim i As Integer
Dim lFirstNumberPos As Long
sInputString = "evil333"
'loop through text in input string
'if value IsNumeric (digit), stop looping
For i = 1 To Len(sInputString)
If IsNumeric(Mid(sInputString, i, 1)) Then
lFirstNumberPos = i
Exit For
End If
Next i
Dim Name As String
Dim Number As String
'return result
Name = Left$(sInputString, lFirstNumberPos - 1)
Number = Mid$(sInputString, lFirstNumberPos)
End Sub
Or another method:
Sub test2()
'if you are going to have too long string it would maybe better to use "instr" method
Dim sInputString As String
Dim lFirstNumberPos As Long
Dim i As Integer
sInputString = "evil333"
Dim lLoopedNumber as Long
LoopedNumber = 0
lFirstNumberPos = Len(sInputString) + 1
'loop through digits 0-9 and stop when any of the digits will be found
For i = 0 To 9
LoopedNumber = InStr(1, sInputString, cstr(i), vbTextCompare)
If LoopedNumber > 0 Then
lFirstNumberPos = Application.Min(LoopedNumber,lFirstNumberPos)
End If
Next i
Dim Name As String
Dim Number As String
'return result
Name = Left$(sInputString, lFirstNumberPos - 1)
Number = Mid$(sInputString, lFirstNumberPos)
End Sub
You should regular expressions (regex) to match the two parts of your strings. The following regex describes how to match the two parts:
/([a-z]+)([0-9]+)/
Their use in VBA is thorougly explained in Portland Runner's answer to How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

Find anything but a number or "C"

I have this formula (below) where I am trying to find a space in C1. Instead of this, I would like to update this formula to look for anything except for "C" or any number and not only find a space.
LEFT(C1, find("" "", C1, 1)-1)
For e.g.
if C1 has - "C1234 - XXX" or "C1234-XXX" or "C1234:XXX", I always want the above function to find anything except for "C" and "1234" (i.e. numbers).
P.S.: I would want to use the find function only with improvements to meet the above conditions.
Please suggest.
Perhaps this:
'To create a new string from a source string which will or will not contain the characters present within the source string
'Examples of string of characters: 0123456789 -OR- {}[]<>\/|+*=-_(),.:;?!##$%^&™®©~'" OR - combination of various characters
Public Function getNewStringFromString(ByVal strSource As Variant, ByVal strChars As Variant, Optional isInString As Boolean = True) As String
Dim strArr As Variant, iChar As Variant
getNewStringFromString = ""
If VarType(strSource) = vbString And VarType(strChars) = vbString Then
strSource = Trim(strSource): strChars = Trim(strChars)
If Len(strSource) > 0 And Len(strChars) > 0 Then
strArr = Split(StrConv(strSource, vbUnicode), vbNullChar)
For Each iChar In strArr
If (isInString Xor isInArray(iChar, strChars)) = False Then getNewStringFromString = getNewStringFromString + iChar
Next iChar
Erase strArr
End If
End If
End Function
Use as the following:
MsgBox getNewStringFromString(CStr(Range("C1")), "C0123456789")
Forgot to give you the code for the isInArray function. Here it is:
'To check if an element is within a specific Array, Object, Range, String, etc.
Public Function isInArray(ByVal itemSearched As Variant, ByVal aArray As Variant) As Boolean
Dim item As Variant
If VarType(aArray) >= vbArray Or VarType(aArray) = vbObject Or VarType(aArray) = vbDataObject Or TypeName(aArray) = "Range" Then
For Each item In aArray
If itemSearched = item Then
isInArray = True
Exit Function
End If
Next item
isInArray = False
ElseIf VarType(aArray) = vbString Then
isInArray = InStr(1, aArray, itemSearched, vbBinaryCompare) > 0 'Comparing character by character
Else
On Error Resume Next
isInArray = Not IsError(Application.Match(itemSearched, aArray, False))
Err.Clear: On Error GoTo 0
End If
End Function
Given your data format, where
C is always the first character
subsequent values are all digits
You want to return the C followed by the digits
Try:
="C" & LOOKUP(9E+307,VALUE(MID(A1,2,{1,2,3,4,5,6,7})))
If there might be more than 7 digits, you can either extend the array constant, or use a formula to create a larger array.
The formula looks for the largest integer in the string, starting with position 2. So it will stop at the last non-digit, since anything including a non-digit will return an error.
If the "non-digit" might be your decimal or thousands separator, you will need to replace it with something else, with a nested SUBSTITUTE
Replace . , and space with -
=LOOKUP(1E+307,--SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(MID(A1,2,{1,2,3,4,5,6,7}),",","-"),".","-"),".","-"))
For a VBA solution, I would use regular expressions.
Option Explicit
Function getCnum(str As String)
Dim RE As Object
Const sPat As String = "(^C\d+).*"
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = False
.MultiLine = True
.ignorecase = True
.Pattern = sPat
getCnum = .Replace(str, "$1")
End With
End Function
Note that this also validates the string by checking that the first letter is, in fact, a C (or c). If you want it to be case-sensitive, make the obvious change.

get sentence with cursor and multiple commas in word vba

How do I get a sentence with multiple commas in MS Word with VBA that the cursor is in?
All the posts I've found said to get the sentence the cursor is in then use the code:
Selection.Sentences(1)
The above works well with a sentence with only 1 comma. But if I have a sentence with multiple commas like this:
For example, tomorrow is Tuesday(e.g., not Wednesday) or Thursday.
where the cursor is set somewhere in "For example" then "Selection.Sentences(1)" returns between the bars "...(e.g.|, |n...".
I'm using the latest version of Word. I plan on launching the code on an older version (I think 2013) that I first noticed the problem on.
This code is better suited to explain why MS didn't solve your problem than it is to actually solve it. However - depending upon your circumstances - you may like to play with it.
Option Explicit
Sub SelectSentence()
' 30 Jan 2018
' list abbreviations containing periods only
' in sequence of their expected frequency of occurrance
Const Abbs As String = "e.g.,f.i.,etc.,i.e."
Dim Fun As String ' sentence to select
Dim Para As Range
Dim SelStart As Long ' location of selection
Dim Sp() As String ' array of Abbs
Dim Cp() As String ' array of encoded Abbs
With Selection
Set Para = .Paragraphs(1).Range
SelStart = .Start
End With
Sp = Split(Abbs, ",")
With Para
Application.ScreenUpdating = False
.Text = CleanString(.Text, Sp, Cp)
Fun = ActiveDocument.Range(SelStart, SelStart + 1).Sentences(1).Text
SelStart = InStr(.Text, Fun) + .Start - 1
.Text = OriginalString(.Text, Cp)
.SetRange SelStart, SelStart + Len(Fun) - 1
Application.ScreenUpdating = True
.Select
End With
Fun = Selection.Text
Debug.Print Fun
End Sub
Private Function CleanString(ByVal Txt As String, _
Abbs() As String, _
Cp() As String) As String
' 30 Jan 2018
Dim i As Integer
ReDim Cp(UBound(Abbs))
For i = 0 To UBound(Abbs)
If InStr(Txt, ".") = 0 Then Exit For
Cp(i) = AbbToTxt(Abbs(i))
Txt = Replace(Txt, Abbs(i), Cp(i))
Next i
ReDim Preserve Cp(i)
CleanString = Txt
End Function
Private Function AbbToTxt(ByVal Abb As String) As String
' 30 Jan 2018
' use a character for Chr(92) not occurring in your document.
' Apparently it must be a character with a code below 128.
' use same character as function 'AbbToTxt'
AbbToTxt = Replace(Abb, ".", Chr(92))
End Function
Private Function OriginalString(ByVal Txt As String, _
Cp() As String) As String
' 30 Jan 2018
Dim i As Integer
For i = 0 To UBound(Cp) - 1
Txt = Replace(Txt, Cp(i), TxtToAbb(Cp(i)))
Next i
OriginalString = Txt
End Function
Private Function TxtToAbb(ByVal Txt As String) As String
' 30 Jan 2018
' use same character as function 'AbbToTxt'
TxtToAbb = Replace(Txt, Chr(92), ".")
End Function
For one, the code will only handle abbreviations which you program into it (see Const Abbs at the top of the code). For another, it will fail to recognise a period with dual meaning, such as "etc." found at the end of a sentence.
If you are allowed to edit the documents you work with, the better way of tackling your problem may well be to remove the offending periods with Find > Replace. After all, whoever understands "e.g." is also likely to understand "eg". Good Luck!

Function which Removes Only Non-ASCII characters in a column in access table

I have a access table and i am writing a vba code to remove non-ascii characters from the table, i have tried using below two functions
Public Function removeall(stringData As String) As String
Dim letter As Integer
Dim final As String
Dim i As Integer
For i = 1 To Len(stringData) 'loop thru each char in stringData
letter = Asc(Mid(stringData, i, 1)) 'find the char and assign asc value
Select Case letter 'Determine what type of char it is
Case Is < 91 And letter > 64 'is an upper case char
final = final & Chr(letter)
Case Is < 123 And letter > 96 'is an lower case char
final = final & Chr(letter)
Case Is = 32 'is a space
final = final & Chr(letter)
End Select
Next i
removeall = final
End Function
And also tried using below function
Public Function Clean(InString As String) As String
'-- Returns only printable characters from InString
Dim x As Integer
For x = 1 To Len(InString)
If Asc(Mid(InString, x, 1)) > 31 And Asc(Mid(InString, x, 1)) < 127 Then
Clean = Clean & Mid(InString, x, 1)
End If
Next x
End Function
But the problem is : In removeall function it removes everything including # and space characters.. And In Clean function also removes special characters as well.
I need a correct function which retains key board characters and removes all other characters
Examples of strings in tables are :
1) "ATTACHMENT FEEDING TUBE FITS 5-18 ºFR# "
2) "CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEXº"
Any help would be greatly appreciated
Output should be like
1) "ATTACHMENT FEEDING TUBE FITS 5-18 FR"
2) "CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEX"
One approach would be to use a whitelist of accepted characters. e.g.
' You can set up your domain specific list:
Const Whitelist = "1234567890" & _
"qwertyuiopasdfghjklzxcvbnm" & _
"QWERTYUIOPASDFGHJKLZXCVBNM" & _
" `~!##$%^&*()_-=+[]{};:""'|\<>?/ –"
Public Sub test()
Debug.Print Clean("ATTACHMENT FEEDING TUBE FITS 5-18 ºFR#")
Debug.Print Clean("CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEXº")
End Sub
Public Function isAllowed(char As String) As Boolean
isAllowed = InStr(1, Whitelist, char, vbBinaryCompare) > 0
End Function
Public Function Clean(dirty As String) As String
'-- Returns only printable characters from dirty
Dim x As Integer
Dim c As String
For x = 1 To Len(dirty)
c = Mid(dirty, x, 1)
If isAllowed(c) Then
Clean = Clean & c
End If
Next x
End Function
Alternate approach that preserves ALL ASCII characters, without working with a whitelist, in a single function:
Public Function RemoveNonASCII(str As String) As String
Dim i As Integer
For i = 1 To Len(str)
If AscW(Mid(str, i, 1)) < 127 Then 'It's an ASCII character
RemoveNonASCII = RemoveNonASCII & Mid(str, i, 1) 'Append it
End If
Next i
End Function

Find and replace all names of variables in VBA module

Let's assume that we have one module with only one Sub in it, and there are no comments. How to identify all variable names ? Is it possible to identify names of variables which are not defined using Dim ? I would like to identify them and replace each with some random name to obfuscate my code (O0011011010100101 for example), replace part is much easier.
List of characters which could be use in names of macros, functions and variables :
ABCDEFGHIJKLMNOPQRSTUVWXYZdefghijklmnopqrstuvwxyzg€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙ÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙
Below are my function I've wrote recenlty :
Function randomName(n as integer) as string
y="O"
For i = 2 To n:
If Rnd() > 0.5 Then
y = y & "0"
Else
y = y & "1"
End If
Next i
randomName=y
End Function
In goal to replace given strings in another string which represent the code of module I use below sub :
Sub substituteNames()
'count lines in "Module1" which is part of current workbook
linesCount = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.CountOfLines
'read code from module
code = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.Lines(StartLine:=1, Count:=linesCount)
inputStr = Array("name1", "name2", "name2") 'some hardwritten array with string to replace
namesLength = 20 'length of new variables names
For i = LBound(inputStr) To UBound(inputStr)
outputString = randomName(namesLength-1)
code = Replace(code, inputStr(i), outputString)
Next i
Debug.Print code 'view code
End Sub
then we simply substitute old code with new one, but how to identify strings with names of variables ?
Edition
Using **Option Explicit ** decrease safety of my simple method of obfuscation, because to reverse changes you only have to follow Dim statements and replace ugly names with something normal. Except that to make such substitution harder, I think it's good idea to break the line in the middle of variable name :
O0O000O0OO0O0000 _
0O00000O0OO0
the simple method is also replacing some strings with chains based on chr functions chr(104)&chr(101)&chr(108)&chr(108)&chr(111) :
Sub stringIntoChrChain()
strInput = "hello"
strOutput = ""
For i = 1 To Len(strInput)
strOutput = strOutput & "chr(" & Asc(Mid(strInput, i, 1)) & ")&"
Next i
Debug.Print Mid(strOutput, 1, Len(strOutput) - 1)
End Sub
comments like below could make impression on user and make him think that he does not poses right tool to deal with macro etc.:
'(k=Äó¬)w}ż^¦ů‡ÜOyúm=ěËnóÚŽb W™ÄQó’ (—*-ĹTIäb
'R“ąNPÔKZMţ†üÍQ‡
'y6ű˛Š˛ŁŽ¬=iýQ|˛^˙ ‡ńb ¬ĂÇr'ń‡e˘źäžŇ/âéç;1qýěĂj$&E!V?¶ßšÍ´cĆ$Âű׺Ůî’ﲦŔ?TáÄu[nG¦•¸î»éüĽ˙xVPĚ.|
'ÖĚ/łó®Üă9Ę]ż/ĹÍT¶Mµę¶mÍ
'q[—qëýY~Pc©=jÍ8˘‡,Ú+ń8ŐűŻEüńWü1ďëDZ†ć}ęńwŠbŢ,>ó’Űçµ™Š_…qÝăt±+‡ĽČg­řÍ!·eŠP âńđ:ŶOážű?őë®ÁšńýĎáËTbž}|Ö…ăË[®™
You can use a regular expression to find variable assignments by looking for the equals sign. You'll need to add a reference to the Microsoft VBScript Regular Expressions 5.5 and Microsoft Visual Basic for Applications Extensibility 5.3 libraries as I've used early binding.
Please be sure to back up your work and test this before using it. I could have gotten the regex wrong.
UPDATE:
I've refined the regular expressions so that it no longer catches datatypes of strongly typed constants (Const ImAConstant As String = "Oh Noes!" previously returned String). I've also added another regex to return those constants as well. The last version of the regex also mistakenly caught things like .Global = true. That was corrected. The code below should return all variable and constant names for a given code module. The regular expressions still aren't perfect, as you'll note that I was unable to stop false positives on double quotes. Also, my array handling could be done better.
Sub printVars()
Dim linesCount As Long
Dim code As String
Dim vbPrj As VBIDE.VBProject
Dim codeMod As VBIDE.CodeModule
Dim regex As VBScript_RegExp_55.RegExp
Dim m As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim i As Long
Dim j As Long
Dim isInDatatypes As Boolean
Dim isInVariables As Boolean
Dim datatypes() As String
Dim variables() As String
Set vbPrj = VBE.ActiveVBProject
Set codeMod = vbPrj.VBComponents("Module1").CodeModule
code = codeMod.Lines(1, codeMod.CountOfLines)
Set regex = New RegExp
With regex
.Global = True ' match all instances
.IgnoreCase = True
.MultiLine = True ' "code" var contains multiple lines
.Pattern = "(\sAs\s)([\w]*)(?=\s)" ' get list of datatypes we've used
' match any whole word after the word " As "
Set matches = .Execute(code)
End With
ReDim datatypes(matches.count - 1)
For i = 0 To matches.count - 1
datatypes(i) = matches(i).SubMatches(1) ' return second submatch so we don't get the word " As " in our array
Next i
With regex
.Pattern = "(\s)([^\.\s][\w]*)(?=\s\=)" ' list of variables
' begins with a space; next character is not a period (handles "with" assignments) or space; any alphanumeric character; repeat until... space
Set matches = .Execute(code)
End With
ReDim variables(matches.count - 1)
For i = 0 To matches.count - 1
isInDatatypes = False
isInVariables = False
' check to see if current match is a datatype
For j = LBound(datatypes) To UBound(datatypes)
If matches(i).SubMatches(1) = datatypes(j) Then
isInDatatypes = True
Exit For
End If
'Debug.Print matches(i).SubMatches(1)
Next j
' check to see if we already have this variable
For j = LBound(variables) To i
If matches(i).SubMatches(1) = variables(j) Then
isInVariables = True
Exit For
End If
Next j
' add to variables array
If Not isInDatatypes And Not isInVariables Then
variables(i) = matches(i).SubMatches(1)
End If
Next i
With regex
.Pattern = "(\sConst\s)(.*)(?=\sAs\s)" 'strongly typed constants
' match anything between the words " Const " and " As "
Set matches = .Execute(code)
End With
For i = 0 To matches.count - 1
'add one slot to end of array
j = UBound(variables) + 1
ReDim Preserve variables(j)
variables(j) = matches(i).SubMatches(1) ' again, return the second submatch
Next i
' print variables to immediate window
For i = LBound(variables) To UBound(variables)
If variables(i) <> "" And variables(i) <> Chr(34) Then ' for the life of me I just can't get the regex to not match doublequotes
Debug.Print variables(i)
End If
Next i
End Sub