Extracting text from string between two identical characters using VBA - vba

Let's say I have the following string within a cell:
E. Stark, T. Lannister, A. Martell, P Baelish, B. Dondarrion, and J. Mormont. Increased levels of nudity across Westeros contributes to its sporadic seasonal climate. Nat. Proc. Aca. Sci. (2011) 3: 142-149.
And I want to extract only the title from this. The approach I am considering is to write a script that says "Pull text from this string, but only if it is more than 50 characters long." This way it only returns the title, and not stuff like " Stark, T" and " Martell, P". The code I have so far is:
Sub TitleTest()
Dim txt As String
Dim Output As String
Dim i As Integer
Dim rng As Range
Dim j As Integer
Dim k As Integer
j = 5
Set rng = Range("A" & j) 'text is in cell A5
txt = rng.Value 'txt is string
i = 1
While j <= 10 'there are five references between A5 and A10
k = InStr(i, txt, ".") - InStr(i, txt, ". ") + 1 'k is supposed to be the length of the string returned, but I can't differenciate one "." from the other.
Output = Mid(txt, InStr(i, txt, "."), k)
If Len(Output) < 100 Then
i = i + 1
ElseIf Len(Output) > 10 Then
Output = Mid(txt, InStr(i, txt, "."), InStr(i, txt, ". "))
Range("B5") = Output
j = j + 1
End If
Wend
End Sub
Of course, this would work well if it wasn't two "." I was trying to full information from. Is there a way to write the InStr function in such a way that it won't find the same character twice? Am I going about this in the wrong way?
Thanks in advance,
EDIT: Another approach that might work (if possible), is if I could have one character be " any lower case letter." and ".". Would even this be possible? I can't find any example of how this could be achieved...

Here you go, it works exactly as you wish. Judging from your code I am sure that you can adapt it for your needs quite quickly:
Option Explicit
Sub ExtractTextSub()
Debug.Print ExtractText("E. Stark, T. Lannister, A. Martell, P Baelish, B. Dondarrion, and J. Mormont. Increased levels of nudity across Westeros contributes to its sporadic seasonal climate. Nat. Proc. Aca. Sci. (2011) 3: 142-149.")
End Sub
Public Function ExtractText(str_text As String) As String
Dim arr As Variant
Dim l_counter As Long
arr = Split(str_text, ".")
For l_counter = LBound(arr) To UBound(arr)
If Len(arr(l_counter)) > 50 Then
ExtractText = arr(l_counter)
End If
Next l_counter
End Function
Edit: 5 votes in no time made me improve my code a bit :) This would return the longest string, without thinking of the 50 chars. Furthermore, on Error handlaer and a constant for the point. Plus adding a point to the end of the extract.
Option Explicit
Public Const STR_POINT = "."
Sub ExtractTextSub()
Debug.Print ExtractText("E. Stark, T. Lannister, A. Martell, P Baelish, B. Dondarrion, and J. Mormont. Increased levels of nudity across Westeros contributes to its sporadic seasonal climate. Nat. Proc. Aca. Sci. (2011) 3: 142-149.")
End Sub
Public Function ExtractText(str_text As String) As String
On Error GoTo ExtractText_Error
Dim arr As Variant
Dim l_counter As Long
Dim str_longest As String
arr = Split(str_text, STR_POINT)
For l_counter = LBound(arr) To UBound(arr)
If Len(arr(l_counter)) > Len(ExtractText) Then
ExtractText = arr(l_counter)
End If
Next l_counter
ExtractText = ExtractText & STR_POINT
On Error GoTo 0
Exit Function
ExtractText_Error:
MsgBox "Error " & Err.Number & Err.Description
End Function

Related

Split text into 80 character lines, issue with last line

I'm trying to take a body of text and add line breaks around 80 characters on each line. The issue I'm having is on the last line it's adding an extra line break than would be desired. For instance this string should not have a line break on the second to last line:
Alice was beginning to get very tired of sitting by her sister on the bank, and
of having nothing to do: once or twice she had peeped into the book her sister
was reading, but it had no pictures or conversations in it, and what is the use
of a book, thought Alice without pictures or
conversations?
should look like this (note "conversations" has been moved up):
Alice was beginning to get very tired of sitting by her sister on the bank, and
of having nothing to do: once or twice she had peeped into the book her sister
was reading, but it had no pictures or conversations in it, and what is the use
of a book, thought Alice without pictures or conversations?
Here's the code:
Sub StringChop()
Dim OrigString As String
Dim NewString As String
Dim counter As Long
Dim length As Long
Dim LastSpace As Long
Dim LineBreak As Long
Dim TempString As String
Dim TempNum As Long
OrigString = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book, thought Alice without pictures or conversations?"
length = Len(OrigString)
counter = 1
Do While counter < length
'Extract next 80 characters from last position
TempString = Mid(OrigString, counter, 80)
'Determine last space in string
LastSpace = InStrRev(TempString, " ")
'Determine first line break in string
LineBreak = InStr(TempString, vbNewLine)
'If line break exists in sentence...
'only count characters up to line break, and set counter to that amount
Select Case LastSpace 'What to do if there are spaces in sentence
Case Is > 0 'There are spaces in sentence
Select Case LineBreak 'What to do if there are line breaks in sentence
Case Is = 0
'From last counter position,
NewString = NewString & Mid(OrigString, counter, LastSpace) & vbNewLine
counter = counter + LastSpace
Case Is <> 0
NewString = NewString & Mid(OrigString, counter, LineBreak)
counter = counter + LineBreak
End Select
Case Is = 0 'There are no more spaces left in remaining sentence
NewString = NewString & Mid(OrigString, counter)
counter = length
End Select
Loop
Debug.Print NewString
End Sub
Word wrapping is an interesting problem. I wrote the following code once as an experiment. You might find it helpful:
Option Explicit
'Implements a dynamic programming approach to word wrap
'assumes fixed-width font
'a word is defined to be a white-space delimited string which contains no
'whitespace
'the cost of a line is the square of the number of blank spaces at the end
'of a line
Const INFINITY As Long = 1000000
Dim optimalCost As Long
Function Cost(words As Variant, i As Long, j As Long, L As Long) As Long
'words is a 0-based array of strings, assumed to have no white spaces
'i, j are indices in range 0,...,n, where n is UBOUND(words)+1
'L is the maximum length of a line
'Cost returns the cost of a line which begins with words(i) and ends with
'words(j-1). It returns INFINITY if the line is too short to hold the words
'or if j <= i
Dim k As Long
Dim sum As Long
If j <= i Or Len(words(i)) > L Then
Cost = INFINITY
Exit Function
End If
sum = Len(words(i))
k = i + 1
Do While k < j And sum <= L
sum = sum + 1 + Len(words(k)) 'for space
k = k + 1
Loop
If sum > L Then
Cost = INFINITY
Else
Cost = (L - sum) ^ 2
End If
End Function
Function WordWrap(words As Variant, L As Long) As String
'returns string consisting of words with spaces and
'line breaks inserted at the appropriate places
Dim v() As Long, d() As Long
Dim n As Long
Dim i As Long, j As Long
Dim candidate As Long
n = UBound(words) + 1
ReDim v(0 To n)
ReDim d(0 To n)
v(0) = 0
d(0) = -1
For j = 1 To n
v(j) = INFINITY 'until something better is found
i = j - 1
Do
candidate = v(i) + Cost(words, i, j, L)
If candidate < v(j) Then
v(j) = candidate
d(j) = i
End If
i = i - 1
Loop While i >= 0 And candidate < INFINITY
If v(j) = INFINITY Then
MsgBox "Some words are too long for the given length"
Exit Function
End If
Next j
optimalCost = v(n)
'at this stage, optimal path has been found
'just need to follow d() backwards, inserting line breaks
i = d(n) 'beginning of current line
WordWrap = words(n - 1)
j = n - 2
Do While i >= 0
Do While j >= i
WordWrap = words(j) & " " & WordWrap
j = j - 1
Loop
If i > 0 Then WordWrap = vbCrLf & WordWrap
i = d(i)
Loop
End Function
The above function expects an array of words. You would have to split a string before using it as input:
Sub test()
Dim OrigString As String
OrigString = "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book, thought Alice without pictures or conversations?"
Debug.Print WordWrap(Split(OrigString), 80)
End Sub
Output:
Alice was beginning to get very tired of sitting by her sister on the bank,
and of having nothing to do: once or twice she had peeped into the book
her sister was reading, but it had no pictures or conversations in it, and
what is the use of a book, thought Alice without pictures or conversations?

Application.Match not exact value

Have a piece of code that looks for matches between 2 sheets (sheet1 is customer list and rData is copied pdf with invoices). It usually is exact match but in some cases I'm looking for 6 first characters that matches rData
Dim rData As Variant
Dim r As Variant
Dim r20 As Variant
Dim result As Variant
Dim i As Long
rData = ActiveWorkbook.Sheets(2).Range("A1:A60000")
r20 = ActiveWorkbook.Sheets(1).Range("C2:C33")
For Each r In r20
result = Application.Match(r, rData, 0)
If Not IsError(result) Then
For i = 1 To 5
If (result - i) > 0 Then
If (Left(Trim(rData(result - i, 1)), 3) = "418") Then
MsgBox "customer: " & r & ". invoice: " & rData(result - i, 1)
End If
End If
Next
For i = 1 To 15
If (result + i) > 0 Then
If (Left(Trim(rData(result + i, 1)), 3) = "418") Then
MsgBox "customer: " & r & ". invoice: " & rData(result + i, 1)
End If
End If
Next
End If
Next r
End Sub
Only part of this that is giving me a headache is this part result = Application.Match(r, rData, 0). How do it get match for not exact match?
Sample of Sheet1
This is what more or less looks like. Matching after CustomerNumber# is easy because they are the same every invoice. BUT sometimes invoice does not have it so I'm searching after CustomerName and sometimes they have uppercase letters, sometimes there is extra stuff behind it and therefore it cannot find exact match.
Hope it makes sense.
To match the customer name from your customer list to the customer name in the invoice even if it has extra characters appended, you can use the wildcard * in Match().
You also have a typo in the Match() function. r20 should be rData.
This is your code with the fixes applied:
Sub Test()
'v4
Dim rData As Variant
Dim r As Variant
Dim r20 As Variant
Dim result As Variant
Dim i As Long
rData = ActiveWorkbook.Sheets(2).Range("A1:A60000")
r20 = ActiveWorkbook.Sheets(1).Range("C2:C33")
For Each r In r20
result = Application.Match(r & "*", rData, 0) ' <~ Fixed here
If Not IsError(result) Then
For i = 1 To 5
If (result - i) > 0 Then
If (Left(Trim(rData(result - i, 1)), 3) = "418") Then
MsgBox "customer: " & r & ". invoice: " & rData(result - i, 1)
End If
End If
Next
For i = 1 To 15
If (result + i) > 0 Then
If (Left(Trim(rData(result + i, 1)), 3) = "418") Then
MsgBox "customer: " & r & ". invoice: " & rData(result + i, 1)
End If
End If
Next
End If
Next r
End Sub
Notes:
Match() is case insensitive, so it works with different capitalisations.
The data in Sheets(2) must all be text for Match() to work correctly with wildcards.
EDIT1: New better version
EDIT2: Refactored constants and made data ranges dynamic
EDIT3: Allows for any prefix to an invoice number of a fixed length
The following is a better, rewritten version of your code:
Sub MuchBetter()
'v3
Const s_InvoiceDataWorksheet As String = "Sheet2"
Const s_InvoiceDataColumn As String = "A:A"
Const s_CustomerWorksheet As String = "Sheet1"
Const s_CustomerStartCell As String = "C2"
Const s_InvoiceNumPrefix As String = "418"
Const n_InvoiceNumLength As Long = 8
Const n_InvScanStartOffset As Long = -5
Const n_InvScanEndOffset As Long = 15
Dim ƒ As Excel.WorksheetFunction: Set ƒ = Excel.WorksheetFunction ' Shortcut
With Worksheets(s_InvoiceDataWorksheet).Range(s_InvoiceDataColumn)
With .Parent.Range(.Cells(1), .Cells(Cells.Rows.Count).End(xlUp))
Dim varInvoiceDataArray As Variant
varInvoiceDataArray = ƒ.Transpose(.Cells.Value2)
End With
End With
With Worksheets(s_CustomerWorksheet).Range(s_CustomerStartCell)
With .Parent.Range(.Cells(1), .EntireColumn.Cells(Cells.Rows.Count).End(xlUp))
Dim varCustomerArray As Variant
varCustomerArray = ƒ.Transpose(.Cells.Value2)
End With
End With
Dim varCustomer As Variant
For Each varCustomer In varCustomerArray
Dim dblCustomerIndex As Double
dblCustomerIndex = Application.Match(varCustomer & "*", varInvoiceDataArray, 0)
If Not IsError(dblCustomerIndex) _
And varCustomer <> vbNullString _
Then
Dim i As Long
For i = ƒ.Max(dblCustomerIndex + n_InvScanStartOffset, 1) _
To ƒ.Min(dblCustomerIndex + n_InvScanEndOffset, UBound(varInvoiceDataArray))
Dim strInvoiceNum As String
strInvoiceNum = Right$(Trim$(varInvoiceDataArray(i)), n_InvoiceNumLength)
If (Left$(strInvoiceNum, Len(s_InvoiceNumPrefix)) = s_InvoiceNumPrefix) Then
MsgBox "customer: " & varCustomer & ". invoice: " & strInvoiceNum
End If
Next
End If
Next varCustomer
End Sub
Notes:
It is a good idea to use constants so all literal values are typed once only and kept grouped together.
Using the RVBA naming convention greatly increases the readability of the code, and reduces the likelihood of bugs.
Using long, appropriately named variables makes the code essentially self-documenting.
Using .Value2 whenever reading cell values is highly recommended (it avoids implicit casting, making it slightly faster as well as eliminating certain issues caused by the casting ).
Surprisingly, in VBA there are good reasons to put a variable declaration as close as possible to the first use of the variable. Two such reasons are 1) it improves readability, and 2) it simplifies future refactoring. Just remember that the variable is not reinitialised every time the Dim is encountered. Initialisation only occurs the first time.
The twin loops have been rolled into one according to the DRY principle.
Whilst the check for an empty customer name/number is not strictly necessary if you can guarantee it will never be so, it is good defensive programming as an empty value will cause erroneous results.
The negative index check inside the loop has been removed and replaced with the one-time use of the Max() worksheet function in the For statement.
The Min() worksheet function is also used in the For statement to avoid trying to read past the end of the array.
Always use worksheet functions on the WorksheetFunction object unless you are explicitly checking for errors, in which case use the Application object.

Excel VBA - Why does this arithmetic comparison of a split string containing numbers work?

I'm wondering why the below code works as I hoped for, considering that I'm splitting a string into an array (that's also defined as a string), and afterwards comparing it in an arithmetic (numeric) way.
Option Explicit
Sub test()
Dim str As String, arr() As String
Dim num As Integer, i As Integer
str = "12 9 30"
num = 20
arr() = Split(str, " ")
For i = LBound(arr) To UBound(arr)
If arr(i) > num Then
MsgBox (arr(i) & " is larger than " & num)
End If
Next i
End Sub
As intended the msgBox within the if statement is fired, showing that:
12 isn't larger than 20
9 isn't larger than 20
30 is larger than 20
I didn't know/think that such comparison could work as hoped as i'm basically comparing a string to an integer. I assume there's something i'm not aware of, but in that case, what is it?
PS. I was a bit in doubt regarding which forum to post in, but based my choice on this meta question
For answer please refer to the following article: https://msdn.microsoft.com/en-us/library/aa263418(v=vs.60).aspx
In short if you compare string to numeric type variable, string variable is converted to double* type.
*double based on the information from VB .net comparison operators reference (https://learn.microsoft.com/en-us/dotnet/visual-basic/language-reference/operators/comparison-operators), VB 6.0, VBA and VBA .net are not the same things, however comparison logic should be the same.
VBA seems to be implicitly converting the data type during run-time.
Consider following code which also works.
Sub test2()
Dim str As String, arr() As String, num As String
Dim i As Integer
str = "12 9 30"
num = 12 '\\ Note the way number is being passed.
arr() = Split(str, " ")
For i = LBound(arr) To UBound(arr)
If arr(i) = num Then
MsgBox (arr(i) & " is equal to " & num)
End If
Next i
End Sub
And then below one where arithmetic operation is coercing it to be numeric at run-time.
Sub test3()
Dim str As String, arr() As String, num As String
Dim i As Integer
str = "12 9 30"
num = 12
arr() = Split(str, " ")
For i = LBound(arr) To UBound(arr)
If (arr(i) - num) > 0 Then
MsgBox (arr(i) & " is greater than " & num)
End If
Next i
End Sub
I know it will not answer your question fully but might explain why it is giving correct result. It is advisable to convert to correct data type rather than relying on defaults i.e.
If CInt(arr(i)) > num Then

Identify Capital Letter in grouped words and insert a comma and space

I am working on a task where I have to copy/paste the content from website into excel.
But the problem is when I copy/paste the content in excel, it appears like this :
Los AngelesNew YorkSilicon Valley
Consumer InternetMobileB2BEnterprise SoftwareE-CommerceMarketplacesSocial
Let s call Los Angeles an item which is merged with another item New York and I want to separate these items so that information is readable like this:
Los Angeles, New York, Silicon Valley
Consumer Internet, Mobile, B2B, Enterprise Software, E-Commerce, Marketplaces, Social
When I noticed I actually realized that on website (due to some technical reason) I was unable to copy the comma between items and therefore every other item was merged with a capital letter with previous item.
Now please help me know is there an intelligent way to solve this problem because there are hundred of entries. What I see is this is how this problem can be solved:
Identify a capital letter which is not after a space and has small letter previous to it.
Insert a comma and space at that place and continue with the remaining string.
Please feel free to elaborate if this won't work and if there is an alternative solution. VBA code/ Excel Formula - anything that can help me automate it. Thanks.
With "B2B" it would be a bit tougher, but it works pretty well with the others:
Public Sub TestMe()
Debug.Print insert_a_space("Los AngelesNew YorkSilicon Valley")
Debug.Print insert_a_space("Consumer InternetMobileB2BEnterprise SoftwareE-CommerceMarketplacesSocial")
End Sub
Public Function insert_a_space(my_str As String)
Dim my_char As String
Dim l_counter As Long
Dim str_result As String
For l_counter = 1 To Len(my_str)
my_char = Mid(my_str, l_counter, 1)
If Asc(my_char) >= 65 And Asc(my_char) <= 90 Then
If l_counter > 1 Then
If Asc(Mid(my_str, (l_counter - 1), 1)) <> 32 And _
Asc(Mid(my_str, (l_counter - 1), 1)) <> 45 Then
str_result = str_result & ", "
End If
End If
End If
str_result = str_result & my_char
Next l_counter
insert_a_space = str_result
End Function
The logic is that you run TestMe. Or use as an Excel function insert_a_space and then give the string. The function looks for big letters (between 65 and 90 asc) and if there is no space or - before the big letter (asc 32) and (asc 45), it writes a comma with a space to the answer.
Edit:
Workaround SaaS and B2B
The idea is to introduce an escape symbol. Thus, whenever we see "\" we ignore it. This escape symbol is introduced through str_replace_me and should be explicitly written for which options it is.
Public Sub TestMe()
Dim str_1 As String
Dim str_2 As String
str_1 = "Los AngelesNew YorkSilicon Valley"
str_2 = "Consumer InternetMobileB2BEnterprise SoftwareE-CommerceMarketplacesSocialSaaS"
Debug.Print insert_a_space(str_replace_me(str_1))
Debug.Print insert_a_space(str_replace_me(str_2))
End Sub
Public Function str_replace_me(my_str As String) As String
str_replace_me = Replace(my_str, "SaaS", "Saa\S")
str_replace_me = Replace(str_replace_me, "B2B", "B2\B")
End Function
Public Function insert_a_space(my_str As String)
Dim my_char As String
Dim l_counter As Long
Dim str_result As String
For l_counter = 1 To Len(my_str)
my_char = Mid(my_str, l_counter, 1)
If Asc(my_char) >= 65 And Asc(my_char) <= 90 Then
If l_counter > 1 Then
If Asc(Mid(my_str, (l_counter - 1), 1)) <> 32 And _
Asc(Mid(my_str, (l_counter - 1), 1)) <> 45 And _
Asc(Mid(my_str, (l_counter - 1), 1)) <> 92 Then
str_result = str_result & ", "
End If
End If
End If
str_result = str_result & my_char
Next l_counter
str_result = Replace(str_result, "\", "")
insert_a_space = str_result
End Function
Please paste this code in VBA module.
Function AddSpaces(pValue As String) As String
Dim xOut As String
xOut = VBA.Left(pValue, 1)
For i = 2 To VBA.Len(pValue)
xAsc = VBA.Asc(VBA.Mid(pValue, i, 1))
If xAsc >= 65 And xAsc <= 90 Then
xOut = xOut & "," & " " & VBA.Mid(pValue, i, 1)
Else
xOut = xOut & VBA.Mid(pValue, i, 1)
End If
Next
AddSpaces = xOut
End Function
After that go to your spreadsheet and enter this formula =addspaces(A1)
Copy the formula to all the cells that you want to change.
You can copy the content from the website and paste the same into a notepad. Then copy the content from the notepad and paste it into the Excel.

When does VBA change variable type without being asked to?

I am getting a runtime error I don't understand in Excel 2011 for Mac under OS X 10.7.5. Here is a summary of the code:
Dim h, n, k as Integer
Dim report as Workbook
Dim r1 as Worksheet
Dim t, newline as String
Dim line() as String
newline = vbCr
'
' (code to get user input from a text box, to select a worksheet by number)
'
ReDim line(report.Sheets.Count + 10)
MsgBox "Array line has " & UBound(line) & " elements." '----> 21 elements
line = split(t, newline)
h = UBound(line)
MsgBox "Array line has " & h & " elements." '----> 16 elements
n = 0
MsgBox TypeName(n) '----> Integer
For k = h To 1 Step -1
If IsNumeric(line(k)) Then
n = line(k)
Exit For
End If
Next k
If n > 0 Then
MsgBox n '----> 7
MsgBox TypeName(n) '----> String
Set r1 = report.Sheets(n) '----> Runtime error "Subscript out of bounds"
So n is declared as an integer, but now VBA thinks it is a string and looks for a worksheet named "7". Is this a platform bug, or is there something I haven't learned yet?
It also surprises me that putting data into the dynamic array reduces its dimension, but perhaps that is normal, or perhaps for dynamic arrays Ubound returns the last used element instead of the dimension, although I have not seen that documented.
The first part of your question is answered by #ScottCraner in the comments - the correct syntax for declaring multiple strongly typed variables on one line is:
Dim h As Integer, n As Integer, k As Integer
'...
Dim t As String, newline As String
So, I'll address the second part of your question specific to UBound - unless you've declared Option Base 1 at the top of the module, your arrays start at element 0 by default, not element 1. However, the Split function always returns a 0 based array (unless you split a vbNullString, in which case you get a LBound of -1):
Private Sub ArrayBounds()
Dim foo() As String
'Always returns 3, regardless of Option Base:
foo = Split("zero,one,two,three", ",")
MsgBox UBound(foo)
ReDim foo(4)
'Option Base 1 returns 1,4
'Option Base 0 (default) returns 0,3
MsgBox LBound(foo) & "," & UBound(foo)
End Sub
That means this line is extremely misleading...
h = UBound(line)
MsgBox "Array line has " & h & " elements."
...because the Array line actually has h + 1 elements, which means that your loop here...
For k = h To 1 Step -1
If IsNumeric(line(k)) Then
n = line(k)
Exit For
End If
Next k
...is actually skipping element 0. You don't really even need the h variable at all - you can just make your loop parameter this...
For k = UBound(line) To LBound(line) Step -1
If IsNumeric(line(k)) Then
n = line(k)
Exit For
End If
Next k
...and not have to worry what the base of the array is.
BTW, not asked, but storing vbCr as a variable here...
newline = vbCr
...isn't necessary at all, and opens the door for all kinds of other problems if you intend that a "newline" is always vbCr. Just use the pre-defined constant vbCr directly.