How to skip splitting delimited string inside a string array in vba - vba

I have a VBA macro where I need to get splitStr(1) = str2,str3; splitStr(2) = str4;....
string1 = "str1,""str2,str3"",str4,str5,str6"
splitStr = Split(string1,",")
Debug.Print splitStr(1)
"str2
Thanks

The Split(expression [,delimiter] [,limit] [,compare] ) function is searching through your string character by character looking for the delimiter or substring by substring depending on your delimiter. You have a few choices:
Use Split(string1,",") and get the resulting array in splitStr as follows:
splitStr(0) = str1
splitStr(1) = "str2
splitStr(2) = str3"
splitStr(3) = str4
splitStr(4) = str5
splitStr(5) = str6
Then use the following code to "fix" your result the way you want
splitStr(1) = Replace(splitStr(1), """", "") & Replace(splitStr(2), """", "")
'splitStr(1) now contains str2str3
'splitStr(2) still contains str3"

Split strings maintaining tokens within quotation marks
Try this approach using a temporary conversion to the ►Byte type, which allows to loop faster through strings.
Method
Basically all commata within quotation marks are temporarily converted to another character (e.g. a semicolon), so that you can split as usual by substituting eventually semicolons back to normal commata.
See further explanations in the following code example:
Sub testByte()
Const COMMA& = 44, QUOT& = 34, SEMI& = 59 ' constants for comma, quot.mark, semicolon
Dim b() As Byte, i As Long, bQU As Boolean
Dim string1$, sTemp$
' [0] string to be splitted
string1 = "str1,""str2,str3"",str4,str5,str6"
' [1] temporary conversion of commatas in quotations to another character using the BYTE type
b = string1 ' Assign string to bytes
For i = 0 To UBound(b) Step 2 ' check each bytes representation of character
If b(i) = QUOT Then bQU = IIf(bQU, False, True) ' check quotation starts
If bQU Then If b(i) = COMMA Then b(i) = SEMI ' change commata in quotation to temporary semicolons
Next i
sTemp = b ' convert bytes to normal string type
' [2] split as usual
Dim splitStr As Variant
splitStr = Split(sTemp, ",")
' [3] check all results
For i = 0 To UBound(splitStr) ' list all splitted items and ...
Debug.Print i, Replace(splitStr(i), ";", ",") ' ... change semicolons back to normal commata
Next i
End Sub
'Results in immediate window
'---------------------------
'0 str1
'1 "str2,str3"
'2 str4
'3 str5
'4 str6
Note
Of course there is some intentional redundance in variables string1 and sTemp allowing additional testings :-)

Related

How to split an unicode-string to readable characters?

I have a VBA formula-function to split a string and add space between each character. It works fines only for an Ascii string. But I want to do the same for the Tamil Language. Since it is Unicode, the result is not readable. It splits even the auxiliary characters, Upper dots, Prefix, Suffix auxilary characters which should not be separated in Tamil/Hindi/Kanada/Malayalam/All India Languages. So, how to write a function to split a Tamil Word into readable characters.
Function AddSpace(Str As String) As String
Dim i As Long
For i = 1 To Len(Str)
AddSpace = AddSpace & Mid(Str, i, 1) & " "
Next i
AddSpace = Trim(AddSpace)
End Function
Adding Space is not the important point of this question. Splitting the Unicode string into an array from any of those languages is the requirement.
For example, the word, "பார்த்து" should be separated as "பா ர் த் து", not as "ப ா ர ் த ் த ு". As you can see, the first two letters "பா" (ப + ா) are combined. If I try to manually put a space in between them, I can't do it in any word processor. If you want to test, please put it in Notepad and add space between each character. It won't allow you to separate as ("ப ா"). So "பார்த்து" should be separated as "பா ர் த் து". It is the correct separation in Tamil like languages. This is the one that I am struggling to achieve in VBA.
The Character Code table for Tamil is here.
Tamil/Hindi/many Indian languages have (1)Consonants, (2)Independent vowels, (3)Dependent vowel signs, (4)Two-part dependent vowel signs. Among these 4 types, the first two are each one separate lettter, no issues with them. but the last 2 are dependent, they should not be separated from its joint character. For example, the letter, பா (ப + ் ), it contains one independent (ப) and one dependent (ா) letter.
If this info is not enough, please comment what should I post more.
(Note: It is possible in C#.Net using the code from the MS link by #Codo)
You can assign a string to a Byte array so the following might work
Dim myBytes as Byte
myBytes = "Tamilstring"
which generates two bytes for each character. You could then create a second byte array twice the size of the first by using space$ to crate a suitable string and then use a for loop (step 4) to copy two bytes at a time from the first to the second array. Finally, assign the byte array back to a string.
The problem you have is you are looking for what Unicode calls an extended grapheme cluster.
For a Unicode compatible regex engine that is simply /\X/
Not sure how you do that in VBA.
Referring the link mentioned by #ScottCraner in comments on the question and Character code for Tamil.
Check the result in cell A2 and highlighted in yellow are Dependent vowel signs which are used in DepVow string
Sub Split_Unicode_String()
'https://stackoverflow.com/questions/68774781/how-to-split-an-unicode-string-to-readable-characters
Dim my_string As String
'input string
Dim buff() As String
'array of input string characters
Dim DepVow As String
'Create string of Dependent vowel signs
Dim newStr As String
'result string with spaces as desired
Dim i As Long
my_string = Range("A1").Value
ReDim buff(Len(my_string) - 1) 'array of my_string characters
For i = 1 To Len(my_string)
buff(i - 1) = Mid$(my_string, i, 1)
Cells(1, i + 2) = buff(i - 1)
Cells(2, i + 2) = AscW(buff(i - 1)) 'used this for creating DepVow below
Next i
'Create string of Dependent vowel signs preceded and succeeded by comma
DepVow = "," & Join(Array(ChrW$(3006), ChrW$(3021), ChrW$(3009)), ",")
newStr = ""
For i = LBound(buff) To UBound(buff)
If InStr(1, DepVow, ChrW$(AscW(buff(i + 1))), vbTextCompare) > 0 Then
newStr = newStr & ChrW$(AscW(buff(i))) & ChrW$(AscW(buff(i + 1))) & " "
i = i + 1
Else
newStr = newStr & ChrW$(AscW(buff(i))) & " "
End If
Next i
'result string in range A2
Cells(2, 1) = Left(newStr, Len(newStr) - 1)
End Sub
Try below algorithm. which will concat all the mark characters with letter characters.
redim letters(0)
For i=1 To Len(Str)
If ascW(Mid(Str,i,1)) >3005 And ascW(Mid(Str,i,1)) <3022 Then
letters(UBound(letters)-1) = letters(UBound(letters)-1)+Mid(Str,i,1)
Else REDIM PRESERVE
letters(UBound(letters) + 1)
letters(UBound(letters)-1) = Mid(Str,i,1)
End If
Next
MsgBox(join(letters, ", "))'return பா, ர், த், து,

select only text between quotes in VB

Is it some function which can return me text between qoutes.? Text before and between quotes is variable length. I find a function mid but in specify is length. I would like to get from this string text between both quots(APP_STATUS_RUNNING) and (PRIMARY):
string: Error gim_icon_cfg_1 Application with DBID 736 has status "APP_STATUS_RUNNING", runmode "PRIMARY"
Thank you
EDIT: I try to get output to the label but show me error:BC30452: Operator '&' is not defined for types 'String' and '1-dimensional array of String'.
Dim output As String = myProcess.StandardOutput.ReadToEnd()
Dim StandardError As String = myProcess.StandardError.ReadToEnd()
Dim Splitted() As String
Splitted = Split(output, """")
Label1.text="Ahoj " & Splitted & "Error " & StandardError
You can split your string by quotes beeing the delimiter Split(InputString, """") the odd numbers of the output array then are the strings between quotes, the even numbers are the rest.
Option Explicit
Public Sub Example()
Dim InputString As String
InputString = "Error gim_icon_cfg_1 Application with DBID 736 has status ""APP_STATUS_RUNNING"", runmode ""PRIMARY"""
Dim Splitted() As String
Splitted = Split(InputString, """")
' between quotes (all odd numbers)
Debug.Print Splitted(1) ' APP_STATUS_RUNNING
Debug.Print Splitted(3) ' PRIMARY
' rest (all even numbers)
Debug.Print Splitted(0) ' Error gim_icon_cfg_1 Application with DBID 736 has status
Debug.Print Splitted(2) ' , runmode
End Sub
Or:
The follow code gives you what you are trying to have.
In practice you can search patterns that are between quotes then get from each element the first element sliced by quote (which means quote = 0 element = 1 other quote = 2)
Dim s As String = "Error gim_icon_cfg_1 Application With DBID 736 has status ""APP_STATUS_RUNNING"", runmode ""PRIMARY"""
Dim parts() As String = s.Split(" "c).Where(Function(el) el Like "*""*""*").Select(Function(el) el.Split(""""c)(1)).ToArray

Replace characters in an XML file of about 30000 characters

I have an XML file with about 30000 characters I want to put into a String variable. In that variable, I want to replace certain characters using Replace function.
In the below code neither Replace nor Instr detect the characters I'm looking for. When I shorten the XML to around 3000 characters, the code works.
Strings can be up to 2 billion characters long, so what's going on here?
The only thing that comes to mind is that str is a String and the Instr/Replace require a String Expression as an argument. Is there a limit on the length of String Expression these functions can handle?
Dim str As String
Dim arrInvalidChars() As String
Dim arrValidChars() As String
Dim i As Integer
arrInvalidChars = Split(Expression:="ä, ü, ö, ß, é, ç", Delimiter:=", ")
arrValidChars = Split(Expression:="ae, ue, oe, ss, e, c", Delimiter:=", ")
For i = LBound(arrInvalidChars) To UBound(arrInvalidChars)
On Error Resume Next
Debug.Print InStr(1, str, arrInvalidChars(i), vbTextCompare)
str = Replace(str, arrInvalidChars(i), arrValidChars(i), , , vbTextCompare)
On Error GoTo 0
Next i

VBA String Delimiter

How can I split a string in VBA after a certain amount of the same delimiter?
For example : {"Josh","Green"},{"Peter","John"}.
Here I would like {"Josh","Green"} as the first record in an array and {"Peter","John"} as the second. I want to avoid parsing the string character by character.
There are several ways to do this, my suggestion:
Replace },{ with something else before the split, to create a new delimiter.
For example:
Option Explicit
Sub Test()
Const c As String = "{""Josh"",""Green""},{""Peter"",""John""}"
Dim s As String
Dim v As Variant
s = Replace(c, "},{", "}#,#{", 1)
v = Split(s, "#,#")
Debug.Print v(0) '{"Josh","Green"}
Debug.Print v(1) '{"Peter","John"}
End Sub
That will split s into a Variant-array v with two strings, v(0) and v(1), instead of four strings, which you would get if you split the original string with just , as a delimiter.

Comparing character only to character at end of string

I am writing a program in Visual Basic 2010 that lists how many times a word of each length occurs in a user-inputted string. Although most of the program is working, I have one problem:
When looping through all of the characters in the string, the program checks whether there is a next character (such that the program does not attempt to loop through characters that do not exist). For example, I use the condition:
If letter = Microsoft.VisualBasic.Right(input, 1) Then
Where letter is the character, input is the string, and Microsoft.VisualBasic.Right(input, 1) extracts the rightmost character from the string. Thus, if letter is the rightmost character, the program will cease to loop through the string.
This is where the problems comes in. Let us say the string is This sentence has five words. The rightmost character is an s, but an s is also the fourth and sixth character. That means that the first and second s will break the loop just as the others will.
My questions is whether there is a way to ensure that only the last s, or whatever character is the last one in the string can break the loop.
There are a few methods you can use for this, one as Neolisk shows; here are a couple of others:
Dim breakChar As Char = "s"
Dim str As String = "This sentence has five words"
str = str.Replace(".", " ")
str = str.Replace(",", " ")
str = str.Replace(vbTab, " ")
' other chars to replace
Dim words() As String = str.ToLower.Split(New Char() {" "}, StringSplitOptions.RemoveEmptyEntries)
For Each word In words
If word.StartsWith(breakChar) Then Exit For
Console.WriteLine("M1 Word: ""{0}"" Length: {1:N0}", word, word.Length)
Next
If you need to loop though chars for whatever reason, you can use something like this:
Dim breakChar As Char = "s"
Dim str As String = "This sentence has five words"
str = str.Replace(".", " ")
str = str.Replace(",", " ")
str = str.Replace(vbTab, " ")
' other chars to replace
'method 2
Dim word As New StringBuilder
Dim words As New List(Of String)
For Each c As Char In str.ToLower.Trim
If c = " "c Then
If word.Length > 0 'support multiple white-spaces (double-space etc.)
Console.WriteLine("M2 Word: ""{0}"" Length: {1:N0}", word.ToString, word.ToString.Length)
words.Add(word.ToString)
word.Clear()
End If
Else
If word.Length = 0 And c = breakChar Then Exit For
word.Append(c)
End If
Next
If word.Length > 0 Then
words.Add(word.ToString)
Console.WriteLine("M2 Word: ""{0}"" Length: {1:N0}", word.ToString, word.ToString.Length)
End If
I wrote these specifically to break on the first letter in a word as you ask, adjust as needed.
VB.NET code to calculate how many times a word of each length occurs in a user-inputted string:
Dim sentence As String = "This sentence has five words"
Dim words() As String = sentence.Split(" ")
Dim v = From word As String In words Group By L = word.Length Into Group Order By L
Line 2 may need to be adjusted to remove punctuation characters, trim extra spaces etc.
In the above example, v(i) contains word length, and v(i).Group.Count contains how many words of this length were encountered. For debugging purposes, you also have v(i).Group, which is an array of String, containing all words belonging to this group.