Split Text and number IN VBA - vba

I have column header which I want to split
Heading
XA 2009
WW YY 2010
XXA 2011
I Want output like
XA,
WW YY,
XXA
Earlier I was using find function in excel which was working fine
=MID("XA 2009",1,FIND(" ","XA 2009",FIND(" ","XA 2009")+1)-1)
OUTPUT AS XA,
WW YY
Now requirement has change to code in vba
I was trying to use Instr() instead of find as it is not working in VBA
Mid("XA 2009", 1, InStr(1, "XA 2009", " ", InStr(1, "XA 2009", "2")) - 1)
Now the output is XA,
WW instead of WW YY.
Can anyone suggest what I am doing wrong. I am pretty new to vba.
I Want output like
XA,
WW YY,
XXA
I am using excel 2013

First, see in the answer for the following SO question, the general approach and prerequisites for using Regex search in VBA:
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Now, as for your specific requirement, try the following pattern:
(\D*)\s+\d*\s+(\D*)\s+\d*\s+(\D*)\s+\d*
It will work for your precise example, but if you need the input string to be a bit more general you might need to modify the pattern.
Some explanations:
\D* will match one or more non numerical text characters ("alpha character")
\s+ will match at least one space character
\d* will match one or more numerical digits
the (parenthesis) are for grouping sets of results, so I used them to surround what you wish to extract from the input string.
If for example you know for sure that there's only one white-space character you can use:
[\s]
So the pattern might look like:
(\D*)[\s]\d*[\s](\D*)[\s]\d*[\s](\D*)[\s]\d*
Also, this is a great tool for online pattern testing:
https://regex101.com/
This is the solution for your edited requirement:
In the VBA editor, go to tools=>references, find and select the checkbox next to "Microsoft VBScript Regular Expressions 5.5", press ok
add this code to "ThisWorkbook" module:
Private Sub solution()
Dim regEx As New RegExp
Dim strPattern As String
Dim myInput As Range
Dim myOutput As Range
Set myInput = ActiveSheet.Range("A1")
Set myOutput = ActiveSheet.Range("A2")
strPattern = "(\D*)[\s]\d*[\s](\D*)[\s]\d*[\s](\D*)[\s]\d*"
strInput = myInput.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
ActiveSheet.Range("A2") = regEx.Replace(strInput, "$1, $2, $3")
End If
End Sub

You probably had that formula in a specific cell, right? I think the following should do:
Range("yourcell").FormulaR1C1 = "=MID("XA 2009",1,FIND(" ","XA 2009",FIND(" ","XA 2009")+1)-1)"
Just replace "yourcell" with the cell number you had the formula in. So if you had it in cell A1, for example, it should be Range("A1")

Related

extract airlines from flight numbers strings in excel

I have problem of extracting two-character code from the string format like:
"VA198-VA200-VA197"
I just want to get the string:
"VA-VA-VA"
Also the data I have are not just in one format, some data is like:
"DL123-DL245"
or
"DL123-VA345-HU12-OZ123"
Does anyone know how to do it fast in excel? Thanks.
With data in A1, in B1 enter the array formula:
=TEXTJOIN("",TRUE,IF(ISERR(MID(A1,ROW(INDIRECT("1:100")),1)+0),MID(A1,ROW(INDIRECT("1:100")),1),""))
NOTE:
The formula strips out all numeric characters, leaving only the alphas and the dash.
Array formulas must be entered with Ctrl + Shift + Enter rather than just the Enter key. If this is done correctly, the formula will appear with curly braces around it in the Formula Bar.
There are a couple of ways you can approach this depending on how many possible segments their are in your string. If we assume your flight number is in A1:
First Segment: =LEFT(A1,2)
Second Segment: =MID(A1,FIND("-",A1)+1,2)
Third Segment: =MID(A1,FIND("-",A1,FIND("-",A1)+1)+1,2)
You could then concatenate the three expressions together and add a fourth with some conditionals. The problem is that based on your information you can have anywhere from 1 to 4 (at least) names which means you'll need a conditional:
Second Segment: =IF(ISERR(FIND("-",A1)),"",MID(A1,FIND("-",A1)+1,2))
Adding in the separators we end up with something like this for up to four segements:
=CONCATENATE(LEFT(A1,2),IF(ISERR(FIND("-",A1)),"",CONCATENATE("-",MID(A1,FIND("-",A1)+1,2))),IF(ISERR(FIND("-",A1,FIND("-",A1)+1)),"",CONCATENATE("-",MID(A1,FIND("-",A1,FIND("-",A1)+1)+1,2))),IF(ISERR(FIND("-",A1,FIND("-",A1,FIND("-",A1)+1)+1)),"",CONCATENATE("-",MID(A1,FIND("-",A1,FIND("-",A1,FIND("-",A1)+1)+1)+1,2))))
This will give you everything in one field.
Here is a VBA type answer.Assuming all strings are structured in the same way. Meaning Two letters followed by numbers and separated with "-". If one such string is A1, and you want to write the result to B1:
Sub BreakStrings()
d = Split(Range("A1"), "-")
For i = LBound(d) To UBound(d)
d(i) = Left(d(i), 2)
Next i
strg = Join(d, "-")
Range("B1") = strg
End Sub
User-defined function (UDF):
Function GetVal(cell)
With CreateObject("VBScript.RegExp")
.Global = True: .Pattern = "(\w{2})(.+?)(?=-|$)"
GetVal = .Replace(cell, "$1")
End With
End Function

convert an integer variable to a string with leading zeros in VBA

For starters, there are LOTS of questions that have been asked with this topic. However all the ones I kept clicking on were in languages other than VBA and I did not understand the syntax of those languages.
When I did a google search I found this answer which seemed promising. AH FIDDLE STICKS! I just realized that answer for VB and probably explains why its not working in my VBA
Situation
I have a variable called DimScale that is an integer. I want to create a string called DimName that will start with "mm-" and be following by the integer from DimScale with leading 0s such that there are a minimum of characters after "mm-".
IF DimScale = 25
Then DimName = "mm-0025"
IF DimScale = 235
Then DimName = "mm-0235"
Note Dimscale >=1 and <= 9999
What I have tried
Dim Dimscale as Integer
Dim Dimension_Style_Name as String
String.Format("{0:0000}", DimScale)
Dimension_Style_Name = DimScale$
Dimension_Style_Name.Format("{0:0000}", DimScale)
I have read the gist too that Dimscale get converted to a string and then is sent through a loop of adding a leading zero until the length of the string equals the 4 characters in my case for the integer part.
I have also seen the case with IF statments where IF Dimscale <10 then "000"& If Dimscale <100 then "00"& etc.
Is there a way to do it like like the VB method in VBA?
maybe:
DimName = "mm-" & format(DimScale,"0000")
As per #MathieuGuindon valuable (as usual) contribution:
Format (fully-qualified VBA.Strings.Format) takes a Variant parameter, and returns a Variant - you can also use its little brother Format$, which takes a String and returns a String, eliminating implicit conversions along the way
I had a similar need to apply leading zeros ( 12 to 00012 ) to a specified range. But everything I'd found thus-far used an iterative cell-by-cell approach. I found an older but still valuable posting from SiddHarth Rout. His posting pertains to case conversion ( lower to upper case ) but I found it adapted nicely to applying leading zeros.
Here is link to SiddHarth's posting:
Convert an entire range to uppercase without looping through all the cells
Here is the adaptation for applying leading zeros to a specified range:
Sub rngLeadingZeros(rng As Range, nbrZeros As Integer)
' Add leading zeros to a specified range.
Dim strZeros As String
Dim x As Integer
'build string as required for text() function:
For x = 1 To nbrZeros
strZeros = strZeros & "0"
Next
'make sure the range is formatted as text:
rng.NumberFormat = "#"
'apply the format to the range:
rng = Evaluate("index(text(" & rng.Address & ", """ & strZeros & """),)")
End Sub
Sub testZ()
With ActiveSheet
rngLeadingZeros .Range("e3:e9"), 5
End With
End Sub

Extract first two digits that comes after some string in Excel

I have a row with values something like this, How to extract first two digits that come after the text 'ABCD' to another cell, any formula or vba? There may be a few chars in between or sometimes none.
ABCD 10 sadkf sdfas
ABCD-20sdf asdf
ABCD 40
ABCD50 asdf
You can do this with a worksheet formula. No need for VBA.
Assuming you do not need to test for the presence of two digits:
=MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2)
If you need to test for the presence of two digits, you can try:
=IF(ISNUMBER(-RIGHT(MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),1)),MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),"Invalid")
In general, it is always a good idea to show some code in StackOverflow. Thus, you show that you have tried something and you give some directions for the answer.
Concerning the first two digits extract, there are many ways to do this. Starting from RegEx and finishing with a simple looping of the chars and checking each one of them.
This is the loop option:
Public Function ExtractTwoDigits(inputString As String) As Long
Application.Volatile
Dim cnt As Long
Dim curChar As String
For cnt = 1 To Len(inputString)
curChar = Mid(inputString, cnt, 1)
If IsNumeric(curChar) Then
If Len(ExtractTwoDigits) Then
ExtractTwoDigits = ExtractTwoDigits & curChar
Exit Function
Else
ExtractTwoDigits = curChar
End If
End If
Next cnt
ExtractTwoDigits = -1
End Function
Application.Volatile makes sure that the formula recalculates every time;
-1 is the answer if no two digits exist in the inputString;
IsNumeric checks whether the string inside is numeric;
As a further step, you may try to make the function a bit robust, extracting the first 1, 3, 4 or 5 digits, depending on a parameter that you put. Something like this =ExtractTwoDigits("tarato123ra2",4), returning 1232.
RegEx Version:
Public Function GetFirstTwoNumbers(ByVal strInput As String) As Integer
Dim reg As New RegExp, matches As MatchCollection
With reg
.Global = True
.Pattern = "(\d{2})"
End With
Set matches = reg.Execute(strInput)
If matches.Count > 0 Then
GetFirstTwoNumbers = matches(0)
Else
GetFirstTwoNumbers = -1
End If
End Function
You have to enable Microsoft Regular Expressions 5.5 under extras->references. The pattern (\d{2}) matches 2 digits, return value is the number, if not existing -1.
Note: it only extracts 2 successive numbers.
If you place this function into a module, you can use it like normal formula.
Here a great site to to get into regEx.

How to sort excel values with numbers in end

I have a macro which reads file names from a folder. The problem is that when file names are in series like A1,A2.....A200.pdf, as in this image:
then it reads in Excel as A1,A10,A100,A101.....A109,A11,A110.....A119,A20, as in this image:
How can I sort this so that the value in Excel comes as same as folder file names, or is there a way I can sort in Excel itself?
You can sort this in Excel with a helper column. Create a new column and calculate the length of your filenames in that "=LEN(A1)". Then use two-level sort to sort your filenames. Data -> Sort: Use length in the first level and the filenames in the second level.
Another option, you can use the RegEx object to extract the Numeric digits "captured" inside the file name.
Option Explicit
Sub SortFileNames()
Dim i As Long
With Sheets("Sheet1") ' replaces "Sheet1| with your sheet's name
For i = 1 To .Cells(.Rows.Count, "A").End(xlUp).Row
.Range("B" & i).Value = RetractNumberwithRegex(.Range("A" & i)) ' call the Regex function
Next i
End With
End Sub
'========================================================================
Function RetractNumberwithRegex(Rng As Range) As String
' function uses the Regex object to substract the Numeric values inside
Dim Reg1 As Object
Dim Matches As Object
Set Reg1 = CreateObject("vbscript.regexp")
With Reg1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "[0-9]{1,20}" ' any size numeric string (up to 20 digits length)
End With
Set Matches = Reg1.Execute(Rng.Value2)
If Matches.Count <> 0 Then
RetractNumberwithRegex = Matches.Item(0)
End If
End Function
This is happening ofcourse of because different sorting algorithm in both these cases (Windows Explorer and Excel) Refer to this article if you want to understand.
To solve your problem, one of the ways is to pull out only the numeric part of file names in a different cell (say column B) and then sort based on those numbers.
If I can assume that the pattern of the files names is AXXX.pdf i.e. one letter A, then number, and 4 characters for file extension. You can use this function
=VALUE(MID(A1,2,LEN(A1)-5))
This works by pulling out some number of characters from in between the string. As per assumption, the number starts from 2nd place that's why the second parameter is 2. Then to decide, how many characters you pull, you know that all the characters except 'A' (1 char) and '.pdf' (4 chars) make the number. So, take the lenght of the whole name and reduce 5 characters. You get your number part which you can sort.
This will be your result:
The best way is to change the file names in your Excel list to have leading zeroes. Instead of A19, refer to the file as A019 and it will sort correctly. Convert the file names using this formula in a helper column.
=Left($A2, 1) & Right("000" & Mid($A2, 2, 3), 3)
Note that the 3 zeroes and string lengths of 3 are all related to each other. To create fixed length numbers of 4 digits, just use 4 zeroes and increase both string lengths to 4.
Copy the formula down from row 2 to the end. Copy the helper column, paste Values in place and, when everything is perfect, replace the original column with the helper.
In order to accommodate a fixed number of digits following the number the above formula may be tweaked. The formula below will accommodate 4 extra characters which follow the number, for example ".pdf" (including the period).
=Left($A2, 1) & Right("000" & Mid($A2, 2, 7), 7)

delete data in cell after specific character

I have data in cells A1:A1000. It is a list of names followed by a small note, like this:
sam" fast
nick" long
tom" quick
They all have " and a space after the names and then the note. What I am trying to do is delete the everything after the name.
I was playing around with macros to try and do this, but could not get anything to work. Any idea how I might do this?
Here is a nifty trick without macros:
Select the proper range (or even just click on A to select the entire column) and then do Ctrl+F, click Replace, in Find write exactly "* and leave the Replace with box empty. Now click Replace all and tada !
It replaces everything after (and including) the quote with nothing because it uses * as a wildcard you left the replace box empty.
Edit: As suggested here is the VBA code for this:
Columns("A:A").Replace What:="""*", Replacement:="", LookAt:=xlPart
Easy! I don't know what version of Excel you are using, but in short you want to do a Convert Text to Columns and then split the cells using a delimiter of ". This will leave you with two columns, one of the data you want and one you can just delete.
Here is the walk through in Office 2010:
Highlight column A
find the Data menu
find the Convert Text to Columns menu
Pick Delimited and hit next
In the Other box, type "
hit Finish
Done! Now you have all your names in column A and you can just delete column B.
To sum up, do a "Convert Text to Columns" and then split the cells using a delimiter of ". Super easy and fast.
few options:
Replace
Range("A1:A1000").Replace """*", vbNullString
If you require to manipulate the value further then the below are more appropriate:
With Regex:
Dim str As String, strClean As String
Dim cell As Range
For Each cell In Range("A1:A1000")
With CreateObject("vbscript.regexp")
.Pattern = "\""(.*)"
.Global = True
cell = .Replace(cell, vbNullString)
End With
Next cell
Without Regex, splitting the string:
Dim strSplit() As String
Dim cell As Range
For Each cell In Range("A1:A1000")
If (cell.Value <> vbNullString) Then
cell.Value = Split(cell.Value, """")(0)
End If
Next cell
In case you want to keep your source data, you can also do it with a simple Excel formula in the next column. Assuming that your data is in column A, the following formula will return only the name: =LEFT(A1,SEARCH("""",A1)-1)
Sub Macro1()
For Row = 1 To 1000
S = Range("A" & Row).Cells.Value
Pos = InStr(S, Chr(34))
If Pos > 0 Then Range("A" & Row).Cells.Value = Left(S, Pos - 1)
Next
End Sub
Press ctrl + f, click on replace tab, type * in the find what box and then click on replace all. No need to put anything in replace box. Here you are replacing everything after ..