How to sort excel values with numbers in end - vba

I have a macro which reads file names from a folder. The problem is that when file names are in series like A1,A2.....A200.pdf, as in this image:
then it reads in Excel as A1,A10,A100,A101.....A109,A11,A110.....A119,A20, as in this image:
How can I sort this so that the value in Excel comes as same as folder file names, or is there a way I can sort in Excel itself?

You can sort this in Excel with a helper column. Create a new column and calculate the length of your filenames in that "=LEN(A1)". Then use two-level sort to sort your filenames. Data -> Sort: Use length in the first level and the filenames in the second level.

Another option, you can use the RegEx object to extract the Numeric digits "captured" inside the file name.
Option Explicit
Sub SortFileNames()
Dim i As Long
With Sheets("Sheet1") ' replaces "Sheet1| with your sheet's name
For i = 1 To .Cells(.Rows.Count, "A").End(xlUp).Row
.Range("B" & i).Value = RetractNumberwithRegex(.Range("A" & i)) ' call the Regex function
Next i
End With
End Sub
'========================================================================
Function RetractNumberwithRegex(Rng As Range) As String
' function uses the Regex object to substract the Numeric values inside
Dim Reg1 As Object
Dim Matches As Object
Set Reg1 = CreateObject("vbscript.regexp")
With Reg1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "[0-9]{1,20}" ' any size numeric string (up to 20 digits length)
End With
Set Matches = Reg1.Execute(Rng.Value2)
If Matches.Count <> 0 Then
RetractNumberwithRegex = Matches.Item(0)
End If
End Function

This is happening ofcourse of because different sorting algorithm in both these cases (Windows Explorer and Excel) Refer to this article if you want to understand.
To solve your problem, one of the ways is to pull out only the numeric part of file names in a different cell (say column B) and then sort based on those numbers.
If I can assume that the pattern of the files names is AXXX.pdf i.e. one letter A, then number, and 4 characters for file extension. You can use this function
=VALUE(MID(A1,2,LEN(A1)-5))
This works by pulling out some number of characters from in between the string. As per assumption, the number starts from 2nd place that's why the second parameter is 2. Then to decide, how many characters you pull, you know that all the characters except 'A' (1 char) and '.pdf' (4 chars) make the number. So, take the lenght of the whole name and reduce 5 characters. You get your number part which you can sort.
This will be your result:

The best way is to change the file names in your Excel list to have leading zeroes. Instead of A19, refer to the file as A019 and it will sort correctly. Convert the file names using this formula in a helper column.
=Left($A2, 1) & Right("000" & Mid($A2, 2, 3), 3)
Note that the 3 zeroes and string lengths of 3 are all related to each other. To create fixed length numbers of 4 digits, just use 4 zeroes and increase both string lengths to 4.
Copy the formula down from row 2 to the end. Copy the helper column, paste Values in place and, when everything is perfect, replace the original column with the helper.
In order to accommodate a fixed number of digits following the number the above formula may be tweaked. The formula below will accommodate 4 extra characters which follow the number, for example ".pdf" (including the period).
=Left($A2, 1) & Right("000" & Mid($A2, 2, 7), 7)

Related

extract airlines from flight numbers strings in excel

I have problem of extracting two-character code from the string format like:
"VA198-VA200-VA197"
I just want to get the string:
"VA-VA-VA"
Also the data I have are not just in one format, some data is like:
"DL123-DL245"
or
"DL123-VA345-HU12-OZ123"
Does anyone know how to do it fast in excel? Thanks.
With data in A1, in B1 enter the array formula:
=TEXTJOIN("",TRUE,IF(ISERR(MID(A1,ROW(INDIRECT("1:100")),1)+0),MID(A1,ROW(INDIRECT("1:100")),1),""))
NOTE:
The formula strips out all numeric characters, leaving only the alphas and the dash.
Array formulas must be entered with Ctrl + Shift + Enter rather than just the Enter key. If this is done correctly, the formula will appear with curly braces around it in the Formula Bar.
There are a couple of ways you can approach this depending on how many possible segments their are in your string. If we assume your flight number is in A1:
First Segment: =LEFT(A1,2)
Second Segment: =MID(A1,FIND("-",A1)+1,2)
Third Segment: =MID(A1,FIND("-",A1,FIND("-",A1)+1)+1,2)
You could then concatenate the three expressions together and add a fourth with some conditionals. The problem is that based on your information you can have anywhere from 1 to 4 (at least) names which means you'll need a conditional:
Second Segment: =IF(ISERR(FIND("-",A1)),"",MID(A1,FIND("-",A1)+1,2))
Adding in the separators we end up with something like this for up to four segements:
=CONCATENATE(LEFT(A1,2),IF(ISERR(FIND("-",A1)),"",CONCATENATE("-",MID(A1,FIND("-",A1)+1,2))),IF(ISERR(FIND("-",A1,FIND("-",A1)+1)),"",CONCATENATE("-",MID(A1,FIND("-",A1,FIND("-",A1)+1)+1,2))),IF(ISERR(FIND("-",A1,FIND("-",A1,FIND("-",A1)+1)+1)),"",CONCATENATE("-",MID(A1,FIND("-",A1,FIND("-",A1,FIND("-",A1)+1)+1)+1,2))))
This will give you everything in one field.
Here is a VBA type answer.Assuming all strings are structured in the same way. Meaning Two letters followed by numbers and separated with "-". If one such string is A1, and you want to write the result to B1:
Sub BreakStrings()
d = Split(Range("A1"), "-")
For i = LBound(d) To UBound(d)
d(i) = Left(d(i), 2)
Next i
strg = Join(d, "-")
Range("B1") = strg
End Sub
User-defined function (UDF):
Function GetVal(cell)
With CreateObject("VBScript.RegExp")
.Global = True: .Pattern = "(\w{2})(.+?)(?=-|$)"
GetVal = .Replace(cell, "$1")
End With
End Function

How to put character if there is one digit in cell in VBA Macro?

I would like to ask, how to put character, in this case 0 to cell, if the cell already contains digit in it.
To clarify what do I mean, if on the cell is number 5, I would like to put before the number 5, number 0 to have the result 05.
As far as I know, cell format should be TEXT to avoid automatic Excel correction. But, this question is specific due to several different characters in the cells. In some point I got in the same column different characters in the cells (1, 2, 3, AV, AR, IX etc.).
For example: I would like to select column K, find the numeric characters with one digit (1, 2, 3, -9) and paste there 0 before it to have two space digit like 01, 02, 03, …
Of course, with macro. I know how to put Text format to it, but do not know how to manage the whole macro function to select column K, format whole column as text, find one digit number in the column and paste 0 before it.
Does anybody know how to do that?
Many thanks in advance.
There are 2 solutions:
Format the numbers
Convert numbers to text and format them
1. Format the numbers
The advantage of this solution is that the numbers will still be numbers (not text) but formatted with leading zeros. Therefore you still can calculate with these numbers as before.
Public Sub ChangeNumberFormat()
ThisWorkbook.Worksheets("YourDesiredSheetName").Columns("K").NumberFormat = "00"
'this will keep them numbers but only change the format of them
End Sub
Note that you don't need to do this necessarily with VBA you can just set a user defined cell format 00 for column K (open format cells with Ctrl + 1).
2. Convert numbers to text and format them
If you really need to convert them to text this would be a possible solution. But I really don't recommend that because you cannot calculate with these "numbers" anymore because they are converted to text.
The trick would be to format the number with numberformat first and then convert it to text (see comments in the code).
Option Explicit 'force variable declaring
Public Sub FixLeadingZerosInText()
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("YourDesiredSheetName") '<-- change your sheet name here
Dim lRow As Long
lRow = ws.Cells(ws.Rows.Count, "K").End(xlUp).Row 'find last used row in column K
Dim iCell As Range
For Each iCell In ws.Range("K1:K" & lRow) 'loop from row 1 to last used in column K
If iCell.Value < 10 And iCell.Value > -10 Then 'check if it is a one digit number
Dim tmpText As String
tmpText = Format(iCell.Value, "00") 'format the one digit number
iCell.NumberFormat = "#" 'convert number to text
iCell.Value = tmpText 're-write formatted number
End If
iCell.NumberFormat = "#" 'make all other numbers in column K formatted as text too
Next iCell
End Sub

Best way to populate an excel string column for fastest subsequent vba search (can I use metadata, etc?)

In a column with hundreds or even 1-2 thousand strings of approximately 40 characters, with one string per cell and many repeating entries, what is the best way to populate the column to conduct the fastest possible search later? The search should return a row number so that the corresponding row can be deleted.
Is there some way to append metadata or label to a cell/row for faster search? Is there some other mechanism that can identify cells that will make searching easier?
I'm new to VBA, and I want to set out on the best path before I get too far into the project and have to search through thousands of strings.
edit: Someone requested an example cell: The cells will have email addresses in them. I can control the email addresses on the server, so they will roughly be 40 characters long each. They will contain alphanumeric characters only.
Example of a fast way to implement a dictionary lookup
Data is on Sheet1, and starts in column A
The strings are in column B
Option Explicit
Public Sub SearchStrings()
Dim ur As Variant, r As Long, d As Object
Const COL_ID = 2
Set d = CreateObject("Scripting.Dictionary") 'or Reference to Microsof Scripting Runtime
d.CompareMode = TextCompare 'Case insensitive, or "BinaryCompare" otherwise
ur = Sheet1.UsedRange.Columns(COL_ID) 'read strings from column COL_ID into array
For r = LBound(ur) To UBound(ur) 'populate dictionary; Key = string (unique)
If Not IsError(ur(r, 1)) Then d(CStr(ur(r, 1))) = r 'Item = row id
Next
Debug.Print d.Keys()(3) 'prints the string in row 3
Debug.Print d.Items()(3) 'prints the row number of the 3rd string
End Sub
If you want to store string duplicates use this:
If Not IsError(ur(r, 1)) Then d(COL_ID & "-" & r) = CStr(ur(r, 1))
which is Key = Column ID & "-" & row ID (2-5), and Item = String itself

Extract 5-digit number from one column to another

I need help with extracting 5-digit numbers only from one column to another in Excel 2010. These numbers can be in any position of the string (beginning of the string, anywhere in the middle, or at the end). They can be within brackets or quotes like:
(15478) or "15478" or '15478' or [15478]
I need to ignore any numbers that are less than 5 digits and include numbers that start with 1 or more leading zeros (like 00052, 00278, etc.) and ensure that leading zeros are copied over to the next column. Could someone help me with either creating a formula or UDF?
Here is a formula-based alternative that will extract the first 5 digit number found in cell A1. I tend to prefer reasonably simple formula solutions over VBA in most situations as formulas are more portable. This formula is an array formula and thus must be entered with Ctrl+Shift+Enter. The idea is to split the string up into every possible 5 character chunk and test each one and return the first match.
=MID(A1,MIN(IF(NOT(ISERROR(("1"&MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),9999999999)),5)
Let's break this down. First we have an expression I used twice to return an array of numbers from 1 up to 4 less than the length of your initial text. So if you have a string of length 10 the following will return {1,2,3,4,5,6}. Hereafter the below formula will be referred to as rowlist. I used R1C1 notation to avoid potential circular references.
ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))
Next we will use that array to split the text into an array of 5 letter chunks and test each chunk. The test being performed is to prepend a "1" and append ".1" then verify the chunk is numeric. The prepend and append eliminate the possibility of white space or decimals. We can then check the character before and the character after to make sure they are not numbers. Hereafter the below formula will be referred to as isnumarray.
NOT(ISERROR(("1"&MID(A1,rowlist,5)&".1")*1))
*ISERROR(MID(A1,rowlist+5,1)*1)
*ISERROR(MID(A1,rowlist-1,1)*1)
Next we need to find the first valid 5 digit number in the string by returning the current index from a duplicate of the rowlist formula and returning a large number for non-matches. Then we can use the MIN function to grab that first match. Hereafter the below will be referred to as minindex.
MIN(IF(isnumarray,rowlist,9999999999))
Finally we need to grab the numeric string that started at the index returned by the MIN function.
MID(A1,minindex,5)
The following UDF will return the first five digit number in the string, including any leading zero's. If you need to detect if there is more than one five digit number, the modifications are trivial. It will return a #VALUE! error if there are no five-digit numbers.
Option Explicit
Function FiveDigit(S As String, Optional index As Long = 0) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Pattern = "(?:\b|\D)(\d{5})(?:\b|\D)"
.Global = True
FiveDigit = .Execute(S)(index).submatches(0)
End With
End Function
As you may see from the discussion between Mark and myself, some of your specifications are unclear. But if you would want to exclude decimal numbers, when the decimal portion has five digits, then the regex pattern in my code above should be changed:
.Pattern = "(?:\d+\.\d+)|(?:\b|\D)(\d{5})(?:\b|\D)"
I just wrote this UDF for you , basic but will do it...
It will find the first 5 consecutive numbers in a string, very crude error checking so it just says Error if anything isn't right
Public Function GET5DIGITS(value As String) As String
Dim sResult As String
Dim iLen As Integer
sResult = ""
iLen = 0
For i = 1 To Len(value)
If IsNumeric(Mid(value, i, 1)) Then
sResult = sResult & Mid(value, i, 1)
iLen = iLen + 1
Else
sResult = ""
iLen = 0
End If
If iLen = 5 Then Exit For
Next
If iLen = 5 Then
GET5DIGITS = Format(sResult, "00000")
Else
GET5DIGITS = "Error"
End If
End Function

Return a list of column headers across a row when cells have text

I want to get a list of column headers for each cell that contains a text value.
Eg.
A--------------B-------------C-------------BC (desired output)
1 Header1 Header2 Header3
2 M T Header1, Header3
3 T MT Header1, Header2
4 TMW Header2
In the final product I want to use two final columns with formulas listing headers from cells with values across 9 columns and a second with the other 40 odd columns.
I have the vague notion that I might need to use INDEX, MATCH and IF functions - but as a novice have no idea how to string them together coherently.
Here I will make use of VBA's Join function. VBA functions aren't directly available in Excel, so I wrap Join in a user-defined function that exposes the same functionality:
Function JoinXL(arr As Variant, Optional delimiter As String = " ")
JoinXL = Join(arr, delimiter)
End Function
The formula in D2 is:
=JoinXL(IF(NOT(ISBLANK(A2:C2)),$A$1:$C$1&", ",""),"")
entered as an array formula (using Ctrl-Shift-Enter). It is then copied down.
Explanation:
NOT(ISBLANK(A2:C2)) detects which cells have text in them; returns this array for row 2: {TRUE,FALSE,TRUE}
IF(NOT(ISBLANK(A2:C2)),$A$1:$C$1&", ","") converts those boolean values to row 1 contents followed by a comma delimiter; returns the array {"Header A, ","","Header C, "}.
JoinXL joins the contents of that array into a single string.
If you want to use worksheet functions, and not VBA, I suggest returning each column header in a separate cell. You can do this by entering a formula such as:
This formula must be array-entered:
BC: =IFERROR(INDEX($A$1:$C$1,1,SMALL((LEN($A2:$C2)>0)*COLUMN($A2:$C2),COUNTBLANK($A2:$C2)+COLUMNS($A:A))),"")
Adjust the range references A:C to reflect the columns actually used for your data. Be sure to use the same mixed address format as in above. Do NOT change the $A:A reference, however.
Then fill right until you get blanks; and fill down as far as required.
You can reverse the logic to get a list of the "other" headers.
To array-enter a formula, after entering
the formula into the cell or formula bar, hold down
ctrl-shift while hitting enter. If you did this
correctly, Excel will place braces {...} around the formula.
If you really need to have the results as comma-separated values in two different columns, I would suggest the following User Defined Function.
To enter this User Defined Function (UDF), alt-F11 opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.
To use this User Defined Function (UDF), enter a formula like
=Headers($A2:$BA2,$A$1:$BA$1,True)
or, to get the headers that do NOT contain text:
=Headers($A2:$BA2,$A$1:$BA$1,FALSE)
in some cell.
=====================================================
Option Explicit
Function Headers(rData As Range, rHeaders As Range, Optional bTextPresent As Boolean = True) As String
Dim colHeaders As Collection
Dim vData, vHeaders
Const sDelimiter As String = ", "
Dim sRes() As String
Dim I As Long
vData = rData
vHeaders = rHeaders
Set colHeaders = New Collection
For I = 1 To UBound(vData, 2)
If (Len(vData(1, I)) > 0) = bTextPresent Then colHeaders.Add vHeaders(1, I)
Next I
ReDim sRes(1 To colHeaders.Count)
For I = 1 To colHeaders.Count
sRes(I) = colHeaders(I)
Next I
Headers = Join(sRes, sDelimiter)
End Function
==========================================
You should probably add some logic to the routine to ensure your range arguments are a single row, and that the two arguments are of the same size.