Excel VBA: Find cells with similar values in a column - vba

I have in column A multiple strings of numbers that look like:
2222222266622222266666222222222222266622666666222
2222266666622222222666662222666222222666222222222
2222266622226666662266622266622222222222222222666
2222222222222666222226662266622226666622222222666
2666662266622222222222222222222666222222666222666
2222266622222666666666662266622222222222222222222
6662266622226662222266622222666222222266622222222
2666622666666222666222222666222222222222222222222
2222266626662666222222266622222222222666222266622
and so on.
I'm trying to find the values that are 90% the same or another percentage that I would choose before I run the program.
The expected result should be in column B how many other cells would share the same structure as much as an percentage with column A, and if it is possible in next column or columns, the cells that gave that similitude
My first try:
Sub Similar()
Dim stNow As Date
Dim DATAsheet As Worksheet
Dim firstrow As Integer
Dim finalrow As Integer
Dim i As Integer
Dim j As Integer
Dim k As Integer
Dim l As Integer
Dim String_i, Len_i, String_j, Len_j
stNow = Now
Application.EnableEvents = False
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Set DATAsheet = Sheet1
DATAsheet.Select
firstrow = Cells(1, 2).End(xlDown).Row
finalrow = Cells(Rows.Count, 1).End(xlUp).Row
For i = firstrow To finalrow
For j = firstrow To finalrow
If i > 3 And j > 3 And i <> j Then
String_i = Cells(i, 1).Value
Len_i = Len(String_i)
String_j = Cells(j, 1).Value
Len_j = Len(String_j)
For k = 1 To Len_i
For l = 1 To Len_j
If Mid(String_i, k, 1) = Mid(String_j, l, 1) Then
Cells(j, 2).Value = Cells(j, 2).Value + 1
End If
Next l
Next k
End If
DoEvents
Next j
Application.StatusBar = "Loop 1/1 --- Done: " & Round((i / finalrow * 100), 0) & " %"
Next i
Application.StatusBar = ""
MsgBox "Done"
Application.EnableEvents = True
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
But it gaves me in column B the results:
259461.00
262794.00
262794.00
262794.00
259461.00
266123.00
259461.00
259461.00
Any help is appreciated.
Thanks!!!

VladMale first we will need lengt of this string - function LEN, and then we will need comparing substrings - function MID. Second function should be made in some loop from first to last character in a string, and every time it will match, some other cell should count how many times it mached and how many not. The posivite result we can divide by the string length * 100 and compare if it is more or less than 90%

Related

My (Vba) Code only works with 1 variable in the list, and gives only blanks back when multiple variables used in listbox

I've got a code that puts all the data of my Excel file (rows = 12,5k+ and columns = 97) in to a two-dimensional string. Then it loops through a certain column ("G") to list an listbox ("listbox1") with only unique findings.
Then in the Userform the user can choose to select some of the found items and transform it to another listbox ("Listbox2") Then when the user hits the button (CommandButton4) I would like the code to filter the array on only the rows where in column "G" it is the same as in one (or more) given criteria in listbox2.
It works when It has only one item in the listbox but when given two items in the listbox, it only returns everything blank.
Can some one please tell me what I'm doing wrong because I've no idea.
code:
Private Sub CommandButton4_Click()
Dim arr2() As Variant
Dim data As Variant
Dim B_List As Boolean
Dim i As Long, j As Long, q As Long, r As Long, LastColumn As Long, LastRow As Long
q = 1
r = 1
Dim ws As Worksheet
Set ws = ActiveWorkbook.Sheets("Sheet3")
Application.ScreenUpdating = False
Application.EnableEvents = False
With ThisWorkbook.Sheets("Sheet3")
LastRow = .Cells(Rows.Count, 2).End(xlUp).Row
LastColumn = .Cells(3, Columns.Count).End(xlToLeft).Column
ReDim arr2(1 To LastRow, 1 To LastColumn)
For i = 2 To LastRow
For j = 1 To LastColumn
arr2(i, j) = .Cells(i, j).Value
Next j
Next i
End With
For i = 1 To LastRow
For j = 0 To Me.ListBox2.ListCount - 1
If ListBox2.List(j) = arr2(i, 7) Then
'Later aan te passen
Else
For q = 1 To LastColumn
arr2(i, q) = ""
Next q
End If
Next j
Next i
Sheets("Sheet3").UsedRange.ClearContents
For i = LBound(arr2, 1) To UBound(arr2, 1)
If arr2(i, 2) <> "" Then
r = r + 1
For j = LBound(arr2, 2) To UBound(arr2, 2)
ThisWorkbook.Sheets("Sheet3").Cells(r, j).Value = arr2(i, j)
Next j
End If
Debug.Print i, j, arr2(i, 7)
Next i
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
The issue is your second nested-loop:
For i = 1 To LastRow
For j = 0 To Me.ListBox2.ListCount - 1
If ListBox2.List(j) = arr2(i, 7) Then
'Later aan te passen
Else
For q = 1 To LastColumn
arr2(i, q) = ""
Next q
End If
Next j
Next i
Suppose that our ListBox has 2 values, "First" and "Second". For each row, you do the following:
j = 0
ListBox2.List(0) = "First"
If Column G is "First", do nothing
Otherwise, make the whole Row Blank Including if Column G = "Second"
At this point, the only possible values for Column G are now "First" or Blank
j = 1
ListBox2.List(1) = "Second"
If Column G is "Second", do nothing But, this cannot happen, because you have already changed any "Second" Rows to Blank
Otherwise, make the whole Row Blank
At this point, the Row will always be Blank
I recommend having a Boolean test variable. Set it to False at the start of each Row-loop, and set it to True if you find a match. If it is still False after you check all ListBox items, then blank the row:
Dim bTest AS Boolean
For i = 1 To LastRow
bTest = False 'Reset for the Row
For j = 0 To Me.ListBox2.ListCount - 1
If ListBox2.List(j) = arr2(i, 7) Then
bTest = True 'We found a match!
Exit For 'No need to keep looking
End If
Next j
If Not bTest Then 'If we didn't find a match
For q = 1 To LastColumn
arr2(i, q) = "" 'Blank the row
Next q
End If
Next i

Deleting "empty" rows when they just "appear empty"

I can not manage to cleanse my data of the "empty" rows. There is no problem in deleting the "0" but those cells which are empty are not empty but have something like "null strings" in it.
Sub Reinigung()
Application.ScreenUpdating = False 
Application.EnableEvents = False 
ListeEnde3 = ThisWorkbook.Sheets("input").Cells(Rows.Count, 1).End(xlUp).Row
For Zeile1 = 2 To ListeEnde3
If ThisWorkbook.Sheets("input").Cells(Zeile1, 14) = "0" Or ThisWorkbook.Sheets("2018").Cells(Zeile1, 14) = "" Then
ThisWorkbook.Sheets("input").Rows(Zeile1).Delete
Zeile1 = Zeile1 - 1
Else
End If
Next
' ThisWorkbook.Sheets("import").Columns(14).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
That code just freezes my excel, if i leave out the
thisWorkbook.Sheets("2018").Cells(Zeile1, 14) = ""
part, it works and deletes all rows, where colum 14 contains a "0".
If I check the cells which appear blank with =isblank it returns "false". There is no "space" in the cell and no " ' ".
What to do?
edit
After the first tips my code looks like this now:
Sub Reinigung()
Dim ListeEnde3 As Long
Dim Zeile1 As Long
Application.ScreenUpdating = False 
Application.EnableEvents = False 
ListeEnde3 = ThisWorkbook.Sheets("import").Cells(Rows.Count, 1).End(xlUp).Row
For Zeile1 = ListeEnde3 To 2 Step -1
Set rngX = ThisWorkbook.Sheets("import").Cells(Zeile1, 14)
If (rngX = "0" Or rngX = "") Then 'or rngY = vbNullString
ThisWorkbook.Sheets("import").Rows(Zeile1).Delete
End If
Next Zeile1
' ThisWorkbook.Sheets("import").Columns(14).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
Excel still crashes / freezes (I waited for 5 minutes) but since the code runs "smoothly" with F8 I wanted to give it a shot with less data: It works!
If I am not reducing the data there are ~ 70000 rows to check. I let it run on 720 rows and it worked.
Any way to tweak the code in a way that it can handle the 70000+ rows? I didn't think that it would be too much.
Thanks!
You can use AutoFilter and delete the visible rows (not tested) :
Dim ws As Worksheet: Set ws = ThisWorkbook.Worksheets("import")
ws.UsedRange.AutoFilter 14, Array("=0", "="), xlFilterValues
ws.UsedRange.Offset(1).EntireRow.Delete
ws.AutoFilterMode = False
Another way is to simply use internal arrays and write out the new data set which has valid rows.
It is very fast.
If your dataset has formulas then you'll have to use extra code, but if it's constants only, then the below should do:
Sub Reinigung()
'Here I test with column E to Z, set Ranges appropriately
Application.ScreenUpdating = False
Application.EnableEvents = False
Dim ListeEnde3 As Long, x As Long, y As Long
'last row of data - set to column of non-blank data
ListeEnde3 = ThisWorkbook.Sheets("import").Cells(Rows.Count, 5).End(xlUp).Row
Dim ws As Worksheet
Set ws = ThisWorkbook.Sheets("import")
Dim startCell As Range
'set to whatever cell is the upper left corner of data
Set startCell = ThisWorkbook.Sheets("import").Range("E1")
Dim arr As Variant, arrToPrint() As Variant
'Get rightmost column of data instead of hardcoding to "Z"
'write dataset into an array
arr = ws.Range(startCell, ws.Range("Z" & ListeEnde3)).Value
x = UBound(arr) - LBound(arr) + 1 'num of rows of data
y = UBound(arr, 2) - LBound(arr, 2) + 1 'num of columns of data
ReDim arrToPrint(1 To x, 1 To y) 'array to hold valid/undeleted data
Dim i As Long, j As Long, printCounter As Long, arrayColumnToCheck as Long
arrayColumnToCheck = 14 - startCell.Column + 1 '14 is column N
For i = 1 To x
If arr(i, arrayColumnToCheck ) <> 0 And arr(i, arrayColumnToCheck ) <> vbNullString Then
printCounter = printCounter + 1
For j = 1 To y
'put rows to keep in arrToPrint
arrToPrint(printCounter, j) = arr(i, j)
Next j
End If
Next i
'Print valid rows to keep - only values will print - no formulas
startCell.Resize(printCounter, y).Value = arrToPrint
'Delete the rows with zero & empty cells off the sheet
startCell.Offset(printCounter).Resize(ListeEnde3 - printCounter, y).Delete xlShiftUp
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
You can add IsEmpty to your code to check the cells filling
Sub Reinigung()
Application.ScreenUpdating = False
Application.EnableEvents = False
ListeEnde3 = ThisWorkbook.Sheets("input").Cells(Rows.Count, 1).End(xlUp).Row
For Zeile1 = 2 To ListeEnde3
Set rngX = ThisWorkbook.Sheets("input").Cells(Zeile1, 14)
Set rngY = ThisWorkbook.Sheets("2018").Cells(Zeile1, 14)
If (rngX = "0" And (Not IsEmpty(rngX))) Or (rngY = "") Then
ThisWorkbook.Sheets("input").Rows(Zeile1).Delete
Zeile1 = Zeile1 - 1
End If
Next
' ThisWorkbook.Sheets("import").Columns(14).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
Application.ScreenUpdating = True
Application.EnableEvents = True
End Sub
NEVER a good idea to alter a loop counter: Zeile1 = Zeile1 - 1
Instead start at the end and use Step -1 in your loop to work backward.
You are in a infinite loop because the loop doesnt move forward. If Zeile=3 and there is a "" in row3 in the '2018' sheet, then it will always be stuck on the Zeile1 = 3 line. You will always be coming back to that "" on row 3 in '2018'sheet.
For Zeile1 = ListeEnde3 To 2 Step -1
Set rngX = ThisWorkbook.Sheets("input").Cells(Zeile1, 14)
Set rngY = ThisWorkbook.Sheets("2018").Cells(Zeile1, 14)
If (rngX = "0" Or rngY = "") Then 'or rngY = vbNullString
ThisWorkbook.Sheets("input").Rows(Zeile1).Delete
End If
Next Zeile1

What is the best way to combine rows in a large dataset in excel

a report I pull gives me an excel spreadsheet that splits the data for each entry across three rows in excel. I'm trying to figure out the best way to combine the three rows into one row so each field is in it's own column.
Each three row cluster is separated by a blank row and each of the data rows has five columns. The first cluster starts on row 4.
I have a macro (shown below) that does this correctly, but not efficiently. The spreadsheets I get have many (up to a million) rows in them.
I was originally using the cut and paste commands and that was really slow. I found that directly setting .value make it quite a bit faster but this is still way to slow.
I think that the right answer is to do all of the manipulation in memory and write to the actual excel range only once, but I'm at the limits of my VBA foo.
Option Explicit
Sub CombineRows()
Application.ScreenUpdating = False
Dim currentRow As Long
Dim lastRow As Long
Dim pasteColumn As Long
Dim dataRange As Range
Dim rowEmpty As Boolean
Dim firstOfGroup As Boolean
Dim data As Variant
Dim rw As Range
pasteColumn = 6
rowEmpty = True
firstOfGroup = True
currentRow = 4
lastRow = 30
Set dataRange = Range(Cells(currentRow, 1), Cells(lastRow, 5))
For Each rw In dataRange.Rows
Debug.Print rw.Row
If WorksheetFunction.CountA(Range(Cells(rw.Row, 1), Cells(rw.Row, 5))) = 0 Then
If rowEmpty Then Exit For
currentRow = rw.Row + 1
rowEmpty = True
Else
If Not rowEmpty Then
Range(Cells(currentRow, pasteColumn), Cells(currentRow, pasteColumn + 4)).value = Range(Cells(rw.Row, 1), Cells(rw.Row, 5)).value
Range(Cells(rw.Row, 1), Cells(rw.Row, 5)).value = ""
Debug.Print "pasteColumn:"; pasteColumn
If pasteColumn = 6 Then
pasteColumn = 11
ElseIf pasteColumn = 11 Then
pasteColumn = 6
End If
End If
rowEmpty = False
End If
Next
Application.ScreenUpdating = True
End Sub
Update: After I posted this, I noticed that I still had those Debug.Print statements in there. Once I removed those, the performance improved from execution times on the order of hours to a minute or two.
I still thing that this is unnecessarily slow so I'm still interested in any answer that can explain the right way to minimize the VBA <-> excel interactions.
If I understand correctly your question, you want to copy some data.
I recommend you to use an array.
Sub data()
Dim data() As String 'Create array
Dim column as integer
column = 0
For i = 0 To 100000 'See how many columns are in the line
If IsEmpty(Cells(rowNum, i+1)) = False Then
column = column + 1
Else
Exit For
End If
Next
ReDim date(column) As String 'Recreat the array, with the excat column numer
For i = 0 To column - 1
data(i, j) = Cells(rowNum, i + 1) 'Puts data into the array
Next
End sub()
And now you just have to insert the data from the array to the correct cell.
#Cubbi is correct. You can use an array to do all of your data manipulation and then write to the worksheet only once at the end. I've adapted your code to use an array to combine the three rows into a single row for each of the groups. Then at the end it selects "Sheet2" and pastes in the collected data. Note, this is not an in-place solution like yours, but it is super fast:
Option Explicit
Sub AutitTrailFormat()
Application.ScreenUpdating = False
Dim dataArray() As String
Dim currentRow As Long
Dim lastRow As Long
Dim pasteColumn As Long
Dim dataRange As Range
Dim rowEmpty As Boolean
Dim firstOfGroup As Boolean
Dim data As Variant
Dim rw As Range
Dim i, j, k As Long
Dim Destination As Range
pasteColumn = 6
rowEmpty = True
firstOfGroup = True
currentRow = 4
lastRow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row
Worksheets("Sheet1").Select
Set dataRange = Worksheets("Sheet1").Range(Cells(currentRow, 1), Cells(lastRow, 5))
data = dataRange.Value
ReDim dataArray(UBound(data, 1), 15)
j = 1
k = 1
For i = 1 To UBound(data, 1)
If data(i, 1) = "" And data(i, 2) = "" And data(i, 3) = "" And data(i, 4) = "" And data(i, 5) = "" Then
j = j + 1
k = 1
Else
dataArray(j, k + 0) = data(i, 1)
dataArray(j, k + 1) = data(i, 2)
dataArray(j, k + 2) = data(i, 3)
dataArray(j, k + 3) = data(i, 4)
dataArray(j, k + 4) = data(i, 5)
k = k + 5
End If
Next
Worksheets("Sheet2").Select
Set Destination = Worksheets("Sheet2").Range(Cells(1, 1), Cells(UBound(dataArray, 1), 16))
Destination.Value = dataArray
Application.ScreenUpdating = True
End Sub

Looping through more than 5.000 rows? Improvements needed

Can I improve the following code or is there another way to accomplish my goals in less time?
My goal is to concatenate countries and categories of unique brand names.
In the current structure one row represents one brand, one category and one country. In my output I want on row per brand, that has a concatenated cell with all countries and a concatenated cell with all categories.
My solution so far:
I have an Excel-Workbook that has three sheets "HelpSheet","Input" and "Output".
"HelpSheet"contains the list of brand names without duplicates. "Input"has the original data (one row, one entry). "Output"should be filled with one row per brand name.
"Input"approx. 25.000 rows.
"HelpSheet"approx. 5.000 rows.
EDITED: Now I am using variants to store my Range to avoid VBA/Worksheet overhead. Now I get an "Out of memory"-Error.
In VBA I wrote this:
Sub CellsTogether()
Dim ipRange As Variant
Dim hsRange As Variant
Dim countryCount As Long
Dim categoryCount As Long
Dim brandArray() As String
Dim categoryStr As String
Dim countryStr As String
Dim countryArr() As String
Dim categoryArr() As String
Dim identifier As String
Dim i As Long
Dim j As Long
Dim iRow As Long
Dim iCol As Long
Dim k As Long
Dim l As Long
Dim lastRow As Long
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Application.EnableEvents = False
ipRange = Worksheets("Input").Range("B3:N29316")
hsRange = Worksheets("HelpSheet").Range("A1:A4781")
countryCount = 1
categoryCount = 1
For j = LBound(hsRange, 1) To UBound(hsRange, 1)
For iRow = LBound(ipRange, 1) To UBound(ipRange, 1)
iCol = 1
If ipRange(iRow, iCol) = hsRange(j, 1) Then
ReDim Preserve countryArr(1 To countryCount)
ReDim Preserve categoryArr(1 To categoryCount)
For k = LBound(countryArr) To UBound(countryArr)
If countryArr(k) = ipRange(iRow, iCol + 2) Then
Exit For
Else
countryArr(UBound(countryArr)) = ipRange(iRow, iCol + 2)
countryCount = countryCount + 1
End If
Next k
For l = LBound(categoryArr) To UBound(categoryArr)
If categoryArr(l) = ipRange(iRow, iCol + 12) Then
Exit For
Else
categoryArr(UBound(categoryArr)) = ipRange(iRow, iCol + 12)
categoryCount = categoryCount + 1
End If
Next l
identifier = ipRange(iRow, iCol + 3)
End If
Next iRow
For k = LBound(countryArr) To UBound(countryArr)
countryStr = countryStr & countryArr(k) & Chr(10)
Next k
For k = LBound(categoryArr) To UBound(categoryArr)
categoryStr = categoryStr & categoryArr(k) & Chr(10)
Next k
Worksheets("Output").Cells(j + 2, 3).Value = hsRange(j, 1)
Worksheets("Output").Cells(j + 2, 6).Value = countryStr
Worksheets("Output").Cells(j + 2, 5).Value = categoryStr
Worksheets("Output").Cells(j + 2, 2).Value = identifier
Next j
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.EnableEvents = True
End Sub
However this takes a very long time to calculate.
Any improvements?

Copy a value down n-times

I have two tables: "VALUES" and "INSERT". In Table "INSERT" the cell "A1" contains a value that needs to be coped down as many times as there are rows > "" in "VALUES"
I have tried the following code but it does not work as I intended but copies the value of "A1" 4 times every other row in "VALUES".
What am I doing wrong? Thank you for your help!
Sub InsertValuePart()
Dim maxRow As Integer
Dim calcVal As String
Dim x As String
Dim i As Long
Sheets("VALUES").Select
maxRow = Cells(Rows.Count, 1).End(xlUp).row
Sheets("INSERT").Select
Cells(1, 1).Activate
x = Cells(1, 1).Value
Cells(2, 1).Select
For i = 1 To maxRow
Cells(i, 1).Value = x
i = i + 1
Next
Application.CutCopyMode = False
Range("A1").Select
End Sub
No need to define i as it is treated in your for command. Also a for command will automatically iterate over each value so using i = i + 1 will give you values of i = 1, 3, 5. If you do not include that then i = 1,2,3,....,maxRow. Also calcVal was not used anywhere so I got rid of it.
Sub InsertValuePart()
Dim maxRow As Integer
Dim x As String
maxRow = Sheets("VALUES").Cells(Rows.Count, 1).End(xlUp).Row
x = Sheets("INSERT").Cells(1, 1).Value
For i = 1 to maxRow
Sheets("INSERT").Cells(i,1).Value = x
Next
Sheets("INSERT").Range("A1").Select
End Sub
Another way to do it is listed below on one line:
Sub InsertValuePart()
Sheets("INSERT").Range("A1", "A" & CStr(Sheets("VALUES").Cells(Rows.Count, 1).End(xlUp).Row)).Value = Sheets("INSERT").Range("A1").Value
End Sub