Fast Way to Normalize Data with VBA (Excel) - vba

I'm currently trying to normalize data with VBA in Excel. Therefore, my workbook imports several csv files and wrote them in different worksheets all of them are built like this.
First row: Header
First column: x-Axis (for plotting)
Second column to nth column: y-values
Now I want to normalize all columns from 2 to n (dividing by maximum value of each column). Here is the function I'm using so far:
Sub NormalizeData(dataName)
cs = Worksheets(dataName).UsedRange.SpecialCells(xlCellTypeLastCell).Column
rs = Worksheets(dataName).UsedRange.SpecialCells(xlCellTypeLastCell).Row
For col = 2 To cs
maxValue = Application.WorksheetFunction.Max(Worksheets(dataName).Columns(col))
For r = 2 To rs
Worksheets(dataName).Cells(r, col) = Worksheets(dataName).Cells(r, col) / maxValue
Next r
Next col
End Sub
This approach works, but because of the amount of data, it's very slow. Is there any way to increase the speed? I already switched of the screen update.
Thanks you very much for your help!!!

Use another sheet and PasteSpecial.
Assuming ws1 contains your data and ws2 is currently unused:
with ws2.Range(.Cells(2,2), .Cells(rs, cs))
.value = maxValue
.copy
end with
ws1.Range(.Cells(2,2), .Cells(rs, cs)).PasteSpecial _
Operation:=xlPasteSpecialOperationDivide
Application.CutCopyMode = False

Here is a sub that normalizes the numbers in a rectangular range. You can decide on what range you want to normalize and then pass that range to this sub:
Sub NormalizeRange(R As Range)
'assumes that R is a rectangular range
'will throw an error if any column has max 0
Dim vals As Variant, maxes As Variant
Dim i As Long, j As Long, m As Long, n As Long
m = R.Rows.Count
n = R.Columns.Count
ReDim maxes(1 To n)
With Application.WorksheetFunction
For i = 1 To n
maxes(i) = .Max(R.Columns(i))
Next i
End With
vals = R.Value
For i = 1 To m
For j = 1 To n
vals(i, j) = vals(i, j) / maxes(j)
Next j
Next i
R.Value = vals
End Sub
This will be more efficient than what you currently has since it moves values between the spreadsheet and VBA in bulk transfers rather than via a large number of individual read/writes. It also avoids things like screen-updating issues and intermediate recalculation of functions depending on these values.

Related

VBA solution for very slow Countif formula

I have a solution that works in Excel with a countif formula (with the help of another Stackoverflow user).
Essentially what this countif formula does is count the first instance of an ID that exclusively exists with the classification type "DC". For example, as you can see in my snippet, 2232 is marked with as it is only exists with the classification "DC". Whilst in the case of 2240 it is marked as 0 as there are multiple classifications possible.
The formula in column D is the following:
=IF(IF(B2<>"DC",0,AND(COUNTIF(C$2:C$28,C2)=COUNTIF(A$2:A$28,A2),COUNTIF(A$2:A2,A2)=1)),1,0)
The problem that I am experiencing is that this is an extremely slow formula to process for Excel -- it takes roughly ~10-15 mins to complete. The database that I am running this on contains of roughly 150k~ lines.
I was wondering if it was possible to do this same process in VBA, but a lot faster and more efficient than the current processing time.
So I am using the following piece of VBA code to try to recreate the same results:
Sub MarkUniqueID()
Dim Ary As Variant, Nary As Variant
Dim r As Long
With ThisWorkbook.Sheets("sheet1")
Ary = .Range("A2", .Range("A" & Rows.Count).End(xlUp)).Value2
End With
ReDim Nary(1 To UBound(Ary), 1 To 1)
With CreateObject("scripting.dictionary")
For r = 1 To UBound(Ary)
If Not .Exists(Ary(r, 1)) Then
.Add Ary(r, 1), Nothing
Nary(r, 1) = 1
Else
Nary(r, 1) = 0
End If
Next r
End With
ThisWorkbook.Sheets("sheet1").Range("E2").Resize(r).Value = Nary
End Sub
Which runs the process much smoother it takes only a few ~seconds of my original time, however, I am not sure how I can add one more criteria into my array (i.e. only exclusively consider "DC"), as now the results are not what I want (see below).
Any pointers would be much appreciated!
You can use another dictionary to track which ID's should be excluded:
Sub MarkUniqueID()
Dim Ary As Variant, Nary() As Long, cls, id, k
Dim r As Long, dictIn As Object, dictOut As Object
Dim ws As Worksheet
Set dictIn = CreateObject("scripting.dictionary")
Set dictOut = CreateObject("scripting.dictionary")
Set ws = ThisWorkbook.Sheets("sheet1")
'pick up the classification and ID
Ary = ws.Range("B2:C" & ws.Cells(ws.Rows.Count, "B").End(xlUp).Row).Value
ReDim Nary(1 To UBound(Ary), 1 To 1)
For r = 1 To UBound(Ary, 1)
cls = Ary(r, 1)
id = CStr(Ary(r, 2))
If cls = "DC" Then
If Not dictIn.exists(id) Then dictIn.Add id, r
Else
If Not dictOut.exists(id) Then dictOut.Add id, True
End If
Next r
For Each k In dictIn
If Not dictOut.exists(k) Then Nary(dictIn(k), 1) = 1
Next k
ws.Range("E2").Resize(UBound(Nary, 1)).Value = Nary
End Sub

Gather data tidy in Excel using VBA

What the case is:
So I got a "results sample" in excel format that needs filtering and reshaping to look nice. It is a result that will be not identical all the time but it follows similar rules. I have to filter it further and make it a little more tidy. I have figured out the filtering part, but I am not sure how to sort the remaining data, in a tidy way.
What the situation is:
There are six columns involved.
Notice: Real deal is not THAT simple, but what I need can be demonstrated using such a simple example and then I can manage more complex stuff myself I suppose.
For our example we use columns from B to G
The data are set as pairs of a "title" and a value.
For instance, if you look the first example picture I provide, The first detais the pair B3 and C3.
As you can see, looking at the same picture, D3 and E3 is an empty pair.
Same goes for D4 - E4 and F4 - G4 and so on until a last one at B11 - C11.
Starting data example:
[
What I want to achieve:
I would like, using Visual Basic for Applications, to sort the data, starting from let's say for our example B3 (see second picture) and fill three SETS of two columns, (BC, DE, FG) if there are no data inside those cells.
Notice: If a cell like D3 is null then SURELY E3 will be null too so there can be just only one check. I mean we can check either value columns or title columns.
Notice2: The B,D,F or C,E,G columns DON'T have to be sorted. I just want all the not-null values of B,D,F and their respective values from C,E,G gathered together neat so printing will not need 30 pages but just a few (too many spaces between is causing it and I try to automate the cleanup)
Here's something to start with. The first double loop populates a VBA Collection with Range variables that refer to the Cells that contain the titles.
The associated values are obtained by using an offset. The middle double loop performs a bubble sort on the latter (highly inefficient - you might want to replace it with something else). The next if statement creates a 2nd sheet if it doesn't exist on which to write out the results (last loop).
Option Explicit
Sub GatherData()
Dim lastRow As Integer, lastCol As Integer
Dim r As Integer, c As Integer
Dim vals As Collection
Set vals = New Collection
With Sheets(1)
lastCol = .UsedRange.Columns(.UsedRange.Columns.Count).Column
lastRow = .UsedRange.Rows(.UsedRange.Rows.Count).row
For c = 1 To lastCol Step 2
For r = 1 To lastRow
If (Trim(Cells(r, c).Value) <> "") Then
vals.Add .Cells(r, c)
End If
Next
Next
End With
' Bubble Sort
Dim i As Integer, j As Integer
Dim vTemp As Range
For i = 1 To vals.Count - 1
For j = i + 1 To vals.Count
If vals(i).Value > vals(j).Value Then
Set vTemp = vals(j)
vals.Remove j
vals.Add vTemp, vTemp, i
End If
Next j
Next i
Dim sht2 As Worksheet
If ThisWorkbook.Worksheets.Count = 1 Then
Set sht2 = ThisWorkbook.Sheets.Add(After:=ThisWorkbook.Sheets(1))
Else
Set sht2 = Worksheets(2)
End If
With sht2
r = 3
c = 2
For i = 1 To vals.Count
.Cells(r, c).Value = vals(i).Value
.Cells(r, c + 1).Value = vals(i).Offset(, 1).Value
c = c + 2
If c = 8 Then
r = r + 1
c = 2
End If
Next
End With
End Sub
Here is a method using the Dictionary object. I use early binding which requires setting a reference to Microsoft Scripting Runtime. If you are going to be distributing this, you might want to convert this to late-binding.
We assume that your data is properly formed as you show it above. In other words, all the titles are in even numbered columns; and the results are in the adjacent cell.
We create the dictionary using the Title as the Key, and the adjacent cell value for the Dictionary item.
We collect the information
Transfer the Keys to a VBA array and sort alphabetically
create a "Results Array" and populate it in order
write the results to a worksheet.
I will leave formatting and header generation to you.
By the way, there is a constant in the code for the number of Title/Value pair columns. I have set it to 3, but you can vary that.
Enjoy
Option Explicit
Option Compare Text 'If you want the sorting to be case INsensitive
'set reference to Microsoft Scripting Runtime
Sub TidyData()
'Assume Titles are in even numbered columns
'Assume want ColPairs pairs of columns for output
'Use dictionary with Title as key, and Value as the item
Dim dctTidy As Dictionary
Dim arrKeys As Variant
Dim wsSrc As Worksheet, wsRes As Worksheet, rRes As Range
Dim vSrc As Variant, vRes As Variant
Dim LastRow As Long, LastCol As Long
Dim I As Long, J As Long, K As Long, L As Long
Dim V As Variant
'in Results
Const ColPairs As Long = 3
'Set Source and results worksheet and range
Set wsSrc = Worksheets("sheet1")
Set wsRes = Worksheets("sheet2")
Set rRes = wsRes.Cells(1, 2)
'Read source data into variant array
With wsSrc.Cells
LastRow = .Find(what:="*", after:=.Item(1, 1), _
LookIn:=xlValues, searchorder:=xlByRows, searchdirection:=xlPrevious).Row
LastCol = .Find(what:="*", after:=.Item(1, 1), _
LookIn:=xlValues, searchorder:=xlByColumns, searchdirection:=xlPrevious).Column
vSrc = .Range(.Cells(1, 1), .Cells(LastRow, LastCol))
End With
'Collect the data into a dictionary
Set dctTidy = New Dictionary
For I = 1 To UBound(vSrc, 1)
For J = 2 To UBound(vSrc, 2) Step 2
If vSrc(I, J) <> "" Then _
dctTidy.Add Key:=vSrc(I, J), Item:=vSrc(I, J + 1)
Next J
Next I
'For this purpose, we can do a simple sort on the dictionary keys,
' and then create our results array in the sorted order.
arrKeys = dctTidy.Keys
Quick_Sort arrKeys, LBound(arrKeys), UBound(arrKeys)
'Create results array
ReDim vRes(1 To WorksheetFunction.RoundUp(dctTidy.Count / ColPairs, 0), 1 To ColPairs * 2)
I = 0
J = 0
For Each V In arrKeys
K = Int(I / ColPairs) + 1
L = (J Mod ColPairs) * 2 + 1
vRes(K, L) = V
vRes(K, L + 1) = dctTidy(V)
I = I + 1
J = J + 1
Next V
'write the results
Set rRes = rRes.Resize(UBound(vRes, 1), UBound(vRes, 2))
With rRes
.Worksheet.Cells.Clear
.Value = vRes
.HorizontalAlignment = xlCenter
End With
End Sub
Sub Quick_Sort(ByRef SortArray As Variant, ByVal first As Long, ByVal last As Long)
Dim Low As Long, High As Long
Dim Temp As Variant, List_Separator As Variant
Low = first
High = last
List_Separator = SortArray((first + last) / 2)
Do
Do While (SortArray(Low) < List_Separator)
Low = Low + 1
Loop
Do While (SortArray(High) > List_Separator)
High = High - 1
Loop
If (Low <= High) Then
Temp = SortArray(Low)
SortArray(Low) = SortArray(High)
SortArray(High) = Temp
Low = Low + 1
High = High - 1
End If
Loop While (Low <= High)
If (first < High) Then Quick_Sort SortArray, first, High
If (Low < last) Then Quick_Sort SortArray, Low, last
End Sub
Assuming we got all variables set and initialized properly, in this example:
Sheets("sheetname").Select ' because stupid things can happen...
For i = 3 To 13
Let newrangeT = "B" & i '
Let newrangeV = "C" & i '
If Sheets("sheetname").Range(newrangeV) <> "" Then
values(Position) = Sheets("sheetname").Range(newrangeV)
titles(Position) = Sheets("sheetname").Range(newrangeT)
Position = Position + 1
Else
' Don't do anything if the fields are null
End If
Next i
Sheets("sheetname").Range("B1:G13").Clear
' We then get each data from the arrays with a For loop.
' We set a columnset variable to 1.
' We set a currentrow variable to 3.
' If columnset is 1 data will enter in B and C and columnset = columnset +1
' Then if columnset is 2 we set data to DE and columnset = columnset +1
' But if columnset is 2we set data to FG and columnset = 1 and currentrow = currentrow +1
' Iterating the arrays will result in a neat setting of the data, but it will add zeros for all the nulls. Thus we need an If statement that will exclude that values checking the TITLE array (that should contain a title instead). if the value is not 0 then... we run what I describe, otherwise we do nothing.
Putting the data in the array is half of the trick.
Then we clear the area.
We set two string variables to declare ranges (actually cell reference) for every cell iterated in the loop. Here I demonstrated only for column set B,C
but we have to do the same for the rest of the columns.
The If statement here checks for null. You might have different needs, so changing the if statement changes the filtering. Here I check if the cells are not null. If the cells of column C contain data, put those data in values array and the respective B data on titles array but where? Position starts as 1 and we then iterate it +1 each time it adds something.
You can set data from an array using this command:
' current_row is set to the first row of the spreadsheet we wanna fill.
Sheets("sheetname").Select ' because stupid things can happen...
newrangeV = "C" & current_row
Sheets("sheetname").Range(newrangeV) = values(j)
The rest is just putting things together.
In any case, I wanna thank both of the people involved in this question, because I might didn't got the solution, but I got an idea of how to do other stuff, like accidentally learning something new. Cheers.

Sorting by description for groups of rows on a single sheet, concept inquiry

So, I'm a bit baffled on how to move forward, and would like some collaboration to get me started. I'm not asking for someone to code this, but to verify my theoretical path forward.
Background:
I have a single worksheet with 30 activities. Each activity is 75 rows, with the first of the 75 rows having a cell with the description of the activity. Assuming the # of columns is irrelevent to this, the activities A, B, and C, would appear such as:
A1
A...
A75
B1
B...
B75
C1
C...
C75
Theoretical path forward:
Since I have a known row which starts each activity, I was thinking that I could:
.1) Copy the known cell from each row that I intend to sort by to another sheet (this isn't preferred, but is how I can think to do it).
.2) Once in the other sheet, Sort the activity descriptions.
.3) Once sorted, I want to copy each of the activities 75 rows, in order, to the sorted sheet, via Match or Find.
.4) Once completed, I would Copy the Activities from the new sheet, paste back into the original sheet, then delete the new sheet.
Question:
Does this sound appropriate? Is there possibly a better way to do this that immediately comes to mind?
I think you could save a lot of time copying and pasting if you used a 2 dimensional array. This program stops at each 75th row and loads up the 2d array "Activities" from column A. The array gets passed to bubblesort where it is sorted on the first value. Then it is returned and all output to column B. You might have to adjust the constants to match and if there is a blank row between activities there will be some other minor adjustments to the two main loops.
Option Compare Text
Const RowsPerActivity As Integer = 75 'setting
Const NumActivities As Integer = 30 'setting
Const RowtoSortOn As Integer = 1 'setting
Sub SortGroups()
Dim MySheet As Worksheet
Set MySheet = Worksheets("sheet1")
Dim Activities(1 To NumActivities, 1 To RowsPerActivity) As Variant
Dim CurActivity, CurDataRow As Integer
Dim LastRow As Integer
LastRow = MySheet.Cells(MySheet.Rows.Count, "A").End(xlUp).Row
For CurActivity = 1 To LastRow Step RowsPerActivity 'maybe +1 if there is a blank row between activities
For CurDataRow = 1 To RowsPerActivity Step 1
Activities((CurActivity \ 75) + 1, CurDataRow) = MySheet.Cells(CurActivity + CurDataRow - 1, 1).Value
Next CurDataRow
Next CurActivity
Call BubbleSort(Activities)
For CurActivity = 1 To LastRow Step RowsPerActivity 'maybe +1 if there is a blank row between activities
For CurDataRow = 1 To RowsPerActivity Step 1
MySheet.Cells(CurActivity + CurDataRow - 1, 2).Value = Activities((CurActivity \ 75) + 1, CurDataRow)
Next CurDataRow
Next CurActivity
End Sub
Sub BubbleSort(ByRef list() As Variant)
' Sorts an array using bubble sort algorithm
Dim First As Integer, Last As Long
Dim i As Long, j As Long, k As Long
Dim Temp As Variant
First = LBound(list, 1)
Last = UBound(list, 1)
For i = First To Last - 1
For j = i + 1 To Last
If list(i, RowtoSortOn) > list(j, RowtoSortOn) Then
For k = 1 To RowsPerActivity
Temp = list(j, k)
list(j, k) = list(i, k)
list(i, k) = Temp
Next k
End If
Next j
Next i
End Sub

Speed up Excel VBA search script

I need to search for duplicate values and mark them in an Excel spreadsheet. I have my data to verify in column D and the data where possible duplicates are in column K. I need to check for each row in column D all the rows in col. K.
This is my current script for this:
Sub MySub()
Dim ThisCell1 As Range
Dim ThisCell2 As Range
For Each ThisCell1 In Range("D1:D40000")
'This is the range of cells to check
For Each ThisCell2 In Range("K1:K40000")
'This is the range of cells to compare
If ThisCell1.Value = ThisCell2.Value Then
If ThisCell1.Value <> "" Then
ThisCell1.Interior.ColorIndex = 3
End If
Exit For
End If
Next ThisCell2
Next ThisCell1
End Sub
The problem with this is that it's VERY slow. I mean it takes hours to check the data which is not acceptable. Even when the range is set to 1:5000, it still takes 10-15 minutes to finish. Is there any way to make it faster?
A dictionary will be the fastest way to achieve what you are looking for. Don't forget to add a reference to the 'microsoft scripting runtime' in your project
Sub MySubFast()
Dim v1 As Variant
Dim dict As New Scripting.Dictionary
Dim c As Range
v1 = Range("D1:D40000").Value
For Each c In Range("K1:K40000")
If Not dict.Exists(c.Value) Then
dict.Add c.Value, c
End If
Next
Dim i As Long
For i = LBound(v1, 1) To UBound(v1, 1)
If v1(i, 1) <> "" Then
If dict.Exists(v1(i, 1)) Then
Range("D" & i).Interior.ColorIndex = 3
End If
End If
Next i
End Sub
note : this is an improvement of #Jeanno answer.
Use arrays instead of referencing objects (Ranges) way faster.
Sub MySubFast()
Dim v1 As Variant
Dim v2 As Variant
v1 = Range("D1:D40000").Value
v2 = Range("K1:K40000").Value
Dim i As Long, j As Long
For i = LBound(v1, 1) To UBound(v1, 1)
For j = LBound(v2, 1) To UBound(v2, 1)
If v1(i, 1) = v2(j, 1) Then
If v1(i, 1) <> "" Then
Range("D" & i).Interior.ColorIndex = 3
End If
Exit For
End If
Next j
Next i
End Sub
Aren't you just highlighting cells in column D if the value exists in column K? No need for VBA for this, just use conditional formatting.
Select column D (selecting the whole column is fine)
Add a conditional format using this formula: =COUNTIF($K:$K,$D1)>0
The conditional format will apply and update automatically as you change data in columns D and K, and it should be basically instant

Fastest way to check if two ranges are equal in excel vba [duplicate]

This question already has answers here:
How to compare two entire rows in a sheet
(11 answers)
Closed 8 years ago.
Imagine you have two sets of data and the number of rows and columns are the same. Now you want check if data in cells in one set is equal to data in cells with the same relative address in the other set. If thats true for all cells of a row, remove the row from both sets. I can code this very easily by comparing each cell and that's not good for large data sets. See code below for two columns where the two sets of data happen to be in the same sheet side by side with 300 in column offset between them.
Dim RngOb As Range
Dim c As Range
Range("A1", "B1").Select
set RngOb = Range(Selection, Selection.End(xlDown))
For Each c In RngOb.Rows
If c.Cells(1,1).Value = c.Offset(0, 300).Cells(1,1).Value Then
If c.Cells(1,2).Value = c.Offset(0, 300).Cells(1,2).Value Then
c.EntireRow.Delete
End If
End If
Next
My actual data has more than 100 columns and different number of columns from day to day. I'm looking for a smart, fast way to do this for large data sets. I highly appriciate answers, feedback and criticism. :D
Here is a simple way to compare two rows in isomorphic ranges.............in this example row #5 of each range:
Sub RowCompare()
Dim ary1() As Variant
Dim Range1 As Range, Range2 As Range, rr1 As Range, rr2 As Range
Set Range1 = Range("B9:F20")
Set Range2 = Range("I16:M27")
Set rr1 = Range1.Rows(5)
Set rr2 = Range2.Rows(5)
ary1 = Application.Transpose(Application.Transpose(rr1))
ary2 = Application.Transpose(Application.Transpose(rr2))
st1 = Join(ary1, ",")
st2 = Join(ary2, ",")
If st1 = st2 Then
MsgBox "the same"
Else
MsgBox "different"
End If
End Sub
If you have embedded commas in the cells, then choose another character in the JOIN
If I understand your problem correctly, the following code should allow you to do what you want. Within the code, you select the range you wish to process; the first column of each data set, and the number of columns within each data set.
It does assume only two data sets, as you wrote, although that could be expanded. And there are ways of automatically determining the dataset columns, if there is no other data in between.
Option Explicit
Option Base 0
Sub RemoveDups()
Dim I As Long, J As Long
Dim rRng As Range
Dim vRng As Variant, vRes() As Variant
Dim bRng() As Boolean
Dim aColumns, lColumns As Long
Dim colRowsDelete As Collection
'vRng to include from first to last column to be tested
Set rRng = Range("f1", Cells(Rows.Count, "F").End(xlUp)).Resize(columnsize:=100)
vRng = rRng
ReDim bRng(1 To UBound(vRng))
'columns to be tested
'Specify First column of each data set
aColumns = Array(1, 13)
'num columns in each data set
lColumns = 3
For I = 1 To UBound(vRng)
bRng(I) = vRng(I, aColumns(0)) = vRng(I, aColumns(1))
For J = 1 To lColumns - 1
bRng(I) = bRng(I) And (vRng(I, aColumns(0) + J) = vRng(I, aColumns(1) + J))
Next J
Next I
'Rows to Delete
Set colRowsDelete = New Collection
For I = 1 To UBound(bRng)
If bRng(I) = True Then colRowsDelete.Add Item:=I
Next I
'Delete the rows
If colRowsDelete.Count > 0 Then
Application.ScreenUpdating = False
For I = colRowsDelete.Count To 1 Step -1
rRng.Rows(colRowsDelete.Item(I)).EntireRow.Delete
Next I
End If
Application.ScreenUpdating = True
End Sub