My data looks like this, it has 3 columns: Town Code, Ward Code and the corresponding Population of that Ward :
| Town | Ward | Population |
| 1000 | 10001 | 20 |
| 1000 | 10002 | 30 |
| 1000 | 10003 | 40 |
| 1234 | 12341 | 50 |
| 1234 | 12342 | 35 |
I am not able to write a code in vba that will be give me the sum of population for all Wards under a Town (ie total Population of a Town). Considerations:- I have a huge dataset. I have multiple sheets over which I have the same kind of data, with different Town ID's (also different number of Wards per Town). Please kindly help if you can take out some time out of your busy schedule.
Here your solution
How to perform SumIf using VBA on an array in Excel
Sub FasterThanSumifs()
'FasterThanSumifs Concatenates the criteria values from columns A and B -
'then uses simple IF formulas (plus 1 sort) to get the same result as a sumifs formula
'Columns A & B contain the criteria ranges, column C is the range to sum
'NOTE: The data is already sorted on columns A AND B
'Concatenate the 2 values as 1 - can be used to concatenate any number of values
With Range("D2:D25001")
.FormulaR1C1 = "=RC[-3]&RC[-2]"
.Value = .Value
End With
'If formula sums the range-to-sum where the values are the same
With Range("E2:E25001")
.FormulaR1C1 = "=IF(RC[-1]=R[-1]C[-1],RC[-2]+R[-1]C,RC[-2])"
.Value = .Value
End With
'Sort the range of returned values to place the largest values above the lower ones
Range("A1:E25001").Sort Key1:=Range("D1"), Order1:=xlAscending, _
Key2:=Range("E1"), Order2:=xlDescending, Header:=xlYes
Sheet1.Sort.SortFields.Clear
'If formula returns the maximum value for each concatenated value match &
'is therefore the equivalent of using a Sumifs formula
With Range("F2:F25001")
.FormulaR1C1 = "=IF(RC[-2]=R[-1]C[-2],R[-1]C,RC[-1])"
.Value = .Value
End With
End Sub
Related
I've got an Excel workbook that obtains data from an MS SQL database. One of the sheets is used to check the data against requirements and to highlight faults. In order to do that, I've got a requirements sheet where the requirement is in a named range; after updating the data I copy the conditional formatting of the table header to all data rows. That works pretty nicely so far. The problem comes when I have more than one set of requirements:
An (agreeably silly) example could be car racing, where requirements may exist for driver's license and min/max horsepower. When looking at the example, please imagine there are a few thousand rows and 71 columns presently...
+-----+--------+----------------+------------+---------+
| Car | Race | RequirementSet | Horsepower | License |
+-----+--------+----------------+------------+---------+
| 1 | Monaco | 2 | 200 | A |
+-----+--------+----------------+------------+---------+
| 2 | Monaco | 2 | 400 | B |
+-----+--------+----------------+------------+---------+
| 3 | Japan | 3 | 200 | C |
+-----+--------+----------------+------------+---------+
| 4 | Japan | 3 | 300 | A |
+-----+--------+----------------+------------+---------+
| 5 | Japan | 3 | 350 | B |
+-----+--------+----------------+------------+---------+
| 6 | Mexico | 1 | 200 | A |
+-----+--------+----------------+------------+---------+
The individual data now needs to be checked against the requirements set in another sheet:
+-------------+---------------+---------------+---------+
| Requirement | MinHorsepower | MaxHorsepower | License |
+-------------+---------------+---------------+---------+
| 1 | 200 | 250 | A |
+-------------+---------------+---------------+---------+
| 2 | 250 | 500 | B |
+-------------+---------------+---------------+---------+
| 3 | 250 | 400 | A |
+-------------+---------------+---------------+---------+
In order to relate back to my present situation, I am only looking at either the Monaco, Japan or Mexico Race, and there is only 1 record in the requirements sheet, where the value in e.g. Cell B2 is always the MinHorsepower and the value in C2 is always the MaxHorsepower. So these cells are a named range that I can access in my data sheet.
Now however I would like to obtain all races at once, and refer conditional formatting formulas to the particular requirement.
Focussing on "Horsepower" in Monaco (requirement set 2), I can now find out that the min Horsepower is 250 and the max is 500 - so I will colour that column for car 1 as red and for car 2 as green.
The formula is programatically copied from the header row (the first conditional format rule is if row(D1) = 1 then do nothing)
I can't decide what the best approach to the problem is. Ideally, the formula is readable, something like `AND(D2 >= MinHorsepower; D2 <= MaxHorsepower) - I cannot imagine it to be maintainable if I had to use Vlookup combined with Indirect and Match to match a column header in requirements for that particular requirement - especially when it comes to combining criteria like in the HP example with min and max above.
I am wondering if I should read the requirements table into a dictionary or something in VBA, and then use a function like
public function check(requirementId as int, requirement$)
which then in Excel I could use like =D2 >= check(c2, "MinHorsepower")
Playing around with this a little bit it appears to be pretty slow as opposed to the previous system where I could only have one requirement. It would be fantastic if you could help me out with a fresh approach to this problem. I'll update this question as I go along; I'm not sure if I managed to illustrate the example really well but the actual data wouldn't mean anything to you.
In any case, thanks for hanging in until here!
Edit 29 October 2016
I have found a solution as basis for mine. Using the following code I can add my whole requirements table to a dictionary, and access the requirement.
Using a class clsRangeToDictionary (based on Tim Williams clsMatrix)
Option Explicit
Private m_array As Variant
Private dictRows As Object
Private dictColumns As Object
Public Sub Init(vArray As Variant)
Dim i As Long
Set dictRows = CreateObject("Scripting.Dictionary")
Set dictColumns = CreateObject("Scripting.Dictionary")
'add the row keys and positions. Skip the first row as it contains the column key
For i = LBound(vArray, 1) + 1 To UBound(vArray, 1)
dictRows.Add vArray(i, 1), i
Next i
'add the column keys and positions, skipping the first column
For i = LBound(vArray, 2) + 1 To UBound(vArray, 2)
dictColumns.Add vArray(1, i), i
Next i
' store the array for future use
m_array = vArray
End Sub
Public Function GetValue(rowKey, colKey) As Variant
If dictRows.Exists(rowKey) And dictColumns.Exists(colKey) Then
GetValue = m_array(dictRows(rowKey), dictColumns(colKey))
Else
Err.Raise 1000, "clsRangeToDictionary:GetValue", "The requested row key " & CStr(rowKey) & " or column Key " & CStr(colKey) & " does not exist"
End If
End Function
' return a zero-based array of RowKeys
Public Function RowKeys() As Variant
RowKeys = dictRows.Keys
End Function
' return a zero-based array of ColumnKeys
Public Function ColumnKeys() As Variant
ColumnKeys = dictColumns.Keys
End Function
I can now read the whole RequirementSet table into a dictionary and write a helper to obtain the particular requirement roughly so:
myDictionaryObject.GetValue(table1's RequirementSet, "MinHorsePower")
If someone could help me figure out how to put this into an answer giving the credit due to Tim Williams that'd be great.
I have a large data set which looks like this:
Employee ID |Job| Function| Level|Date of change
1 | x | a | A1 | 01/05/2014
1 | y | a | A1 | 02/04/2015
1 | y | a | A2 | 25/08/2015
1 | z | a | A3 | 27/12/2015
1 | z | c | A3 | 01/03/2016
2 | t | b | B1 | 12/05/2013
2 | v | b | B1 | 13/04/2014
2 | w | b | B3 | 12/01/2016
Each row contains a change in either job, function or level.
I need to create a table which puts together the latest change in level for each employee (so for employee 1, it would be row 4). So far I have used a combination of conditional formatting and pivots but I was wondering if there is a way to do this quicker in VBA.
Thanks!
Without VBA
This assumes that there are genuine dates in column E with format dd/mm/yyyy, In G1 enter the Array Formula:
=MAX(IF(A:A=1,E:E,""))
This gives the latest date for employee 1
Array formulas must be entered with Ctrl + Shift + Enter rather than just the Enter key.
Then in G2 enter:
=SUMPRODUCT(--(A1:A9=1)*(E1:E9=G1)*(ROW(1:9)))
This gives the row number of the record you are interested in.
From there you can use INDEX() to get any information from that row.
NOTE:
The formulas in G1 and G2 can be combined into a single cell if desired.
EDIT#1:
The same set of formulas should work with text values for the employee id as well as numbers:
Not sure this is the best solution, but since this is a one-off exercise and it did the trick I used this:
VBA to find all the rows where there was a change in level, and write a "yes" in column "F" where applicable:
Sub JLChange()
Dim Data As Worksheet
Dim i As Long
Dim lastrow As Long
Set Data = ThisWorkbook.Worksheets("Data")
lastrow = Data.Cells(Rows.Count, 1).End(xlUp).Row
For i = 1 To lastrow
If Cells(i + 1, 4).Value <> Cells(i, 4).Value And_
Cells(i + 1,1).Value = Cells(i, 1).Value Then
Cells(i + 1, 6).Value = "Yes"
Else
Cells(i + 1, 6).Value = "No"
End If
Next i
End Sub
For all the records in column "F", I used the formulas suggested by Gary's student to get the very last change.
Alternatively, you can copy-paste this database of changes in a new sheet, sort by ID and by date of change from newest to oldest, then use vlookup to get the first entry for each ID.
Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?
The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.
To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i
I have a spreadsheet like this:
Item Category | Customer Category
--------------|-------------------
A | Z
A | Z
A | Y
B | Y
B | Z
B | Z
etc.. on to > 35K rows.
I am trying to write a VBA module to count and merge rows which are the same in both columns, but keep coming up short.
I want output like:
Item Category | Customer Category | Count
--------------|--------------------|------
A | Z | 2
A | Y | 1
B | Y | 1
B | Z | 2
And so on...
It sounds so simple, and I found numerous examples on here sorting and counting much more complicated spreadsheets, but can't get mine to work.
Try this easy way using Advance Filter.
Sub Test()
Dim rng As Range, lrow As Long
With Sheet1
.Columns("E:G").ClearContents: .Range("G1") = "Count"
Set rng = .Range("A1", .Range("B" & .Rows.Count).End(xlUp))
rng.AdvancedFilter Action:=xlFilterCopy, _
CopyToRange:=.Range("E1"), Unique:=True
lrow = .Range("E" & .Rows.Count).End(xlUp).Row
With .Range("G2", "G" & lrow)
.Formula = "=COUNTIFS(A:A,E2,B:B,F2)"
.Value = .Value
End With
End With
End Sub
I think this is ok with 35k. So everytime you run the code, it will summarize items in Columns A and B to Columns E to G. Just something to get you started. And of course the easiest would be a Pivot Table as commented by Tim. HTH
Below is the table
Amt | Val | Location
230 | a | DEL
450 | b | KOL
670 | c | BLR
890 | d | DEL
111 | e | KOL
133 | a | KOL
155 | b | DEL
177 | c | BLR
199 | a | DEL
221 | b | BLR
243 | c | BLR
265 | d | KOL
287 | a | KOL
309 | b | DEL
331 | c | DEL
353 | d | KOL
375 | e | BLR
397 | a | BLR
419 | b | DEL
441 | c | KOL
out of a,b,c,d,e values how to find the maximum 2 values for respective location based on the a's..b's..c's..d's..e's amount.
I am able to get the sum of values of top 2 val through Pivot table, for one location
Please tell how to get the top 2 val with their sum of amount for all location simultaneously through VBA,
I have Posted VBA code for the same, which gives result for only one location.
Sorry not able to upload the snapshot.
Say your data is in A1 thru C20. You have three unique locations: DEL, KOL, BLR.
In D1 enter:
=SUMPRODUCT(--(A$1:A$20)*(C$1:C$20=C1)) and copy down thru D3
In E1 enter:
=LARGE(D1:D3,1)
In E2 enter:
=LARGE(D1:D3,2)
Should look like:
EDIT:
based upon your comment, the highest two values for DEL would be:
=LARGE(IF(C1:C20="del",A1:A20),1)
and
=LARGE(IF(C1:C20="del",A1:A20),2)
These are array formulas that must be entered with CNTRL-SHFT-ENTER rather than just the ENTER key
DMAX function returns the largest number in a column in a list or database, based on a given criteria.
http://www.techonthenet.com/excel/formulas/dmax.php
1. Insert a Pivot Table.
2. Add Val in Row Labels.
3. Add Location in column Labels.
4. Add Amt in Values field(Sumof Amt).
Now In created Pivot Table,
1. In column labels filter for only one location(eg: Blr).
2. In Row Labels filter apply value filters and select Top 10..(last item).
3. In place of 10(by default) give 2.
4. Now the table consists of Top 2 val with their sum of Amount for BLR Location.
VBA Code for the Same:
Private Sub CommandButton1_Click()
Dim wkbk As Workbook
Set wkbk = ActiveWorkbook
With wkbk.Sheets(1)
LastRow = .Range("A1").End(xlDown).Row
LastCol = .Range("A1").End(xlToRight).Column
Set rngSource = .Range("A1", .Cells(LastRow, LastCol))
End With
With wkbk.Sheets(2)
Set dst = .Range("a1")
End With With wkbk
Sheets(1).PivotTableWizard _
SourceType:=xlDatabase, _
SourceData:=rngSource, _
TableDestination:=dst, _
TableName:="Pivotinfo"
End With
With wkbk.Sheets(2).PivotTables("Pivotinfo")
.PivotFields("Val").Orientation = xlRowField
.PivotFields("Location").Orientation = xlColumnField
With .PivotFields("Amt")
.Orientation = xlDataField
.Function = xlSum With wkbk.Sheets(2).PivotTables("Pivotinfo").PivotFields("Location")
.PivotItems("DEL").Visible = False
.PivotItems("KOL").Visible = False
End With
With wkbk.sheets(2).PivotTables("Pivotinfo").PivotFields("Val").AutoShow _
xlAutomatic, xlTop, 2, "Sum of Amt"
End With
End With
End With
End Sub