FormulaArray not averaging out all the specified entries - vba

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?

The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.

To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

Related

Confusing matching behaviour of pandas extract(all)

I have a strange problem. But first, I want to match a hierarchy-based string onto the value of a column in a pandas data frame and count the occurrence of the current node and all of its children.
| index | hierarchystr |
| ----- | --------------------- |
| 0 | level0level00level000|
| 1 | level0level01 |
| 2 | level0level02level021|
| 3 | level0level02level021|
| 4 | level0level02level020|
| 5 | level0level02level021|
| 6 | level1level02level021|
| 7 | level1level02level021|
| 8 | level1level02level021|
| 9 | level2level02level021|
Assume that there are 300k lines. Each node can have multiple children with again multiple children so on and so forth (here represented by level0-2 strings). Now I have a separate hierarchy where I extract the hierarchy strings from. Now to the problem:
#hstrs = ["level0", "level1", "level0level01", "level0level02", "level0level02level021"]
pat = "|".join(hstrs)
s = df.hierarchystr.str.extract('(' + pat + ')', expand=True)[0]
df1 = df.groupby(s).size().reset_index(name='Count')
df1 = df1[df1 > 200]
size = len(df1)
The size of the found matched substrings with occurrence greater than 200 differ every RUN! "level0" should match every row where the hierarchy str level0 is included and should build a group with all its subchildren and that size needs to be greater than 200.
Edit:// levelX is just an example, i have thousands of nodes, with different names and again thousands of different subchilds. The hstrs strings do not include each other, besides the parent nodes. (E.g. "parent1" is included in "parent1subchild1" and "parent1subchild2")
I traced it back to a different order of the hierarchy strings in the array hstrs. So I changed the code and compare each substring individually:
for hstr in hstrs:
s = df.hierarchystr.str.extract('(' + hstr + ')', expand=True)
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
list.append(hstr)
This is slow as hell, but the result sticks the same, no matter which order hstrs has. But for efficiency is it possible to do the same with only one regex matching group, all at once for all hstrs?
Edit://
expected output would be:
|index| 0 | Count |
|-----|---------------------|-------|
|0 |level0 | 5 |
|1 |level1 | 3 |
|2 |level0level01 | 1 |
|3 |level0level02 | 4 |
|4 |level0level02level021| 3 |
Edit2://
it has something to do with the ordering of hstrs. I think with the match and stop after the first match the behavior of the extract method. If the ordering is different the hierarchy strings in the pat will be matched differently which results in different sizes of each group. A high hierarchy (short str) will be matched first, the lower hierarchy levels in the same pat won't be matched again. But IDK what to do against this behavior.
Edit3://
an alternative would be, but is also slow as hell:
for hstr in hstrs:
s = df[df.hierarchystr.str.contains(fqn)]
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
beforeset.append(fqn)
Edit4://
I think what I am searching for is the opportunity to do a "group_by" with "contains" or "is in" for the hstrs. I am glad for every Idea. :)
Edit5://
Found a simple, but not satisfying alternative (but faster than the previous tries):
containing =[item for hierarchystr in df.hierarchystr for item in hstrs if item in hierarchystr]
containing = Counter(containing)
df1 = pd.DataFrame([containing]).T
nodeNamesWithOver200 = df1[df1 > 200].dropna().index.values

Creating an Excel Lookup Table Sheet from a Comma Delimited and ID column

We exported a customer's table who was using AirTable to keep track of their client's information and locations in an attempt to import into a SQL database. Because of the way AirTable exports, the references to other tables in their "AirTable Base" are not via ID's, but exported in a single column as basically power labels for lack of a better explanation.
There's about 4,000 client rows in this table. Clients can have one or more locations. Excluding many of the other columns it looks like:
| Client_ID | Client_Name | ... | Locations
| 3456 | Acme Grocery | ... | "Memphis, TN","Orlando, FL","Philadelphia, PA"
| 3457 | Addition Financial | ... | "Miami, FL","Plano, TX","New York, NY"
| 3458 | Barros Pizza | ... | "Queen Creek, AZ"
We are trying to get the data ready for import into SQL, so we are attempting to find a formula/method which could take the Client_ID and then insert that into rows in a new data sheet made from the comma-delimited column. Using the above example the new data should look like the following:
| ClientInLocation_ID | Client_ID | Location |
| 10000 | 3456 | Memphis, TN |
| 10001 | 3456 | Orlando, FL |
| 10002 | 3456 | Philadelphia, PA |
| 10003 | 3457 | Miami, FL |
| 10004 | 3457 | Plano, TX |
| 10005 | 3457 | New York, NY |
| 10006 | 3458 | Queen Creek, AZ |
Doing so will allow us to then grab the unique locations, assign ID's to them and then replace the Location text with a Location_ID field.
I was thinking pivot tables, text to rows, etc. but perhaps I'm not experienced enough with them to pull this off. Also, any solutions can obviously exclude the ClientInLocation_ID auto increment as we could always have that autofilled once the other two fields are populated. Any help greatly appreciated.
There are many ways to tackle this problem. You can use PowerQuery (PQ) to do some of the lifting if you have an appropriate version of Excel. PQ is built into recently released Excel versions and is a free add-on for Excel 2013 and 2010 but is not available for anything older than Excel 2010. If you see a Power Query tab on the ribbon then you're good to go.
Use your data as the source for a new query and split the location column by delimiter "," To clarify, you are using three characters as the delimiter: the last quote of a location, the comma delimiting two locations, and the first quote of the second location. This puts one location in a cell with subsequent locations in columns to the right.
Every cell in the first column well have a quote in front of the text and the cell holding the final location for that row will have a quote at the end of the text. This is easily cleared in PQ but we're done here so it's probably faster to click Save & Load to close the editor and use Ctrl+H in Excel to clear them.
Your data will automatically be converted into a table that is connected to your source data. That means that refreshing the table does two things: it wipes any edits you've made and it updates the table with any changes in your source data. So either delete the query (if this is a one and done project) or copy the table to a new sheet (if you want to rapidly rebuild with new source data)
From there, I'd turn to VBA and use three nested For loops. The outer loop iterates every row in your data from the bottom up (Step -1). The middle loop iterates the columns to add new rows. The inner loop populates the rows.
This is quick, dirty, makes several assumptions and is in no way tested because it was written on my phone:
Option Explicit
Sub TransformTable ()
Dim ws As Worksheet
Dim myTable As ListObject
Dim rng As Range
Dim j As Long
Dim k As Long
Dim l as Long
Set ws = ActiveSheet
Set myTable = ws.ListObjects(1)
Application.ScreenUpdating = False
For j = myTable.ListRows.Count to 2 Step -1
For k = 1 to Application.WorksheetFunction.CountA(ws.Range(ws.Cells(j,1),ws.Cells(j,myTable.ListColumns.Count) - 3
Set rng = ws.Cells(j,1)
myTable.ListRows.Add j+k
For l = 0 to 1
rng.Offset(k,l) = rng.Offset(0,l)
Next l
rng.Offset(k,3) = rng.Offset(0,3+k)
rng.Offset(0,3+k).Cells.Clear
Next k
Next l
Application.ScreenUpdating = True
End Sub

Excel/VBA/Conditional Formatting: Dictionary of Dictionaries

I've got an Excel workbook that obtains data from an MS SQL database. One of the sheets is used to check the data against requirements and to highlight faults. In order to do that, I've got a requirements sheet where the requirement is in a named range; after updating the data I copy the conditional formatting of the table header to all data rows. That works pretty nicely so far. The problem comes when I have more than one set of requirements:
An (agreeably silly) example could be car racing, where requirements may exist for driver's license and min/max horsepower. When looking at the example, please imagine there are a few thousand rows and 71 columns presently...
+-----+--------+----------------+------------+---------+
| Car | Race | RequirementSet | Horsepower | License |
+-----+--------+----------------+------------+---------+
| 1 | Monaco | 2 | 200 | A |
+-----+--------+----------------+------------+---------+
| 2 | Monaco | 2 | 400 | B |
+-----+--------+----------------+------------+---------+
| 3 | Japan | 3 | 200 | C |
+-----+--------+----------------+------------+---------+
| 4 | Japan | 3 | 300 | A |
+-----+--------+----------------+------------+---------+
| 5 | Japan | 3 | 350 | B |
+-----+--------+----------------+------------+---------+
| 6 | Mexico | 1 | 200 | A |
+-----+--------+----------------+------------+---------+
The individual data now needs to be checked against the requirements set in another sheet:
+-------------+---------------+---------------+---------+
| Requirement | MinHorsepower | MaxHorsepower | License |
+-------------+---------------+---------------+---------+
| 1 | 200 | 250 | A |
+-------------+---------------+---------------+---------+
| 2 | 250 | 500 | B |
+-------------+---------------+---------------+---------+
| 3 | 250 | 400 | A |
+-------------+---------------+---------------+---------+
In order to relate back to my present situation, I am only looking at either the Monaco, Japan or Mexico Race, and there is only 1 record in the requirements sheet, where the value in e.g. Cell B2 is always the MinHorsepower and the value in C2 is always the MaxHorsepower. So these cells are a named range that I can access in my data sheet.
Now however I would like to obtain all races at once, and refer conditional formatting formulas to the particular requirement.
Focussing on "Horsepower" in Monaco (requirement set 2), I can now find out that the min Horsepower is 250 and the max is 500 - so I will colour that column for car 1 as red and for car 2 as green.
The formula is programatically copied from the header row (the first conditional format rule is if row(D1) = 1 then do nothing)
I can't decide what the best approach to the problem is. Ideally, the formula is readable, something like `AND(D2 >= MinHorsepower; D2 <= MaxHorsepower) - I cannot imagine it to be maintainable if I had to use Vlookup combined with Indirect and Match to match a column header in requirements for that particular requirement - especially when it comes to combining criteria like in the HP example with min and max above.
I am wondering if I should read the requirements table into a dictionary or something in VBA, and then use a function like
public function check(requirementId as int, requirement$)
which then in Excel I could use like =D2 >= check(c2, "MinHorsepower")
Playing around with this a little bit it appears to be pretty slow as opposed to the previous system where I could only have one requirement. It would be fantastic if you could help me out with a fresh approach to this problem. I'll update this question as I go along; I'm not sure if I managed to illustrate the example really well but the actual data wouldn't mean anything to you.
In any case, thanks for hanging in until here!
Edit 29 October 2016
I have found a solution as basis for mine. Using the following code I can add my whole requirements table to a dictionary, and access the requirement.
Using a class clsRangeToDictionary (based on Tim Williams clsMatrix)
Option Explicit
Private m_array As Variant
Private dictRows As Object
Private dictColumns As Object
Public Sub Init(vArray As Variant)
Dim i As Long
Set dictRows = CreateObject("Scripting.Dictionary")
Set dictColumns = CreateObject("Scripting.Dictionary")
'add the row keys and positions. Skip the first row as it contains the column key
For i = LBound(vArray, 1) + 1 To UBound(vArray, 1)
dictRows.Add vArray(i, 1), i
Next i
'add the column keys and positions, skipping the first column
For i = LBound(vArray, 2) + 1 To UBound(vArray, 2)
dictColumns.Add vArray(1, i), i
Next i
' store the array for future use
m_array = vArray
End Sub
Public Function GetValue(rowKey, colKey) As Variant
If dictRows.Exists(rowKey) And dictColumns.Exists(colKey) Then
GetValue = m_array(dictRows(rowKey), dictColumns(colKey))
Else
Err.Raise 1000, "clsRangeToDictionary:GetValue", "The requested row key " & CStr(rowKey) & " or column Key " & CStr(colKey) & " does not exist"
End If
End Function
' return a zero-based array of RowKeys
Public Function RowKeys() As Variant
RowKeys = dictRows.Keys
End Function
' return a zero-based array of ColumnKeys
Public Function ColumnKeys() As Variant
ColumnKeys = dictColumns.Keys
End Function
I can now read the whole RequirementSet table into a dictionary and write a helper to obtain the particular requirement roughly so:
myDictionaryObject.GetValue(table1's RequirementSet, "MinHorsePower")
If someone could help me figure out how to put this into an answer giving the credit due to Tim Williams that'd be great.

Microsoft Report Builder - Row totals percent of column total

I'd really appreciate some help with Report Builder. As seen below, I have a report that shows the number of items. In my SQL query I have used a CASE statement to tag some of the items with a y or a n.
What I want to do is add a calculated cell that sums all the values of the items tagged with y and divide by the total and * 100 to find the percent of the rows tagged y of the total amount.
Answer looking for is -
Apple | Y | 100
Pear | Y | 200
Orange| N | 500
Total | 800
Percent of Ys = 37.5% (100+200/800*100)
I'm new to report builder so please let me know if this doesn't make sense.
Many thanks.
You could add two more columns to your query, using similar logic as your CASE statement for the Y/N column. The first column is populated with the value only when the condition for "Y" is true, otherwise it is zero. The second column is populated with the value only when the condition for "N" is true, otherwise it is zero. This would give you a result set similar to this:
All Y N
Apple | Y | 100 | 100 | 0
Pear | Y | 200 | 200 | 0
Orange| N | 500 | 0 | 500
Total | 800 | 300 | 500
Then your calculation is something like this:
Percent of Ys = (Sum(Y) / Sum(All)) * 100
i.e.
Percent of Ys = (300 / 800) * 100 = 37.5%

Qlikview - Scatter chart dot colors dimension setup not working

I have some data that I want to display in scatter chart. I have the following two dimensions:
Dimension1: This is each record in the table - say unique id for each row. So the number of dots should be equal to number of records.
Dimension2: This is a combination of 2 columns. tp and vc. Colors of each dot is based on these 2 columns.
tp vc
1 a 1
2 b 2
3 c 1
So there will be dots of 3 colors based on the above tp and vc combinations. Then there are 3 expressions representing X and Y and Size of dot. I am not sure how to configure the dimensions to achieve the goal.
Thanks
You will need a calculated dimmension which is the concatanation expression defined as =tp & vc in your case.
Then this will be your single dimmension. Then your x,y,size expressions make up the remaining requirements for this chart.
This will give you three colors, one for each unique record combination and they will be labled a1 and b2 and c1.
id tp vc x y size
1 | a | 1 | 3 | 5 | 7
2 | b | 2 | 1 | 2 | 10
3 | c | 1 | 9 | 5 | 5