Excel/VBA/Conditional Formatting: Dictionary of Dictionaries - vba

I've got an Excel workbook that obtains data from an MS SQL database. One of the sheets is used to check the data against requirements and to highlight faults. In order to do that, I've got a requirements sheet where the requirement is in a named range; after updating the data I copy the conditional formatting of the table header to all data rows. That works pretty nicely so far. The problem comes when I have more than one set of requirements:
An (agreeably silly) example could be car racing, where requirements may exist for driver's license and min/max horsepower. When looking at the example, please imagine there are a few thousand rows and 71 columns presently...
+-----+--------+----------------+------------+---------+
| Car | Race | RequirementSet | Horsepower | License |
+-----+--------+----------------+------------+---------+
| 1 | Monaco | 2 | 200 | A |
+-----+--------+----------------+------------+---------+
| 2 | Monaco | 2 | 400 | B |
+-----+--------+----------------+------------+---------+
| 3 | Japan | 3 | 200 | C |
+-----+--------+----------------+------------+---------+
| 4 | Japan | 3 | 300 | A |
+-----+--------+----------------+------------+---------+
| 5 | Japan | 3 | 350 | B |
+-----+--------+----------------+------------+---------+
| 6 | Mexico | 1 | 200 | A |
+-----+--------+----------------+------------+---------+
The individual data now needs to be checked against the requirements set in another sheet:
+-------------+---------------+---------------+---------+
| Requirement | MinHorsepower | MaxHorsepower | License |
+-------------+---------------+---------------+---------+
| 1 | 200 | 250 | A |
+-------------+---------------+---------------+---------+
| 2 | 250 | 500 | B |
+-------------+---------------+---------------+---------+
| 3 | 250 | 400 | A |
+-------------+---------------+---------------+---------+
In order to relate back to my present situation, I am only looking at either the Monaco, Japan or Mexico Race, and there is only 1 record in the requirements sheet, where the value in e.g. Cell B2 is always the MinHorsepower and the value in C2 is always the MaxHorsepower. So these cells are a named range that I can access in my data sheet.
Now however I would like to obtain all races at once, and refer conditional formatting formulas to the particular requirement.
Focussing on "Horsepower" in Monaco (requirement set 2), I can now find out that the min Horsepower is 250 and the max is 500 - so I will colour that column for car 1 as red and for car 2 as green.
The formula is programatically copied from the header row (the first conditional format rule is if row(D1) = 1 then do nothing)
I can't decide what the best approach to the problem is. Ideally, the formula is readable, something like `AND(D2 >= MinHorsepower; D2 <= MaxHorsepower) - I cannot imagine it to be maintainable if I had to use Vlookup combined with Indirect and Match to match a column header in requirements for that particular requirement - especially when it comes to combining criteria like in the HP example with min and max above.
I am wondering if I should read the requirements table into a dictionary or something in VBA, and then use a function like
public function check(requirementId as int, requirement$)
which then in Excel I could use like =D2 >= check(c2, "MinHorsepower")
Playing around with this a little bit it appears to be pretty slow as opposed to the previous system where I could only have one requirement. It would be fantastic if you could help me out with a fresh approach to this problem. I'll update this question as I go along; I'm not sure if I managed to illustrate the example really well but the actual data wouldn't mean anything to you.
In any case, thanks for hanging in until here!
Edit 29 October 2016
I have found a solution as basis for mine. Using the following code I can add my whole requirements table to a dictionary, and access the requirement.
Using a class clsRangeToDictionary (based on Tim Williams clsMatrix)
Option Explicit
Private m_array As Variant
Private dictRows As Object
Private dictColumns As Object
Public Sub Init(vArray As Variant)
Dim i As Long
Set dictRows = CreateObject("Scripting.Dictionary")
Set dictColumns = CreateObject("Scripting.Dictionary")
'add the row keys and positions. Skip the first row as it contains the column key
For i = LBound(vArray, 1) + 1 To UBound(vArray, 1)
dictRows.Add vArray(i, 1), i
Next i
'add the column keys and positions, skipping the first column
For i = LBound(vArray, 2) + 1 To UBound(vArray, 2)
dictColumns.Add vArray(1, i), i
Next i
' store the array for future use
m_array = vArray
End Sub
Public Function GetValue(rowKey, colKey) As Variant
If dictRows.Exists(rowKey) And dictColumns.Exists(colKey) Then
GetValue = m_array(dictRows(rowKey), dictColumns(colKey))
Else
Err.Raise 1000, "clsRangeToDictionary:GetValue", "The requested row key " & CStr(rowKey) & " or column Key " & CStr(colKey) & " does not exist"
End If
End Function
' return a zero-based array of RowKeys
Public Function RowKeys() As Variant
RowKeys = dictRows.Keys
End Function
' return a zero-based array of ColumnKeys
Public Function ColumnKeys() As Variant
ColumnKeys = dictColumns.Keys
End Function
I can now read the whole RequirementSet table into a dictionary and write a helper to obtain the particular requirement roughly so:
myDictionaryObject.GetValue(table1's RequirementSet, "MinHorsePower")
If someone could help me figure out how to put this into an answer giving the credit due to Tim Williams that'd be great.

Related

Creating an Excel Lookup Table Sheet from a Comma Delimited and ID column

We exported a customer's table who was using AirTable to keep track of their client's information and locations in an attempt to import into a SQL database. Because of the way AirTable exports, the references to other tables in their "AirTable Base" are not via ID's, but exported in a single column as basically power labels for lack of a better explanation.
There's about 4,000 client rows in this table. Clients can have one or more locations. Excluding many of the other columns it looks like:
| Client_ID | Client_Name | ... | Locations
| 3456 | Acme Grocery | ... | "Memphis, TN","Orlando, FL","Philadelphia, PA"
| 3457 | Addition Financial | ... | "Miami, FL","Plano, TX","New York, NY"
| 3458 | Barros Pizza | ... | "Queen Creek, AZ"
We are trying to get the data ready for import into SQL, so we are attempting to find a formula/method which could take the Client_ID and then insert that into rows in a new data sheet made from the comma-delimited column. Using the above example the new data should look like the following:
| ClientInLocation_ID | Client_ID | Location |
| 10000 | 3456 | Memphis, TN |
| 10001 | 3456 | Orlando, FL |
| 10002 | 3456 | Philadelphia, PA |
| 10003 | 3457 | Miami, FL |
| 10004 | 3457 | Plano, TX |
| 10005 | 3457 | New York, NY |
| 10006 | 3458 | Queen Creek, AZ |
Doing so will allow us to then grab the unique locations, assign ID's to them and then replace the Location text with a Location_ID field.
I was thinking pivot tables, text to rows, etc. but perhaps I'm not experienced enough with them to pull this off. Also, any solutions can obviously exclude the ClientInLocation_ID auto increment as we could always have that autofilled once the other two fields are populated. Any help greatly appreciated.
There are many ways to tackle this problem. You can use PowerQuery (PQ) to do some of the lifting if you have an appropriate version of Excel. PQ is built into recently released Excel versions and is a free add-on for Excel 2013 and 2010 but is not available for anything older than Excel 2010. If you see a Power Query tab on the ribbon then you're good to go.
Use your data as the source for a new query and split the location column by delimiter "," To clarify, you are using three characters as the delimiter: the last quote of a location, the comma delimiting two locations, and the first quote of the second location. This puts one location in a cell with subsequent locations in columns to the right.
Every cell in the first column well have a quote in front of the text and the cell holding the final location for that row will have a quote at the end of the text. This is easily cleared in PQ but we're done here so it's probably faster to click Save & Load to close the editor and use Ctrl+H in Excel to clear them.
Your data will automatically be converted into a table that is connected to your source data. That means that refreshing the table does two things: it wipes any edits you've made and it updates the table with any changes in your source data. So either delete the query (if this is a one and done project) or copy the table to a new sheet (if you want to rapidly rebuild with new source data)
From there, I'd turn to VBA and use three nested For loops. The outer loop iterates every row in your data from the bottom up (Step -1). The middle loop iterates the columns to add new rows. The inner loop populates the rows.
This is quick, dirty, makes several assumptions and is in no way tested because it was written on my phone:
Option Explicit
Sub TransformTable ()
Dim ws As Worksheet
Dim myTable As ListObject
Dim rng As Range
Dim j As Long
Dim k As Long
Dim l as Long
Set ws = ActiveSheet
Set myTable = ws.ListObjects(1)
Application.ScreenUpdating = False
For j = myTable.ListRows.Count to 2 Step -1
For k = 1 to Application.WorksheetFunction.CountA(ws.Range(ws.Cells(j,1),ws.Cells(j,myTable.ListColumns.Count) - 3
Set rng = ws.Cells(j,1)
myTable.ListRows.Add j+k
For l = 0 to 1
rng.Offset(k,l) = rng.Offset(0,l)
Next l
rng.Offset(k,3) = rng.Offset(0,3+k)
rng.Offset(0,3+k).Cells.Clear
Next k
Next l
Application.ScreenUpdating = True
End Sub

How to eval a string containing column names?

As I cannot attach a conditional formatting on a Table, I need an abstract function to chech if a set of records or all records have errors inside, and show these errors into forms and/or reports.
Because, to achieve this goal in the 'standard' mode, I have to define the rule [○for a field of a table every time I use that field in a control or report, and this means the need to repeate the same things an annoying lot of times, not to tell about introducing errors and resulting in a maintenance nightmare.
So, my idea is to define all the check for all the tables and their rows in an CheckError-table, like the following fragment related to the table 'Persone':
TableName | FieldName | TestNumber | TestCode | TestMessage | ErrorType[/B][/COLOR]
Persone | CAP | 4 | len([CAP]) = 0 or isnull([cap]) | CAP mancante | warning
Persone | Codice Fiscale | 1 | len([Codice Fiscale]) < 16 | Codice fiscale nullo o mancante | error
Persone | Data di nascita | 2 | (now() - [Data di nascita]) < 18 * 365 | Minorenne | info
Persone | mail | 5 | len([mail)] = 0 or isnull([mail] | email mancante | warning
Persone | mail | 6 | (len([mail)] = 0 or isnull([mail]) | richiesto l'invio dei referti via e- mail, | error
| | | and [modalità ritiro referti] = ""e-mail"" | ma l'indirizzo e-mail è mancante |
Persone | Via | 3 | len([Via]) = 0 or isnull([Via]) | Indirizzo mancante | warning
Now, in each form or report which use the table Persona, I want to set an 'onload' property to a function
' to validate all fields in all rows and set the appropriate bg and fg color
Private Sub Form_Open(Cancel As Integer)
Call validazione.validazione(Form, "Persone", 0)
End Sub
' to validate all fields in the row identified by ID and set the appropriate bg and fg color
Private Sub Codice_Fiscale_LostFocus()
Call validazione.validazione(Form, "Persone", ID)
End Sub
So, the function validazione, at a certain point, as exactly one row for the table Persone, and the set of expressions described in the column [TestCode] above.
Now, I need to logically evaluate the TestString against the table row, to obtain a true or a false.
If true, I'll set the fg and bg color of the field as normal
if false, I'll set the the fg and bg color as per error, info or warning, as defined by the column [ErrorType] above.
All the above is easy, ready, and running, except for the red statement above:
How can I evaluate the teststring against the table row, to obtain a result?
Thank you
Paolo

Dynamically Resizing Table to last row?

I have a table that I would like to resize dynamically in VBA.
My current code is this:
Sub resizedata()
Dim ws As Worksheet
Dim ob As ListObject
Dim Lrow1 As Long
Lrow1 = Sheets("Sheet4").Cells(Rows.Count, "J").End(xlUp).Row
Set ws = ActiveWorkbook.Worksheets("Sheet4")
Set ob = ws.ListObjects("Table28")
ob.Resize ob.Range.Resize(Lrow1)
End Sub
I would like to add one condition onto this though...
The table should resize to before the first 0 in column J.
For instance:
+-------+--------+-------+
|Date(I)|Hours(J)| Sal(K)|
+-------+--------+-------+
| Aug | 150000 | 12356 |
| Sep | 82547 | 8755 |
| Oct | 92857 | 98765 |
| Nov | 10057 | 45321 |
| Dec | 0 | 0 |
| Jan | 0 | 0 |
+-------+--------+-------+
The above table's last row should be the November row because December is the first 0 value in column J.
Can anyone assist in revising my existing code?
Something like:
With Sheets("Sheet4")
Lrow1 = .Cells(.Rows.Count, "J").End(xlUp).Row
Do While .Cells(Lrow1, "J").Value=0
Lrow1 = Lrow1 - 1
Loop
End With
This should help you to automatically resize the table.
With Worksheets("sheet-name").ListObjects("table-name")
.Resize .Range(1, 1).CurrentRegion
End With
Is there a particular reason you want to do this in VBA? You can simply create a defined name and use the following to create a self-adjusting range:
OFFSET(Sheet!$A$1,0,0,COUNTA(Sheet!$A:$A),COUNTA(Sheet!$1:$1))
OFFSET(reference, rows, cols, [height], [width])
The top line shows typical usage, 2nd line is the official syntax.
Go into Name Manager on the Formulas tab, click new, give it a name and paste the code into Refers to:.
One caveat, it looks as though your data has 0's where there are no values. If that is truly the case, you'll have to do a different test to determine height.
One benefit of this method is that it is always calculated. If the data changes then the size definition of your named range adjusts as needed.
I hope this is helpful, I use this a lot...

FormulaArray not averaging out all the specified entries

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?
The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.
To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

Dynamically select a column from a generic list

I have a table that is 200 columns wide and need to return the data of a specific row and column but I won't know the column until runtime. I can easily get the row I want into either a list, an individual strongly typed object, or an Array through LINQ but I can't for the life of me figure out how to find the column I need.
So For instance (on a smaller scale) my table looks like this
GrowerKey | day1 | day2 | day3 | day4 |
-----------------------------------------
3 | 1 | 3 | 2 | 2 |
4 | 6 | 1 | 9 | 1 |
5 | 8 | 8 | 2 | 4 |
and I can get the row I want with something simple like this
Dim CleanRecord As List(Of Grower_Clean_Schedule) = (From key In eng.Grower_Clean_Schedules
Where key.Grower_Key = Grower_Key).ToList
how do I then return only the value of a specific column of that row (like say the value stored in "day2") When I won't know which column until runtime?
Something like this (starting with CleanRecord which you defined in your question):
dim matchingRow = CleanRecord.First()
dim props = matchingRow.GetType().GetProperties( _
BindingFlags.Instance or BindingFlags.Public))
dim myReturnVal = (from prop in props _
where prop.Name = "day2" _
select prop.GetValue(matchingRow, Nothing).FirstOrDefault()
return myReturnVal