vba arrays from google apps script - vba

In google docs I saw someone have a script that wrote a function that was like:
function createChargeList () {
return [['No.', 'Name', 'Cooldown', 'Power', 'Energy Loss', 'Type', 'Damage Window Start'],
[1.0, 'AerialAce', 240.0, 55.0, 33.0, 3.0, 190.0], [2.0, 'AirCutter', 270.0, 60.0, 50.0, 3.0, 180.0], [3.0, 'AncientPower', 350.0, 70.0, 33.0, 6.0, 285.0], [4.0, 'AquaJet', 260.0, 45.0, 33.0, 11.0, 170.0], [5.0, 'AquaTail', 190.0, 50.0, 33.0, 11.0, 120.0]
}
I am trying to teach myself coding in VBA and I'm having trouble understanding how something like the above code would translate into a vba script for Excel.

Google Apps script is based on javascript. VBA is based on Visual Basic. Hardcoding it is a possibility but you end up with some funky array and you're then limited to VBA's clunky array methods (or lack thereof).
Function createChargeList()
Dim var(5)
var(0) = [{"No.", "Name", "Cooldown", "Power", "Energy Loss", "Type", "Damage Window Start"}]
var(1) = [{1.0, "AerialAce", 240.0, 55.0, 33.0, 3.0, 190.0}]
var(2) = [{2.0, "AirCutter", 270.0, 60.0, 50.0, 3.0, 180.0}]
var(3) = [{3.0, "AncientPower", 350.0, 70.0, 33.0, 6.0, 285.0}]
var(4) = [{4.0, "AquaJet", 260.0, 45.0, 33.0, 11.0, 170.0}]
var(5) = [{5.0, "AquaTail", 190.0, 50.0, 33.0, 11.0, 120.0}]
createChargeList = var
End Function
Or you could "hardcode" it to a Dictionary or Collection or an ArrayList (of ArrayLists). There would be some benefits to using a Dict or ArrayList because they have other methods you can use. Dictionary example, this is not how I would structure it but it shows how you could do it:
Function createChargeDict()
Dim d As Object
Set d = CreateObject("Scripting.Dictionary")
d("No.") = Array(1#, 2#, 3#, 4#, 5#)
d("Name") = Split("AerialAce,AirCutter,AncientPower,AquaJet,AquaTail", ",")
'etc...
Set createChargeDict = d
End Function
But, it's going to be difficult to "hardcode" it if it's more than a very small amount of data e.g., per Tim's answer (or likewise through what I did, above), and of course if anything changes you need to go and edit the code.
Another option would be to save the data as a CSV or otherwise delimited file and read it in to an array/dict/list/collection at runtime.
What I would do (Opinion)
Another more common approach -- as you inquired about in your comments -- would be to simply create a table on a worksheet (optionally: hide the worksheet so user's can't mess it up), assign a Name to represent the table's range, and then you can just refer to the range's .Value property which will return an array.
The added benefits of this is that you can use all of the Worksheet methods and functions against the range of cells (or as a Table/ListObject on a sheet), and then you can do a lot more with it in terms of filtering, slicing, finding elements, matching/indexing, etc., and it's also easier to update.

VBA doesn't have the same type of array literal construct you find in javascript
This is what you have if you want a single-line assignment:
Dim a
a = Array("a", "b", "c") '>>0-based 1-d array
a = ["a", "b"] '>> *1*-based 1-d array
a = [{"a", "b";"c","d"}] '>> *1*-based 2-d array
Note that last one is not the same as your example (ie. not an array of arrays)

Related

What is the best way to initialise a NumPy masked array with an existing mask?

I was expecting to just say something like
ma.zeros(my_shape, mask=my_mask, hard_mask=True)
(where the mask is the correct shape) but ma.zeros (or ma.ones or ma.empty) rather surprisingly doesn't recognise the mask argument. The simplest I've come up with is
ma.array(np.zeros(my_shape), mask=my_mask, hard_mask=True)
which seems to involve unnecessary copying of lots of zeros. Is there a better way?
Make a masked array:
In [162]: x = np.arange(5); mask=np.array([1,0,0,1,0],bool)
In [163]: M = np.ma.MaskedArray(x,mask)
In [164]: M
Out[164]:
masked_array(data=[--, 1, 2, --, 4],
mask=[ True, False, False, True, False],
fill_value=999999)
Modify x, and see the result in M:
In [165]: x[-1] = 10
In [166]: M
Out[166]:
masked_array(data=[--, 1, 2, --, 10],
mask=[ True, False, False, True, False],
fill_value=999999)
In [167]: M.data
Out[167]: array([ 0, 1, 2, 3, 10])
In [169]: M.data.base
Out[169]: array([ 0, 1, 2, 3, 10])
The M.data is a view of the array used in creating it. No unnecessary copies.
I haven't used functions like np.ma.zeros, but
In [177]: np.ma.zeros
Out[177]: <numpy.ma.core._convert2ma at 0x1d84a052af0>
_convert2ma is a Python class, that takes a funcname and returns new callable. It does not add mask-specific parameters. Study that yourself if necessary.
np.ma.MaskedArray, the function that actually subclasses ndarray takes a copy parameter
copy : bool, optional
Whether to copy the input data (True), or to use a reference instead.
Default is False.
and the first line of its __new__ is
_data = np.array(data, dtype=dtype, copy=copy,
order=order, subok=True, ndmin=ndmin)
I haven't quite sorted out whether M._data is just a reference to the source data, or a view. In either case, it isn't a copy, unless you say so.
I haven't worked a lot with masked arrays, but my impression is that, while they can be convenient, they shouldn't be used where you are concerned about performance. There's a lot of extra work required to maintain both the mask and the data. The extra time involved in copying the data array, if any, will be minor.

Math.Net Multiple Regression Is Wrong After The 4th Independent Variables

I am able to generate correct intercept and coefficients for a multiple regression (Math.Net) adding up to three independent variables. However, once a fourth independent variable is added the returned values are nowheres near close.
Using this code:
Dim i As Integer
Dim g(5)() As Double
g(0) = {1.0, 4.0, 3.2}
g(1) = {2.0, 5.0, 4.1}
g(2) = {3.0, 2.0, 2.5}
g(3) = {4.0, 3.0, 1.6}
g(4) = {4.0, 3.0, 1.6}
Dim d As Double() = {3.5, 5.6, 1.2, 15.2, 3.4, 4.2}
Dim p As Double() = MultipleRegression.QR(Of Double)(g, d, intercept:=True)
For i = 0 To UBound(p)
Debug.WriteLine(p(i))
Next
I get:
-2.45972222222223
1.13194444444445
3.11805555555555
-2.38888888888889
These are correct.
However, if I run the same code, but add a 4th independent variable as such:
Dim i As Integer
Dim g(5)() As Double
g(0) = {1.0, 4.0, 3.2, 5.3}
g(1) = {2.0, 5.0, 4.1, 2.4}
g(2) = {3.0, 2.0, 2.5, 3.6}
g(3) = {4.0, 3.0, 1.6, 2.1}
g(4) = {4.0, 3.0, 1.6, 2.1}
g(5) = {4.0, 3.0, 1.6, 2.1}
Dim d As Double() = {3.5, 5.6, 1.2, 15.2, 3.4, 4.2}
Dim p As Double() = MultipleRegression.QR(Of Double)(g, d, intercept:=True)
For i = 0 To UBound(p)
Debug.WriteLine(p(i))
Next
I get:
6.88018203734109E+17
-9.8476516475107E+16
-3.19472310972754E+16
-4.61094057074081E+16
-5.92835216238101E+16
These number are nowhere close to being correct.
If anyone can provide any direction as to what I am doing wrong, I would be very appreciative. TIA
I have not worked out the math details, but looking intuitively at your problem, of the six observations, three (g(3),g(4),g(5)) have identical independent variables, and the corresponding values of the dependent variable have the highest, median, and lowest values. So these observations don't have any real predictive value. In effect, you are trying to estimate 5 values based on three observations. That's not going to work well, and results in instability in the math.
I've changed your data very slightly, and it returns better values. (I use C#). The problem is with the data, not the program.
double[][] g = new double [6][];
g[0] = new double[4] { 1.0, 4.0, 3.2, 5.3};
g[1] = new double[4] { 2.0, 5.0, 4.1, 2.4};
g[2] = new double[4] { 3.0, 2.0, 2.5, 3.6};
g[3] = new double[4] { 4.0, 3.0, 1.6, 2.12};
g[4] = new double[4] { 4.0, 3.0, 1.6, 2.11};
g[5] = new double[4] { 4.0, 3.0, 1.6, 2.1};
double[] d = new double[6] { 3.5, 5.6, 1.2, 15.2, 3.4, 4.2 };
var p = MultipleRegression.QR(g, d, true);
for (int i = 0; i < p.Length; i++) Console.WriteLine(p[i].ToString());
This returns:
-6386.81388888898
913.902777777791
297.597222222225
428.444444444452
550.000000000007

Testing Numpy operations

Whenever I need to test a moderately complex numpy expression, say,
c = np.multiply.outer(a, b)
d = np.einsum('kjij->ijk', c)
I end up doings hacks such as, e.g., setting a and bthus
a = np.arange(9).reshape(3,3)
b = a / 10
so that I can then track what d contains.
This is ugly and not very convenient. Ideally, I would be able to do something like the following:
a = np.array(list("abcdefghi")).reshape(3,3)
b = np.array(list("ABCDEFGHI")).reshape(3,3)
c = np.add.outer(a, b)
d = np.einsum('kjij->ijk', c)
so that, e.g., d[0, 1, 2] could be seen to correspond to 'hB', which is much clearer than .7 (which is what the other assignment to a and b would give.) This cannot be done, because the ufunc add does not take characters.
In summary, once I start chaining a few transformations (an outer product, an einsum, broadcasting or slicing, etc.) I lose track and need to see for myself what my transformations are actually doing. That's when I need to run a few examples, and that's where my current method of doing so strikes me as suboptimal. Is there any standard, or better, way to do this?
In [454]: a = np.array(list("abcdefghi")).reshape(3,3)
...: b = np.array(list("ABCDEFGHI")).reshape(3,3)
np.add can't be used because add has not been defined for the string dtype:
In [455]: c = np.add.outer(a,b)
....
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('<U1')
But np.char has functions that apply Python string methods to ndarray elements (these aren't fast, just convenient):
Signature: np.char.add(x1, x2)
Docstring:
Return element-wise string concatenation for two arrays of str or unicode.
Using broadcasting I can perform your outer string concatenation:
In [457]: c = np.char.add(a[:,:,None,None], b[None,None,:,:])
In [458]: c.shape
Out[458]: (3, 3, 3, 3)
In [459]: c
Out[459]:
array([[[['aA', 'aB', 'aC'],
['aD', 'aE', 'aF'],
['aG', 'aH', 'aI']],
[['bA', 'bB', 'bC'],
['bD', 'bE', 'bF'],
['bG', 'bH', 'bI']],
....
[['iA', 'iB', 'iC'],
['iD', 'iE', 'iF'],
['iG', 'iH', 'iI']]]], dtype='<U2')
I was skeptical that einsum could handle this array, since normally einsum is used for np.dot like sum-of-products calculations. But with this indexing, it is just selecting a diagonal and rearranging axes, so it does work:
In [460]: np.einsum('kjij->ijk', c)
Out[460]:
array([[['aA', 'dA', 'gA'],
['bB', 'eB', 'hB'],
['cC', 'fC', 'iC']],
[['aD', 'dD', 'gD'],
['bE', 'eE', 'hE'],
['cF', 'fF', 'iF']],
[['aG', 'dG', 'gG'],
['bH', 'eH', 'hH'],
['cI', 'fI', 'iI']]], dtype='<U2')
The d from the numeric test case:
array([[[0. , 3. , 6. ],
[1.1, 4.1, 7.1],
[2.2, 5.2, 8.2]],
[[0.3, 3.3, 6.3],
[1.4, 4.4, 7.4],
[2.5, 5.5, 8.5]],
[[0.6, 3.6, 6.6],
[1.7, 4.7, 7.7],
[2.8, 5.8, 8.8]]])
The pattern with these numeric values is just as clear as with strings.
I like to use distinct array shapes where possible, because it makes tracking dimensions across changes easier:
In [496]: a3 = np.arange(1,13).reshape(4,3)
...: b3 = np.arange(1,7).reshape(2,3) / 10
In [497]: c3 = np.add.outer(a3,b3)
In [498]: d3 = np.einsum('kjij->ijk', c3)
In [499]: c3.shape
Out[499]: (4, 3, 2, 3)
In [500]: d3.shape
Out[500]: (2, 3, 4)
In [501]: d3
Out[501]:
array([[[ 1.1, 4.1, 7.1, 10.1],
[ 2.2, 5.2, 8.2, 11.2],
[ 3.3, 6.3, 9.3, 12.3]],
[[ 1.4, 4.4, 7.4, 10.4],
[ 2.5, 5.5, 8.5, 11.5],
[ 3.6, 6.6, 9.6, 12.6]]])
This, for example, would raise an error if I try ''kjik->ijk'.
With numeric values I can perform the multiply.outer with einsum:
In [502]: c4 = np.multiply.outer(a3,b3)
In [503]: np.allclose(c4,np.einsum('ij,kl',a3,b3))
Out[503]: True
In [504]: d4 = np.einsum('kjij->ijk', c4)
In [505]: np.allclose(d4,np.einsum('kj,ij->ijk',a3,b3))
Out[505]: True
In [506]: d4
Out[506]:
array([[[0.1, 0.4, 0.7, 1. ],
[0.4, 1. , 1.6, 2.2],
[0.9, 1.8, 2.7, 3.6]],
[[0.4, 1.6, 2.8, 4. ],
[1. , 2.5, 4. , 5.5],
[1.8, 3.6, 5.4, 7.2]]])
That 'kj,ij->ijk' gives me a better of idea of what is happening than the d display.
Another way to put it:
(4,3) + (2,3) => (2,3,4)

Unit testing both Excel formulas and VB with the same tool?

Are there any tools that allow for unit testing both Excel formulas and Visual Basic forms within Excel? I'm finding methods that will do one or the other, but not both. Rubberduck for example looks promising for testing VBA, but does not appear to allow testing of formulas within Excel spreadsheets.
The only way I have found to "Unit Test" functions within Excel is to create a sheet within you workbook for which it's purpose in life is to be the validations page. Define Various additional functions within this sheet which look for edge cases and check additions etc within your workbook.
It is quite helpful to keep a comments field as well as a boolean success field that can be aggregated to put a custom format message on your other input pages to cue the user to a failed "Unit Test".
This approach can work quite well in making your testing reusable as well as transparent to the end users of your workbooks.
It is possible to unit test both Excel formulas and VBA using FlyingKoala. FlyingKoala is an extension of xlwings.
xlwings offers a COM wrapper which provides the ability to execute VBA from Python (by getting Excel to run it). It's a great library/solution. The esteemed Felix from ZoomerAnalyitics has written a blogpost about unit testing VBA using xlwings with examples.
FlyingKoala uses a library (xlcalculator) to convert Excel formulas into Python which can then be unit tested in Python's unittest framework. So it's possible to evaluate the formula and check against a known goal value whether that is in Excel or pre-defined.
An example of unit testing formulas using FlyingKoala while Excel is running;
import unittest
import logging
import xlwings as xw
from flyingkoala import FlyingKoala
from pandas import DataFrame
from pandas import Series
from numpy import array
from numpy.testing import assert_array_equal
from pandas.testing import assert_series_equal
logging.basicConfig(level=logging.ERROR)
class Test_equation_1(unittest.TestCase):
def setUp(self):
self.workbook_name = r'growing_degrees_day.xlsm'
if len(xw.apps) == 0:
raise "We need an Excel workbook open for this unit test."
self.my_fk = FlyingKoala(self.workbook_name, load_koala=True)
self.my_fk.reload_koala('')
self.equation_name = xw.Range('Equation_1')
if self.equation_name not in self.my_fk.koala_models.keys():
model = None
wb = xw.books[self.workbook_name]
wb.activate()
for name in wb.names:
self.my_fk.load_model(self.equation_name)
if self.equation_name == name.name:
model = xw.Range(self.equation_name)
self.my_fk.generate_model_graph(model)
if model is None:
return 'Model "%s" has not been loaded into cache, if named range exists check spelling.' % self.equation_name
def test_Equation_1(self):
"""First type of test for Equation_1"""
xw.books[self.workbook_name].sheets['Growing Degree Day'].activate()
goal = xw.books[self.workbook_name].sheets['Growing Degree Day'].range(xw.Range('D2'), xw.Range('D6')).options(array).value
tmin = xw.books[self.workbook_name].sheets['Growing Degree Day'].range(xw.Range('B2'), xw.Range('B6')).options(array).value
tmax = xw.books[self.workbook_name].sheets['Growing Degree Day'].range(xw.Range('C2'), xw.Range('C6')).options(array).value
inputs_for_DegreeDay = DataFrame({'T_min': tmin, 'T_max': tmax})
result = self.my_fk.evaluate_koala_model('Equation_1', inputs_for_DegreeDay).to_numpy()
assert_array_equal(goal, result)
def test_Equation_1_predefined_goal(self):
"""First type of test for Equation_1"""
goal = Series([0.0, 0.0, 0.0, 0.0, 0.0, 5, 10, 15, 20])
tmin = [-20, -15, -10, -5, 0, 5, 10, 15, 20]
tmax = [0, 5, 10, 15, 20, 25, 30, 35, 40]
inputs_for_DegreeDay = DataFrame({'T_min': tmin, 'T_max': tmax})
result = self.my_fk.evaluate_koala_model('Equation_1', inputs_for_DegreeDay)
assert_series_equal(goal, result)
def test_VBA_Equation_1(self):
"""
The function definition being called;
Function VBA_Equation_1(T_min As Double, T_max As Double) As Double
VBA_Equation_1 = Application.WorksheetFunction.Max(((T_max + T_min) / 2) - 10, 0)
End Function
"""
goal = 20
vba_equation_1 = xw.books[self.workbook_name].macro('VBA_Equation_1')
result = vba_equation_1(20.0, 40.0)
self.assertEqual(goal, result)

Is it possible to alias multiple names in a numpy record array?

Suppose I construct a numpy record array like this
num_rows = <whatever>
data = np.zeros(
(num_rows,),
dtype={
'names':['apple', 'banana'],
'formats': ['f8', 'f8']
}
Now I can access data either by name or index.
For example, the following are the same:
data['banana'][0]
and
data[0]['banana']
etc.
Is there a way to alias different names?
For example, can I set things up so that there's another name manzana such that
data['manzana']
is the same thing as
data['apple']
?
['offsets' and 'titles' are 2 mechanisms for giving different names to fields]
There is an offset parameter that can function in this way. Usually it is used to split another field into several pieces (e.g. an int into bytes). But it also works with identical fields. In effect it defines several fields with overlapping data.
In [743]: dt=np.dtype({'names':['apple','manzana','banana','guineo'],
'formats':['f8','f8','f8','f8'],
'offsets':[0,0,8,8]})
In [745]: np.zeros((3,),dtype=dt)
Out[745]:
array([(0.0, 0.0, 0.0, 0.0), (0.0, 0.0, 0.0, 0.0), (0.0, 0.0, 0.0, 0.0)],
dtype={'names':['apple','manzana','banana','guineo'],
'formats':['<f8','<f8','<f8','<f8'],
'offsets':[0,0,8,8], 'itemsize':16})
In [746]: A=np.zeros((3,),dtype=dt)
In [747]: A['banana']=[1,2,3]
In [748]: A
Out[748]:
array([(0.0, 0.0, 1.0, 1.0),
(0.0, 0.0, 2.0, 2.0),
(0.0, 0.0, 3.0, 3.0)],
dtype={'names':['apple','manzana','banana','guineo'], 'formats':['<f8','<f8','<f8','<f8'], 'offsets':[0,0,8,8], 'itemsize':16})
In [749]: A['guineo']
Out[749]: array([ 1., 2., 3.])
In [750]: A['manzana']=[.1,.2,.3]
In [751]: A['apple']
Out[751]: array([ 0.1, 0.2, 0.3])
In [752]: A
Out[752]:
array([(0.1, 0.1, 1.0, 1.0),
(0.2, 0.2, 2.0, 2.0),
(0.3, 0.3, 3.0, 3.0)],
dtype={'names':['apple','manzana','banana','guineo'], 'formats':['<f8','<f8','<f8','<f8'], 'offsets':[0,0,8,8], 'itemsize':16})
There's another dtype parameter, titles that is better suited to your needs, and easier to understand:
http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
In [792]: dt1=np.dtype({'names':['apple','banana'],'formats':['f8','f8'], 'titles':['manzana', 'guineo'], 'offsets':[0,8]})
In [793]: A1=np.zeros((3,),dtype=dt1)
In [794]: A1
Out[794]:
array([(0.0, 0.0), (0.0, 0.0), (0.0, 0.0)],
dtype=[(('manzana', 'apple'), '<f8'), (('guineo', 'banana'), '<f8')])
In [795]: A1['apple']=[1,2,3]
In [796]: A1['guineo']=[.1,.2,.3]
In [797]: A1
Out[797]:
array([(1.0, 0.1), (2.0, 0.2), (3.0, 0.3)],
dtype=[(('manzana', 'apple'), '<f8'), (('guineo', 'banana'), '<f8')])
In [798]: A1['banana']
Out[798]: array([ 0.1, 0.2, 0.3])
I have put the answer of #hpaulj into a simple method and share it here in case someone wants to use it.
def add_alias(arr, original, alias):
"""
Adds an alias to the field with the name original to the array arr.
Only one alias per field is allowed.
"""
if arr.dtype.names is None:
raise TypeError("arr must be a structured array. Use add_name instead.")
descr = arr.dtype.descr
try:
index = arr.dtype.names.index(original)
except ValueError:
raise ValueError("arr does not have a field named '" + str(original)
+ "'")
if type(descr[index][0]) is tuple:
raise ValueError("The field " + str(original) +
" already has an alias.")
descr[index] = ((alias, descr[index][0]), descr[index][1])
arr.dtype = np.dtype(descr)
return arr
def add_name(arr, name):
"""
Adds a name to the data of an unstructured array.
"""
if arr.dtype.names is not None:
raise TypeError("arr must not be a structured array. "
+ "Use add_alias instead.")
arr.dtype = np.dtype([(name, arr.dtype.name)])
return arr