In OpenMDAO's ExecComp, is shape_by_conn compatible with has_diag_partials? - derivative

I have an om.ExecComp that performs a simple operation:
"d_sq = x**2 + y**2"
where x, y, and d_sq are always 1D np.arrays. I'd like to be able to use this with large arrays without allocating a large dense matrix. I'd also like the length of the array to be configured based on the shape of the connections.
However, if I specify x={"shape_by_conn": True} rather than x={"shape":100000}, even if I also have has_diag_partials=True, it attempts to allocate a 100000^2 array. Is there a way to make these two options compatible?

First, I'll note that you're using ExecComp a bit outside its design intended purpose. Thats not to say that you're something totally invalid, but generally speaking ExecComp was designed for small, cheap calculations. Passing it giant arrays is not something we test for.
That being said, I think what you want will work. When you use shape_by_conn in this you need to be sure to size both your inputs and outputs. I've provided an example, along with a manually defined component that does the same thing. Since your equations are pretty simple, this would be a little faster overall.
import numpy as np
import openmdao.api as om
class SparseCalc(om.ExplicitComponent):
def setup(self):
self.add_input('x', shape_by_conn=True)
self.add_input('y', shape_by_conn=True)
self.add_output('d_sq', shape_by_conn=True, copy_shape='x')
def setup_partials(self):
# when using shape_by_conn, you need to delcare partials
# in this secondary method
md = self.get_io_metadata(iotypes='input')
# everything should be the same shape, so just need this one
x_shape = md['x']['shape']
row_col = np.arange(x_shape[0])
self.declare_partials('d_sq', 'x', rows=row_col, cols=row_col)
self.declare_partials('d_sq', 'y', rows=row_col, cols=row_col)
def compute(self, inputs, outputs):
outputs['d_sq'] = inputs['x']**2 + inputs['y']**2
def compute_partials(self, inputs, J):
J['d_sq', 'x'] = 2*inputs['x']
J['d_sq', 'y'] = 2*inputs['y']
if __name__ == "__main__":
p = om.Problem()
# use IVC here, because you have to have something connected to
# in order to use shape_by_conn. Normally IVC is not needed
ivc = p.model.add_subsystem('ivc', om.IndepVarComp(), promotes=['*'])
ivc.add_output('x', 3*np.ones(10))
ivc.add_output('y', 2*np.ones(10))
# p.model.add_subsystem('sparse_calc', SparseCalc(), promotes=['*'])
p.model.add_subsystem('sparse_exec_calc',
om.ExecComp('d_sq = x**2 + y**2',
x={'shape_by_conn':True},
y={'shape_by_conn':True},
d_sq={'shape_by_conn':True,
'copy_shape':'x'},
has_diag_partials=True),
promotes=['*'])
p.setup(force_alloc_complex=True)
p.run_model()
If you still find this isn't working as expected, please feel free to submit a bug report with a test case that shows the problem clearly (i.e. will raise the error you're seeing). In this case, the provided manual component can serve as a workaround.

Related

It there a way to cache OpenMDAO component outputs to avoid duplicate executions?

I am writing a model consists of five subsystems. The first subsystem is generates data for other subsystems using the inputs it does not solve anything iteratively, therefore its outputs not changed during computation. I want it calls once the compute method just like the initialization. How can I write a model that calls once in run_model and calls every time just once in run_driver?
It's a little hard to be sure without more details, but you mention "iterative" so Im guessing you have a solver at the top level of your model and there is a component not involved in that solver loop, but that is getting called each time the solver iterates.
The solution to this is to make a sub-group in your model that has just the components that need to iterate. Put your only-run-once component at the top of the model, along with that group. Put the iterative solver on the sub-group.
An alternative solution is to add a bit of caching to your component, so it checks its inputs to see if they have changed. If they have, re-run. If they have not, just keep the old answer.
Here is an example that includes both features (note: the solver in this example does not converge because its a toy problem that doesn't have a valid physical solution. I just threw it together to illustrate the model structure and caching)
import openmdao.api as om
# from openmdao.utils.assert_utils import assert_check_totals
class StingyComp(om.ExplicitComponent):
def setup(self):
self.add_input('x1', val=2.)
self.add_input('x2', val=3.)
self.add_output('x')
self._input_hash = None
def compute(self, inputs, outputs):
x1 = inputs['x1'][0] # pull the scalar out so you can hash it
x2 = inputs['x2'][0]
print("running StingyComp")
current_input_hash = hash((x1, x2))
if self._input_hash != current_input_hash :
print(' ran compute')
outputs['x'] = 2*x1 + x2**2
self._input_hash = current_input_hash
else:
print(' skipped compute')
class NormalComp(om.ExplicitComponent):
def setup(self):
self.add_input('x1', val=2.)
self.add_input('x2', val=3.)
self.add_output('y')
def compute(self, inputs, outputs):
x1 = inputs['x1']
x2 = inputs['x2']
print("running normal Comp")
outputs['y'] = x1 + x2
p = om.Problem()
p.model.add_subsystem('run_once1', NormalComp(), promotes=['*'])
p.model.add_subsystem('run_once2', StingyComp(), promotes=['*'])
sub_group = p.model.add_subsystem('sub_group', om.Group(), promotes=['*']) # transparent group that could hold sub-solver
sub_group.add_subsystem('C1', om.ExecComp('f1 = f2**2 + 1.5 * x - y**2.5'), promotes=['*'])
sub_group.add_subsystem('C2', om.ExecComp('f2 = f1**2 + x**1.5 - 2.5*y'), promotes=['*'])
sub_group.nonlinear_solver = om.NewtonSolver(solve_subsystems=False)
sub_group.linear_solver = om.DirectSolver()
p.setup()
print('first run')
p.run_model()
print('second run, same inputs')
p.run_model()
p['x1'] = 10
p['x2'] = 27.5
print('third run, new inputs')
p.run_model()

Efficient solving of generalised eigenvalue problems in python

Given an eigenvalue problem Ax = λBx what is the more efficient way to solve it out of the two shown here:
import scipy as sp
import numpy as np
def geneivprob(A,B):
# Use scipy
lamda, eigvec = sp.linalg.eig(A, B)
return lamda, eigvec
def geneivprob2(A,B):
# Reduce the problem to a standard symmetric eigenvalue problem
Linv = np.linalg.inv(np.linalg.cholesky(B))
C = Linv # A # Linv.transpose()
#C = np.asmatrix((C + C.transpose())*0.5,np.float32)
lamda,V = np.linalg.eig(C)
return lamda, Linv.transpose() # V
I saw the second version in a codebase and was wondering if it was better than simply using scipy.
Well there is no obvious advantage in using the second approach, maybe for some class of matrices it will be better, I would suggest you to test with the problems you want to solve. Since you are transforming the eigenvectors, this will also transform how the errors affect the solution, and maybe that is the reason for using this second method, not efficiency, but numerical accuracy, or convergence.
Another thing is that the second method will only work for symmetric B.

What does the tensorflow.python.eager.tape do in the implementation of tf.contrib.eager.custom_gradient?

I am going through TensorFlow Eager Execution from here and find it difficult to understand the customizing gradients part.
#tfe.custom_gradient
def logexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1/(1 + e))
return tf.log(1 + e), grad
First, it is difficult to make sense what does dy do in the gradient function.
When I read the implementation of tf.contrib.eager.custom_gradient.
I can't really make sense the working mechanism behind tape. Following is the code I borrow from the implementation of tf.contrib.eager.custom_gradient. Can anybody explain what does tape do here?
from tensorflow.python.eager import tape
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import gen_array_ops
from tensorflow.python.util import nest
from tensorflow.python.framework import ops as tf_ops
def my_custom_gradient(f):
def decorated(*args, **kwargs):
for x in args:
print('args {0}'.format(x))
input_tensors = [tf_ops.convert_to_tensor(x) for x in args]
with tape.stop_recording():
result, grad_fn = f(*args, **kwargs)
flat_result = nest.flatten(result)
flat_result = [gen_array_ops.identity(x) for x in flat_result]
def actual_grad_fn(*outputs):
print(*outputs)
return nest.flatten(grad_fn(*outputs))
tape.record_operation(
f.__name__, # the name of f, in this case logexp
flat_result,
input_tensors,
actual_grad_fn) # backward_function
flat_result = list(flat_result)
return nest.pack_sequence_as(result, flat_result)
return decorated
Even though I found the implementation of tape from here. But I can't really get much out of it due the poor documentation.
I will answer each sub-question separately:
Re: what is dy: Primary use case for gradient functions is during back propagation. During back propagation, the gradient for each op, must take the gradient(s) of its output(s) and produce the gradient(s) of its input(s). This is effectively the consequence of chain rule from calculus. For simple ops, the final gradient is just multiplication of dy with the op's gradient (again from chain rule). In this case, you can see dy * (1 - 1/(1 + e)).
Re: custom_gradient is complicated: Yes, it is. The public API for gradient tapes is tfe.GradientTape, which should be much easier to understand and work with. You can find simple examples in its spec and its tests. A more complex "real world" example can be found here. If its basic workings are not clear, please ask specific questions. Also, we will soon publish a more detailed guide for working with gradients when executing eagerly.
The tape that is used to implement custom_gradient and GradientTape is a low level concept that wraps some C++ code. End users should not care about it (it is not exposed in the tfe namespace). It is used to build a "tape" of executed operations. Tape is similar to but simpler than regular TF graphs. It allows one to compute gradients between two tensors it recorded.

sklearn's `RandomizedSearchCV` not working with `np.random.RandomState`

I am trying to optimize a pipeline and wanted to try giving RandomizedSearchCV a np.random.RandomState object. I can't it to work but I can give it other distributions.
Is there a special syntax I can use to give RandomSearchCV a np.random.RandomState(0).uniform(0.1,1.0)?
from scipy import stats
import numpy as np
from sklearn.neighbors import KernelDensity
from sklearn.grid_search import RandomizedSearchCV
# Generate data
x = np.random.normal(5,1,size=int(1e3))
# Make model
model = KernelDensity()
# Gridsearch for best params
# This one works
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":stats.uniform(0.1, 1)}, n_iter=30, n_jobs=2)
search_params.fit(x[:, None])
# RandomizedSearchCV(cv=None, error_score='raise',
# estimator=KernelDensity(algorithm='auto', atol=0, bandwidth=1.0, breadth_first=True,
# kernel='gaussian', leaf_size=40, metric='euclidean',
# metric_params=None, rtol=0),
# fit_params={}, iid=True, n_iter=30, n_jobs=2,
# param_distributions={'bandwidth': <scipy.stats._distn_infrastructure.rv_frozen object at 0x106ab7da0>},
# pre_dispatch='2*n_jobs', random_state=None, refit=True,
# scoring=None, verbose=0)
# This one doesn't work :(
search_params = RandomizedSearchCV(model, param_distributions={"bandwidth":np.random.RandomState(0).uniform(0.1, 1)}, n_iter=30, n_jobs=2)
# TypeError: object of type 'float' has no len()
What you observe is expected, as the class-method uniform of an object of type np.random.RandomState() immediately draws a sample at the time of the call.
Compared to that, your usage of scipy's stats.uniform() creates a distribution yet to sample from. (Although i'm not sure if it's working as you expect in your case; be careful with the parameters).
If you want to incorporate something based on np.random.RandomState() you have to build your own class like mentioned in the docs:
This example uses the scipy.stats module, which contains many useful distributions for sampling parameters, such as expon, gamma, uniform or randint. In principle, any function can be passed that provides a rvs (random variate sample) method to sample a value. A call to the rvs function should provide independent random samples from possible parameter values on consecutive calls.

Derivative check with scalers

I have a problem that I want to scale the design variables. I have added the scaler, but I want to check the derivative to make sure it is doing what I want it to do. Is there a way to check the scaled derivative? I have tried to use check_total_derivatives() but the derivative is the exact same regardless of what value I put for scaler:
from openmdao.api import Component, Group, Problem, IndepVarComp, ExecComp
from openmdao.drivers.pyoptsparse_driver import pyOptSparseDriver
class Scaling(Component):
def __init__(self):
super(Scaling, self).__init__()
self.add_param('x', shape=1)
self.add_output('y', shape=1)
def solve_nonlinear(self, params, unknowns, resids):
unknowns['y'] = 1000. * params['x']**2 + 2
def linearize(self, params, unknowns, resids):
J = {}
J['y', 'x'] = 2000. * params['x']
return J
class ScalingGroup(Group):
def __init__(self,):
super(ScalingGroup, self).__init__()
self.add('x', IndepVarComp('x', 0.0), promotes=['*'])
self.add('g', Scaling(), promotes=['*'])
p = Problem()
p.root = ScalingGroup()
# p.driver = pyOptSparseDriver()
# p.driver.options['optimizer'] = 'SNOPT'
p.driver.add_desvar('x', lower=0.005, upper=100., scaler=1000)
p.driver.add_objective('y')
p.setup()
p['x'] = 3.
p.run()
total = p.check_total_derivatives()
# Derivative is the same regardless of what the scaler is.
The scalers and adders are consistent in their behavior, so the check derivatives routines give results in unscaled terms to be more intuitive.
If you really want to see what impact the scaler is having when the NLP sees the scaled value and you're using SNOPT, you can add SNOPT's derivative check capability:
p.driver.opt_settings['Verify level'] = 3
SNOPT_print.out will contain, with the scaler set to 1:
Column x(j) dx(j) Element no. Row Derivative Difference approxn
1 3.00000000E+00 2.19E-06 Objective 6.00000000E+03 6.00000219E+03 ok
Or if we change it to the x scaler to 1000:
Column x(j) dx(j) Element no. Row Derivative Difference approxn
1 3.00000000E+03 1.64E-03 Objective 6.00000000E+00 6.00000164E+00 ok
So in the units of the problem, which check_total_derivatives uses, the derivative doesn't change. But the scaled value as seen by the optimizer is changing.
Another way to see exactly what the optimizer is seeing from calc_gradient is to mimic the call to calc_gradient. This is not necessarily easy to figure out, but I thought I would paste it here for reference.
print p.calc_gradient(list(p.driver.get_desvars().keys()),
list(p.driver.get_objectives().keys()) + list(p.driver.get_constraints().keys()),
dv_scale=p.driver.dv_conversions,
cn_scale=p.driver.fn_conversions)