I am trying to generate a sample of 100 scenarios (X, Y) where both X and Y are normally distributed X=N(50,5^2), Y=N(30,2^2) and X and Y are correlated Cov(X,Y)=0.4.
I have been able to generate 100 scenarios with the Cholesky decomposition:
# We do a Cholesky decomposition to generate correlated scenarios
nScenarios = 10
Σ = [25 0.4; 0.4 4]
μ = [50, 30]
L = cholesky(Σ)
v = [rand(Normal(0, 1), nScenarios), rand(Normal(0, 1), nScenarios)]
X = reshape(zeros(nScenarios),1,nScenarios)
Y = reshape(zeros(nScenarios),1,nScenarios)
for i = 1:nScenarios
X[1, i] = sum(L.U[1, j] *v[j][i] for j = 1:nBreadTypes) + μ[1]
Y[1, i] = sum(L.U[2, j] *v[j][i] for j = 1:nBreadTypes) + μ[2]
end
However I need the probability of each scenario, i.e P(X=k and Y=p). My question would be, how can we get a sample of a certain distribution with the probability of each scenario?
Following the BatWannaBe explanation, normally I would do it like this:
julia> using Distributions
julia> d = MvNormal([50.0, 30.0], [25.0 0.4; 0.4 4.0])
FullNormal(
dim: 2
μ: [50.0, 30.0]
Σ: [25.0 0.4; 0.4 4.0]
)
julia> point = rand(d)
2-element Vector{Float64}:
52.807189619051485
32.693811008760676
julia> pdf(d, point)
0.0056519503173830515
Related
I'm setting up a linear programming optimization model using CPLEX and am wondering if it's possible to accomplish a modification of the cost function dependent upon which binary decision variables are 'active' in an arbitrary solution. This is mostly a question about how to formulate the LP model (if it's even possible), but responses in the context of CPLEX are welcome or even preferred.
Say I have an LP problem in canonical form:
minimize cTx
s.t. Ax <= b
With cost function:
c = [c_1, c_2,...,c_100]
All variables are binary. I have this basic setup modeled and running effectively in CPLEX.
Now say I have a subset of variables:
efficiency_set = [x_1, x_2,...,x_5]
With the condition:
if any x_n in efficiency_set == 1
then c_n for all other x_n in the set = 0.9 * c_n
Essentially there is a dependency where if any x_n in the efficiency set is 'active', it becomes 10% less expensive for other variables in the set to appear in the solution.
I thought that CPLEX indicator constraints were what I was looking for, but after reading through documentation, I don't think I can enforce an on-the-fly change to cost function with them (I could be wrong). So I feel like it needs to be done through formulation of the LP, but I can't reason how to accomplish it. Any ideas?. Thanks.
In CPLEX you have many APIs, let me answer you with the easiest one OPL.
Your canonical form can be written
int n=3;
int m=4;
range N=1..n;
range M=1..m;
float A[N][M]=[[1,4,9,6],[8,5,0,8],[2,9,0,2]];
float B[M]=[3,1,3,0];
float C[N]=[1,1,1];
dvar boolean x[N];
minimize sum(i in N) C[i]*x[i];
subject to
{
forall(j in M) sum(i in N) A[i,j]*x[i]>=B[j];
}
and then you can you write logical constraints:
int n=3;
int m=4;
range N=1..n;
range M=1..m;
float A[N][M]=[[1,4,9,6],[8,5,0,8],[2,9,0,2]];
float B[M]=[3,1,3,0];
float C[N]=[1,1,1];
{int} efficiencySet={1,2};
dvar boolean activeEfficiencySet;
dvar boolean x[N];
minimize sum(i in N) C[i]*x[i]*(1-0.1*activeEfficiencySet*(i not in efficiencySet));
subject to
{
forall(j in M) sum(i in N) A[i,j]*x[i]>=B[j];
activeEfficiencySet==(1<=sum(i in efficiencySet) x[i]);
}
Using Alex's data, I have written the program in docplex (cplex python API)
from docplex.mp.model import Model
n = 3
m = 4
A = {}
A[0, 0] = 1
A[0, 1] = 4
A[0, 2] = 9
A[0, 3] = 6
A[1, 0] = 8
A[1, 1] = 5
A[1, 2] = 0
A[1, 3] = 8
A[2, 0] = 2
A[2, 1] = 9
A[2, 2] = 0
A[2, 3] = 2
B = {}
B[0] = 3
B[1] = 1
B[2] = 3
B[3] = 0
C = {}
C[0] = 1
C[1] = 1
C[2] = 1
efficiencySet = [0, 1]
mdl = Model(name="")
activeEfficiencySet = mdl.binary_var()
x = mdl.binary_var_dict(range(n), name="x")
# constraint 1:
for j in range(m):
mdl.add_constraint(mdl.sum(A[i, j] * x[i] for i in range(n)) >= B[j])
# constraint 2:
mdl.add(activeEfficiencySet == (mdl.sum(x) >= 1))
# objective function:
# expr = mdl.linear_expr()
lst = []
for i in range(n):
if i not in efficiencySet:
lst.append((C[i] * x[i] * (1 - 0.1 * activeEfficiencySet)))
else:
lst.append(C[i] * x[i])
mdl.minimize(mdl.sum(lst))
mdl.solve()
for i in range(n):
print(str(x[i]) + " : " + str(x[i].solution_value))
activeEfficiencySet.solution_value
Using Z3Py, once a model has been checked for an optimization problem, is there a way to convert ArithRef expressions into values?
Such as
y = If(x > 5, 0, 0.5 * x)
Once values have been found for x, can I get the evaluated value for y, without having to calculate again based on the given values for x?
Many thanks.
You need to evaluate, but it can be done by the model for you automatically:
from z3 import *
x = Real('x')
y = If(x > 5, 0, 0.5 * x)
s = Solver()
r = s.check()
if r == sat:
m = s.model();
print("x =", m.eval(x, model_completion=True))
print("y =", m.eval(y, model_completion=True))
else:
print("Solver said:", r)
This prints:
x = 0
y = 0
Note that we used the parameter model_completion=True since there are no constraints to force x (and consequently y) to any value in this model. If you have sufficient constraints added, you wouldn't need that parameter. (Of course, having it does not hurt.)
I'm trying to implement RGB to HSV conversion from opencv in pure numpy using formula from here:
def rgb2hsv_opencv(img_rgb):
img_hsv = cv2.cvtColor(img_rgb, cv2.COLOR_RGB2HSV)
return img_hsv
def rgb2hsv_np(img_rgb):
assert img_rgb.dtype == np.float32
height, width, c = img_rgb.shape
r, g, b = img_rgb[:,:,0], img_rgb[:,:,1], img_rgb[:,:,2]
t = np.min(img_rgb, axis=-1)
v = np.max(img_rgb, axis=-1)
s = (v - t) / (v + 1e-6)
s[v==0] = 0
# v==r
hr = 60 * (g - b) / (v - t + 1e-6)
# v==g
hg = 120 + 60 * (b - r) / (v - t + 1e-6)
# v==b
hb = 240 + 60 * (r - g) / (v - t + 1e-6)
h = np.zeros((height, width), np.float32)
h = h.flatten()
hr = hr.flatten()
hg = hg.flatten()
hb = hb.flatten()
h[(v==r).flatten()] = hr[(v==r).flatten()]
h[(v==g).flatten()] = hg[(v==g).flatten()]
h[(v==b).flatten()] = hb[(v==b).flatten()]
h[h<0] += 360
h = h.reshape((height, width))
img_hsv = np.stack([h, s, v], axis=-1)
return img_hsv
img_bgr = cv2.imread('00000.png')
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img_rgb = img_rgb / 255.0
img_rgb = img_rgb.astype(np.float32)
img_hsv1 = rgb2hsv_np(img_rgb)
img_hsv2 = rgb2hsv_opencv(img_rgb)
print('max diff:', np.max(np.fabs(img_hsv1 - img_hsv2)))
print('min diff:', np.min(np.fabs(img_hsv1 - img_hsv2)))
print('mean diff:', np.mean(np.fabs(img_hsv1 - img_hsv2)))
But I get big diff:
max diff: 240.0
min diff: 0.0
mean diff: 0.18085355
Do I missing something?
Also maybe it's possible to write numpy code more efficient, for example without flatten?
Also I have hard time finding original C++ code for cvtColor function, as I understand it should be actually function cvCvtColor from C code, but I can't find actual source code with formula.
From the fact that the max difference is exactly 240, I'm pretty sure that what's happening is in the case when both or either of v==r, v==g are simultaneously true alongside v==b, which gets executed last.
If you change the order from:
h[(v==r).flatten()] = hr[(v==r).flatten()]
h[(v==g).flatten()] = hg[(v==g).flatten()]
h[(v==b).flatten()] = hb[(v==b).flatten()]
To:
h[(v==r).flatten()] = hr[(v==r).flatten()]
h[(v==b).flatten()] = hb[(v==b).flatten()]
h[(v==g).flatten()] = hg[(v==g).flatten()]
The max difference may start showing up as 120, because of that added 120 in that equation. So ideally, you would want to execute these three lines in the order b->g->r. The difference should be negligible then (still noticing a max difference of 0.01~, chalking it up to some round off somewhere).
h[(v==b).flatten()] = hb[(v==b).flatten()]
h[(v==g).flatten()] = hg[(v==g).flatten()]
h[(v==r).flatten()] = hr[(v==r).flatten()]
How to sample without replacement in TensorFlow? Like numpy.random.choice(n, size=k, replace=False) for some very large integer n (e.g. 100k-100M), and smaller k (e.g. 100-10k).
Also, I want it to be efficient and on the GPU, so other solutions like this with tf.py_func are not really an option for me. Anything which would use tf.range(n) or so is also not an option because n could be very large.
This is one way:
n = ...
sample_size = ...
idx = tf.random_shuffle(tf.range(n))[:sample_size]
EDIT:
I had posted the answer below but then read the last line of your post. I don't think there is a good way to do it if you absolutely cannot produce a tensor with size O(n) (numpy.random.choice with replace=False is also implemented as a slice of a permutation). You could resort to a tf.while_loop until you have unique indices:
n = ...
sample_size = ...
idx = tf.zeros(sample_size, dtype=tf.int64)
idx = tf.while_loop(
lambda i: tf.size(idx) == tf.size(tf.unique(idx)),
lambda i: tf.random_uniform(sample_size, maxval=n, dtype=int64))
EDIT 2:
About the average number of iterations in the previous method. If we call n the number of possible values and k the length of the desired vector (with k ≤ n), the probability that an iteration is successful is:
p = product((n - (i - 1) / n) for i in 1 .. k)
Since each iteartion can be considered a Bernoulli trial, the average number of trials unitl first success is 1 / p (proof here). Here is a function that calculates the average numbre of trials in Python for some k and n values:
def avg_iter(k, n):
if k > n or n <= 0 or k < 0:
raise ValueError()
avg_it = 1.0
for p in (float(n) / (n - i) for i in range(k)):
avg_it *= p
return avg_it
And here are some results:
+-------+------+----------+
| n | k | Avg iter |
+-------+------+----------+
| 10 | 5 | 3.3 |
| 100 | 10 | 1.6 |
| 1000 | 10 | 1.1 |
| 1000 | 100 | 167.8 |
| 10000 | 10 | 1.0 |
| 10000 | 100 | 1.6 |
| 10000 | 1000 | 2.9e+22 |
+-------+------+----------+
You can see it varies wildy depending on the parameters.
It is possible, though, to construct a vector in a fixed number of steps, although the only algorithm I can think of is O(k2). In pure Python it goes like this:
import random
def sample_wo_replacement(n, k):
sample = [0] * k
for i in range(k):
sample[i] = random.randint(0, n - 1 - len(sample))
for i, v in reversed(list(enumerate(sample))):
for p in reversed(sample[:i]):
if v >= p:
v += 1
sample[i] = v
return sample
random.seed(100)
print(sample_wo_replacement(10, 5))
# [2, 8, 9, 7, 1]
print(sample_wo_replacement(10, 10))
# [6, 5, 8, 4, 0, 9, 1, 2, 7, 3]
This is a possible way to do it in TensorFlow (not sure if the best one):
import tensorflow as tf
def sample_wo_replacement_tf(n, k):
# First loop
sample = tf.constant([], dtype=tf.int64)
i = 0
sample, _ = tf.while_loop(
lambda sample, i: i < k,
# This is ugly but I did not want to define more functions
lambda sample, i: (tf.concat([sample,
tf.random_uniform([1], maxval=tf.cast(n - tf.shape(sample)[0], tf.int64), dtype=tf.int64)],
axis=0),
i + 1),
[sample, i], shape_invariants=[tf.TensorShape((None,)), tf.TensorShape(())])
# Second loop
def inner_loop(sample, i):
sample_size = tf.shape(sample)[0]
v = sample[i]
j = i - 1
v, _ = tf.while_loop(
lambda v, j: j >= 0,
lambda v, j: (tf.cond(v >= sample[j], lambda: v + 1, lambda: v), j - 1),
[v, j])
return (tf.where(tf.equal(tf.range(sample_size), i), tf.tile([v], (sample_size,)), sample), i - 1)
i = tf.shape(sample)[0] - 1
sample, _ = tf.while_loop(lambda sample, i: i >= 0, inner_loop, [sample, i])
return sample
And an example:
with tf.Graph().as_default(), tf.Session() as sess:
tf.set_random_seed(100)
sample = sample_wo_replacement_tf(10, 5)
for i in range(10):
print(sess.run(sample))
# [3 0 6 8 4]
# [5 4 8 9 3]
# [1 4 0 6 8]
# [8 9 5 6 7]
# [7 5 0 2 4]
# [8 4 5 3 7]
# [0 5 7 4 3]
# [2 0 3 8 6]
# [3 4 8 5 1]
# [5 7 0 2 9]
This is quite intesive on tf.while_loops, though, which are well-known not to be particularly fast in TensorFlow, so I wouldn't know how fast can you really get with this method without some kind of benchmarking.
EDIT 4:
One last possible method. You can divide the range of possible values (0 to n) in "chunks" of size c and pick a random amount of numbers from each chunk, then shuffle everything. The amount of memory that you use is limited by c, and you don't need nested loops. If n is divisible by c, then you should get about a perfect random distribution, otherwise values in the last "short" chunk would receive some extra probability (this may be negligible depending on the case). Here is a NumPy implementation. It is somewhat long to account for different corner cases and pitfalls, but if c ≥ k and n mod c = 0 several parts get simplified.
import numpy as np
def sample_chunked(n, k, chunk=None):
chunk = chunk or n
last_chunk = chunk
parts = n // chunk
# Distribute k among chunks
max_p = min(float(chunk) / k, 1.0)
max_p_last = max_p
if n % chunk != 0:
parts += 1
last_chunk = n % chunk
max_p_last = min(float(last_chunk) / k, 1.0)
p = np.full(parts, 2)
# Iterate until a valid distribution is found
while not np.isclose(np.sum(p), 1) or np.any(p > max_p) or p[-1] > max_p_last:
p = np.random.uniform(size=parts)
p /= np.sum(p)
dist = (k * p).astype(np.int64)
sample_size = np.sum(dist)
# Account for rounding errors
while sample_size < k:
i = np.random.randint(len(dist))
while (dist[i] >= chunk) or (i == parts - 1 and dist[i] >= last_chunk):
i = np.random.randint(len(dist))
dist[i] += 1
sample_size += 1
while sample_size > k:
i = np.random.randint(len(dist))
while dist[i] == 0:
i = np.random.randint(len(dist))
dist[i] -= 1
sample_size -= 1
assert sample_size == k
# Generate sample parts
sample_parts = []
for i, v in enumerate(np.nditer(dist)):
if v <= 0:
continue
c = chunk if i < parts - 1 else last_chunk
base = chunk * i
sample_parts.append(base + np.random.choice(c, v, replace=False))
sample = np.concatenate(sample_parts, axis=0)
np.random.shuffle(sample)
return sample
np.random.seed(100)
print(sample_chunked(15, 5, 4))
# [ 8 9 12 13 3]
A quick benchmark of sample_chunked(100000000, 100000, 100000) takes about 3.1 seconds in my computer, while I haven't been able to run the previous algorithm (sample_wo_replacement function above) to completion with the same parameters. It should be possible to implement it in TensorFlow, maybe using tf.TensorArray, although it would require significant effort to get it exactly right.
use the gumbel-max trick here: https://github.com/tensorflow/tensorflow/issues/9260
z = -tf.log(-tf.log(tf.random_uniform(tf.shape(logits),0,1)))
_, indices = tf.nn.top_k(logits + z,K)
indices are what you want. This tick is so easy~!
The following works fairly fast on the GPU, and I did not encounter memory issues when using n~100M and k~10k (using NVIDIA GeForce GTX 1080 Ti):
def random_choice_without_replacement(n, k):
"""equivalent to 'numpy.random.choice(n, size=k, replace=False)'"""
return tf.math.top_k(tf.random.uniform(shape=[n]), k, sorted=False).indices
I'm trying to optimize a piece of code that solves a large sparse nonlinear system using an interior point method. During the update step, this involves computing the Hessian matrix H, the gradient g, then solving for d in H * d = -g to get the new search direction.
The Hessian matrix has a symmetric tridiagonal structure of the form:
A.T * diag(b) * A + C
I've run line_profiler on the particular function in question:
Line # Hits Time Per Hit % Time Line Contents
==================================================
386 def _direction(n, res, M, Hsig, scale_var, grad_lnprior, z, fac):
387
388 # gradient
389 44 1241715 28220.8 3.7 g = 2 * scale_var * res - grad_lnprior + z * np.dot(M.T, 1. / n)
390
391 # hessian
392 44 3103117 70525.4 9.3 N = sparse.diags(1. / n ** 2, 0, format=FMT, dtype=DTYPE)
393 44 18814307 427597.9 56.2 H = - Hsig - z * np.dot(M.T, np.dot(N, M)) # slow!
394
395 # update direction
396 44 10329556 234762.6 30.8 d, fac = my_solver(H, -g, fac)
397
398 44 111 2.5 0.0 return d, fac
Looking at the output it's clear that constructing H is by far the most costly step - it takes considerably longer than actually solving for the new direction.
Hsig and M are both CSC sparse matrices, n is a dense vector and z is a scalar. The solver I'm using requires H to be either a CSC or CSR sparse matrix.
Here's a function that produces some toy data with the same formats, dimensions and sparseness as my real matrices:
import numpy as np
from scipy import sparse
def make_toy_data(nt=200000, nc=10):
d0 = np.random.randn(nc * (nt - 1))
d1 = np.random.randn(nc * (nt - 1))
M = sparse.diags((d0, d1), (0, nc), shape=(nc * (nt - 1), nc * nt),
format='csc', dtype=np.float64)
d0 = np.random.randn(nc * nt)
Hsig = sparse.diags(d0, 0, shape=(nc * nt, nc * nt), format='csc',
dtype=np.float64)
n = np.random.randn(nc * (nt - 1))
z = np.random.randn()
return Hsig, M, n, z
And here's my original approach for constructing H:
def original(Hsig, M, n, z):
N = sparse.diags(1. / n ** 2, 0, format='csc')
H = - Hsig - z * np.dot(M.T, np.dot(N, M)) # slow!
return H
Timing:
%timeit original(Hsig, M, n, z)
# 1 loops, best of 3: 483 ms per loop
Is there a faster way to construct this matrix?
I get close to a 4x speed-up in computing the product M.T * D * M out of the three diagonal arrays. If d0 and d1 are the main and upper diagonal of M, and d is the main diagonal of D, then the following code creates M.T * D * M directly:
def make_tridi_bis(d0, d1, d, nc=10):
d00 = d0*d0*d
d11 = d1*d1*d
d01 = d0*d1*d
len_ = d0.size
data = np.empty((3*len_ + nc,))
indices = np.empty((3*len_ + nc,), dtype=np.int)
# Fill main diagonal
data[:2*nc:2] = d00[:nc]
indices[:2*nc:2] = np.arange(nc)
data[2*nc+1:-2*nc:3] = d00[nc:] + d11[:-nc]
indices[2*nc+1:-2*nc:3] = np.arange(nc, len_)
data[-2*nc+1::2] = d11[-nc:]
indices[-2*nc+1::2] = np.arange(len_, len_ + nc)
# Fill top diagonal
data[1:2*nc:2] = d01[:nc]
indices[1:2*nc:2] = np.arange(nc, 2*nc)
data[2*nc+2:-2*nc:3] = d01[nc:]
indices[2*nc+2:-2*nc:3] = np.arange(2*nc, len_+nc)
# Fill bottom diagonal
data[2*nc:-2*nc:3] = d01[:-nc]
indices[2*nc:-2*nc:3] = np.arange(len_ - nc)
data[-2*nc::2] = d01[-nc:]
indices[-2*nc::2] = np.arange(len_ - nc ,len_)
indptr = np.empty((len_ + nc + 1,), dtype=np.int)
indptr[0] = 0
indptr[1:nc+1] = 2
indptr[nc+1:len_+1] = 3
indptr[-nc:] = 2
np.cumsum(indptr, out=indptr)
return sparse.csr_matrix((data, indices, indptr), shape=(len_+nc, len_+nc))
If your matrix M were in CSR format, you can extract d0 and d1 as d0 = M.data[::2] and d1 = M.data[1::2], I modified you toy data making routine to return those arrays as well, and here's what I get:
In [90]: np.allclose((M.T * sparse.diags(d, 0) * M).A, make_tridi_bis(d0, d1, d).A)
Out[90]: True
In [92]: %timeit make_tridi_bis(d0, d1, d)
10 loops, best of 3: 124 ms per loop
In [93]: %timeit M.T * sparse.diags(d, 0) * M
1 loops, best of 3: 501 ms per loop
The whole purpose of the above code is to take advantage of the structure of the non-zero entries. If you draw a diagram of the matrices you are multiplying together, it is relatively easy to convince yourself that the main (d_0) and top and bottom (d_1) diagonals of the resulting tridiagonal matrix are simply:
d_0 = np.zeros((len_ + nc,))
d_0[:len_] = d00
d_0[-len_:] += d11
d_1 = d01
The rest of the code in that function is simply building the tridiagonal matrix directly, as calling sparse.diags with the above data is several times slower.
I tried running your test case and had problems with the np.dot(N, M). I didn't dig into it, but I think my numpy/sparse combo (both pretty new) had problems using np.dot on sparse arrays.
But H = -Hsig - z*M.T.dot(N.dot(M)) runs just fine. This uses the sparse dot.
I haven't run a profile, but here are Ipython timings for several parts. It takes longer to generate the data than to do that double dot.
In [37]: timeit Hsig,M,n,z=make_toy_data()
1 loops, best of 3: 2 s per loop
In [38]: timeit N = sparse.diags(1. / n ** 2, 0, format='csc')
1 loops, best of 3: 377 ms per loop
In [39]: timeit H = -Hsig - z*M.T.dot(N.dot(M))
1 loops, best of 3: 1.55 s per loop
H is a
<2000000x2000000 sparse matrix of type '<type 'numpy.float64'>'
with 5999980 stored elements in Compressed Sparse Column format>