Numpy Linalg on transition matrix

Numpy Linalg on transition matrix - numpy

I have the following states
states = [(0,2,3,0), (2,2,3,0), (2,2,2,0), (2,2,1,0)]
In addition, I have the following transition matrix
import pandas as pd
transition_matrix = pd.DataFrame([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]], columns=states, index=states)
So, if you are in state (2,2,1,0) there is a 50% that you go to state (0,2,3,0) and a 50% probability that you go (2,2,3,0).
If you are in state (0,2,3,0), the absorbing state, you win.
We can write the following equations
p_win_(0,2,3,0) = 1
p_win_(2,2,3,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,2,0)
p_win_(2,2,2,0) = 0.50 * p_win_(0,2,3,0) + 0.3 * p_win(2,2,3,0) + 0.2 * p_win(2,2,1,0)
p_win_(2,2,1,0) = 0.50 * p_win_(0,2,3,0) + 0.5 * p_win(2,2,3,0)
I would like to solve the above formulas. I looked at the documentation of the np.linalg.solve-function. The example doesn't use defined variables and, in addition, I have terms on both side of the equal sign.
Please show me how I can solve the above.

First, your first equation is wrong (it should be p_win_(0,2,3,0) = 1* p_win_(0,2,3,0))
You are essentially trying to get the largest eigenvector (corresponding to eig=1) for the Transition matrix. p_win_ is determined by:
v = Pv (or P-I)v, sum(v) = 1, where I is identity matrix np.eye(4)
which we can write in extended form as:
I = np.eye(4)
P = np.array([[1, 0, 0, 0],
[0.5, 0.3, 0.2, 0],
[0.5, 0.3, 0, 0.2],
[0.5, 0.5, 0, 0]]) # if you already have it in DataFrame,
# you can alternatively do:
# P = transition_matrix.to_numpy()
extend_m = np.concatenate((P-I, np.ones((1,4), axis=0))
# Equation to solve is extend_m*v = np.array([0,1])
So solution is given by
v = np.linalg.lstsq(extend_m, np.array([0,1])
I use lstsq because we have an overdetermined system (5 equations, 4 unknowns). If you want to use np.linalg.solve you need to reduce it to 4 equations, which I leave up to you (In this particular case, there is one obviously redundant equation, which you can just remove).

Related

Python: create (sparse) stacked diagonal block matrix

I need to create a matrix with the form
M=[
[a1, 0, 0],
[0, b1, 0],
[0, 0, c1],
[a2, 0, 0],
[0, b2, 0],
[0, 0, c2],
[a3, 0, 0],
[0, b3, 0],
[0, 0, c3],
...]
where a(i), b(i) and c(i) are [1xp] blocks. The resulting matrix M has the form [3m x 3p]. I am given the input data in the form of 3 matrices [m x p]:
A = [[a1.T, a2.T, a3.T, ...]].T
B = [[b1.T, b2.T, b3.T, ...]].T
C = [[c1.T, c2.T, c3.T, ...]].T
How can I create the matrix M? Ideally it would be sparse using the scipy.sparse library but I am even struggling creating it as a dense matrix using numpy. Is there no way around a loop or at least list comprehension in this case?

No need to make it complicated. For your scale, the following executes in less than a second.
import numpy as np
import scipy.sparse
from numpy.random import default_rng
rand = default_rng(seed=0)
m = 70_000
p = 20
abc = rand.random((3, m, p))
M_dense = np.zeros((m, 3, 3*p))
for i in range(3):
M_dense[:, i, i*p:(i+1)*p] = abc[i, ...]
M_sparse = scipy.sparse.csr_matrix(M_dense.reshape((-1, 3*p)))
print(M_sparse.shape)
(210000, 60)
Far better, though, is to construct the sparse matrix directly. Note the permuted shape of abc.
abc = rand.random((m, 3, p))
data = abc.ravel()
indices = np.tile(np.arange(3*p), m)
indptr = np.arange(0, data.size+1, p)
M_sparse = scipy.sparse.csr_matrix((data, indices, indptr))

Assign array values to contiguous intervals

I have a levels array
# 0 1 2 3 4
levels = np.array(( 0.2, 0.4, 0.6, 0.8 ))
and a values array, e.g.,
np.random.seed(20230204)
values = np.random.rand(5)
and eventually a SLOW function
def map_into_levels(values, levels):
result = []
for n in np.asarray(values):
for r, level in enumerate(levels):
if n <= level:
break
else:
r += 1
result.append(r)
return result
so that I have
In [153]: np.random.seed(20220204)
...: values = np.random.rand(6)
...: levels = np.array(( 0.2, 0.4, 0.6, 0.8 ))
...: result = map_into_levels(values, levels)
...: print(levels)
...: print(values)
...: print(result)
[0.2 0.4 0.6 0.8]
[0.00621839 0.23945242 0.87124946 0.56328486 0.5477085 0.88745812]
[0, 1, 4, 2, 2, 4]
In [154]:
Could you please point me towards a Numpy primitive that helps me to speed up the operations?

You need np.searchsorted assuming levels is sorted already. It find indices where elements should be inserted to maintain order:
np.searchsorted(levels, values)
# array([0, 1, 4, 2, 2, 4], dtype=int32)

How to define variables, constrains to Pandas Dataframe when using CVXPY for optimization?

import pandas as pd
import numpy as np
import re
import cvxpy as cvx
data = pd.read_excel('Optimality_V3.xlsx', encoding='latin-1')
As u can see I just imported a csv file as a dataframe. Now I want to solve a maximixation function using the CVXPY library to identify the optimal values of row data['D'] such that the sum of values of data['B'] is maximum.
My objective function is quadratic as my decision variable data['D'] and the function is something like this:
data['B'] = data['C'] * data['D']**2 / data['E'].
The constraints I want to assign to every row of data['D']:
data['D'] * 0.8 <= data['D'] <= data['D'] * 1.2
decision_variables = []
variable_constraints = []
for rownum, row in data.iterrows():
var_ind = str('x' + str(rownum))
var_ind = cvx.Variable()
con_ind = var_ind * 0.8 <= var_ind <= var_ind * 1.2
decision_variables.append(str(var_ind))
variable_constraints.append(str(con_ind))
The above code is my attempt at doing this. I am new to CVXPY and trying to figure out how I can create variables named var_ind with constraints con_ind.

Look at documentation for many examples: https://www.cvxpy.org/index.html
data = pd.DataFrame(data={
'A': [1, 2, 3, 4, 5],
'B': [0, 50, 40, 80, 20],
'C': [1200, 600, 900, 6500, 200],
'D': [0.4, 1.2, 0.8, 1.6, 1.1],
'E': [0.4, 0.5, 0.6, 0.4, 0.5],
'F': [0.8, 0.4, 1.2, 1.6, 1],
})
x = cvx.Variable(data.index.size)
constraints = [
x * 0.8 <= x,
x <= x * 1.2
]
objective = cvx.Minimize(
cvx.sum(
cvx.multiply((data['C']/data['E']).tolist(), x**2)
)
)
prob = cvx.Problem(objective, constraints)
prob.solve()
print x.value

The goal of my optimizer is to calculate new value's for column D such that the new values are always (D*0.8 <= new_D(or x below) <= D*1.2, lets call these bounds of x. Apart from these,
The maximization function is:
cvx.sum[cvx.multiply((data['C']*data['F']/data['D']).tolist(), x)]
I have a further constraint:
cvx.sum[cvx.multiply((data['F']*data['E']*data['C']/data['D']).tolist(), x**2)] == data['C'].sum()
import pandas as pd
import numpy as np
import re
import cvxpy as cvx
data = pd.DataFrame(data={
'A': [1, 2, 3, 4, 5],
'B': [100, 50, 40, 80, 20],
'C': [1200, 600, 900, 6500, 200],
'D': [0.4, 1.2, 0.8, 1.6, 1.1],
'E': [0.4, 0.5, 0.6, 0.4, 0.5],
'F': [0.8, 0.4, 1.2, 1.6, 1],
})
x = cvx.Variable(data.index.size)
Now, I want to add a third additional quadratic constraint that says the total sum of column C is always constant.
constraints = [
x * 0.8 <= x,
x <= x * 1.2,
cvx.sum(
cvx.multiply((data['F']*data['E']*data['C']/data['D']).tolist(), x**2)
) == data['C'].sum()
]
The minimization function as you can see is pretty simple and is linear. How do I convert this to a maximization function though?
objective = cvx.Minimmize(
cvx.sum(
cvx.multiply((data['C']*data['F']/data['D']).tolist(), x)
)
)
prob = cvx.Problem(objective, constraints)
prob.solve()
print(x.value)
I am going through the CVXPY documentation and its helping me a lot! But I don't see any examples that have a 3rd constraint designed similar to mine, and I am getting bugs 'DCPError: Problem does not follow DCP rules.'

Logarithmic multi-sequenz plot with equal bar widths

I have something like
import matplotlib.pyplot as plt
import numpy as np
a=[0.05, 0.1, 0.2, 1, 2, 3]
plt.hist((a*2, a*3), bins=[0, 0.1, 1, 10])
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
which gives me the following plot:
As one can see, the bar width is not equal. In the linear part (from 0 to 0.1), everything is find, but after this, the bar width is still in linear scale, while the axis is in logarithmic scale, giving me uneven widths for bars and spaces in between (the tick is not in the middle of the bars).
Is there any way to correct this?

Inspired by https://stackoverflow.com/a/30555229/635387 I came up with the following solution:
import matplotlib.pyplot as plt
import numpy as np
d=[0.05, 0.1, 0.2, 1, 2, 3]
def LogHistPlot(data, bins):
totalWidth=0.8
colors=("b", "r", "g")
for i, d in enumerate(data):
heights = np.histogram(d, bins)[0]
width=1/len(data)*totalWidth
left=np.array(range(len(heights))) + i*width
plt.bar(left, heights, width, color=colors[i], label=i)
plt.xticks(range(len(bins)), bins)
plt.legend(loc='best')
LogHistPlot((d*2, d*3, d*4), [0, 0.1, 1, 10])
plt.show()
Which produces this plot:
The basic idea is to drop the plt.hist function, compute the histogram by numpy and plot it with plt.bar. Than, you can easily use a linear x-axis, which makes the bar width calculation trivial. Lastly, the ticks are replaced by the bin edges, resulting in the logarithmic scale. And you don't even have to deal with the symlog linear/logarithmic botchery anymore.

You could use histtype='stepfilled' if you are okay with a plot where the data sets are plotted one behind the other. Of course, you'll need to carefully choose colors with alpha values, so that all your data can still be seen...
a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.05, 0.05, 0.15, 0.15, 2]
colors = [(0.2, 0.2, 0.9, 0.5), (0.9, 0.2, 0.2, 0.5)] # RGBA tuples
plt.hist((a, b), bins=[0, 0.1, 1, 10], histtype='stepfilled', color=colors)
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
I've changed your data slightly for a better illustration. This gives me:
For some reason the overlap color seems to be going wrong (matplotlib 1.3.1 with Python 3.4.0; Is this a bug?), but it's one possible solution/alternative to your problem.

Okay, I found out the real problem: when you create the histogram with those bin-edge settings, the histogram creates bars which have equal size, and equal outside-spacing on the non-log scale.
To demonstrate, here's a zoomed-in version of the plot in the question, but in non-log scale:
Notice how the first two bars are centered around (0 + 0.1) / 2 = 0.05, with a gap of 0.1 / 10 = 0.01 at the edges, while the next two bars are centered around (0.1 + 1.0) / 2 = 0.55, with a gap of 1.1 / 10 = 0.11 at either edge.
When converting things to log scale, bar widths and edge widths all go for a huge toss. This is compounded further by the fact that you have a linear scale from 0 to 0.1, after which things become log-scale.
I know no way of fixing this, other than to do everything manually. I've used the geometric means of the bin-edges in order to compute what the bar edges and bar widths should be. Note that this piece of code will work only for two datasets. If you have more datasets, you'll need to have some function that fills in the bin-edges with a geometric series appropriately.
import numpy as np
import matplotlib.pyplot as plt
def geometric_means(a):
"""Return pairwise geometric means of adjacent elements."""
return np.sqrt(a[1:] * a[:-1])
a = [0.05, 0.1, 0.2, 1, 2, 3] * 2
b = [0.05, 0.1, 0.2, 1, 2, 3] * 3
# Find frequencies
bins = np.array([0, 0.1, 1, 10])
a_hist = np.histogram(a, bins=bins)[0]
b_hist = np.histogram(b, bins=bins)[0]
# Find log-scale mid-points for bar-edges
mid_vals = np.hstack((np.array([0.05,]), geometric_means(bins[1:])))
# Compute bar left-edges, and bar widths
a_x = np.empty(mid_vals.size * 2)
a_x = bins[:-1]
a_widths = mid_vals - bins[:-1]
b_x = np.empty(mid_vals.size * 2)
b_x = mid_vals
b_widths = bins[1:] - mid_vals
plt.bar(a_x, a_hist, width=a_widths, color='b')
plt.bar(b_x, b_hist, width=b_widths, color='g')
plt.gca().set_xscale("symlog", linthreshx=0.1)
plt.show()
And the final result:
Sorry, but the neat gaps between the bars get killed. Again, this can be fixed by doing the appropriate geometric interpolation, so that everything is linear on log-scale.

Just in case someone stumbles upon this problem:
This solution looks much more like the way it should be
plotting a histogram on a Log scale with Matplotlib

Extract triangles form delaunay filter in mayavi

How can I extract triangles from delaunay filter in mayavi?
I want to extract the triangles just like matplotlib does
import numpy as np
import matplotlib.delaunay as triang
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
#matplotlib
centers, edges, triangles_index, neig = triang.delaunay(x,y)
#mayavi
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
I want to extract the triangles from mayavi delaunay filter to obtain the variables #triangle_index and #centers (just like matplotlib)
The only thing I've found is this
http://docs.enthought.com/mayavi/mayavi/auto/example_delaunay_graph.html
but only get the edges, and are codificated different than matplotlib

To get the triangles index:
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
poly is a PolyData object, poly.polys is a CellArray object that stores the index information.
For detail about CellArray: http://www.vtk.org/doc/nightly/html/classvtkCellArray.html
To get the center of every circumcircle, you need to loop every triangle and calculate the center:
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
cell.circumcircle() is a static function, so you need to pass all the points of the triangle as arguments, the center data will be returned by modify the fourth argument.
Here is the full code:
import numpy as np
from enthought.mayavi import mlab
x = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
y = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
z = np.zeros(9)
vtk_source = mlab.pipeline.scalar_scatter(x, y, z, figure=False)
delaunay = mlab.pipeline.delaunay2d(vtk_source)
poly = delaunay.outputs[0]
tindex = poly.polys.data.to_array().reshape(-1, 4)[:, 1:]
centers = []
for i in xrange(poly.number_of_cells):
cell = poly.get_cell(i)
points = cell.points.to_array()[:, :-1].tolist()
center = [0, 0]
points.append(center)
cell.circumcircle(*points)
centers.append(center)
centers = np.array(centers)
print centers
print tindex
The output is:
[[ 1.5 0.5]
[ 1.5 0.5]
[ 0.5 1.5]
[ 0.5 0.5]
[ 0.5 0.5]
[ 0.5 1.5]
[ 1.5 1.5]
[ 1.5 1.5]]
[[5 4 2]
[4 1 2]
[7 6 4]
[4 3 1]
[3 0 1]
[6 3 4]
[8 7 4]
[8 4 5]]
The result may not be the same as matplotlib.delaunay, because there are many possible solutions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Numpy Linalg on transition matrix - numpy

Related

Python: create (sparse) stacked diagonal block matrix

Assign array values to contiguous intervals

How to define variables, constrains to Pandas Dataframe when using CVXPY for optimization?

Logarithmic multi-sequenz plot with equal bar widths

Extract triangles form delaunay filter in mayavi

Categories

Resources