Add an extra column to ndarray in python - numpy

I have a ndarray as follows.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
I have a position ndarray as follows.
position = [10, 20, 30]
Now I want to add the position value at the beginning of the feature_matrix as follows.
[[10, 0.1, 0.3], [20, 0.7, 0.8], [30, 0.8, 0.8]]
I tried the answers in this: How to add an extra column to an numpy array
E.g.,
feature_matrix = np.concatenate((feature_matrix, position), axis=1)
However, I get the error saying that;
ValueError: all the input arrays must have same number of dimensions
Please help me to resolve this prblem.

This solved my problem. I used np.column_stack.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
position = [10, 20, 30]
feature_matrix = np.column_stack((position, feature_matrix))

It is the shape of the position array which is incorrect regarding the shape of the feature_matrix.
>>> feature_matrix
array([[ 0.1, 0.3],
[ 0.7, 0.8],
[ 0.8, 0.8]])
>>> position
array([10, 20, 30])
>>> position.reshape((3,1))
array([[10],
[20],
[30]])
The solution is (with np.concatenate):
>>> np.concatenate((position.reshape((3,1)), feature_matrix), axis=1)
array([[ 10. , 0.1, 0.3],
[ 20. , 0.7, 0.8],
[ 30. , 0.8, 0.8]])
But np.column_stack is clearly great in your case !

Related

create a lineplot from a few variables

I have a dataframe with 3 variables, each one is representing different time point for the same outcome (e.g. weight):
df = pd.DataFrame({"Time_1": [-4.5, -0.8, -3.0, 0.2, -2.5], \
"Time_2": [-3, -0.2, -2.5, 0.3, 1], "TIme_3": [-2, 0, -1, 0.5, 1]})
I want to plot a trajectory for this variable identical to this graph:
Where I have a first point of (0,0) for the basline and three additional points on X axis with the correspondign values.
You could just use df.shift().fillna(0).cumsum().plot(marker='D') to get a plot of the 3 variables together. Shift and fillna are used so that the first line can be 0 for all the variables.
df = pd.DataFrame({"Time_1": [-4.5, -0.8, -3.0, 0.2, -2.5], \
"Time_2": [-3, -0.2, -2.5, 0.3, 1], "Time_3": [-2, 0, -1, 0.5, 1]})
df.shift().fillna(0).cumsum().plot(marker='D')

in pyplot hist2D with customized colorbar mark bins outside colorbar range

I'm plotting a weighted 2D histogram with one value assigned to each bin. Here's a minimal example:
import matplotlib.pyplot as plotter
plot_field, axis_field = plotter.subplots()
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
histo = plotter.hist2d(x, y, bins=bins, weights=w)
plotter.colorbar(histo[3], extend='min')
plotter.clim(minimum, max(w))
plotter.show()
Restricting the range of the colorbar works fine. However, I want to the bins with weight below the minimum to be marked in some way. Either colored differently or indicated in some other way.
Is there a simple way to do this?
Thanks a lot!
You could create your own colormap for example:
import numpy as np
import matplotlib.pyplot as plotter
from matplotlib import cm
from matplotlib.colors import ListedColormap
plot_field, axis_field = plotter.subplots()
viridis = cm.get_cmap('viridis', 256)
newcolors = viridis(np.linspace(0, 1, 256))
pink = np.array([248/256, 24/256, 148/256, 1])
newcolors[0, :] = pink
newcmp = ListedColormap(newcolors)
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
_, _, _, mesh = plotter.hist2d(
x, y, bins=bins, weights=w, cmap=newcmp, vmin=minimum, vmax=max(w)
)
plotter.colorbar(mesh, extend='min')
plotter.show()

tensorflow how do one get the output the same size as input tensor after segment sum

I'm using the tf.unsorted_segment_sum method of TensorFlow and it works.
For example:
tf.unsorted_segment_sum(tf.constant([0.2, 0.1, 0.5, 0.7, 0.8]),
tf.constant([0, 0, 1, 2, 2]), 3)
Gives the right result:
array([ 0.3, 0.5 , 1.5 ], dtype=float32)
I want to get:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)
I've solved it.
data = tf.constant([0.2, 0.1, 0.5, 0.7, 0.8])
gr_idx = tf.constant([0, 0, 1, 2, 2])
y, idx, count = tf.unique_with_count(gr_idx)
group_sum = tf.segment_sum(data, gr_idx)
group_sup = tf.gather(group_sum, idx)
answer:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)

How to do multiply each row of a matrix by different scalar in tensorflow [duplicate]

I have a 2D matrix M of shape [batch x dim], I have a vector V of shape [batch]. How can I multiply each of the columns in the matrix by the corresponding element in the V? That is:
I know an inefficient numpy implementation would look like this:
import numpy as np
M = np.random.uniform(size=(4, 10))
V = np.random.randint(4)
def tst(M, V):
rows = []
for i in range(len(M)):
col = []
for j in range(len(M[i])):
col.append(M[i][j] * V[i])
rows.append(col)
return np.array(rows)
In tensorflow, given two tensors, what is the most efficient way to achieve this?
import tensorflow as tf
sess = tf.InteractiveSession()
M = tf.constant(np.random.normal(size=(4,10)), dtype=tf.float32)
V = tf.constant([1,2,3,4], dtype=tf.float32)
In NumPy, we would need to make V 2D and then let broadcasting do the element-wise multiplication (i.e. Hadamard product). I am guessing, it should be the same on tensorflow. So, for expanding dims on tensorflow, we can use tf.newaxis (on newer versions) or tf.expand_dims or a reshape with tf.reshape -
tf.multiply(M, V[:,tf.newaxis])
tf.multiply(M, tf.expand_dims(V,1))
tf.multiply(M, tf.reshape(V, (-1, 1)))
In addition to #Divakar's answer, I would like to make a note that the order of M and V don't matter. It seems that tf.multiply also does broadcasting during multiplication.
Example:
In [55]: M.eval()
Out[55]:
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]], dtype=int32)
In [56]: V.eval()
Out[56]: array([10, 20, 30], dtype=int32)
In [57]: tf.multiply(M, V[:,tf.newaxis]).eval()
Out[57]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)
In [58]: tf.multiply(V[:, tf.newaxis], M).eval()
Out[58]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)

numpy - multiply each element in array by a scaling factor

I have a numpy array of values, and a list of scaling factors which I want to scale each value in the array by, down each column
values = [[ 0, 1, 2, 3 ],
[ 1, 1, 4, 3 ],
[ 2, 1, 6, 3 ],
[ 3, 1, 8, 3 ]]
ls_alloc = [ 0.1, 0.4, 0.3, 0.2]
# convert values into numpy array
import numpy as np
na_values = np.array(values, dtype=float)
Edit: To clarify:
na_values can is a 2-dimensional array of stock cumulative returns (ie: normalised to day 1), where each row represents a date, and each column a stock. The data is returned as an array for each date.
I want to now scale each stock's cumulative return by its allocation in the portfolio. So for each date (ie: each row of ndarray values, apply the respective element from ls_alloc to the array element-wise)
# scale each value by its allocation
na_components = [ ls_alloc[i] * na_values[:,i] for i in range(len(ls_alloc)) ]
This does what I want, but I can't help but feel there must be a way to have numpy do this for me automatically?
That is, I feel:
na_components = [ ls_alloc[i] * na_values[:,i] for i in range(len(ls_alloc)) ]
# display na_components
na_components
[array([ 0. , 0.1, 0.2, 0.3]), \
array([ 0.4, 0.4, 0.4, 0.4]), \
array([ 0.6, 1.2, 1.8, 2.4]), \
array([ 0.6, 0.6, 0.6, 0.6])]
should be able to be expressed as something like:
tmp = np.multiply(na_values, ls_alloc)
# display tmp
tmp
array([[ 0. , 0.4, 0.6, 0.6],
[ 0.1, 0.4, 1.2, 0.6],
[ 0.2, 0.4, 1.8, 0.6],
[ 0.3, 0.4, 2.4, 0.6]])
Is there a numpy function which will achieve what I want elegantly and succinctly?
Edit:
I see that my first solution has transposed my data, such that I am returned a list of ndarrays. na_components[0] now gives an ndarray of the stock values for the first stock, 1 element per date.
The next step that I perform with na_components is to calculate the total cumulative return for the portfolio by summing each individual component
na_pfo_cum_ret = np.sum(na_components, axis=0)
This works with the list of individual stock return ndarrays.
That order seems a little odd to me, but IIUC, all you need to do is to transpose the result of multiplying na_values by array(ls_alloc):
>>> v
array([[ 0., 1., 2., 3.],
[ 1., 1., 4., 3.],
[ 2., 1., 6., 3.],
[ 3., 1., 8., 3.]])
>>> a
array([ 0.1, 0.4, 0.3, 0.2])
>>> (v*a).T
array([[ 0. , 0.1, 0.2, 0.3],
[ 0.4, 0.4, 0.4, 0.4],
[ 0.6, 1.2, 1.8, 2.4],
[ 0.6, 0.6, 0.6, 0.6]])
It's not completely clear to me what you want to do, but the answer is probably in Broadcasting rules. I think you want:
values = np.array( [[ 0, 1, 2, 3 ],
[ 1, 1, 4, 3 ],
[ 2, 1, 6, 3 ],
[ 3, 1, 8, 3 ]] )
ls_alloc = np.array([ 0.1, 0.4, 0.3, 0.2])
and either:
na_components = values * ls_alloc
or:
na_components = values * ls_alloc[:,np.newaxis]