Add an extra column to ndarray in python

Add an extra column to ndarray in python - numpy

I have a ndarray as follows.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
I have a position ndarray as follows.
position = [10, 20, 30]
Now I want to add the position value at the beginning of the feature_matrix as follows.
[[10, 0.1, 0.3], [20, 0.7, 0.8], [30, 0.8, 0.8]]
I tried the answers in this: How to add an extra column to an numpy array
E.g.,
feature_matrix = np.concatenate((feature_matrix, position), axis=1)
However, I get the error saying that;
ValueError: all the input arrays must have same number of dimensions
Please help me to resolve this prblem.

This solved my problem. I used np.column_stack.
feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]]
position = [10, 20, 30]
feature_matrix = np.column_stack((position, feature_matrix))

It is the shape of the position array which is incorrect regarding the shape of the feature_matrix.
>>> feature_matrix
array([[ 0.1, 0.3],
[ 0.7, 0.8],
[ 0.8, 0.8]])
>>> position
array([10, 20, 30])
>>> position.reshape((3,1))
array([[10],
[20],
[30]])
The solution is (with np.concatenate):
>>> np.concatenate((position.reshape((3,1)), feature_matrix), axis=1)
array([[ 10. , 0.1, 0.3],
[ 20. , 0.7, 0.8],
[ 30. , 0.8, 0.8]])
But np.column_stack is clearly great in your case !

Related

create a lineplot from a few variables

I have a dataframe with 3 variables, each one is representing different time point for the same outcome (e.g. weight):
df = pd.DataFrame({"Time_1": [-4.5, -0.8, -3.0, 0.2, -2.5], \
"Time_2": [-3, -0.2, -2.5, 0.3, 1], "TIme_3": [-2, 0, -1, 0.5, 1]})
I want to plot a trajectory for this variable identical to this graph:
Where I have a first point of (0,0) for the basline and three additional points on X axis with the correspondign values.

You could just use df.shift().fillna(0).cumsum().plot(marker='D') to get a plot of the 3 variables together. Shift and fillna are used so that the first line can be 0 for all the variables.
df = pd.DataFrame({"Time_1": [-4.5, -0.8, -3.0, 0.2, -2.5], \
"Time_2": [-3, -0.2, -2.5, 0.3, 1], "Time_3": [-2, 0, -1, 0.5, 1]})
df.shift().fillna(0).cumsum().plot(marker='D')

in pyplot hist2D with customized colorbar mark bins outside colorbar range

I'm plotting a weighted 2D histogram with one value assigned to each bin. Here's a minimal example:
import matplotlib.pyplot as plotter
plot_field, axis_field = plotter.subplots()
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
histo = plotter.hist2d(x, y, bins=bins, weights=w)
plotter.colorbar(histo[3], extend='min')
plotter.clim(minimum, max(w))
plotter.show()
Restricting the range of the colorbar works fine. However, I want to the bins with weight below the minimum to be marked in some way. Either colored differently or indicated in some other way.
Is there a simple way to do this?
Thanks a lot!

You could create your own colormap for example:
import numpy as np
import matplotlib.pyplot as plotter
from matplotlib import cm
from matplotlib.colors import ListedColormap
plot_field, axis_field = plotter.subplots()
viridis = cm.get_cmap('viridis', 256)
newcolors = viridis(np.linspace(0, 1, 256))
pink = np.array([248/256, 24/256, 148/256, 1])
newcolors[0, :] = pink
newcmp = ListedColormap(newcolors)
x = [0.5, 1.5, 2.5, 0.5, 1.5, 2.5, 0.5, 1.5, 2.5]
y = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
w = [2, 1, 0, 3, 0, 0, 1, 0, 3]
minimum = 1
bins = [[0, 1, 2, 3], [0, 1, 2, 3]]
_, _, _, mesh = plotter.hist2d(
x, y, bins=bins, weights=w, cmap=newcmp, vmin=minimum, vmax=max(w)
)
plotter.colorbar(mesh, extend='min')
plotter.show()

tensorflow how do one get the output the same size as input tensor after segment sum

I'm using the tf.unsorted_segment_sum method of TensorFlow and it works.
For example:
tf.unsorted_segment_sum(tf.constant([0.2, 0.1, 0.5, 0.7, 0.8]),
tf.constant([0, 0, 1, 2, 2]), 3)
Gives the right result:
array([ 0.3, 0.5 , 1.5 ], dtype=float32)
I want to get:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)

I've solved it.
data = tf.constant([0.2, 0.1, 0.5, 0.7, 0.8])
gr_idx = tf.constant([0, 0, 1, 2, 2])
y, idx, count = tf.unique_with_count(gr_idx)
group_sum = tf.segment_sum(data, gr_idx)
group_sup = tf.gather(group_sum, idx)
answer:
array([0.3, 0.3, 0.5, 1.5, 1.5], dtype=float32)

How to do multiply each row of a matrix by different scalar in tensorflow [duplicate]

I have a 2D matrix M of shape [batch x dim], I have a vector V of shape [batch]. How can I multiply each of the columns in the matrix by the corresponding element in the V? That is:
I know an inefficient numpy implementation would look like this:
import numpy as np
M = np.random.uniform(size=(4, 10))
V = np.random.randint(4)
def tst(M, V):
rows = []
for i in range(len(M)):
col = []
for j in range(len(M[i])):
col.append(M[i][j] * V[i])
rows.append(col)
return np.array(rows)
In tensorflow, given two tensors, what is the most efficient way to achieve this?
import tensorflow as tf
sess = tf.InteractiveSession()
M = tf.constant(np.random.normal(size=(4,10)), dtype=tf.float32)
V = tf.constant([1,2,3,4], dtype=tf.float32)

In NumPy, we would need to make V 2D and then let broadcasting do the element-wise multiplication (i.e. Hadamard product). I am guessing, it should be the same on tensorflow. So, for expanding dims on tensorflow, we can use tf.newaxis (on newer versions) or tf.expand_dims or a reshape with tf.reshape -
tf.multiply(M, V[:,tf.newaxis])
tf.multiply(M, tf.expand_dims(V,1))
tf.multiply(M, tf.reshape(V, (-1, 1)))

In addition to #Divakar's answer, I would like to make a note that the order of M and V don't matter. It seems that tf.multiply also does broadcasting during multiplication.
Example:
In [55]: M.eval()
Out[55]:
array([[1, 2, 3, 4],
[2, 3, 4, 5],
[3, 4, 5, 6]], dtype=int32)
In [56]: V.eval()
Out[56]: array([10, 20, 30], dtype=int32)
In [57]: tf.multiply(M, V[:,tf.newaxis]).eval()
Out[57]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)
In [58]: tf.multiply(V[:, tf.newaxis], M).eval()
Out[58]:
array([[ 10, 20, 30, 40],
[ 40, 60, 80, 100],
[ 90, 120, 150, 180]], dtype=int32)

numpy - multiply each element in array by a scaling factor

I have a numpy array of values, and a list of scaling factors which I want to scale each value in the array by, down each column
values = [[ 0, 1, 2, 3 ],
[ 1, 1, 4, 3 ],
[ 2, 1, 6, 3 ],
[ 3, 1, 8, 3 ]]
ls_alloc = [ 0.1, 0.4, 0.3, 0.2]
# convert values into numpy array
import numpy as np
na_values = np.array(values, dtype=float)
Edit: To clarify:
na_values can is a 2-dimensional array of stock cumulative returns (ie: normalised to day 1), where each row represents a date, and each column a stock. The data is returned as an array for each date.
I want to now scale each stock's cumulative return by its allocation in the portfolio. So for each date (ie: each row of ndarray values, apply the respective element from ls_alloc to the array element-wise)
# scale each value by its allocation
na_components = [ ls_alloc[i] * na_values[:,i] for i in range(len(ls_alloc)) ]
This does what I want, but I can't help but feel there must be a way to have numpy do this for me automatically?
That is, I feel:
na_components = [ ls_alloc[i] * na_values[:,i] for i in range(len(ls_alloc)) ]
# display na_components
na_components
[array([ 0. , 0.1, 0.2, 0.3]), \
array([ 0.4, 0.4, 0.4, 0.4]), \
array([ 0.6, 1.2, 1.8, 2.4]), \
array([ 0.6, 0.6, 0.6, 0.6])]
should be able to be expressed as something like:
tmp = np.multiply(na_values, ls_alloc)
# display tmp
tmp
array([[ 0. , 0.4, 0.6, 0.6],
[ 0.1, 0.4, 1.2, 0.6],
[ 0.2, 0.4, 1.8, 0.6],
[ 0.3, 0.4, 2.4, 0.6]])
Is there a numpy function which will achieve what I want elegantly and succinctly?
Edit:
I see that my first solution has transposed my data, such that I am returned a list of ndarrays. na_components[0] now gives an ndarray of the stock values for the first stock, 1 element per date.
The next step that I perform with na_components is to calculate the total cumulative return for the portfolio by summing each individual component
na_pfo_cum_ret = np.sum(na_components, axis=0)
This works with the list of individual stock return ndarrays.

That order seems a little odd to me, but IIUC, all you need to do is to transpose the result of multiplying na_values by array(ls_alloc):
>>> v
array([[ 0., 1., 2., 3.],
[ 1., 1., 4., 3.],
[ 2., 1., 6., 3.],
[ 3., 1., 8., 3.]])
>>> a
array([ 0.1, 0.4, 0.3, 0.2])
>>> (v*a).T
array([[ 0. , 0.1, 0.2, 0.3],
[ 0.4, 0.4, 0.4, 0.4],
[ 0.6, 1.2, 1.8, 2.4],
[ 0.6, 0.6, 0.6, 0.6]])

It's not completely clear to me what you want to do, but the answer is probably in Broadcasting rules. I think you want:
values = np.array( [[ 0, 1, 2, 3 ],
[ 1, 1, 4, 3 ],
[ 2, 1, 6, 3 ],
[ 3, 1, 8, 3 ]] )
ls_alloc = np.array([ 0.1, 0.4, 0.3, 0.2])
and either:
na_components = values * ls_alloc
or:
na_components = values * ls_alloc[:,np.newaxis]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Add an extra column to ndarray in python - numpy

This solved my problem. I used np.column_stack. feature_matrix = [[0.1, 0.3], [0.7, 0.8], [0.8, 0.8]] position = [10, 20, 30] feature_matrix = np.column_stack((position, feature_matrix))

Related

create a lineplot from a few variables

in pyplot hist2D with customized colorbar mark bins outside colorbar range

tensorflow how do one get the output the same size as input tensor after segment sum

How to do multiply each row of a matrix by different scalar in tensorflow [duplicate]

numpy - multiply each element in array by a scaling factor

Categories

Resources