Apply shading, formatting and borders to pivoted dataframe - dataframe

I have the following data that has been pivoted:
pip install Jinja2
import pandas as pd
import numpy as np
from numpy import rec, nan
a=rec.array([('FY20', 361.410592 , nan, 21.97, nan, 'Total', 'Fast'),
('FY21', 359.26952604, -1., 22.99, 5., 'Total', 'Fast'),
('FY22', 362.4560529 , 1., 22.77, -1., 'Total', 'Fast'),
('FY23', 371.53543252, 2., 21.92, -4., 'Total', 'Fast'),
('FY24', 374.48894494, 1., 21.88, -0., 'Total', 'Fast'),
('FY25', 377.09481613, 1., 21.85, -0., 'Total', 'Fast'),
('FY20', 67.043756 , nan, 21. , nan, 'Homes', 'Fast'),
('FY21', 110.12145222, 63., 20.95, -0., 'Homes', 'Fast'),
('FY22', 117.46526727, 7., 20.73, -1., 'Homes', 'Fast'),
('FY23', 125.83482531, 7., 18.99, -8., 'Homes', 'Fast'),
('FY24', 126.16748411, 1., 18.95, -0., 'Homes', 'Fast'),
('FY25', 127.786528 , 1., 18.96, 0., 'Homes', 'Fast'),
('FY20', 294.366836 , nan, 22.19, nan, 'Businesses', 'Fast'),
('FY21', 249.14807381, -15., 23.89, 8., 'Businesses', 'Fast'),
('FY22', 245.99078563, -2., 23.74, -1., 'Businesses', 'Fast'),
('FY23', 245.70060721, 0., 23.42, -1., 'Businesses', 'Fast'),
('FY24', 247.32146083, 1., 23.37, -0., 'Businesses', 'Fast'),
('FY25', 250.30828813, 1., 23.33, -0., 'Businesses', 'Fast'),
('FY20', 184.631684 , nan, 15.47, nan, 'Total', 'Medium'),
('FY21', 274.25718084, 49., 15.53, 0., 'Total', 'Medium'),
('FY22', 333.23835913, 21., 15.33, -1., 'Total', 'Medium'),
('FY23', 357.33167549, 7., 15.52, 1., 'Total', 'Medium'),
('FY24', 367.84796426, 3., 15.53, 0., 'Total', 'Medium'),
('FY25', 370.1664439 , 1., 15.53, 0., 'Total', 'Medium'),
('FY20', 46.522416 , nan, 17.89, nan, 'Homes', 'Medium'),
('FY21', 97.63428522, 112., 18.72, 5., 'Homes', 'Medium'),
('FY22', 141.25547499, 46., 17.86, -5., 'Homes', 'Medium'),
('FY23', 157.06766598, 11., 18.33, 3., 'Homes', 'Medium'),
('FY24', 163.02337094, 4., 18.29, -0., 'Homes', 'Medium'),
('FY25', 165.98360465, 1., 18.28, -0., 'Homes', 'Medium'),
('FY20', 138.109268 , nan, 14.66, nan, 'Businesses', 'Medium'),
('FY21', 177.62289562, 28., 13.77, -6., 'Businesses', 'Medium'),
('FY22', 191.98288414, 8., 13.46, -2., 'Businesses', 'Medium'),
('FY23', 200.26400951, 4., 13.31, -1., 'Businesses', 'Medium'),
('FY24', 203.82459332, 2., 13.31, 0., 'Businesses', 'Medium'),
('FY25', 205.18283926, 1., 13.31, 0., 'Businesses', 'Medium')],
dtype=[('FY', 'O'), ('ADV', '<f8'), ('YoY_ADV', '<f8'), ('Yield', '<f8'), ('YoY_Yld', '<f8'), ('Cut', 'O'), ('Product', 'O')])
df = pd.DataFrame(a)
df1=pd.melt(df, id_vars=['FY','Product','Cut'], var_name="Metric", value_name="Value")
df2 = pd.pivot(df1, index=['Metric', 'Product','Cut'],columns=['FY'],values=['Value'])
df2
And looks like this:
I want to apply table styles so I can copy/paste a polished table into PowerPoint but need the following:
Shade columns FY23, FY24, FY25 in orange
Apply formatting: Metric=ADV rounded to zero decimals, Metric=Yield to 2 decimals, and each of YoY_ADV plus YoY_Yld to 1 decimal place
Negative numbers red, otherwise black
Apply frame around table.
Here is my code but I am getting error 'Cannot index with multidimensional key':
# 1. If numbers are negative, make red otherwise black
#####################################################
def color_negative_red(x):
if x < 0:
return 'color: red'
else:
return 'color: black'
# 2. Slide major metrics so formatting can be applied
######################################################
adv_slice=df2.loc[('ADV', slice(None)), :]
yld_slice=df2.loc[('Yield', slice(None)), :]
yoy_adv_slice=df2.loc[('YoY_ADV', slice(None)), :]
yoy_yld_slice=df2.loc[('YoY_Yld', slice(None)), :]
#3. Apply table style
#####################
df2.style.applymap(color_negative_red).set_properties(**{'background-color': 'orange'}, subset=['FY23','FY24','FY25']).format('{:.0f}', subset=adv_slice, na_rep='-').format('{:.2f}', subset=yld_slice, na_rep='-').format('{:.1f}', subset=(yoy_adv_slice,yoy_yld_slice), na_rep='-').set_table_styles([{'selector': '',
'props' : [('border','1px solid black')]},
{'selector': 'th',
'props' : [('border','1px solid black')]},
{'selector': 'td',
'props' : [('border','1px solid black')]}]).set_properties(**{'text-align': 'center'})
What is needed to make the code work?

With some mods:
[![# 1. If numbers are negative, make red otherwise black
#####################################################
def color_negative_red(x):
if x < 0:
return 'color: red'
else:
return 'color: black'
#2. Apply table style
#####################
df2.style.applymap(color_negative_red).set_properties(**{'background-color': 'orange'}, subset=\['FY23','FY24','FY25'\]).\
format('{:.0f}', subset=('ADV',), na_rep='-').\
format('{:.2f}', subset=('Yield',), na_rep='-').\
format('{:.1f}', subset=('YoY_ADV',), na_rep='-').\
format('{:.1f}', subset=('YoY_Yld',), na_rep='-').\
set_table_styles(\[{'selector': '',
'props' : \[('border','0.1px solid black')\]},
{'selector': 'th',
'props' : \[('border','0.5px solid black')\]},
{'selector': 'td',
'props' : \[('border','0.5px solid black')\]}\]).set_properties(**{'text-align': 'center'})][1]][1]

Related

merge two thresholds for two 3D arrays into a list

I have the first 3D array of size (50,250,250) that includes data points (1,2,3,4,5). I set up a threshold that is 3, where the data points above should equal to 1 and below it equal to 0. the only exception is when the data points are equal to 3, it has to test the second threshold (threshold1=50) that is based on the second 3D array of size (50,250,250). my equation is how to include the two thresholds in my code! In other words, the for loop will check every datapoint in array 1 and perform the first threshold testing, if the datapoint is equal to 3, the for loop should check the counterpart of that datapoint in the second array for the second threshold testing! I have tried the below code, but the results did not make sense
res1=[]
f1=numpy.ones((250, 250))
threshold=3
threshold1=30
for i in array1:
i = i.data
ii= f1*i
ii[ii < threshold] = 0
ii[ii > threshold] = 1
res1.append(ii)
if ii[ii == threshold]:
for j in array2:
j = j.data
jj[jj < threshold1] = 0
jj[jj > threshold1] = 1
res1.append(jj)
Array1:
array([[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 1.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.],
[3., 3., 3., ..., 0., 0., 0.]],
Array2:[[ nan, nan, nan, ..., nan,
0.9839769, 1.7042577],
[ nan, nan, nan, ..., nan,
nan, nan],
[ nan, nan, nan, ..., 3.2351596,
2.0924768, 1.7604152],
...,
[ nan, nan, nan, ..., 158.48865 ,
158.48865 , 125.888 ],
[ nan, nan, nan, ..., 158.48865 ,
158.48865 , 158.48865 ],
[ nan, nan, nan, ..., 125.88556 ,
158.48865 , 158.48865 ]],
the produced list (rest1)
`[array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 0., 0., 0.]]),
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],`
IIUC, for your second if condition, you are trying to see whether there is at least a 3 value in that array1, and then you will choose that 2D array of the same position. In that case, you should use in operator.
for i in range(len(array1)):
if threshold in array1[i]:
array2[i][array2[i] < threshold1] = 0
array2[i][array2[i] > threshold1] = 1
res1.append(array2[i])
else:
array1[i][array1[i] < threshold] = 0
array1[i][array1[i] > threshold] = 1
res1.append(array1[i])
The above method is a bit lengthy for numpy. There's a numpy way to do this, too.
array1[array1 < threshold] = 0
array1[array1 > threshold] = 1
array2_condition = np.unique(np.argwhere(array1 == 3)[:,0]) # return the index of array1 if 3 in array1
chosen_array2 = array2[array2_condition]
chosen_array2[chosen_array2 < threshold1] = 0
chosen_array2[chosen_array2 > threshold1] = 1
array2[array2_condition] = chosen_array2 # if you still want array2 values to be changed
res1 = array1
res1[array2_condition] = chosen_array2 # Final result
Update
As was mentioned by the OP, every 2D array has at least a 3 in it. So, the array2_condition is not applicable. Instead, we will modify the array2_condition and use a for loop to change the elements.
res1 = array1
res1[res1 < threshold] = 0
res1[res1 > threshold] = 1
array2_condition = np.argwhere(array1 == 3)
for data in array2_condition:
if array2[tuple(data)] > threshold1:
res1[tuple(data)] = 1
elif array2[tuple(data)] < threshold1:
res12[tuple(data)] = 0

`sklearn.preprocessing.normalize` (L2 norm) equivalent in Tensorflow or TFX

How can I do the L2 norm in Tensorflow? I'm looking for the equivalent of sklearn.preprocessing.normalize in Tensorflow or in tfx.
You can use tensorflow.keras.utils.normalize for L2 norm as follows.
Using sklearn.preprocessing.normalize
X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
X_normalized = sklearn.preprocessing.normalize(X, norm='l2')
X_normalized
Output:
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])
Using tf.keras.utils.normalize gives the same output as above
X = [[ 1., -1., 2.],
[ 2., 0., 0.],
[ 0., 1., -1.]]
tf.keras.utils.normalize(
X, order=2
)
Output:
array([[ 0.40824829, -0.40824829, 0.81649658],
[ 1. , 0. , 0. ],
[ 0. , 0.70710678, -0.70710678]])

NumPy : How to determine the index of the first axis of an ndarray according to some condition?

Consider the following ndarray a -
In [117]: a
Out[117]:
array([[[nan, nan],
[nan, nan],
[nan, nan]],
[[ 3., 11.],
[ 7., 13.],
[12., 16.]],
[[ 0., 4.],
[ 6., 1.],
[ 5., 8.]],
[[17., 10.],
[15., 9.],
[ 2., 14.]]])
The minimum computed on the first axis is -
In [118]: np.nanmin(a, 0)
Out[118]:
array([[0., 4.],
[6., 1.],
[2., 8.]])
which is a[2] from visual inspection. What is the most efficient way to calculate this index 2
as suggested by #Divakar you can use np.nanargmin
import numpy as np
a = np.array([[[np.nan, np.nan],
[np.nan, np.nan],
[np.nan, np.nan]],
[[ 3., 11.],
[ 7., 13.],
[12., 16.]],
[[ 0., 4.],
[ 6., 1.],
[ 5., 8.]],
[[17., 10.],
[15., 9.],
[ 2., 14.]]])
minIdx = np.nanargmin(np.sum(a,(1,2)))
minIdx
2
a[minIdx]
array([[0., 4.],
[6., 1.],
[5., 8.]])

Does numpy provide methods for fundamental matrix operations?

Namely, rearranging rows, adding multiples of rows, and multiplying by scalars.
I don't see these methods defined in http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html or elsewhere.
And if they aren't defined, then why not?
Yes, you can manipulate array rows, adding and multiplying them. For example:
In [1]: import numpy as np
In [2]: m = np.ones((3, 4))
In [3]: m
Out[3]:
array([[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.],
[ 1., 1., 1., 1.]])
In [4]: m[1, :] = 2*m[1, :] # Multiply
In [5]: m
Out[5]:
array([[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 1., 1., 1., 1.]])
In [6]: m[0, :] = m[0, :] + 2*m[1, :] # Multiply and add
In [7]: m
Out[7]:
array([[ 5., 5., 5., 5.],
[ 2., 2., 2., 2.],
[ 1., 1., 1., 1.]])
In [8]: m[ (0, 2), :] = m[ (2, 0), :] # Swap rows
In [9]: m
Out[9]:
array([[ 1., 1., 1., 1.],
[ 2., 2., 2., 2.],
[ 5., 5., 5., 5.]])

vstack numpy arrays

If I have two ndarrays:
a.shape # returns (200,300, 3)
b.shape # returns (200, 300)
numpy.vstack((a,b)) # Gives error
Would print out the error:
ValueError: arrays must have same number of dimensions
I tried doing vstack((a.reshape(-1,300), b) which kind of works, but the output is very weird.
You don't specify what final shape you actually want. If it's (200, 300, 4), you can use dstack instead:
>>> import numpy as np
>>> a = np.random.random((200,300,3))
>>> b = np.random.random((200,300))
>>> c = np.dstack((a,b))
>>> c.shape
(200, 300, 4)
Basically, when you're stacking, the lengths have to agree in all the other axes.
[Updated based on comment:]
If you want (800, 300) you could try something like this:
>>> a = np.ones((2, 3, 3)) * np.array([1,2,3])
>>> b = np.ones((2, 3)) * 4
>>> c = np.dstack((a,b))
>>> c
array([[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]],
[[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.],
[ 1., 2., 3., 4.]]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1)
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 2., 2., 2.],
[ 3., 3., 3.],
[ 3., 3., 3.],
[ 4., 4., 4.],
[ 4., 4., 4.]])
>>> c.T.reshape(c.shape[0]*c.shape[-1], -1).shape
(8, 3)