How to walk through two identically shaped dataframes cell by cell and apply some logic by cell comparison - pandas

I need to compare two identically shaped dataframes cell by cell applying some function depending on the comparison result.
some psydocode for illustration
for cell in dataframes:
if cells are equal:
do someting
else:
do someting else
I checked the DataFrame.compare method but didn't get anywhere.

If i understood your question clearly, this could help:
for i in range(df1.shape[0]):
for j in range(df1.shape[1]):
if df1.iloc[i, j] == df2.iloc[i, j]:
# Do something
pass
else:
# Do something else
pass

Related

Pandas: What's the difference between df[df[condition]] and df.loc[df[condition]]

Using a condition to mask certain rows from a dataframe, I would use:
df.loc[mask]
Setting up a mask as a condition (such as selecting only rows having 'y' in column_x) on the dataframe itself without assigning the mask to a variable, I would usually do something like:
df[df['column_x'] == 'y']
But it made me wonder what the use of df.loc actually is in these cases. Am I getting it wrong or is the use of loc in such instances redundant?
Pandas is being smart about you mean by df[df['column_x'] == 'y'] based on the length of the boolean series df['column_x'] == 'y' and the fact that its index happens to align with the index of df. It's syntactic sugar. You can imagine cases where a dataframe is nearly square where things would be more ambiguous.
The .loc accessor is the official and least ambiguous way to access a subset of a dataframe by rows, columns, or both.

pandas add different stylers

I want to make a summary dataframe which highlights my experiments mean and std results, I would like to turn it into a styler and highlight the best values, but I don't know how I can succeed in doing so.
For instance my input data is:
mean = pd.DataFrame({'val': [1, 2, 3]})
std = pd.DataFrame({'val': [.1, .2, .1]})
And the result I would like to have would be something like this :
res = pd.DataFrame({'val': ['1$\pm$.1', '2$\pm$.2','3$\pm$.3']})
I could just turn my dataframes' cells to str and simply add them together but the thing is that I want to highlight the cells with either the highest mean value or either the lowest std one (or even other rules for which I usually define a custom highlight function), which is not possible in a straightforward fashion using strings.
I use stylers in my other tables as I input them directly into a tex document, but here I really don't know how to proceed.

How to select elements of an array from a specific axis in Python

I am working with multidimensional arrays with dynamical axes. Now I want to select elements of the array along a specific axis.
For example, if I have a 3-dimensional array, I want to pick the elements like this
b = a[:, :, 1]
Now my problem is that after one iteration of code the same array becomes 4 dimensional. And again I want to pick the elements like this
b = a[:,:,1,:]
Thus I am looking for a general solution to pick all elements from the 3rd axis of the array. This is very simple if I had to choose a[1] and I could get a[1,:,:,:], but I am not aware how to chose for other axes.
Edit:
Also, I would be interested in a solution where the interested axis also changes for example with the same code and next iteration I would like to get
b = a[:,:,:,1]

Simple question about slicing a Numpy Tensor

I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.
The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.
If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.

Numpy: convert index in one dimension into many dimensions

Many array methods return a single index despite the fact that the array is multidimensional. For example:
a = rand(2,3)
z = a.argmax()
For two dimensions, it is easy to find the matrix indices of the maximum element:
a[z/3, z%3]
But for more dimensions, it can become annoying. Does Numpy/Scipy have a simple way of returning the indices in multiple dimensions given an index in one (collapsed) dimension? Thanks.
Got it!
a = X.argmax()
(i,j) = unravel_index(a, X.shape)
I don't know of an built-in function that does what you want, but where this
has come up for me, I realized that what I really wanted to do was this:
given 2 arrays a,b with the same shape, find the element of b which is in
the same position (same [i,j,k...] position) as the maximum element of a
For this, the quick numpy-ish solution is:
j = a.flatten().argmax()
corresponding_b_element = b.flatten()[j]
Vince Marchetti