How to walk through two identically shaped dataframes cell by cell and apply some logic by cell comparison

How to walk through two identically shaped dataframes cell by cell and apply some logic by cell comparison - pandas

I need to compare two identically shaped dataframes cell by cell applying some function depending on the comparison result.
some psydocode for illustration
for cell in dataframes:
if cells are equal:
do someting
else:
do someting else
I checked the DataFrame.compare method but didn't get anywhere.

If i understood your question clearly, this could help:
for i in range(df1.shape[0]):
for j in range(df1.shape[1]):
if df1.iloc[i, j] == df2.iloc[i, j]:
# Do something
pass
else:
# Do something else
pass

Related

Pandas: What's the difference between df[df[condition]] and df.loc[df[condition]]

Using a condition to mask certain rows from a dataframe, I would use:
df.loc[mask]
Setting up a mask as a condition (such as selecting only rows having 'y' in column_x) on the dataframe itself without assigning the mask to a variable, I would usually do something like:
df[df['column_x'] == 'y']
But it made me wonder what the use of df.loc actually is in these cases. Am I getting it wrong or is the use of loc in such instances redundant?

Pandas is being smart about you mean by df[df['column_x'] == 'y'] based on the length of the boolean series df['column_x'] == 'y' and the fact that its index happens to align with the index of df. It's syntactic sugar. You can imagine cases where a dataframe is nearly square where things would be more ambiguous.
The .loc accessor is the official and least ambiguous way to access a subset of a dataframe by rows, columns, or both.

pandas add different stylers

I want to make a summary dataframe which highlights my experiments mean and std results, I would like to turn it into a styler and highlight the best values, but I don't know how I can succeed in doing so.
For instance my input data is:
mean = pd.DataFrame({'val': [1, 2, 3]})
std = pd.DataFrame({'val': [.1, .2, .1]})
And the result I would like to have would be something like this :
res = pd.DataFrame({'val': ['1$\pm$.1', '2$\pm$.2','3$\pm$.3']})
I could just turn my dataframes' cells to str and simply add them together but the thing is that I want to highlight the cells with either the highest mean value or either the lowest std one (or even other rules for which I usually define a custom highlight function), which is not possible in a straightforward fashion using strings.
I use stylers in my other tables as I input them directly into a tex document, but here I really don't know how to proceed.

How to select elements of an array from a specific axis in Python

I am working with multidimensional arrays with dynamical axes. Now I want to select elements of the array along a specific axis.
For example, if I have a 3-dimensional array, I want to pick the elements like this
b = a[:, :, 1]
Now my problem is that after one iteration of code the same array becomes 4 dimensional. And again I want to pick the elements like this
b = a[:,:,1,:]
Thus I am looking for a general solution to pick all elements from the 3rd axis of the array. This is very simple if I had to choose a[1] and I could get a[1,:,:,:], but I am not aware how to chose for other axes.
Edit:
Also, I would be interested in a solution where the interested axis also changes for example with the same code and next iteration I would like to get
b = a[:,:,:,1]

Simple question about slicing a Numpy Tensor

I have a Numpy Tensor,
X = np.arange(64).reshape((4,4,4))
I wish to grab the 2,3,4 entries of the first dimension of this tensor, which you can do with,
Y = X[[1,2,3],:,:]
Is this a simpler way of writing this instead of explicitly writing out the indices [1,2,3]? I tried something like [1,:], which gave me an error.
Context: for my real application, the shape of the tensor is something like (30000,100,100). I would like to grab the last (10000, 100,100) to (30000,100,100) of this tensor.

The simplest way in your case is to use X[1:4]. This is the same as X[[1,2,3]], but notice that with X[1:4] you only need one pair of brackets because 1:4 already represent a range of values.
For an N dimensional array in NumPy if you specify indexes for less than N dimensions you get all elements of the remaining dimensions. That is, for N equal to 3, X[1:4] is the same as X[1:4, :, :] or X[1:4, :]. Only if you want to index some dimension while getting all elements in a dimension that comes before it is that you actually need to pass :. Such as X[:, 2:4], for instance.

If you wish to select from some row to the end of array, simply use python slicing notation as below:
X[10000:,:,:]
This will select all rows from 10000 to the end of array and all columns and depths for them.

Numpy: convert index in one dimension into many dimensions

Many array methods return a single index despite the fact that the array is multidimensional. For example:
a = rand(2,3)
z = a.argmax()
For two dimensions, it is easy to find the matrix indices of the maximum element:
a[z/3, z%3]
But for more dimensions, it can become annoying. Does Numpy/Scipy have a simple way of returning the indices in multiple dimensions given an index in one (collapsed) dimension? Thanks.

Got it!
a = X.argmax()
(i,j) = unravel_index(a, X.shape)

I don't know of an built-in function that does what you want, but where this
has come up for me, I realized that what I really wanted to do was this:
given 2 arrays a,b with the same shape, find the element of b which is in
the same position (same [i,j,k...] position) as the maximum element of a
For this, the quick numpy-ish solution is:
j = a.flatten().argmax()
corresponding_b_element = b.flatten()[j]
Vince Marchetti

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to walk through two identically shaped dataframes cell by cell and apply some logic by cell comparison - pandas

I need to compare two identically shaped dataframes cell by cell applying some function depending on the comparison result. some psydocode for illustration for cell in dataframes: if cells are equal: do someting else: do someting else I checked the DataFrame.compare method but didn't get anywhere.

If i understood your question clearly, this could help: for i in range(df1.shape[0]): for j in range(df1.shape[1]): if df1.iloc[i, j] == df2.iloc[i, j]: # Do something pass else: # Do something else pass

Related

Pandas: What's the difference between df[df[condition]] and df.loc[df[condition]]

pandas add different stylers

How to select elements of an array from a specific axis in Python

Simple question about slicing a Numpy Tensor

Numpy: convert index in one dimension into many dimensions

Categories

Resources