Get Column Index based on Value rather than Column Label - pandas

Say I have the following Dataframe.
df = pd.DataFrame([["a", "b", "c"], ["d", "e", "f"],["g", "h", "i"]])
How do I get the column index of "c" in row 0?
I know there are ways to get the column index if there are column labels, but I can't find ways to return the column index just based on the cell value, if searching a particular row.

Like this:
df[df[df.eq("c").any(1)].T.eq("c").any(1)].index[0]
2

Here's one way:
You can create a extra dataframe that check each cell value in your original dataframe to locate the string inside each cell. Here you would like to find c which can be mapped as follow:
check_df = df.applymap(lambda in_each_cell_value: str(in_each_cell_value).find("c") >= 0)
The check_df returns the boolean values and locates the string in the cell.
From the above 'check_df' you can extract and keep the column where the cell was found:
[column for column, count in check_df.sum().to_dict().items()
if count > 0]
Complete code:
df = pd.DataFrame([["a", "b", "c"], ["d", "e", "f"],["g", "h", "i"]])
check_df = df.applymap(lambda in_each_cell_value: str(in_each_cell_value).find("c") >= 0)
ind = [column for column, count in check_df.sum().to_dict().items()
if count > 0]
Outputs:
ind[0]
2

Related

Pandas: Getting indices (numeric position) from external array for each value in Column

I have an fixed value with arrays: ['string1', 'string2', 'string3'] and a Pandas Datafrae:
>>> pd.DataFrame({'column': ['string1', 'string1', 'string2']})
column
0 string1
1 string1
2 string2
And I want to add a new column with the indices position from the previous array, so it becomes:
>>> pd.DataFrame({'column': ['string1', 'string1', 'string2', pd.NA], 'indices': [0,0,1, pd.NA]})
column indices
0 string1 0
1 string1 0
2 string2 1
3 <NA> <NA>
I.e the position of the value in the main array. This will be later fed into pyarrow's DictionaryArray[1]. The Dataframe can have null values as well
Is there any fast way to do this? Been trying to figure out how to vectorize it. Naive implementation:
def create_dictionary_array_indices(column_name, arrow_array):
global dictionary_values
values = arrow_array.to_pylist()
indices = []
for i, value in enumerate(values):
if not value or value != value:
indices.append(None)
else:
indices.append(
dictionary_values[column_name].index(value)
)
indices = pd.array(indices, dtype=pd.Int32Dtype())
return pa.DictionaryArray.from_arrays(indices, dictionary_values[column_name])
[1] https://lists.apache.org/thread/xkpyb3zboksbhmyqzzkj983y6l0t9bjs
Given your two dataframes:
import pandas as pd
df1 = pd.DataFrame({"column": ["string1", "string1", "string2"]})
df2 = pd.DataFrame({"column": ["string1", "string1", "string2", pd.NA]})
Here is one way to do it:
df1 = df1.drop_duplicates(keep="first").reset_index(drop=True)
indices = {value: key for key, value in df1["column"].items()}
df2["indices"] = df2["column"].apply(lambda x: indices.get(x, pd.NA))
print(df2)
# Output
column indices
0 string1 0
1 string1 0
2 string2 1
3 <NA> <NA>

to_string(index = False) results in non empty string even when dataframe is empty

I am doing the following in my python script and I want to hide the index column when I print the dataframe. So I used .to_string(index = False) and then use len() to see if its zero or not. However, when i do to_string(), if the dataframe is empty the len() doesn't return zero. If i print the procinject1 it says "Empty DataFrame". Any help to fix this would be greatly appreciated.
procinject1=dfmalfind[dfmalfind["Hexdump"].str.contains("MZ") == True].to_string(index = False)
if len(procinject1) == 0:
print(Fore.GREEN + "[✓]No MZ header detected in malfind preview output")
else:
print(Fore.RED + "[!]MZ header detected within malfind preview (Process Injection indicator)")
print(procinject1)
That's the expected behaviour in Pandas DataFrame.
In your case, procinject1 stores the string representation of the dataframe, which is non-empty even if the corresponding dataframe is empty.
For example, check the below code snippet, where I create an empty dataframe df and check it's string representation:
df = pd.DataFrame()
print(df.to_string(index = False))
print(df.to_string(index = True))
For both index = False and index = True cases, the output will be the same, which is given below (and that is the expected behaviour). So your corresponding len() will always return non-zero.
Empty DataFrame
Columns: []
Index: []
But if you use a non-empty dataframe, then the outputs for index = False and index = True cases will be different as given below:
data = [{'A': 10, 'B': 20, 'C':30}, {'A':5, 'B': 10, 'C': 15}]
df = pd.DataFrame(data)
print(df.to_string(index = False))
print(df.to_string(index = True))
Then the outputs for index = False and index = True cases respectively will be -
A B C
10 20 30
5 10 15
A B C
0 10 20 30
1 5 10 15
Since pandas handles empty dataframes differently, to solve your problem, you should first check whether your dataframe is empty or not, using pandas.DataFrame.empty.
Then if the dataframe is actually non-empty, you could print the string representation of that dataframe, while keeping index = False to hide the index column.

Fill zeroes with increment of the max value

I have the following dataframe
df = pd.DataFrame([{'id':'a', 'val':1}, {'id':'b', 'val':2}, {'id':'c', 'val': 0}, {'id':'d', 'val':0}])
What I want is to replace 0's with +1 of the max value
The result I want is as follows:
df = pd.DataFrame([{'id':'a', 'val':1}, {'id':'b', 'val':2}, {'id':'c', 'val': 3}, {'id':'d', 'val':4}])
I tried the following:
for _, r in df.iterrows():
if r.val == 0:
r.val = df.val.max()+1
However, it there a one-line way to do the above
Filter only 0 rows with boolean indexing and DataFrame.loc and assign range with count Trues values of condition with add maximum value and 1, because python count from 0 in range:
df.loc[df['val'].eq(0), 'val'] = range(df['val'].eq(0).sum()) + df.val.max() + 1
print (df)
id val
0 a 1
1 b 2
2 c 3
3 d 4

How to add a column of Float64 all filled with NA values to a Julia DataFrame?

Seems silly, but I can't figure out how to add a column of Float64 all filled with NA values to a Julia DataFrame in a simple way.
I can do it with the following code, but it seems odd:
df = DataFrame(col1 = [1,2,3], col2 = ['a','b','c'])
df[:a] = 1:size(df, 1)
df[:a] = convert(DataArrays.DataArray{Float64,1},df[:a])
[df[i,:a] = NA for i in 1:size(df, 1) ]
DataArrays are intialized with NAs by default.
So you should just be able to do:
df = DataFrame(col1 = [1,2,3], col2 = ['a','b','c'])
df[:a] = DataArray(Float64,size(df,1))

Referencing Matrix (VB)

I have a matrix (5x5) with values in them for example:
Matrix (1,1) Value: 'a'
Matrix (1, 2) Value: 'b'
Matrix (2, 1) Value: 'c'
how would I be able to find the letter 'a' in that matrix and have it output the coordinates?
ie
user inputs 'b'
[searches for 'b' in table]
output (1,2)
thanks in advance
It's as simple as:
For i As Integer = 0 To LengthOfMatrix - 1
For y As Integer = 0 To HeightOfMatrix - 1
If Matrix(i, y) = "a" Then Console.Write(i & " " & y & vbCrLf)
Next
Next
Asuming that you declared Matrix as:
Dim Matrix As Char(,) = {{"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}, {"a", "b", "c", "d", "e"}}
And LengthOfMatrix And HeightOfMatrix should be the dimentions of your matrix. They could be switched to something more dynamic like:
For i As Integer = 0 To Matrix.GetLength(0) - 1 'Get's the length of the first dimension
For y As Integer = 0 To Matrix.GetLength(1) - 1 'Get's the length of the second dimension
If Matrix(i, y) = "a" Then Console.Write(i & " " & y & vbCrLf)
Next
Next
In a short description, all that this loop does is it goes through all of the elements of the matrix and outputs the coordinates of every element that matches a certain criteria (In this case - equals to 'a').
Note: In most programming languages array's indexes begin from 0, so the first element in your matrix will be at coords (0,0).