Pandas Grouping Select entire Column - pandas

I used the pandas groupby method to get the following dataframe. How do I select an entire column from this dataframe, say the column named EventID or Value.
df['Value'] gives the entire dataFrame back, not just the Value column.
Value
Realization Occurrence EventID
1 207 2023378 20
213 2012388 25
291 2012612 28
324 2036783 12
357 2255910 45
399 2166643 64
420 2022922 19
2 207 2010673 56
249 2018319 77
282 2166809 43

df['Value'] is just the Value column. The reason why there is so much other data attached is because df['Value'] has a MultiIndex with three levels.
To drop the MultiIndex, you could use
df['Value'].reset_index(drop=True)
or, you could get a NumPy array of the underlying data using
df['Value'].values

Related

Remove related row from pandas dataframe

I have the following dataframe:
id
relatedId
coordinate
123
125
55
125
123
45
128
130
60
132
135
50
130
128
40
135
132
50
So I have 6 rows in this dataframe, but I would like to get rid of the related rows resulting in 3 rows. The coordinate column equals 100 between the two related rows, and I would like to keep the one with the lowest value (so the one less than 50. If both are 50, simply one of them). The resulting dataframe would thus be:
id
relatedId
coordinate
125
123
45
132
135
50
130
128
40
Hopefully someone has a good solution for this problem.
Thanks
You can sort the values and get the first value per group using a frozenset of the 2 ids as grouper:
(df
.sort_values(by='coordinate')
.groupby(df[['id', 'relatedId']].agg(frozenset, axis=1), as_index=False)
.first()
)
output:
id relatedId coordinate
0 130 128 40
1 125 123 45
2 132 135 50
Alternatively, to keep the original order, and original indices, use idxmin per group:
group = df[['id', 'relatedId']].agg(frozenset, axis=1)
idx = df['coordinate'].groupby(group).idxmin()
df.loc[sorted(idx)]
output:
id relatedId coordinate
1 125 123 45
3 132 135 50
4 130 128 40

Use Fillna Based on where condition pandas [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
pandas: fillna with data from another dataframe, based on the same ID
(2 answers)
Closed last year.
I have two datasets:
First Dataset:
Customer_Key Incentive_Amount
3434 32
5635 56
6565 NaN
3453 45
Second Dataset:
Customer_Key Incentive_Amount
3425 87
6565 22
1474 46
9842 29
First Dataset has many rows where incentive_amount value is NaN. but it is present in second dataset. For example, See customer_Key = 6565, it's incentive_amount is missing in dataset_1 but present in dataset_2. So, For all NaN values of incentive_amount in dataset_1, copy the incentive_amount value from dataset_2 based on matching customer_key.
Psuedocode will be like:
df_1['incentive_amount'] = np.where(df_1['incentive_incentive']='NaN',
(df_1['incentive_amount'].fillna(df_2['incentive_amount'])
if
df_1['customer_key'] = df_2['customer_key']),
df_1['incentive_amount'])
There are many ways to do this. Please do some reading
combine_first
update
merge

Pandas DataFrame - Getting first two rows in each group

I have dataframe of below structure. Its grouped and sorted by first two columns. I want to have first two rows in each group. How i can get that?
Use groupby_head:
>>> df
open
AVN 20210929 119
20210928 110
20210927 120
PSMC 20210929 270
20210928 265
20210927 260
>>> df.groupby(level=0).head(2)
open
AVN 20210929 119
20210928 110
PSMC 20210929 270
20210928 265

Sorting rows in the following unique manner (values for columns can be interchanged within the same row, to sort the row)

Input Data frame:
1. 0th col 1st_col 2nd_col
2. 23 46 6
3. 33 56 3
4. 243 2 21
The output data frame should be like:
Index
1. 0th col 1st_col 2nd_col
2. 6 23 46
3. 3 33 56
4. 2 21 243
The rows have to be sorted in ascending or descending order, Independent of columns, Means values for columns can be interchanged within the same row, to sort the row. Sorting rows in the following unique manner.
Please Help, I am in the middle of something very important.
Convert DataFrame to numpy array and sort by np.sort with axis=1, then create DataFrame by constructor:
df1 = pd.DataFrame(np.sort(df.to_numpy(), axis=1),
index=df.index,
columns=df.columns)
print (df1)
0th col 1st_col 2nd_col
1 6 23 46
2 3 33 56
3 2 21 243

Column names after transposing a dataframe

I have a small dataframe - six rows (not counting the header) and 53 columns (a store name, and the rest weekly sales for the past year). Each row contains a particular store and each column the store's name and sales for each week. I need to transpose the data so that the weeks appear as rows, the stores appear as columns, and their sales appear as the rows.
To generate the input data:
df_store = pd.read_excel(SourcePath+SourceFile, sheet_name='StoreSales', header=0, usecols=['StoreName'])
# Number rows of all irrelevant stores.
row_numbers = [x+1 for x in df_stores[(df_store['StoreName'] != 'Store1') & (df_store['StoreName'] != 'Store2')
& (df_store['StoreName'] !='Store3')].index]
# Read in entire Excel file, skipping the rows of irrelevant stores.
df_store = pd.read_excel(SourcePath+SourceFile, sheet_name='StoreSales', header=0, usecols = "A:BE",
skiprows = row_numbers, converters = {'StoreName' : str})
# Transpose dataframe
df_store_t = df_store.transpose()
My output puts index numbers above each store name ( 0 to 5), and then each column starts out as StoreName (above the week), then each store name. Yet, I cannot manipulate them by their names.
Is there a way to clear those index numbers so that I can work directly with the resulting column names (e.g., rename "StoreName" to "WeekEnding" and make reference to each store columns ("Store1", "Store2", etc.?)
IIUC, you need to set_index first, then transpose, T:
See this example:
df = pd.DataFrame({'Store':[*'ABCDE'],
'Week 1':np.random.randint(50,200, 5),
'Week 2':np.random.randint(50,200, 5),
'Week 3':np.random.randint(50,200, 5)})
Input Dataframe:
Store Week 1 Week 2 Week 3
0 A 99 163 148
1 B 119 86 92
2 C 145 98 162
3 D 144 143 199
4 E 50 181 177
Now, set_index and transpose:
df_out = df.set_index('Store').T
df_out
Output:
Store A B C D E
Week 1 99 119 145 144 50
Week 2 163 86 98 143 181
Week 3 148 92 162 199 177