Apply diffs down columns of pandas dataframe [duplicate] - pandas

This question already has answers here:
How to replace NaNs by preceding or next values in pandas DataFrame?
(10 answers)
Closed 3 years ago.
I want to apply diffs down columns for a pandas dataframe.
EX:
A B C
23 40000 1
24 nan nan
nan 42000 2
I would want something like:
A B C
23 40000 1
24 40000 1
24 42000 2
I have tried variations of pandas groupby. I think this is probably the right approach. (or applying some function down columns, but not sure if this is efficient correct me if i'm wrong)
I was able to "apply diffs down the column" and get something like:
A B C
24 42000 2
by calling: df = df.groupby('col', as_index=False).last() for each column, but this is not what I am looking for. I am not a pandas expert so apologies if this is a silly question.
Explained above

Look at this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
df = df.fillna(method='ffill')

Related

Merging on Index and rearranging columns of a pandas dataframe in Python [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I am completely new to programming and started learning Python recently.
I have a pandas data frame df as shown in image 1 and trying to rearrange the columns as shown in image 2.
Can you please help me to complete this.
Thanks and Regards,
Arya.
You can use pd.pivot_table like this:
df=pd.DataFrame({'index':[0,1,2,0,1,2],'Name':['A','A','A','B','B','B'],'Value':[10,20,30,15,25,35]})
df.pivot_table(index='index',columns='Name',values='Value').reset_index()
Out[8]:
Name index A B
0 0 10 15
1 1 20 25
2 2 30 35

How to (idiomatically) use pandas .loc to return an empty dataframe when key is not in index [duplicate]

This question already has answers here:
Pandas .loc without KeyError
(6 answers)
Closed 2 years ago.
Say I have a DataFrame (with a multi-index, for that matter), and I wish to take the values at some index - but, if that index does not exist, I wish for it to return an empty df instead of a KeyError.
I've searched for similar questions, but they are all about pandas returning an empty dataframe when it is not desired at some cases (conversely, I do desire an empty dataframe in return).
For example:
import pandas as pd
df = pd.DataFrame(index=pd.MultiIndex.from_tuples([(1,1),(1,2),(3,1)]),
columns=['a','b'], data=[[1,2],[3,4],[10,20]])
so, df is:
a b
1 1 1 2
2 3 4
3 1 10 20
and df.loc[1] is:
a b
1 1 2
2 3 4
df.loc[2] raises a KeyError, and I'd like something that returns
a b
The closest I could get is by calling df.loc[idx:idx] as a slice, which gives the correct result for idx=2, but for idx=1 it returns
a b
1 1 1 2
2 3 4
instead of the desires result.
Of course I can define a function to do it,
One idea with if-else statament:
def get_val(x):
return df.loc[x] if x in df.index.levels[0] else pd.DataFrame(columns=df.columns)
Or generally with try-except statement:
def get_val(x):
try:
return df.loc[x]
except KeyError:
return pd.DataFrame(columns=df.columns)
print (get_val(1))
a b
1 1 2
2 3 4
print (get_val(2))
Empty DataFrame
Columns: [a, b]
Index: []

Is there a way to use the unique values of a count of occurences into column headers pandas? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
I have a dataframe df1.
Time Category
23:05:07 a
23:11:12 b
23:12:15 a
23:16:12 a
Another dataframe df2 has been created returning the count of occurences of each of the unique values in column category and into intervals of 5 minutes. Using this code df2= df1.resample('5T').category.value_counts()
df2
Time Category Category
23:05 a 1
23:10 a 1
23:10 b 1
23:15 a 1
Is there a way to use the unique values as column headers? Look like:
Time a b
23:05 1 0
23:10 1 1
23:15 1 0
value_counts returns multiindex series. So, you just need to work directly on its result by chaining unstack to get your desired output.
Since you were able to resample, I assume Time is your index and it is already in datetime or timedelta dtype
df_final = df1.resample('5T').Category.value_counts().unstack(fill_value=0)
Out[79]:
Category a b
Time
23:05:07 1 0
23:10:07 1 1
23:15:07 1 0
The code below should work. I renamed your second 'Category' column to 'CatUnique'.
df.groupby(['Time','Category'])['CatUnique'].sum().unstack().fillna(0).reset_index()

Pandas: Replace duplicates by their mean values in a dataframe [duplicate]

This question already has answers here:
group by in group by and average
(3 answers)
Closed 4 years ago.
I have been working with a dataframe in Pandas that contains duplicate entries along with non-duplicates in a column. The dataframe looks something like this:
country_name values category
0 country_1 10 a
1 country_2 20 b
2 country_1 50 a
3 country_2 10 b
4 country_3 100 c
5 country_4 10 d
I want to write something that converts(replaces) duplicates with their mean values in my dataframe. An ideal output would be something similar to the following:
country_name values category
0 country_1 30 a
1 country_2 15 b
2 country_3 100 c
3 country_4 10 d
I have been struggling with this for a while so I would appreciate any help. I have forgotten to add category column. The problem with groupby() method as you now when you call mean() it does not return category column back. My solution was to take numerical columns and the column that has duplicates together apply groupby().mean() then concatenate back to the categorical columns. So I am looking for a solution shorten than what I have done.
My method get tedious when you are dealing with many categorical columns.
You can use df.groupby():
df.groupby('country_name').mean().reset_index()

How do I plot 2 columns of a pandas dataframe excluding rows selected by a third column [duplicate]

This question already has answers here:
Scatter plots in Pandas/Pyplot: How to plot by category [duplicate]
(8 answers)
Closed 4 years ago.
I have a pandas dataframe that looks like this:
Nstrike Nprem TDays
0 0.920923 0.000123 2
1 0.951621 0.000246 2
2 0.957760 0.001105 2
..............................
16 0.583251 0.000491 7
17 0.613949 0.000614 7
18 0.675344 0.000368 7
..............................
100 1.013016 0.029592 27
101 1.043713 0.049730 27
102 1.074411 0.071218 27
etc.
I would like to plot a graph of col.1 vs col.2, in separate plots as selected by col.3, maybe even in different colors.
The only way I can see to do that is to separate the dataframe into discrete dataframes for each col.3 value.
Or I could give up on pandas and just make the col.3 subsets into plain python arrays.
I am free to change the structure of the dataframe if it would simplify the problem.
IIUC, you can use this as a skeleton, and customize it how you want:
for g, data in df.groupby('TDays'):
plt.plot(data.Nstrike, data.Nprem, label='TDays '+str(g))
plt.legend()
plt.savefig('plot_'+str(g)+'.png')
plt.close()
Your first plot will look like:
Your second:
And so on