how to ffill and and letter in pandas? - pandas

New to the pandas.
Struggling find a way to ffill and concat a string.
I imported excel sheet then like to fill the blank (NaN) with proceeding value plus some distinguisher(like-1).
-from-
1 a
2 nan
3 b
4 nan
-to-
1 a
2 a-1
3 b
4 b-1
excel example image
I used df.fillna(method='ffill')
But can't figure out how to append '-1' after 'a' and 'b' using 'ffill'.
Any help would be appreciated.
Thank you!!

After you do ffill, you can compute the order of each rows with groupby().cumcount():
df['col'] = df['col'].ffill()
orders = df.groupby('col').cumcount()
# concatenate the order except for the first rows
df['col'] = np.where(orders==0, df['col'], df['col'] + '-' + orders.astype(str))

Related

Changing date order

I have csv file containing a set of dates.
The format is like:
14/06/2000
15/08/2002
10/10/2009
09/09/2001
01/03/2003
11/12/2000
25/11/2002
23/09/2001
For some reason pandas.to_datetime() does not work on my data.
So, I have split the column into 3 columns, as day, month and year.
And now I am trying to combine the columns without "/" with:
df["period"] = df["y"].astype(str) + df["m"].astype(str)
But the problem is instead of getting:
200006
I get:
20006
One zero is missing.
Could you please help me with that?
This will allow you to take the column of dates and turn it into pd.to_datetime()
#This is assuming the column name is 0 as it was on my df
#you can change that to whatever the column name is in your dataframe
df[0] = pd.to_datetime(df[0], infer_datetime_format=True)
df[0] = df[0].sort_values(ascending = False, ignore_index = True)
df
The dayfirst= parameter might help you:
print(df)
0
0 14/06/2000
1 15/08/2002
2 10/10/2009
3 09/09/2001
4 01/03/2003
5 11/12/2000
6 25/11/2002
7 23/09/2001
pd.to_datetime(df[0], dayfirst=True).sort_values()
0 2000-06-14
5 2000-12-11
3 2001-09-09
7 2001-09-23
1 2002-08-15
6 2002-11-25
4 2003-03-01
2 2009-10-10

Convert transactions with several products from columns to row [duplicate]

I'm having a very tough time trying to figure out how to do this with python. I have the following table:
NAMES VALUE
john_1 1
john_2 2
john_3 3
bro_1 4
bro_2 5
bro_3 6
guy_1 7
guy_2 8
guy_3 9
And I would like to go to:
NAMES VALUE1 VALUE2 VALUE3
john 1 2 3
bro 4 5 6
guy 7 8 9
I have tried with pandas, so I first split the index (NAMES) and I can create the new columns but I have trouble indexing the values to the right column.
Can someone at least give me a direction where the solution to this problem is? I don't expect a full code (I know that this is not appreciated) but any help is welcome.
After splitting the NAMES column, use .pivot to reshape your DataFrame.
# Split Names and Pivot.
df['NAME_NBR'] = df['NAMES'].str.split('_').str.get(1)
df['NAMES'] = df['NAMES'].str.split('_').str.get(0)
df = df.pivot(index='NAMES', columns='NAME_NBR', values='VALUE')
# Rename columns and reset the index.
df.columns = ['VALUE{}'.format(c) for c in df.columns]
df.reset_index(inplace=True)
If you want to be slick, you can do the split in a single line:
df['NAMES'], df['NAME_NBR'] = zip(*[s.split('_') for s in df['NAMES']])

How to update multi columns in pandas

I have DF has 5 columns. 3 columns are character type, and other are numeric type. I wanted to update missing values of character type columns are "missing".
I have written update statement like below, but it's not working.
df.select_dtypes(include='object') = df.select_dtypes(include='object').apply(lambda x: x.fillna('missing'))
It's working only when i specify column names.
df[['Manufacturer','Model','Type']] = df.select_dtypes(include='object').apply(lambda x: x.fillna('missing'))
Could you please tell me how i can correct my first update statement?
Here df.select_dtypes(include='object') return new DataFrame, so cannot assign like in first answer, possible solution is use DataFrame.update (working inplace), also apply here is not necessary.
print (df)
Manufacturer Model Type a c
0 a g NaN 4 NaN
1 NaN NaN aa 4 8.0
df.update(df.select_dtypes(include='object').fillna('missing'))
print (df)
Manufacturer Model Type a c
0 a g missing 4 NaN
1 missing missing aa 4 8.0
Or get columns names with strings like:
cols = df.select_dtypes(include='object').columns
df[cols] = df[cols].fillna('missing')
print (df)

Pandas: fill in NaN values with dictionary references another column

I have a dictionary that looks like this
dict = {'b' : '5', 'c' : '4'}
My dataframe looks something like this
A B
0 a 2
1 b NaN
2 c NaN
Is there a way to fill in the NaN values using the dictionary mapping from columns A to B while keeping the rest of the column values?
You can map dict values inside fillna
df.B = df.B.fillna(df.A.map(dict))
print(df)
A B
0 a 2
1 b 5
2 c 4
This can be done simply
df['B'] = df['B'].fillna(df['A'].apply(lambda x: dict.get(x)))
This can work effectively for a bigger dataset as well.
Unfortunately, this isn't one of the options for a built-in function like pd.fillna().
Edit: Thanks for the correction. Apparently this is possible as illustrated in #Vaishali's answer.
However, you can subset the data frame first on the missing values and then apply the map with your dictionary.
df.loc[df['B'].isnull(), 'B'] = df['A'].map(dict)

Slicing and Setting Values in Pandas, with a composite of position and labels

I want to set a value in a specific cell in a pandas dataFrame.
I know which position the row is in (I can even get the row by using df.iloc[i], for example), and I know the name of the column, but I can't work out how to select the cell so that I can set a value to it.
df.loc[i,'columnName']=val
won't work because I want the row in position i, not labelled with index i. Also
df.iloc[i, 'columnName'] = val
obviously doesn't like being given a column name. So, short of converting to a dict and back, how do I go about this? Help very much appreciated, as I can't find anything that helps me in the pandas documentation.
You can use ix to set a specific cell:
In [209]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[209]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.363385 0.024520
2 0.526718 -0.230459 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971
In [210]:
df.ix[1,'b'] = 0
df
Out[210]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.000000 0.024520
2 0.526718 -0.230459 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971
You can also call iloc on the col of interest:
In [211]:
df['b'].iloc[2] = 0
df
Out[211]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.000000 0.024520
2 0.526718 0.000000 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971
You can get the position of the column with get_loc:
df.iloc[i, df.columns.get_loc('columnName')] = val