Slicing and Setting Values in Pandas, with a composite of position and labels - pandas

I want to set a value in a specific cell in a pandas dataFrame.
I know which position the row is in (I can even get the row by using df.iloc[i], for example), and I know the name of the column, but I can't work out how to select the cell so that I can set a value to it.
df.loc[i,'columnName']=val
won't work because I want the row in position i, not labelled with index i. Also
df.iloc[i, 'columnName'] = val
obviously doesn't like being given a column name. So, short of converting to a dict and back, how do I go about this? Help very much appreciated, as I can't find anything that helps me in the pandas documentation.

You can use ix to set a specific cell:
In [209]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[209]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.363385 0.024520
2 0.526718 -0.230459 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971
In [210]:
df.ix[1,'b'] = 0
df
Out[210]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.000000 0.024520
2 0.526718 -0.230459 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971
You can also call iloc on the col of interest:
In [211]:
df['b'].iloc[2] = 0
df
Out[211]:
a b c
0 1.366340 1.643899 -0.264142
1 0.052825 0.000000 0.024520
2 0.526718 0.000000 1.481025
3 1.068833 -0.558976 0.812986
4 0.208232 0.405090 0.704971

You can get the position of the column with get_loc:
df.iloc[i, df.columns.get_loc('columnName')] = val

Related

Comparing string values from sequential rows in pandas series

I am trying to count common string values in sequential rows of a panda series using a user defined function and to write an output into a new column. I figured out individual steps, but when I put them together, I get a wrong result. Could you please tell me the best way to do this? I am a very beginner Pythonista!
My pandas df is:
df = pd.DataFrame({"Code": ['d7e', '8e0d', 'ft1', '176', 'trk', 'tr71']})
My string comparison loop is:
x='d7e'
y='8e0d'
s=0
for i in y:
b=str(i)
if b not in x:
s+=0
else:
s+=1
print(s)
the right result for these particular strings is 2
Note, when I do def func(x,y): something happens to s counter and it doesn't produce the right result. I think I need to reset it to 0 every time the loop runs.
Then, I use df.shift to specify the position of y and x in a series:
x = df["Code"]
y = df["Code"].shift(periods=-1, axis=0)
And finally, I use df.apply() method to run the function:
df["R1SB"] = df.apply(func, axis=0)
and I get None values in my new column "R1SB"
My correct output would be:
"Code" "R1SB"
0 d7e None
1 8e0d 2
2 ft1 0
3 176 1
4 trk 0
5 tr71 2
Thank you for your help!
TRY:
df['R1SB'] = df.assign(temp=df.Code.shift(1)).apply(
lambda x: np.NAN
if pd.isna(x['temp'])
else sum(i in str(x['temp']) for i in str(x['Code'])),
1,
)
OUTPUT:
Code R1SB
0 d7e NaN
1 8e0d 2.0
2 ft1 0.0
3 176 1.0
4 trk 0.0
5 tr71 2.0

Pandas: fill in NaN values with dictionary references another column

I have a dictionary that looks like this
dict = {'b' : '5', 'c' : '4'}
My dataframe looks something like this
A B
0 a 2
1 b NaN
2 c NaN
Is there a way to fill in the NaN values using the dictionary mapping from columns A to B while keeping the rest of the column values?
You can map dict values inside fillna
df.B = df.B.fillna(df.A.map(dict))
print(df)
A B
0 a 2
1 b 5
2 c 4
This can be done simply
df['B'] = df['B'].fillna(df['A'].apply(lambda x: dict.get(x)))
This can work effectively for a bigger dataset as well.
Unfortunately, this isn't one of the options for a built-in function like pd.fillna().
Edit: Thanks for the correction. Apparently this is possible as illustrated in #Vaishali's answer.
However, you can subset the data frame first on the missing values and then apply the map with your dictionary.
df.loc[df['B'].isnull(), 'B'] = df['A'].map(dict)

How to turn Pandas' DataFrame.groupby() result into MultiIndex

Suppose I have a set of measurements that were obtained by varying two parameters, knob_b and knob_2 (in practice there are a lot more):
data = np.empty((6,3), dtype=np.float)
data[:,0] = [3,4,5,3,4,5]
data[:,1] = [1,1,1,2,2,2]
data[:,2] = np.random.random(6)
df = pd.DataFrame(data, columns=['knob_1', 'knob_2', 'signal'])
i.e., df is
knob_1 knob_2 signal
0 3 1 0.076571
1 4 1 0.488965
2 5 1 0.506059
3 3 2 0.415414
4 4 2 0.771212
5 5 2 0.502188
Now, considering each parameter on its own, I want to find the minimum value that was measured for each setting of this parameter (ignoring the settings of all other parameters). The pedestrian way of doing this is:
new_index = []
new_data = []
for param in df.columns:
if param == 'signal':
continue
group = df.groupby(param)['signal'].min()
for (k,v) in group.items():
new_index.append((param, k))
new_data.append(v)
new_index = pd.MultiIndex.from_tuples(new_index,
names=('parameter', 'value'))
df2 = pd.Series(index=new_index, data=new_data)
resulting df2 being:
parameter value
knob_1 3 0.495674
4 0.277030
5 0.398806
knob_2 1 0.485933
2 0.277030
dtype: float64
Is there a better way to do this, in particular to get rid of the inner loop?
It seems to me that the result of the df.groupby operation already has everything I need - if only there was a way to somehow create a MultiIndex from it without going through the list of tuples.
Use the keys argument of pd.concat():
pd.concat([df.groupby('knob_1')['signal'].min(),
df.groupby('knob_2')['signal'].min()],
keys=['knob_1', 'knob_2'],
names=['parameter', 'value'])

Creating a new column value based on calculating existing column value in a data frame

I have a data frame which consists of a column named as ' Lnc ' which is of type integer , and I want to create another column value in the same data frame named Log_Lnc which is log base2 value of the column Lnc.
Would really appreciate if I could get help for the same .
Thank you,
Use np.log2:
In [17]:
df = pd.DataFrame({'Lnc':np.arange(5)})
df['LogLnc'] = np.log2(df['Lnc'])
df
Out[17]:
Lnc LogLnc
0 0 -inf
1 1 0.000000
2 2 1.000000
3 3 1.584963
4 4 2.000000

Pandas - Trying to create a list or Series in a data frame cell

I have the following data frame
df = pd.DataFrame({'A':[74.75, 91.71, 145.66], 'B':[4, 3, 3], 'C':[25.34, 33.52, 54.70]})
A B C
0 74.75 4 25.34
1 91.71 3 33.52
2 145.66 3 54.70
I would like to create another column df['D'] that would be a list or series from the first 3 columns suitable for use in another column with the np.irr function that would look like this
D
0 [ -74.75, 2.34, 25.34, 25.34, 25.34]
1 [ -91.71, 33.52, 33.52, 33.52]
2 [-145.66, 54.70, 54.70, 54.70]
so I could ultimately do something like this
df['E'] = np.irr(df['D'])
I did get as far as this
[-df.A[0]]+[df.C[0]]*df.B[0]
but it is not quite there.
Do you really need the column 'D'?
By the way you can easily add it as:
df['D'] = [[-df.A[i]]+[df.C[i]]*df.B[i] for i in xrange(len(df))]
df['E'] = df['D'].map(np.irr)
if you don't need it, you can directly set E
df['E'] = [np.irr([-df.A[i]]+[df.C[i]]*df.B[i]) for i in xrange(len(df))]
or:
df['E'] = df.apply(lambda x: np.irr([-x.A] + [x.C] * x.B), axis=1)