In dataframe iam getting unexpected results - pandas

when i have tried with below code iam getting unexpected results.
This is my code:
def copy_blanks(df, column):
df.iloc[:, -1] = df.iloc[:, column]
df.loc[df.iloc[:, column] == '', -1] = ''
My input file:
word,number
abc,0
adf,0
gfsgs,0
,0
sdfgsd,0
fgsdfg,0
sfdgs,0
Iam getting like below output:
word,number,word_clean
NA,0,abc
NA,0,adf
NA,0,gfsgs
,0,
NA,0,sdfgsd
NA,0,fgsdfg
NA,0,sfdgs
I Want to get like below output:
word,number,word_clean
abc,0,abc
adf,0,adf
gfsgs,0,gfsgs
,0,
sdfgsd,0,sdfgsd
fgsdfg,0,fgsdfg
sfdgs,0,sfdgs
PLease suggest me.I think the issue is getting by this code [df.loc[df.iloc[:, column] == '', -1] = ''].

IIUC you need fillna for replacing NaN to empty string:
print (df)
word number
0 abc 0
1 adf 0
2 gfsgs 0
3 NaN 0
4 sdfgsd 0
5 fgsdfg 0
6 sfdgs 0
def copy_blanks(df, column):
df['new'] = df.iloc[:, column]
df['new'] = df['new'].fillna('')
return df
column = 0
print (copy_blanks(df, column))
word number new
0 abc 0 abc
1 adf 0 adf
2 gfsgs 0 gfsgs
3 NaN 0
4 sdfgsd 0 sdfgsd
5 fgsdfg 0 fgsdfg
6 sfdgs 0 sfdgs

Related

How to create a new column based on row values in python?

I have data like below:
df = pd.DataFrame()
df["collection_amount"] = 100, 200, 300
df["25%_coll"] = 1, 0, 1
df["75%_coll"] = 0, 1, 1
df["month"] = 4, 5, 6
I want to create a output like below:
basically if 25% is 1 then it should create a column based on month as a new column.
Please help me thank you.
This should work: do ask if something doesn't make sense
for i in range(len(df)):
if df['25%_coll'][i]==1:
df['month_%i_25%%_coll'%df.month[i]]=[df.collection_amount[i] if k==i else 0 for k in range(len(df))]
if df['75%_coll'][i]==1:
df['month_%i_75%%_coll'%df.month[i]]=[df.collection_amount[i] if k==i else 0 for k in range(len(df))]
To build the new columns you could try the following:
df2 = df.melt(id_vars=["month", "collection_amount"])
df2.loc[df2["value"].eq(0), "collection_amount"] = 0
df2["new_cols"] = "month_" + df2["month"].astype("str") + "_" + df2["variable"]
df2 = df2.pivot_table(
index="month", columns="new_cols", values="collection_amount",
fill_value=0, aggfunc="sum"
).reset_index(drop=True)
.melt() the dataframe with index columns month and collection_amount.
Set the appropriate collection_amount values to 0.
Build the new column names in column new_cols.
month collection_amount variable value new_cols
0 4 100 25%_coll 1 month_4_25%_coll
1 5 0 25%_coll 0 month_5_25%_coll
2 6 300 25%_coll 1 month_6_25%_coll
3 4 0 75%_coll 0 month_4_75%_coll
4 5 200 75%_coll 1 month_5_75%_coll
5 6 300 75%_coll 1 month_6_75%_coll
Use .pivot_table() on this dataframe to build the new columns.
The rest isn't completely clear: Either use df = pd.concat([df, df2], axis=1), or df.merge(df2, ...) to merge on month (with .reset_index() without drop=True).
Result for the sample dataframe
df = pd.DataFrame({
"collection_amount": [100, 200, 300],
"25%_coll": [1, 0, 1], "75%_coll": [0, 1, 1],
"month": [4, 5, 6]
})
is
new_cols month_4_25%_coll month_4_75%_coll month_5_25%_coll \
0 100 0 0
1 0 0 0
2 0 0 0
new_cols month_5_75%_coll month_6_25%_coll month_6_75%_coll
0 0 0 0
1 200 0 0
2 0 300 300

Pandas Dataframe Manipulation logic

Can use please help with below problem:
Given two dataframes df1 and df2, need to get something like result dataframe.
import pandas as pd
import numpy as np
feature_list = [ str(i) for i in range(6)]
df1 = pd.DataFrame( {'value' : [0,3,0,4,2,5]})
df2 = pd.DataFrame(0, index=np.arange(6), columns=feature_list)
Expected Dataframe :
Need to be driven by comparing values from df1 with column names (features) in df2. if they match, we put 1 in resultDf
Here's expected output (or resultsDf):
I think you need:
(pd.get_dummies(df1['value'])
.rename(columns = str)
.reindex(columns = df2.columns,
index = df2.index,
fill_value = 0))
0 1 2 3 4 5
0 1 0 0 0 0 0
1 0 0 0 1 0 0
2 1 0 0 0 0 0
3 0 0 0 0 1 0
4 0 0 1 0 0 0
5 0 0 0 0 0 1

Python Pandas Dataframe cell value split

I am lost on how to split the binary values such that each (0,1)value takes up a column of the data frame.
from jupyter
You can use concat with apply list:
df = pd.DataFrame({0:[1,2,3], 1:['1010','1100','0101']})
print (df)
0 1
0 1 1010
1 2 1100
2 3 0101
df = pd.concat([df[0],
df[1].apply(lambda x: pd.Series(list(x))).astype(int)],
axis=1, ignore_index=True)
print (df)
0 1 2 3 4
0 1 1 0 1 0
1 2 1 1 0 0
2 3 0 1 0 1
Another solution with DataFrame constructor:
df = pd.concat([df[0],
pd.DataFrame(df[1].apply(list).values.tolist()).astype(int)],
axis=1, ignore_index=True)
print (df)
0 1 2 3 4
0 1 1 0 1 0
1 2 1 1 0 0
2 3 0 1 0 1
EDIT:
df = pd.DataFrame({0:['1010','1100','0101']})
df1 = pd.DataFrame(df[0].apply(list).values.tolist()).astype(int)
print (df1)
0 1 2 3
0 1 0 1 0
1 1 1 0 0
2 0 1 0 1
But if need lists:
df[0] = df[0].apply(lambda x: [int(y) for y in list(x)])
print (df)
0
0 [1, 0, 1, 0]
1 [1, 1, 0, 0]
2 [0, 1, 0, 1]

copy_blanks(df,column) should copy the value in the original column to the last column for all values where the original column is blank

def copy_blanks(df, column):
like this, Please suggest me.
Input:
e-mail,number
n#gmail.com,0
p#gmail.com,1
h#gmail.com,0
s#gmail.com,0
l#gmail.com,1
v#gmail.com,0
,0
But, here we are having default_value option. In that we can use any value. when we have used this option. that value will adding.like below
e-mail,number
n#gmail.com,0
p#gmail.com,1
h#gmail.com,0
s#gmail.com,0
l#gmail.com,1
v#gmail.com,0
NA,0
But, my output is we have to default value and skip_blank options. when we will use skip_blank like true, then should not work default value,when we will keep skip_blank is false, then should work default value.
my output:
e-mail,number,e-mail_clean
n#gmail.com,0,n#gmail.com
p#gmail.com,1,p#gmail.com
h#gmail.com,0,h#gmail.com
s#gmail.com,0,s#gmail.com
l#gmail.com,1,l#gmail.com
v#gmail.com,0,v#gmail.com
,0,
consider your sample df
df = pd.DataFrame([
['n#gmail.com', 0],
['p#gmail.com', 1],
['h#gmail.com', 0],
['s#gmail.com', 0],
['l#gmail.com', 1],
['v#gmail.com', 0],
['', 0]
], columns=['e-mail','number'])
print(df)
e-mail number
0 n#gmail.com 0
1 p#gmail.com 1
2 h#gmail.com 0
3 s#gmail.com 0
4 l#gmail.com 1
5 v#gmail.com 0
6 0
If I understand you correctly:
def copy_blanks(df, column, skip_blanks=False, default_value='NA'):
df = df.copy()
s = df[column]
if not skip_blanks:
s = s.replace('', default_value)
df['{}_clean'.format(column)] = s
return df
copy_blanks(df, 'e-mail', skip_blanks=False)
e-mail number e-mail_clean
0 n#gmail.com 0 n#gmail.com
1 p#gmail.com 1 p#gmail.com
2 h#gmail.com 0 h#gmail.com
3 s#gmail.com 0 s#gmail.com
4 l#gmail.com 1 l#gmail.com
5 v#gmail.com 0 v#gmail.com
6 0 NA
copy_blanks(df, 'e-mail', skip_blanks=True)
e-mail number e-mail_clean
0 n#gmail.com 0 n#gmail.com
1 p#gmail.com 1 p#gmail.com
2 h#gmail.com 0 h#gmail.com
3 s#gmail.com 0 s#gmail.com
4 l#gmail.com 1 l#gmail.com
5 v#gmail.com 0 v#gmail.com
6 0
copy_blanks(df, 'e-mail', skip_blanks=False, default_value='new#gmail.com')
e-mail number e-mail_clean
0 n#gmail.com 0 n#gmail.com
1 p#gmail.com 1 p#gmail.com
2 h#gmail.com 0 h#gmail.com
3 s#gmail.com 0 s#gmail.com
4 l#gmail.com 1 l#gmail.com
5 v#gmail.com 0 v#gmail.com
6 0 new#gmail.com

Imposing a threshold on values in dataframe in Pandas

I have the following code:
t = 12
s = numpy.array(df.Array.tolist())
s[s<t] = 0
thresh = numpy.where(s>0, s-t, 0)
df['NewArray'] = list(thresh)
while it works, surely there must be a more pandas-like way of doing it.
EDIT:
df.Array.head() looks like this:
0 [0.771511552006, 0.771515476223, 0.77143569165...
1 [3.66720695274, 3.66722560562, 3.66684636758, ...
2 [2.3047433839, 2.30475510675, 2.30451676559, 2...
3 [0.999991522708, 0.999996609066, 0.99989319662...
4 [1.11132718786, 1.11133284052, 0.999679589875,...
Name: Array, dtype: object
IIUC you can simply subtract and use clip_lower:
In [29]: df["NewArray"] = (df["Array"] - 12).clip_lower(0)
In [30]: df
Out[30]:
Array NewArray
0 10 0
1 11 0
2 12 0
3 13 1
4 14 2