how to create a new column after groupby count?

how to create a new column after groupby count? - pandas

I am trying to create a new column with group by and count.
However it throws error "incompatible index of inserted column with frame index"
import pandas as pd
#read csv
df1 = pd.read_csv('hi.txt',sep = '\t')#provide name and sheet of excel file
a=df1.groupby(['c','t']).count()
df1['difference']=a
print(df1)
Input :
coun id cat
A 12 90
U 13 91

Use:
new_df=df.groupby(['category','country'],sort=False).country.count().to_frame('count').reset_index()
print(new_df)
category country count
0 9910 AUS 2
1 7310 NZL 1
2 9910 NZL 1

Related

find value of column base on other column condition

I would like to find a value of column A base on column B's condition. for example, find the ts of first 'B' from value columns:
import pandas as pd
data = [[10,"A"],[20,"A"],[30,"B"],[40,"B"]]
df = pd.DataFrame(data,columns=['ts','value'])
print(df)
: ts value
: 0 10 A
: 1 20 A
: 2 30 B
: 3 40 B
I would like to print out 30 for this example!

You can do that with slice
df.loc[df['value'].eq('B'),'ts'].iloc[0]
Out[163]: 30

How to make pandas work for cross multiplication

I have 3 data frame:
df1
id,k,a,b,c
1,2,1,5,1
2,3,0,1,0
3,6,1,1,0
4,1,0,5,0
5,1,1,5,0
df2
name,a,b,c
p,4,6,8
q,1,2,3
df3
type,w_ave,vac,yak
n,3,5,6
v,2,1,4
from the multiplication, using pandas and numpy, I want to the output in df1:
id,k,a,b,c,w_ave,vac,yak
1,2,1,5,1,16,15,18
2,3,0,1,0,0,3,6
3,6,1,1,0,5,4,7
4,1,0,5,0,0,11,14
5,1,1,5,0,13,12,15
the conditions are:
The value of the new column will be =
#its not a code
df1["w_ave"][1] = df3["w_ave"]["v"]+ df1["a"][1]*df2["a"]["q"]+df1["b"][1]*df2["b"]["q"]+df1["c"][1]*df2["c"]["q"]
for output["w_ave"][1]= 2 +(1*1)+(5*2)+(1*3)
df3["w_ave"]["v"]=2
df1["a"][1]=1, df2["a"]["q"]=1 ;
df1["b"][1]=5, df2["b"]["q"]=2 ;
df1["c"][1]=1, df2["c"]["q"]=3 ;
Which means:
- a new column will be added in df1, from the name of the column from df3.
- for each row of the df1, the value of a, b, c will be multiplied with the same-named q value from df2. and summed together with the corresponding value of df3.
-the column name of df1 , matched will column name of df2 will be multiplied. The other not matched column will not be multiplied, like df1[k].
- However, if there is any 0 in df1["a"], the corresponding output will be zero.
I am struggling with this. It was tough to explain also. My attempts are very silly. I know this attempt will not work. However, I have added this:
import pandas as pd, numpy as np
data1 = "Sample_data1.csv"
data2 = "Sample_data2.csv"
data3 = "Sample_data3.csv"
folder = '~Sample_data/'
df1 =pd.read_csv(folder + data1)
df2 =pd.read_csv(folder + data2)
df3 =pd.read_csv(folder + data3)
df1= df2 * df1

Ok, so this will in no way resemble your desired output, but vectorizing the formula you provided:
df2=df2.set_index("name")
df3=df3.set_index("type")
df1["w_ave"] = df3.loc["v", "w_ave"]+ df1["a"].mul(df2.loc["q", "a"])+df1["b"].mul(df2.loc["q", "b"])+df1["c"].mul(df2.loc["q", "c"])
Outputs:
id k a b c w_ave
0 1 2 1 5 1 16
1 2 3 0 1 0 4
2 3 6 1 1 0 5
3 4 1 0 5 0 12
4 5 1 1 5 0 13

add "all" row to pandas group by

This is my code (using pandas 0.19.2)
import pandas as pd
data=StringIO("""category,region,sales
fruits,east,12
vegatables,east,3
fruits,west,5
vegatables,wst,7
""")
df = pd.read_csv(data)
print(df.groupby('category', as_index=False).agg({'sales': sum}))
This is the output:
category sales
0 fruits 17
1 vegatables 10
My question is: how do add an 'all' row so the output would look like this:
category sales
0 fruits 17
1 vegatables 10
all 27

You can try pivot_table and alter the new data:
new_df = df.pivot_table(columns='category',index='region', values='sales')
new_df['all'] = new_df.sum(1)
Output:
category fruits vegatables all
region
east 12 3 15
west 5 7 12
And if you want your original data:
new_df.stack().to_frame(name='Sales').reset_index()
Output:
region category Sales
0 east fruits 12
1 east vegatables 3
2 east all 15
3 west fruits 5
4 west vegatables 7
5 west all 12

here is what i ended up doing:
from io import StringIO
import pandas as pd
data = StringIO("""category,region,sales
fruits,east,12
vegatables,east,3
fruits,west,5
vegatables,wst,7
""")
df = pd.read_csv(data)
body=df.groupby('category', as_index=False).agg({'sales': sum})
head=df.groupby(lambda x: True, as_index=False) #advanced panda trickery
head=head.agg({'sales': sum})
head.insert(0,'category','*all*')
print(body.append(head))
basically, create another dataframe with the 'all' row and concat

Issue looping through dataframes in Pandas

I have a dict 'd' set up which is a list of dataframes E.g.:
d["DataFrame1"]
Will return that dataframe with all its columns:
ID Name
0 123 John
1 548 Eric
2 184 Sam
3 175 Andy
Each dataframe has a column in it called 'Names'. I want to extract this column from each dataframe in the dict and to create a new dataframe consisting of these columns.
df_All_Names = pd.DataFrame()
for df in d:
df_All_Names[df] = df['Names']
Returns the error:
TypeError: string indices must be integers
Unsure where I'm going wrong here.

For example you have df as follow
df=pd.DataFrame({'Name':['X', 'Y']})
df1=pd.DataFrame({'Name':['X1', 'Y1']})
And we create a dict
d=dict()
d['df']=df
d['df1']=df1
Then presetting a empty data frame:
yourdf=pd.DataFrame()
Using items with for loop
for key,val in d.items():
yourdf[key]=val['Name']
yield :
yourdf
Out[98]:
df df1
0 X X1
1 Y Y1

Your can use reduce and concatenate all of the columns named ['Name'] in your dictionary of dataframes
Sample Data
from functools import reduce
d = {'df1':pd.DataFrame({'ID':[0,1,2],'Name':['John','Sam','Andy']}),'df2':pd.DataFrame({'ID':[3,4,5],'Name':['Jen','Cara','Jess']})}
You can stack the data side by side using axis=1
reduce(lambda x,y:pd.concat([x.Name,y.Name],axis=1),d.values())
Name Name
0 John Jen
1 Sam Cara
2 Andy Jess
Or on top of one an other usingaxis=0
reduce(lambda x,y:pd.concat([x.Name,y.Name],axis=0),d.values())
0 John
1 Sam
2 Andy
0 Jen
1 Cara
2 Jess

How to check dependency of one column to another in a pandas dataframe

I have the following dataframe:
import pandas as pd
df=pd.DataFrame([[1,11,'a'],[1,12,'a'],[1,11,'a'],[1,12,'a'],[1,7,'a'],
[1,12,'a']])
df.columns=['id','code','name']
df
id code name
0 1 11 a
1 1 12 a
2 1 11 a
3 1 12 a
4 1 7 a
5 1 12 a
As shown in the above dataframe, the value of column "id" is directly related to the value of column "name". If I have say, a million records, how can I know that a column is totally dependent on other column in a dataframe?

If they are totally dependent, then their factorizations will be the same
(df.id.factorize()[0] == df.name.factorize()[0]).all()
True

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to create a new column after groupby count? - pandas

Use: new_df=df.groupby(['category','country'],sort=False).country.count().to_frame('count').reset_index() print(new_df) category country count 0 9910 AUS 2 1 7310 NZL 1 2 9910 NZL 1

Related

find value of column base on other column condition

How to make pandas work for cross multiplication

add "all" row to pandas group by

Issue looping through dataframes in Pandas

How to check dependency of one column to another in a pandas dataframe

Categories

Resources