one column as denominator and many as nominator pandas - pandas

I have a data frame including many columns. I want the col1 as the denominator and all other columns as the nominator. I have done this for just col2 (see below code). I want to do this for all other columns in shortcode.
df
Town col1 col2 col3 col4
A 8 7 5 2
B 8 4 2 3
C 8 5 8 5
here is my code for col2:
df['col2'] = df['col2'] / df['col1'
here is my result:
df
A 8 0.875000 1.0 5 2
B 8 0.500000 0.0 2 3
C 8 0.625000 1.0 8 5
I want to do the same with all cols (i.e. col3, col4....)
If this could be done in pivot_table then it will be awsome
Thanks for your help

Use df.iloc with df.div:
In [2084]: df.iloc[:, 2:] = df.iloc[:, 2:].div(df.col1, axis=0)
In [2085]: df
Out[2085]:
Town col1 col2 col3 col4
0 A 8 0.875 0.625 0.250
1 B 8 0.500 0.250 0.375
2 C 8 0.625 1.000 0.625
OR use df.filter , pd.concat with df.div
In [2073]: x = df.filter(like='col').set_index('col1')
In [2078]: out = pd.concat([df.Town, x.div(x.index).reset_index()], 1)
In [2079]: out
Out[2079]:
Town col1 col2 col3 col4
0 A 8 0.875 0.625 0.250
1 B 8 0.500 0.250 0.375
2 C 8 0.625 1.000 0.625

Related

How to align several different dataframes with different shapes on common column?

I have a few different data frames like this below
df1
idx col1 col2 col3
2020-11-20 01:00:00 1 5 9
2020-11-20 02:00:00 2 6 10
2020-11-20 03:00:00 3 7 11
2020-11-20 04:00:00 4 8 12
df2
idx col4 col5 col6
2020-11-20 02:00:00 13 15 17
2020-11-20 03:00:00 14 16 18
df3
idx col7 col8 col9
2020-11-20 01:00:00 19 20 21
and essentially I need to keep all the columns from all DF's but align the values on the timestamp that is the index for each dataframe. My expected output is this
df_merged
idx col1 col2 col3 col4 col5 col6 col7 col8 col9
2020-11-20 01:00:00 1 5 9 NaN NaN NaN 19 20 21
2020-11-20 02:00:00 2 6 10 13 15 17 NaN NaN NaN
2020-11-20 03:00:00 3 7 11 14 16 18 NaN NaN NaN
2020-11-20 04:00:00 4 8 12 NaN NaN NaN NaN NaN NaN
I have tried various things like merge, concat, join, manually doing it for hours now and I am stumped why it wont work. These df's are simplified versions, but my issue with these approaches are that my df1 has a length of 1619, df2 has a length of 1619, df3 has a length of 1617, and df4 (not here but follows same idea) has a length of 1613. When I try this
df_merged = reduce(lambda left,right: pd.merge(left,right,how='left'), [df1,df2,df3,df4]) what happens is that the df_merged size is now 12k rows (not 1619 like the original df). I tried dropping duplicates as well on the final df_merged and that only left me with like 600 rows. I also have tried manually combining them with loc, iloc and isin() but still no luck.
Really any help would be greatly appreciated!
Use merge with how = 'outer'.
Demonstration:
# data preparation
string = """idx col1 col2 col3
2020-11-20 01:00:00 1 5 9
2020-11-20 02:00:00 2 6 10
2020-11-20 03:00:00 3 7 11
2020-11-20 04:00:00 4 8 12"""
data = [x.split(' ') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
string = """idx col4 col5 col6
2020-11-20 02:00:00 13 15 17
2020-11-20 03:00:00 14 16 18"""
data = [x.split(' ') for x in string.split('\n')]
df2 = pd.DataFrame(data[1:], columns = data[0])
string = """idx col7 col8 col9
2020-11-20 01:00:00 19 20 21"""
data = [x.split(' ') for x in string.split('\n')]
df3 = pd.DataFrame(data[1:], columns = data[0])
#solution
df.merge(df2, on = 'idx', how = 'outer').merge(df3, on='idx', how='outer')
Output:
idx col1 col2 col3 col4 col5 col6 col7 col8 col9
0 2020-11-20 01:00:00 1 5 9 NaN NaN NaN 19 20 21
1 2020-11-20 02:00:00 2 6 10 13 15 17 NaN NaN NaN
2 2020-11-20 03:00:00 3 7 11 14 16 18 NaN NaN NaN
3 2020-11-20 04:00:00 4 8 12 NaN NaN NaN NaN NaN NaN

Data Imputation in Pandas Dataframe column

I have 2 tables which I am merging( Left Join) based on common column but other column does not have exact column values and hence some of the column values are blank. I want to fill the missing value with closest tenth . for example I have these two dataframes:
d = {'col1': [1.31, 2.22,3.33,4.44,5.55,6.66], 'col2': ['010100', '010101','101011','110000','114000','120000']}
df1=pd.DataFrame(data=d)
d2 = {'col2': ['010100', '010102','010144','114218','121212','166110'],'col4': ['a','b','c','d','e','f']}
df2=pd.DataFrame(data=d2)
# df1
col1 col2
0 1.31 010100
1 2.22 010101
2 3.33 101011
3 4.44 110000
4 5.55 114000
5 6.66 120000
# df2
col2 col4
0 010100 a
1 010102 b
2 010144 c
3 114218 d
4 121212 e
5 166110 f
After left merging on col2,
I get:
df1.merge(df2,how='left',on='col2')
col1 col2 col4
0 1.31 010100 a
1 2.22 010101 NaN
2 3.33 101011 NaN
3 4.44 111100 NaN
4 5.55 114100 NaN
5 6.66 166100 NaN
Vs what I want, for for all values where NaN, my col2 value firstly converts to closest 10 and then matches in col2 of table 1 if there is a match, place col4 accordingly, if not then closest 100, then closest thousand, ten thousand..
Ideally my answer should be:
col1 col2 col4
0 1.31 010100 a
1 2.22 010101 a
2 3.33 101011 f
3 4.44 111100 d
4 5.55 114100 d
5 6.66 166100 f
Please help me in coding this

How to use SQL minus query equivalent between two dataframes properly

I have two dataframes each having 1000 rows. The dataframes are same, however, row by row is not same. The following examples can be assumed as truncated version of the dataframes.
df1:
col1 col2 col3
1 2 3
2 3 4
5 6 6
8 9 9
df2:
col1 col2 col3
5 6 6
8 9 9
1 2 3
2 3 4
The dataframes don't have indices and I expect null returned when I implement sql minus query on these. I used the following query, but did not obtain the result as expected. Is there any way to achieve my desired result ?
df3 = df1.merge(df2.drop_duplicates(),how='right', indicator=True)
print(df3)
For instance, if I consider df1 as table1 and df2 as table2, and if I ran the following query in SQL server, I would get null returned (empty table).
SELECT * FROM table1
EXCEPT
SELECT * FROM table2
Yes, you can use the indicator like this:
df1.merge(df2, how='left', indicator='ind').query('ind=="left_only"')
Where df1 is:
col1 col2 col3
0 1.0 2.0 3.0
1 2.0 3.0 4.0
2 5.0 6.0 6.0
3 8.0 9.0 9.0
4 10.0 10.0 10.0
and df2 is:
col1 col2 col3
0 5 6 6
1 8 9 9
2 1 2 3
3 2 3 4
Output:
col1 col2 col3 ind
4 10.0 10.0 10.0 left_only

Pandas loop groupby

I have a data
df1
Col1 Col2 Col3 Col4
12 10 R1 0.1
12 10 R2 0.1
12 8 R3 0.6
11 4 R4 0.2
12 10 R5 0.4
11 4 R6 0.1
df2 is a subset of df1
col 1 col 2 count
12 10 3
12 8 1
11 4 2
I want the output of rows matching col1 & col2 of df2 with df1.and thereby automate for each and every combination in df2.
For Combination of 12 ,10 in df2 i want matching rows in df1
col 1 col2 col3 col 4
12 10 R1 0.1
12 10 R2 0.1
12 10 R5 0.4
similarly i want to create a loop for next combination in df2 (12,8)
Col 1 col 2 col 3 col 4
12 8 R3 0.6
similarly i want to create a loop for next combination in df2 (11,4)
Col 1 col 2 col 3 col 4
11 4 R4 0.2
11 4 R6 0.1
i have tried this df3=df1[(df1.Col1 == 12.0)&(df1.Col2 == 10)] but want to automate it without mentioning the combination
I think you second DataFrame is not necessary, only loop by each combinations of unique values in Col1 and Col2 columns:
for i, g in df1.groupby(['Col1','Col2']):
print (i)
print (g)
If want more dynamic solution for dictionry of DataFrame:
d = {f'{i[0]}_{i[1]}':g for i, g in df1.groupby(['Col1','Col2'])}
print (d)
{'11_4': Col1 Col2 Col3 Col4
3 11 4 R4 0.2
5 11 4 R6 0.1, '12_8': Col1 Col2 Col3 Col4
2 12 8 R3 0.6, '12_10': Col1 Col2 Col3 Col4
0 12 10 R1 0.1
1 12 10 R2 0.1
4 12 10 R5 0.4}
print (d['11_4'])
Col1 Col2 Col3 Col4
3 11 4 R4 0.2
5 11 4 R6 0.1

Appending a list to a dataframe

I have a dataframe let's say:
col1 col2 col3
1 x 3
1 y 4
and I have a list:
2
3
4
5
Can I append the list to the data frame like this:
col1 col2 col3
1 x 3
1 y 4
2 Nan Nan
3 Nan Nan
4 Nan Nan
5 Nan Nan
Thank you.
Use concat or append with DataFrame contructor:
df = df.append(pd.DataFrame([2,3,4,5], columns=['col1']))
df = pd.concat([df, pd.DataFrame([2,3,4,5], columns=['col1'])])
print (df)
col1 col2 col3
0 1 x 3.0
1 1 y 4.0
0 2 NaN NaN
1 3 NaN NaN
2 4 NaN NaN
3 5 NaN NaN