I have 2 dataframes df1 and df2. I am trying to apply styling on df1, then drop a column from it and then finally concatenate with df2. Styling on df1 should be retained, though its being lost
I am using the code as listed below, though doesn't seem to work
df1 = pd.DataFrame([["A", 1],["B", 2]], columns=["Letter", "Number"])
df2 = pd.DataFrame([["A", 1],["B", 2]], columns=["Letter2", "Number2"])
def highlight(s):
return ['background-color: red']*2
df1 = df1.style.apply(highlight)
df1.data = df1.data.drop('Letter', axis=1)
combined = pd.concat([df1.data, df2],sort=True)
with pd.ExcelWriter('testcolor.xlsx') as writer:
combined.to_excel(writer,sheet_name = 'test')
I am expecting "Number" from df1 to be highlighted red and Letter2 and Number2 to be in original colour
Related
I'm trying to replicate SQL UPDATE-type functionality in pandas. I've seen other solutions suggesting using pandas update method or merge and dropping columns.
Example dataframes:
df1 = pd.DataFrame([[1,False, None], [1,True, None], [1, False, 'UpdateMe'], [2,True, None]], columns=['id', 'value1', 'value2'])
df2 = pd.DataFrame([[1,True, 'Updated'], [2,True, 'Updated']], columns=['id', 'value1', 'value2'])
Here is the SQL I am trying to replicate:
UPDATE df1
SET value1 = df2.value1, value2 = df2.value2
FROM df1
JOIN df2 ON df1.id = df2.id
WHERE df1.value2 = 'UpdateMe';
I can get the update to work without any qualifier like so:
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df1.update(df2, overwrite=True)
df1.reset_index(drop=False, inplace=True)
df1
id value1 value2
0 1 TRUE Updated
1 1 TRUE Updated
2 1 TRUE Updated
3 2 TRUE Updated
However, when I add a qualifier to which records in the dataframe to update, I get a warning and the target dataframe does not get updated.
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df1.loc[
df1.value2 == 'UpdateMe'
].update(df2, overwrite=True)
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[col] = expressions.where(mask, this, that)
Here is the expected output:
id value1 value2
0 1 FALSE
1 1 TRUE
2 1 TRUE Updated
3 2 TRUE
Any suggestion on how to update multiple columns with a .loc or type of where clause?
You can create temporary columns using merge. Then, user np.where similar to =If() function in excel. Next, remove the temporary columns.
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1,False, None], [1,True, None], [1, False, 'UpdateMe'], [2,True, None]], columns=['id', 'value1', 'value2'])
df2 = pd.DataFrame([[1,True, 'Updated'], [2,True, 'Updated']], columns=['id', 'value1', 'value2'])
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
#Answer
df1 = df1.merge(df2.rename(columns = {'value1':'value1_temp','value2':'value2_temp'}), how = 'left', right_index = True, left_index = True)
df1.value1 = np.where(df1.value2 == 'UpdateMe', df1.value1_temp, df1.value1)
df1.value2 = np.where(df1.value2 == 'UpdateMe', df1.value2_temp, df1.value2)
df1 = df1.drop(labels = ['value1_temp','value2_temp'], axis = 1)
df1
Whole dataframe can be copied to df2 as below.
How to copy only 'B' column and index in df to df2?
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30],'B': [100, 200, 300]}, index=['2021-11-24', '2021-11-25', '2021-11-26'])
df2 = df.copy()
You can simply select and then copy as follows:
df2 = df[['B']].copy()
I am using a list as the selection in order to have a DataFrame instead of a pd.Series.
I am trying to compare 2 pandas dataframes in terms of column names and datatypes. With assert_frame_equal, I get an error since shapes are different. Is there a way to ignore it, as I could not find it in the documentation.
With df1_dict == df2_dict, it just says whether its similar or not, I am trying to print if there are any differences in terms of feature names or datatypes.
df1_dict = dict(df1.dtypes)
df2_dict = dict(df2.dtypes)
# df1_dict = {'A': np.dtype('O'), 'B': np.dtype('O'), 'C': np.dtype('O')}
# df2_dict = {'A': np.dtype('int64'), 'B': np.dtype('O'), 'C': np.dtype('O')}
print(set(df1_dict) - set(df2_dict))
print(f'''Are two datsets similar: {df1_dict == df2_dict}''')
pd.testing.assert_frame_equal(df1, df2)
Any suggestions would be appreciated.
It seems to me that if the two dataframe descriptions are outer joined, you would have all the information you want.
example:
df1 = pd.DataFrame({'a': [1,2,3], 'b': list('abc')})
df2 = pd.DataFrame({'a': [1.0,2.0,3.0], 'b': list('abc'), 'c': [10,20,30]})
diff = df1.dtypes.rename('df1').reset_index().merge(
df2.dtypes.rename('df2').reset_index(), how='outer'
)
def check(x):
if pd.isnull(x.df1):
return 'df1-missing'
if pd.isnull(x.df2):
return 'df2-missing'
if x.df1 != x.df2:
return 'type-mismatch'
return 'ok'
diff['diff_status'] = diff.apply(check, axis=1)
# diff prints:
index df1 df2 diff_status
0 a int64 float64 type-mismatch
1 b object object ok
2 c NaN int64 df1-missing
From a list of values, I try to identify any sequential pair of values whose sum exceeds 10
a = [1,9,3,4,5]
...so I wrote a for loop...
values = []
for i in range(len(a)-2):
if sum(a[i:i+2]) >10:
values += [a[i:i+2]]
...which I rewritten as a list comprehension...
values = [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >10]
Both produce same output:
values = [[1,9], [9,3]]
My question is how best may I apply the above list comprehension in a DataFrame.
Here is the sample 5 rows DataFrame
import pandas as pd
df = pd.DataFrame({'A': [1,1,1,1,0],
'B': [9,8,3,2,2],
'C': [3,3,3,10,3],
'E': [4,4,4,4,4],
'F': [5,5,5,5,5]})
df['X'] = df.values.tolist()
where:
- a is within a df['X'] which is a list of values Columns A - F
df['X'] = [[1,9,3,4,5],[1,8,3,4,5],[1,3,3,4,5],[1,2,10,4,5],[0,2,3,4,5]]
and, result of the list comprehension is to be store in new column df['X1]
Desired output is:
df['X1'] = [[[1,9], [9,3]],[[8,3]],[[NaN]],[[2,10],[10,4]],[[NaN]]]
Thank you.
You could use pandas apply function, and put your list comprehension in it.
df = pd.DataFrame({'A': [1,1,1,1,0],
'B': [9,8,3,2,2],
'C': [3,3,3,10,3],
'E': [4,4,4,4,4],
'F': [5,5,5,5,5]})
df['x'] = df.apply(lambda a: [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >= 10], axis=1)
#Note the axis parameters tells if you want to apply this function by rows or by columns, axis = 1 applies the function to each row.
This will give the output as stated in df['X1']
I know that there are several ways to build up a dataframe in Pandas. My question is simply to understand why the method below doesn't work.
First, a working example. I can create an empty dataframe and then append a new one similar to the documenta
In [3]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: df2.append(df1)
Out[3]: a b
0 1 2
However, if I do the following df2 becomes None:
In [10]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: for i in range(10):
...: df2.append(df1)
In [11]: df2
Out[11]:
Empty DataFrame
Columns: []
Index: []
Can someone explain why it works this way? Thanks!
This happens because the .append() method returns a new df:
Pandas Docs (0.19.2):
pandas.DataFrame.append
Returns: appended: DataFrame
Here's a working example so you can see what's happening in each iteration of the loop:
df1 = pd.DataFrame([[1,2],], columns=['a','b'])
df2 = pd.DataFrame()
for i in range(0,2):
print(df2.append(df1))
> a b
> 0 1 2
> a b
> 0 1 2
If you assign the output of .append() to a df (even the same one) you'll get what you probably expected:
for i in range(0,2):
df2 = df2.append(df1)
print(df2)
> a b
> 0 1 2
> 0 1 2
I think what you are looking for is:
df1 = pd.DataFrame()
df2 = pd.DataFrame([[1,2,3],], columns=['a','b','c'])
for i in range(0,4):
df1 = df1.append(df2)
df1
df.append() returns a new object. df2 is a empty dataframe initially, and it will not change. if u do a df3=df2.append(df1), u will get what u want