Performing operations after styling in a dataframe - pandas

I have 2 dataframes df1 and df2. I am trying to apply styling on df1, then drop a column from it and then finally concatenate with df2. Styling on df1 should be retained, though its being lost
I am using the code as listed below, though doesn't seem to work
df1 = pd.DataFrame([["A", 1],["B", 2]], columns=["Letter", "Number"])
df2 = pd.DataFrame([["A", 1],["B", 2]], columns=["Letter2", "Number2"])
def highlight(s):
return ['background-color: red']*2
df1 = df1.style.apply(highlight)
df1.data = df1.data.drop('Letter', axis=1)
combined = pd.concat([df1.data, df2],sort=True)
with pd.ExcelWriter('testcolor.xlsx') as writer:
combined.to_excel(writer,sheet_name = 'test')
I am expecting "Number" from df1 to be highlighted red and Letter2 and Number2 to be in original colour

Related

Update one dataframe from another with qualifiers

I'm trying to replicate SQL UPDATE-type functionality in pandas. I've seen other solutions suggesting using pandas update method or merge and dropping columns.
Example dataframes:
df1 = pd.DataFrame([[1,False, None], [1,True, None], [1, False, 'UpdateMe'], [2,True, None]], columns=['id', 'value1', 'value2'])
df2 = pd.DataFrame([[1,True, 'Updated'], [2,True, 'Updated']], columns=['id', 'value1', 'value2'])
Here is the SQL I am trying to replicate:
UPDATE df1
SET value1 = df2.value1, value2 = df2.value2
FROM df1
JOIN df2 ON df1.id = df2.id
WHERE df1.value2 = 'UpdateMe';
I can get the update to work without any qualifier like so:
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df1.update(df2, overwrite=True)
df1.reset_index(drop=False, inplace=True)
df1
id value1 value2
0 1 TRUE Updated
1 1 TRUE Updated
2 1 TRUE Updated
3 2 TRUE Updated
However, when I add a qualifier to which records in the dataframe to update, I get a warning and the target dataframe does not get updated.
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
df1.loc[
df1.value2 == 'UpdateMe'
].update(df2, overwrite=True)
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[col] = expressions.where(mask, this, that)
Here is the expected output:
id value1 value2
0 1 FALSE
1 1 TRUE
2 1 TRUE Updated
3 2 TRUE
Any suggestion on how to update multiple columns with a .loc or type of where clause?
You can create temporary columns using merge. Then, user np.where similar to =If() function in excel. Next, remove the temporary columns.
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1,False, None], [1,True, None], [1, False, 'UpdateMe'], [2,True, None]], columns=['id', 'value1', 'value2'])
df2 = pd.DataFrame([[1,True, 'Updated'], [2,True, 'Updated']], columns=['id', 'value1', 'value2'])
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
#Answer
df1 = df1.merge(df2.rename(columns = {'value1':'value1_temp','value2':'value2_temp'}), how = 'left', right_index = True, left_index = True)
df1.value1 = np.where(df1.value2 == 'UpdateMe', df1.value1_temp, df1.value1)
df1.value2 = np.where(df1.value2 == 'UpdateMe', df1.value2_temp, df1.value2)
df1 = df1.drop(labels = ['value1_temp','value2_temp'], axis = 1)
df1

(Python)Dataframe copy with selected columns and index

Whole dataframe can be copied to df2 as below.
How to copy only 'B' column and index in df to df2?
import pandas as pd
df = pd.DataFrame({'A': [10, 20, 30],'B': [100, 200, 300]}, index=['2021-11-24', '2021-11-25', '2021-11-26'])
df2 = df.copy()
You can simply select and then copy as follows:
df2 = df[['B']].copy()
I am using a list as the selection in order to have a DataFrame instead of a pd.Series.

Check similarity of 2 pandas dataframes

I am trying to compare 2 pandas dataframes in terms of column names and datatypes. With assert_frame_equal, I get an error since shapes are different. Is there a way to ignore it, as I could not find it in the documentation.
With df1_dict == df2_dict, it just says whether its similar or not, I am trying to print if there are any differences in terms of feature names or datatypes.
df1_dict = dict(df1.dtypes)
df2_dict = dict(df2.dtypes)
# df1_dict = {'A': np.dtype('O'), 'B': np.dtype('O'), 'C': np.dtype('O')}
# df2_dict = {'A': np.dtype('int64'), 'B': np.dtype('O'), 'C': np.dtype('O')}
print(set(df1_dict) - set(df2_dict))
print(f'''Are two datsets similar: {df1_dict == df2_dict}''')
pd.testing.assert_frame_equal(df1, df2)
Any suggestions would be appreciated.
It seems to me that if the two dataframe descriptions are outer joined, you would have all the information you want.
example:
df1 = pd.DataFrame({'a': [1,2,3], 'b': list('abc')})
df2 = pd.DataFrame({'a': [1.0,2.0,3.0], 'b': list('abc'), 'c': [10,20,30]})
diff = df1.dtypes.rename('df1').reset_index().merge(
df2.dtypes.rename('df2').reset_index(), how='outer'
)
def check(x):
if pd.isnull(x.df1):
return 'df1-missing'
if pd.isnull(x.df2):
return 'df2-missing'
if x.df1 != x.df2:
return 'type-mismatch'
return 'ok'
diff['diff_status'] = diff.apply(check, axis=1)
# diff prints:
index df1 df2 diff_status
0 a int64 float64 type-mismatch
1 b object object ok
2 c NaN int64 df1-missing

How to apply a list comprehension in Panda Dataframe?

From a list of values, I try to identify any sequential pair of values whose sum exceeds 10
a = [1,9,3,4,5]
...so I wrote a for loop...
values = []
for i in range(len(a)-2):
if sum(a[i:i+2]) >10:
values += [a[i:i+2]]
...which I rewritten as a list comprehension...
values = [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >10]
Both produce same output:
values = [[1,9], [9,3]]
My question is how best may I apply the above list comprehension in a DataFrame.
Here is the sample 5 rows DataFrame
import pandas as pd
df = pd.DataFrame({'A': [1,1,1,1,0],
'B': [9,8,3,2,2],
'C': [3,3,3,10,3],
'E': [4,4,4,4,4],
'F': [5,5,5,5,5]})
df['X'] = df.values.tolist()
where:
- a is within a df['X'] which is a list of values Columns A - F
df['X'] = [[1,9,3,4,5],[1,8,3,4,5],[1,3,3,4,5],[1,2,10,4,5],[0,2,3,4,5]]
and, result of the list comprehension is to be store in new column df['X1]
Desired output is:
df['X1'] = [[[1,9], [9,3]],[[8,3]],[[NaN]],[[2,10],[10,4]],[[NaN]]]
Thank you.
You could use pandas apply function, and put your list comprehension in it.
df = pd.DataFrame({'A': [1,1,1,1,0],
'B': [9,8,3,2,2],
'C': [3,3,3,10,3],
'E': [4,4,4,4,4],
'F': [5,5,5,5,5]})
df['x'] = df.apply(lambda a: [a[i:i+2] for i in range(len(a)-2) if sum(a[i:i+2]) >= 10], axis=1)
#Note the axis parameters tells if you want to apply this function by rows or by columns, axis = 1 applies the function to each row.
This will give the output as stated in df['X1']

Why is my dataframe not appending? [duplicate]

I know that there are several ways to build up a dataframe in Pandas. My question is simply to understand why the method below doesn't work.
First, a working example. I can create an empty dataframe and then append a new one similar to the documenta
In [3]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: df2.append(df1)
Out[3]: a b
0 1 2
However, if I do the following df2 becomes None:
In [10]: df1 = pd.DataFrame([[1,2],], columns = ['a', 'b'])
...: df2 = pd.DataFrame()
...: for i in range(10):
...: df2.append(df1)
In [11]: df2
Out[11]:
Empty DataFrame
Columns: []
Index: []
Can someone explain why it works this way? Thanks!
This happens because the .append() method returns a new df:
Pandas Docs (0.19.2):
pandas.DataFrame.append
Returns: appended: DataFrame
Here's a working example so you can see what's happening in each iteration of the loop:
df1 = pd.DataFrame([[1,2],], columns=['a','b'])
df2 = pd.DataFrame()
for i in range(0,2):
print(df2.append(df1))
> a b
> 0 1 2
> a b
> 0 1 2
If you assign the output of .append() to a df (even the same one) you'll get what you probably expected:
for i in range(0,2):
df2 = df2.append(df1)
print(df2)
> a b
> 0 1 2
> 0 1 2
I think what you are looking for is:
df1 = pd.DataFrame()
df2 = pd.DataFrame([[1,2,3],], columns=['a','b','c'])
for i in range(0,4):
df1 = df1.append(df2)
df1
df.append() returns a new object. df2 is a empty dataframe initially, and it will not change. if u do a df3=df2.append(df1), u will get what u want