I have two dataframes that look like this:
df1:
Index var1
0 56
1 67
2 21
Index var2
0 89
1 64
2 31
When I append or concatenate them, I get this:
Index var1 var2
0 56 nan
1 67 nan
2 21 nan
0 nan 89
1 nan 64
2 nan 31
But I would like to get this:
Index var1 var2
0 56 89
1 67 64
2 21 31
The commands I used are:
pd.concat([df1, df2], axis=1)
df1.append([df2])
EDIT:
This is a min-example:
df1 = pd.DataFrame({'var1' : [56,67,21]})
df2 = pd.DataFrame({'var2' : [89,64,31]})
df1.to_dict()
{'var1': {0: 56, 1: 67, 2: 21}}
df2.to_dict()
{'var2': {0: 89, 1: 64, 2: 31}}
df1.index.dtype
dtype('int64')
df2.index.dtype
dtype('int64')
Use:
df1 = df1.set_index('Index')
df2 = df2.set_index('Index')
pd.concat([df1,df2], axis=1)
NOTE: Besure that Index is in the index of the dataframe:
Output:
var1 var2
Index
0 56 89
1 67 64
2 21 31
Related
How would I achieve the desired output as shown below? Ie, stack the first 3 columns underneath each other, stack the second 3 columns underneath each other and rename the columns.
d = {'A': [76, 34], 'B': [21, 48], 'C': [45, 89], 'D': [56, 41], 'E': [3, 2],
'F': [78, 32]}
df = pd.DataFrame(data=d)
df.columns=['A', 'A', 'A', 'A', 'A', 'A']
Output
df
A A A A A A
0 76 21 45 56 3 78
1 34 48 89 41 2 32
Desired Output
Z1 Z2
0 76 56
1 34 41
2 21 3
3 48 2
4 45 78
5 89 32
Go down into numpy, reshape and create a new dataframe:
pd.DataFrame(df.to_numpy().reshape((-1, 2), order='F'), columns = ['Z1','Z2'])
Out[19]:
Z1 Z2
0 76 56
1 34 41
2 21 3
3 48 2
4 45 78
5 89 32
I have created a dataframe called df as follows:
import pandas as pd
d = {'feature1': [1, 22,45,78,78], 'feature2': [33, 2,2,65,65], 'feature3': [100, 2,359,87,2],}
df = pd.DataFrame(data=d)
print(df)
The dataframe looks like this:
I want to create two new columns called Freq_1 and Freq_2 that count, for each record, how many times the number 1 and number 2 appear respectively. So, I'd like the resulting dataframe to look like this:
So, let's take a look at the column called Freq_1:
for the first record, it's equal to 1 because the number 1 appears only once across the whole first record;
for the other records, it's equal to 0 because the number 1 never appears.
Let's take a look now at the column called Freq_2:
for the first record, Freq_2 is equal to 0 because number 2 doesn't appear;
for second record, Freq_2 is equal to 2 because the number 2 appears twice;
and so on ...
How do I create the columns Freq_1 and Freq_2 in pandas?
Try this:
freq = {
i: df.eq(i).sum(axis=1) for i in range(10)
}
pd.concat([df, pd.DataFrame(freq).add_prefix("Freq_")], axis=1)
Result:
feature1 feature2 feature3 Freq_0 Freq_1 Freq_2 Freq_3 Freq_4 Freq_5 Freq_6 Freq_7 Freq_8 Freq_9
1 33 100 0 1 0 0 0 0 0 0 0 0
22 2 2 0 0 2 0 0 0 0 0 0 0
45 2 359 0 0 1 0 0 0 0 0 0 0
78 65 87 0 0 0 0 0 0 0 0 0 0
78 65 2 0 0 1 0 0 0 0 0 0 0
String pattern matching can be performed when the columns are casted to string columns.
d = {'feature1': [1, 22,45,78,78], 'feature2': [33, 2,2,65,65], 'feature3': [100, 2,359,87,2],}
df = pd.DataFrame(data=d)
df = df.stack().astype(str).unstack()
Now we can iterate for each pattern that we are looking for:
usefull_columns = df.columns
for pattern in ['1', '2']:
df[f'freq_{pattern}'] = df[usefull_columns].stack().str.count(pattern).unstack().max(axis=1)
Printing the output:
feature1 feature2 feature3 freq_1 freq_2
0 1 33 100 1.0 0.0
1 22 2 2 0.0 2.0
2 45 2 359 0.0 1.0
3 78 65 87 0.0 0.0
4 78 65 2 0.0 1.0
We can do
s = df.where(df.isin([1,2])).stack()
out = df.join(pd.crosstab(s.index.get_level_values(0),s).add_prefix('Freq_')).fillna(0)
Out[299]:
feature1 feature2 feature3 Freq_1.0 Freq_2.0
0 1 33 100 1.0 0.0
1 22 2 2 0.0 2.0
2 45 2 359 0.0 1.0
3 78 65 87 0.0 0.0
4 78 65 2 0.0 1.0
I have a julia dataframe that looks like this:
time data
0 34
1 34
2 30
3 37
4 32
5 35
How do I create a new binary column that is 0 if time is less than 2 and greater than 4, and 1 if not either condition?
Like this:
time data x
0 34 0
1 34 0
2 30 1
3 37 1
4 32 1
5 35 0
In python, I would do something like:
def func(df):
if df.time < 2 or df.time > 4:
return 0
else:
return 1
df['x'] = df.apply(func, axis=1)
In Julia we have the beautiful Dot Syntax which can be gracefully applied here:
julia> df[!, :x] = 2 .<= df[!, :time] .<= 4
6-element BitVector:
0
0
1
1
1
0
or alternatively
df.x = 2 .<= df.time .<= 4
compute mean of particular column for each unique subset of rows in pandas dataframe. In following example each subset is till 1 appears in column "Flag" i.e. (54+34+78+91+29)/5 = 57.2 and (81+44+61)/3 = 62.0
Currently unable to compute rolling subset of different sizes based on particular column condition
>>> import pandas as pd
>>> df = pd.DataFrame({"Indx": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "Units": [54, 34, 78, 91, 29, 81, 44, 61, 73, 19], "Flag": [0, 0, 0, 0, 1, 0, 0, 1, 0, 1]})
>>> df
Indx Units Flag
0 1 54 0
1 2 34 0
2 3 78 0
3 4 91 0
4 5 29 1
5 6 81 0
6 7 44 0
7 8 61 1
8 9 73 0
9 10 19 1
# DESIRED OUTPUT
>>> df
Indx Units Flag avg
0 1 54 0 57.2
1 2 34 0 57.2
2 3 78 0 57.2
3 4 91 0 57.2
4 5 29 1 57.2
5 6 81 0 62.0
6 7 44 0 62.0
7 8 61 1 62.0
8 9 73 0 46.0
9 10 19 0 46.0
Create the group key by using cumsum then transform
df['Units'].groupby(df.Flag.iloc[::-1].cumsum()).transform('mean')
0 57.2
1 57.2
2 57.2
3 57.2
4 57.2
5 62.0
6 62.0
7 62.0
8 46.0
9 46.0
Name: Units, dtype: float64
#df['new']=df['Units'].groupby(df.Flag.iloc[::-1].cumsum()).transform('mean')
The shortest solution (I think) is:
df['avg'] = df.groupby(df.Flag[::-1].cumsum()).Units.transform('mean')
You don't even need to use iloc, as df.Flag[::-1] retrieves Flag
column in reversed order.
I have 2 dataframes df1 and df2
df1;
A B C
0 11 22 55
1 66 34 54
2 0 34 66
df2;
A B C
0 11 33 455
1 0 0 54
2 0 34 766
Both dataframes have the same dimensions. I want to say if value is 0 in df2 then give that value (based on column and index) a 0 in df1.
So df1 will be
df1;
A B C
0 11 22 55
1 0 0 54
2 0 34 66
Use DataFrame.mask:
df1 = df1.mask(df2 == 0, 0)
For better performance use numpy.where:
df1 = pd.DataFrame(np.where(df2 == 0, 0, df1),
index=df1.index,
columns=df1.columns)
print (df1)
A B C
0 11 22 55
1 0 0 54
2 0 34 66
Using where:
df1 = df1.where(df2.ne(0), 0)
print(df1)
A B C
0 11 22 55
1 0 0 54
2 0 34 66
Another way -
df1 = df1[~df2.eq(0)].fillna(0)