Replace Same values in a column if value in next row is same - excel-2007

I have values in column a, please give me formula to get values in column b
COLUMN "a"
----------
NNII
NNII
NNII
NKJE
NNII
BLFL
BLFL
NKD#54
NKD#54
NKD#54
NKD#54
LWEI
LWEI
LWEI
LWEI
LWEI
LWEI
LWEI
LWEI
COLUMN "b"
----------
NNII
0
0
NKJE
NNII
BLFL
0
NKD#54
0
0
0
LWEI
0
0
0
0
0
0
0

I think I udnerstand what you want. The forumula is:
=IF(a6=b6,a6)
Will return the value of a6 if b6 is the same. Or else it returns FALSE.
a b c
NII NII NII
NII 0 FALSE

Related

pandas creating new columns for each value in categorical columns

I have a pandas dataframe with some numeric and some categoric columns. I want to create a new column for each value of every categorical column and give that column a value of 1 in every row where that value is true and 0 in every row where that value is false. So the df is something like this -
col1 col2 col3
A P 1
B P 3
A Q 7
expected result is something like this:
col1 col2 col3 A B P Q
A P 1 1 0 1 0
B P 3 0 1 1 0
A Q 7 1 0 0 1
Is this possible? can someone please help me?
Use df.select_dtypes, pd.get_dummies with pd.concat:
# First select all columns which have object dtypes
In [826]: categorical_cols = df.select_dtypes('object').columns
# Create one-hot encoding for the above cols and concat with df
In [817]: out = pd.concat([df, pd.get_dummies(df[categorical_cols])], 1)
In [818]: out
Out[818]:
col1 col2 col3 col1_A col1_B col2_P col2_Q
0 A P 1 1 0 1 0
1 B P 3 0 1 1 0
2 A Q 7 1 0 0 1

Trying to get subtotals from a pandas dataframe

I'm doing cross-tabulation between two columns in the dataframe. Here's a sample from the columns:
column_1 column_2
A -8
B 95
A -93
D 11
C -62
D -14
A -55
C 66
B 76
D -49
I'm looking for a code that returns sub totals for A, B, C and D. For instance, for A the subtotal will be -156 (-8-93-55 = -156).
I tried to do that with pandas.crosstab() function:
pandas.crosstab(df[column_1], df[column_2], margins=True, margins_name=column_1).Total
Here's a sample of the output:
-271 -263 -241 -223 -221 -212 -207 -201 ... sum_column
A 1 0 1 0 0 1 0 0 ... ##
B 0 0 0 1 0 0 0 0 ... ##
C 0 0 0 0 1 0 0 1 ... ##
D 0 1 0 0 0 0 1 0 ... ##
The sum column consists of the sums of the boolean values in each row, instead of the sub totals for each of the four letters. I saw once that a boolean table can be used for calculations but I quite sure that by changing the pandas.crosstab() command the desired output can be achieved.
I'd be happy to get some ideas and thoughts from you.
Thanks.
If you'd simply like the totals by the individual categories in column_1 (A, B, C, D), maybe a groupby and summation could be helpful! You would call the groupby on the column with your categories, and then call sum on the result, like this:
df.groupby('column1').sum()

Comparing two columns: if they match, print the value in a new column and if they do not match print the value of the second column to the new column

I have a file with multiple columns. I want to compare A1 ($4) and A2 ($14), and if the values do not match, print the value of A2 ($14). If the values match, I want to print the value of A1 ($15).
File:
chr SNP BP A1 TEST N OR Z P chr SNP cm BP A2 A1
20 rs6078030 61098 T ADD 421838 0.9945 -0.209 0.8344 20 rs6078030 0 61098 C T
20 rs143291093 61270 G ADD 422879 1.046 0.5966 0.5508 20 rs143291093 0 61270 G A
20 rs4814683 61795 T ADD 417687 1.015 0.6357 0.525 20 rs4814683 0 61795 G T
Desired output:
chr SNP BP A1 TEST N OR Z P chr SNP cm BP A2 A1 noneff
20 rs6078030 61098 T ADD 421838 0.9945 -0.209 0.8344 20 rs6078030 0 61098 C T C
20 rs143291093 61270 G ADD 422879 1.046 0.5966 0.5508 20 rs143291093 0 61270 G A A
20 rs4814683 61795 T ADD 417687 1.015 0.6357 0.525 20 rs4814683 0 61795 G T G
I checked the difference between column 4 and 15 first.
awk '$4!=$15{print $4,$15}' file > diff
Then I tried to write the if-else statement:
awk '{if($4=$14) print $16=$14 ; else print $16=$15}' file > new_file
Try this:
awk 'NR==1{$(++NF)="noneff"}NR>1{$(++NF)=($4==$14)?$15:$14}1' so1186.txt
Output:
awk 'NR==1{$(++NF)="noneff"}NR>1{$(++NF)=($4==$14)?$15:$14}1' so1186.txt | column -t
chr SNP BP A1 TEST N OR Z P chr SNP cm BP A2 A1 noneff
20 rs6078030 61098 T ADD 421838 0.9945 -0.209 0.8344 20 rs6078030 0 61098 C T C
20 rs143291093 61270 G ADD 422879 1.046 0.5966 0.5508 20 rs143291093 0 61270 G A A
20 rs4814683 61795 T ADD 417687 1.015 0.6357 0.525 20 rs4814683 0 61795 G T G
awk '{$(++NF)=($4==$15)?$4:$15}1' file

efficiently aggregate multi-row data in a single row representation in pandas dataframe

I have onehot coded pandas dataframe like following
p c1 c2 c3
A 1 0 0
B 1 0 0
A 0 1 0
A 0 0 1
B 0 0 1
I want to put the values of missing cells in one column from the following rows as follows
desired output
p c1 c2 c3
A 1 1 1
B 1 0 1
Like this:
In [463]: df.groupby('p').agg(sum).reset_index()
Out[463]:
p c1 c2 c3
0 A 1 1 1
1 B 1 0 1
s = df.set_index('p').stack()
df = s[s.eq(1)].unstack().fillna(0).astype(int).reset_index()
df.shape()

fill null values with condtional statement using .shift()

I want to fill null values of a column based on values in other column.
A B
1 21
0 21
0 21
1 25
1 28
0 28
My B value increases only if A value is 1.
So I have some null values in column A like
A B
1 21
0 21
NAN 21
1 25
1 28
0 28
I want to fill this null value with 0 beacuse corresponding value of B didn't increase.
df['A'] = np.where((df['A'].isnull()) & (df['B'] ==df['B'].shift()),0,df['A'])
This isn't giving the correct results. Where am i going wrong
loc might work better here.
df.loc[(df['A'] == np.nan) & (df['B'] == df['B'].shift(-1)),'A'] = 0
# I havent checked if the shift needs to be -1 or 1