SQL select filter column based on another column - sql

I've got some data like:
val
chr1
chr2
1
a
x1
2
a
y2
3
a
z3
4
b
x1
5
b
y2
6
b
z3
I want to select data, so that in the result if chr1 = 'a' then chr2 only has x1, otherwise don't filter chr2 i.e :
val
chr1
chr2
1
a
x1
4
b
x1
5
b
y2
6
b
z3
I have the restriction due to a platform I'm forced to use, that I can only filter using the WHERE condition of the query, on Redshift.

You may use the following logic:
SELECT *
FROM yourTable
WHERE chr1 <> 'a' OR chr2 = 'x1';
So a whitelisted record is one which does not have chr1 = 'a', or a record which does have this value but also has chr2 = 'x1'.

Related

Replace NaN inside a Column by another Dataframe using a Condition

I have a Dataframe like below and we have NaN only inside column 'Type 2':
id
Type 1
Type 2
0
a
b
1
c
d
2
e
NaN
3
g
f
4
i
h
5
j
k
6
l
NaN
7
m
NaN
8
o
p
9
x
y
9
z
NaN
And another one is an ordered Dataframe like Below:
id
Type 1
Type 2
0
a
o1
1
b
o2
2
c
o3
3
d
o4
...
...
...
23
x
o24
24
y
o25
25
z
o26
I want to fill NaN inside column 'type 2' with the second Dataframe Type 2 correspond Type 1.
I know I should use fillna() but I do not know how to add conditions like above, I wrote.
Thank you

How to replace missing value in a dataframe with an equation in python

I have the table below, where the missing values ​​in columns Bird1 and Bird2 must be replaced by the result of the linear equation Y(X) = aX + b, where "a" and "b" are constants.
Bird1
Bird2
Bird3
22
33
X0
NaN
4
X1
3
NaN
X2
1
NaN
X3
The result should be as per the table below. How to implement this code in python?
Bird1
Bird2
Bird3
22
33
X0
aX1+b
4
X1
3
aX2+b
X2
1
aX3+b
X3
Here's a way to do it using pandas.DataFrame.fillna
# define a and b constants, ex.
a = 10
b = 5
df.Bird1.fillna(a*df.Bird3 + b, inplace=True)
df.Bird2.fillna(a*df.Bird3 + b, inplace=True)

How to selectively retrieve specific and nearby rows from a file using a reference list? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have two table arrays, here's file A:
k1 A 1
k1 A 2
k1 B 1
k1 B 2
k1 B 3
k1 B 4
k1 B 5
k1 B 6
k1 B 7
k1 B 8
k1 B 9
k1 V 1
k1 V 2
k1 V 3
k1 V 4
k1 V 5
k1 V 6
k1 S 1
k1 S 2
And a subset of first array (say file B):
k1 A 2
k1 B 5
k1 V 2
k1 S 1
I want to selectively retain rows in file B from file A and extract nearby rows +/- 3 according to values in column 3.
My expected output is:
k1 A 1
k1 A 2
k1 B 2
k1 B 3
k1 B 4
k1 B 5
k1 B 6
k1 B 7
k1 B 8
k1 V 1
k1 V 2
k1 V 3
k1 V 4
k1 V 5
k1 S 1
k1 S 2
Any suggestions on how it could be achieved? Thank you so much!
This awk one-liner does the job:
awk '{k=$1 FS $2}
NR==FNR{a[k]=$3;next}a[k]>=$3-3 && a[k]<=$3+3' B A
For the given input example (A & B), it will give you the expected output.
The logic is also straightforward, if you know the awk syntax, you should understand it. The codes explain itself.
P.S.
You tagged the question with grep as well, but grep is not the right tool to do it.

Pandas divide column by index mean

I have a pandas dataframe with 2 index's and I want to divide each value by the column average for the second index (A, B).
For example input df
col1 col2
0 A 1 20
1 A 2 10
2 A 1 10
4 A 4 5
5 B 6 15
6 B 2 50
So for col1, I will dived 0A 1A 2A by 2 because the average of 1,2,1,4 is 2.
col1
0 A 0.5
1 A 1
2 A 0.5
4 A 2
5 B 1.5
6 B 0.5
Can anyone see a good way of doing this?
IIUC, try:
df.groupby(level=1)['col1'].apply(lambda x: x/x.mean())
Better without apply is :
df.col1/df.groupby(level=1)['col1'].transform('mean')
Output
0 A 0.5
1 A 1.0
2 A 0.5
4 A 2.0
5 B 1.5
6 B 0.5

Pandas Group then Shift Column and keep last row

I want to group column idx then shift column val and keep the last row with idx.
import pandas as pd
df = pd.DataFrame({'idx':['a','a','b','b'],
'val':['a1','a2','b1','b2']})
df
idx val
0 a a1
1 a a2
2 b b1
3 b b2
I tried df['val_shift'] = df.groupby('idx').val.shift(1)
idx val val_shift
0 a a1 NaN
1 a a2 a1
2 b b1 NaN
3 b b2 b1
But I want.
idx val
0 a NaN
1 a a1
2 a a2
3 b NaN
4 b b1
5 b b2
Are there any way to get this?
I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:
df1 = df.drop_duplicates('idx', keep='last')
df1.index += .5
df = pd.concat([df, df1]).sort_index().reset_index(drop=True)
Alternative solution:
df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)
df['val_shift'] = df.groupby('idx').val.shift(1)
print (df)
idx val val_shift
0 a a1 NaN
1 a a2 a1
2 a a2 a2
3 b b1 NaN
4 b b2 b1
5 b b2 b2
If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:
df['val_shift'] = df.pop('val').groupby(df['idx']).shift(1)
print (df)
idx val_shift
0 a NaN
1 a a1
2 a a2
3 b NaN
4 b b1
5 b b2
It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.
pd.concat([
d[['idx']].head(1).append(d)
for _, d in df.groupby('idx')
], ignore_index=True)
idx val
0 a NaN
1 a a1
2 a a2
3 b NaN
4 b b1
5 b b2
Alternative
df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)
Using concat with tail
newdf=pd.concat([df,df.groupby('idx').tail(1)])
newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
newdf
Out[885]:
idx val
0 a NaN
1 a a1
1 a a2
2 b NaN
3 b b1
3 b b2