Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a below dataframe and want to count the number of time the value got changed over time.
input dataframe:
class date value
A 2019-01-02 80
A 2019-02-02 80
A 2019-03-02 90
A 2019-04-02 20
A 2019-05-02 80
A 2019-06-02 Null
A 2019-06-03 70
A 2019-06-04 70
A 2019-06-05 20
B ...
output dataframe as below:
class count_of_val
A 6
reason: (80,90,20,80, null,70, 20)
IIUC, use:
(df.groupby('class', sort=False)['value']
.apply(lambda x: (x != x.shift()).sum()-1)
.reset_index(name='count_of_val'))
[out]
class count_of_val
0 A 6
You can use the diff() function of pandas-DataFrame
df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
df['count_of_val'].loc[df['class']=='A'].sum()
Output is:
6
Or if you like DataFrames:
df['count_of_val']=np.where((df['value'].diff()).fillna(method="bfill")!=0,1,0)
desired_class = 'A'
df_count = pd.DataFrame(columns = ['class', 'count_of_val'],
data = [[desired_class, df['count_of_val'].loc[df['class']==desired_class].sum()]])
df_count
Output:
class count_of_val
0 A 6
Compute the rolling difference
diff_kernel = np.array([1,-1])
df['change'] = df.groupby('class', as_index=False)['value'].transform(lambda s: np.array(np.convolve(s, diff_kernel ,'same'), dtype=bool))
Then you can sum it as bool:
change_sum = df.groupby('class')['change'].sum()
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Can anyone suggest me how to sort the dates in Pandas? I tried some methods but couldn't able to get desired result
Index Date Confirmed
0 01-01-2020 2
1 01-02-2020 3
2 01-03-2020 1834
3 02-01-2020 23
4 02-02-2020 3
5 02-03-2020 5
First convert column type to datetime using pd.to_datetime then sort using pd.DataFrame.sort_values and then reset index.
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df = df.sort_values('Date').reset_index(drop=True)
df
Date Confirmed
0 2020-01-01 2
1 2020-01-02 23
2 2020-02-01 3
3 2020-02-02 3
4 2020-03-01 1834
5 2020-03-02 5
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm very new to coding so I am here seeking some help. I have the table below, and I need to do as follow:
Every time that
[All_SNPs]<[Informative_SNPs]
I need to replace negative numbers or number=0 in [All_SNPs] with the values in [Informative_SNPs]. I have tried with awk but I can't get my head around this. Thank you if you can help.
Input
ID Informative_SNPs All_SNPs
1 13 0
2 29 -27
3 15 18
4 10 0
5 11 -850
6 25 37
Output
ID Informative_SNPs All_SNPs
1 13 13
2 29 29
3 15 18
4 10 10
5 11 11
6 25 37
awk 'NR>1 && ($3<=0 || $3<$2) {$3=$2}1' file
Output:
ID Informative_SNPs All_SNPs
1 13 13
2 29 29
3 15 18
4 10 10
5 11 11
6 25 37
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a table with 1 column:
id
1
2
3
4
5
6
7
8
9
and I want the following output:
11
22
33
41
52
63
71
82
93
I am using Oracle.
you can try like below using case when
select case when (id*11)<=33 then (id*11)
when (id*11)>33 and (id*11)<70 then (id*11)-3
else (id*11)-6 end as col
from your_table
demo link
output
COL
11
22
33
41
52
63
71
82
93
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Let suppose there four entries in table as shown below.I want only Row 1 and 2 in resultset.
The vice versa case of 1 and 2 in Row 3 and 4 should be excluded. Please suggest a query for that
Pk Col1 Col2 Col3 Col4
1 A B 20 30
2 E D 40 50
3 B A 20 30
4 D E 40 50
WHERE Col1 < Col2 immediately comes to mind. Actually, that would give you rows 1 and 4, but I presume that's good enough for yuor purposes.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am getting records as below:
PERIOD LABEL1 LABEL2 LABEL3 LABEL4
-----------------------------------
1 12
1 14
1 11
2 10
2 09
and so on..
I want it like below:
PERIOD LABEL1 LABEL2 LABEL3 LABEL4
-----------------------------------
1 12 14 11
2 10 09
Hope its clear.
If you only have positive values, you can use a mix of nvl and max:
select period,
max(nvl(label1, 0)) label1,
max(nvl(label2, 0)) label2,
max(nvl(label3, 0)) label3,
max(nvl(label4, 0)) label4
from my_table
group by period;