Pandas drop_duplicates with multiple conditions - pandas

I have some measurement datas that need to be filtered, I read them as dataframe data, like these:
df
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
and I need to use two different conditions at the same time, that is, to filter 'RequestTime' 'RequestID' and 'ResponseTime' 'ResponseID' by use drop_duplicate(subset=) at the same time. I have used follow command to get the filter results for each of the two conditions:
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
but how to combine the two conditions to drop duplicate row 3 and row 7?

IIUC,
m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]
Output:
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
Create a mask (boolean series) to boolean index your dataframe.
Or chain methods:
df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])

Related

re-arrange and plot data pandas

I have a data frame like the following:
days movements count
0 0 0 2777
1 0 1 51
2 0 2 2
3 1 0 6279
4 1 1 200
5 1 2 7
6 1 3 3
7 2 0 5609
8 2 1 110
9 2 2 32
10 2 3 4
11 3 0 4109
12 3 1 118
13 3 2 101
14 3 3 8
15 3 4 3
16 3 6 1
17 4 0 3034
18 4 1 129
19 4 2 109
20 4 3 6
21 4 4 2
22 4 5 2
23 5 0 2288
24 5 1 131
25 5 2 131
26 5 3 9
27 5 4 2
28 5 5 1
29 6 0 1918
30 6 1 139
31 6 2 109
32 6 3 13
33 6 4 1
34 6 5 1
35 7 0 1442
36 7 1 109
37 7 2 153
38 7 3 13
39 7 4 10
40 7 5 1
41 8 0 1085
42 8 1 76
43 8 2 111
44 8 3 13
45 8 4 7
46 8 7 1
47 9 0 845
48 9 1 81
49 9 2 86
50 9 3 8
51 9 4 8
52 10 0 646
53 10 1 70
54 10 2 83
55 10 3 1
56 10 4 2
57 10 5 1
58 10 6 1
This shows that for example on day 0, I have 2777 entries with 0 movements, 51 entries with 1 movement, 2 entries with 2 movements. I want to plot it as bar graph for every day and show the entries count for all movements. In order to do it, I thought I would transform the data to something like below and then plot a bar graph.
days 0 1 2 3 4 5 6 7
0 2777 51 2
1 6279 200 7 3
2 5609 110 32 4
3 4109 118 101 8 3
4 3034 129 109 6 2 2
5 2288 131 131 9 2 1
6 1918 139 109 13 1 1
7 1442 109 153 13 10 1
8 1085 76 111 13 7 1
9 845 81 86 8 8
10 646 70 83 1 2 1 1
I am not getting an idea of how should I achieve this? I have thousands of lines of data so doing it by hand does not make sense. Can someone guide me how to rearrange the data or if there is a quick way to plot the bar graph using matplotlib straight from the actual data frame that would be even better. Thanks for the help.

How can i give a numeric order to each line based on unique ID

I'm working on big data set and I need to give and print a numeric order for each unique ID ($1) and want to delete the lines above 335 numeric order for each unique ID.
The data looks like
101 24
101 13
101 15
102 25
102 21
102 23
103 20
103 12
103 18
The output looks like this
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
Try below one
Input
$ cat f
101 24
101 13
101 15
102 25
102 21
102 23
103 20
103 12
103 18
Output
$ awk '{print $0,++a[$1]}' f
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
If data is sorted ( column1 ) then use below one, faster
$ awk '$1!=p{n=0}{print $0,++n; p=$1}' f
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
To remove id above 335
$ awk '$1!=p{n=0; p=$1}++n<335{print $0,n}' f
$ awk '++a[$1]<335{print $0,a[$1]}' f

SQL - return the smallest value in one column that matches the value of another column

How can I return in column4 the smallest number from column3 based on column1?
1 100 1
2 100 1
1 101 2
1 102 4
2 200 19
3 200 19
16 200 19
18 200 19
19 200 19
20 200 19
3 301 28
6 301 28
3 302 29
3 310 30
4 400 31
4 410 32
4 420 33
5 500 34
7 500 34
5 510 35
6 510 35
5 520 36
6 610 37
7 700 38
7 701 39
8 701 39
8 800 40
8 802 41
Thank you!
Join to a subquery that calculates the minimums for each col1:
select a.col1, a.col2, a.col3, mcol3
from mytable a
join (select col1, min(col3) mcol3 from mytable group by col1) b
on b.col1 = a.col1
See SQLFiddle, showing this output from your sample data:
1 100 1 1
2 100 1 1
1 101 2 1
1 102 4 1
2 200 19 1
3 200 19 19
16 200 19 19
18 200 19 19
19 200 19 19
20 200 19 19
3 301 28 19
6 301 28 28
3 302 29 19
3 310 30 19
4 400 31 31
4 410 32 31
4 420 33 31
5 500 34 34
7 500 34 34
5 510 35 34
6 510 35 28
5 520 36 34
6 610 37 28
7 700 38 34
7 701 39 34
8 701 39 39
8 800 40 39
8 802 41 39

How to group result of a group by in the same column (Oracle)

I have a query that results in "RESULT 1"
This is a GROUP BY in X, Y, Z and W and a SUM in VALUE. but I want the W / VALUE pair grouped in the same column like the result "RESULT 2".
Is there an efficient way to do that in Oracle?
RESULT 1:
X Y Z W VALUE
-- ----------------- -------------------------- -------------------- ----------
45 18 1 101 1,12
45 18 1 104 1,12
45 18 1 137 2,58
45 18 1 216 6,06
45 18 1 218 5,9
45 18 1 223 7,08
45 18 1 302 4,86
45 18 1 303 4,68
45 18 11 101 9,38
45 18 11 104 9,38
45 18 11 201 9,38
45 18 13 118 9,21
45 18 13 137 2,69
45 18 13 201 9,38
RESULT 2:
X Y Z W VALOR W VALOR W VALOR W VALOR W VALOR W VALOR W VALOR W VALOR
-- ----------------- -------------------------- -------------------- ----------
45 18 1 101 1,12 104 1,12 137 2,58 216 6,06 218 5,9 223 7,08 302 4,86 303 4,68
45 18 11 101 9,38 104 9,38 201 9,38
45 18 13 118 9,21 137 2,69 201 9,38

Detect scroll is completed with PointerWheelChanged

I detect mouse wheel scroll using PointerWheelChanged event at WinRT. I use PointerPoint.Properties.MouseWheelDelta to detect amount and direction of scroll:
PointerPoint mousePosition = e.GetCurrentPoint(_control);
var delta = mousePosition.Properties.MouseWheelDelta;
Nowadays there are devices which emulate mouse scroll (touchpad or touch mice etc).
They tend to issue tens or hundreds (sic!) PointerWheelChanged events per "scroll". Legacy mouse wheel issues one event per wheel click which has delta of +-120 units.
I need to do some heavy processing as soon as user scrolls to some position.
Is there a way to understand that "new" scroll is complete?
FYI Here is a mouse wheel deltas for a single finger flick with Microsoft TouchMouse (sorry for the amount, I just want to illustrate the problem).
15
15
164
164
304
304
658
658
773
773
887
887
1000
1000
1111
1111
1221
1221
1330
1330
108
108
107
107
106
106
105
105
104
104
103
103
102
102
203
203
100
100
99
99
98
98
97
97
96
96
95
95
94
94
93
93
92
92
91
91
90
90
89
89
88
88
88
88
87
87
86
86
85
85
84
84
83
83
82
82
82
82
81
81
80
80
79
79
78
78
78
78
77
77
76
76
75
75
75
75
74
74
73
73
72
72
72
72
71
71
70
70
70
70
69
69
68
68
67
67
67
67
66
66
65
65
65
65
64
64
63
63
63
63
62
62
62
62
61
61
60
60
60
60
59
59
59
59
58
58
57
57
57
57
56
56
56
56
55
55
55
55
54
54
54
54
53
53
52
52
52
52
51
51
51
51
50
50
50
50
49
49
49
49
48
48
48
48
47
47
47
47
46
46
46
46
46
46
45
45
45
45
44
44
44
44
43
43
43
43
42
42
42
42
42
42
41
41
41
41
40
40
40
40
40
40
39
39
39
39
38
38
38
38
38
38
37
37
37
37
37
37
36
36
36
36
35
35
35
35
35
35
34
34
34
34
34
34
33
33
33
33
33
33
32
32
32
32
32
32
31
31
31
31
31
31
30
30
30
30
30
30
30
30
29
29
29
29
29
29
28
28
28
28
28
28
28
28
27
27
27
27
27
27
26
26
26
26
26
26
26
26
25
25
25
25
25
25
25
25
24
24
24
24
24
24
24
24
23
23
23
23
23
23
23
23
23
23
22
22
22
22
22
22
22
22
21
21
21
21
21
21
21
21
21
21
20
20
20
20
20
20
20
20
20
20
19
19
19
19
19
19
19
19
19
19
18
18
18
18
18
18
18
18
18
18
18
18
17
17
17
17
17
17
17
17
17
17
17
17
16
16
16
16
16
16
16
16
16
16
16
16
15
15
15
15
15
15
15
15
15
15
15
15
14
14
14
14
14
14
14
14
14
14
14
14
14
14
14
14
13
13
13
13
13
13
13
13
13
13
13
13
13
13
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
15
15
22
22
7
7
7
7
14
14
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
8
8
12
12
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
9
9
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
EDIT:
Now I do this hack but it is far from perfect
// interval between mouse deltas
private readonly TimeSpan _wheelDeltaThrottleInterval = TimeSpan.FromMilliseconds(8);
// interval to wait until scroll is complete
private readonly TimeSpan _wheelDeltaCompleteInterval = TimeSpan.FromMilliseconds(600);
// create smart wheel handler
IObservable<PointerPoint> pointerWheelObservable =
System.Reactive.Linq.Observable
.FromEventPattern<PointerEventHandler, PointerRoutedEventArgs>(
handler => _control.PointerWheelChanged += handler,
handler => _control.PointerWheelChanged -= handler)
.Select(eventPattern =>
{
PointerRoutedEventArgs e = eventPattern.EventArgs;
PointerPoint mousePosition = e.GetCurrentPoint(_control);
return mousePosition;
})
.Where(mousePosition => Math.Abs(mousePosition.Properties.MouseWheelDelta) > MouseWheelDeltaThreshold);
// subscribe to wheel changes
pointerWheelObservable
.Throttle(_wheelDeltaThrottleInterval)
.ObserveOnDispatcher()
.Subscribe(
OnPointerWheelChanged,
Logger.TrackException);
pointerWheelObservable
.Throttle(_wheelDeltaCompleteInterval)
.Subscribe(
OnPointerWheelCompleted,
Logger.TrackException);
EDIT2 GestureRecognizer class does not help
See this great blog post regarding Windws 8 manipulations handling.
http://blogs.msdn.com/b/windowsappdev/archive/2012/07/02/modernizing-input-in-windows-8.aspx
Unfortunately after my experiments I see GestureRecognizer is not able to detect mouse wheel events flood is over. It fires ManipulationCompleted event after each call of .ProcessMouseWheelEvent()
You can use Reactive Extension library and throttle on the WheelChangedEvent, that way you would always get the last notification for the specified throttle time period
Use GestureRecognizer for a better low level detection of manipulations including mouse whell.
All inputs (mouse, touch, pen, etc.) are included here and supported better than traditional manipulation events. (they don't support single touch rotation, mouse scroller, etc.)
http://code.msdn.microsoft.com/windowsapps/Input-Windows-8-gestures-62c6689b#content
This is much more efficient, flexible and safer than implementing everything from scratch.