pandas element wise conditional return index - pandas

I want to find abnormal values and replace them with corresponding day of next week.
year week day v1 v2
2001 1 1 46 9999
2001 1 2 60 9335
2001 1 3 9999 9318
2001 1 4 47 9999
2001 1 5 57 9373
2001 1 6 9999 9384
2001 1 7 72 9444
2001 2 1 75 73
2001 2 2 74 63
2001 2 3 79 377
2001 2 4 70 361
2001 2 5 75 73
2001 2 6 77 64
2001 2 7 76 57
I could carry out column by column,code as follows:
index_row=df[df['v1']==9999].index
for i in index_row:
df['v1'][i]=df['v1'][i+7] # i+7 is the index of next week
How to element-wise the whole dataframe? Such as pd.applymap.
How get the columns number(name) and row number base on conditional seiving values?
The target df I want as follows:
( * indicated modified values and the next week values)
year week day v1 v2
2001 1 1 46 *73
2001 1 2 60 9335
2001 1 3 *79 9318
2001 1 4 47 *361
2001 1 5 57 9373
2001 1 6 *77 9384
2001 1 7 72 9444
2001 2 1 75 *73
2001 2 2 74 63
2001 2 3 *79 377
2001 2 4 70 *361
2001 2 5 75 73
2001 2 6 *77 64
2001 2 7 76 57

create d1 with set_index on columns ['year', 'week', 'day']
create d2 with same index as d1 except, subtract 1 from week
mask with other
cols = ['year', 'week', 'day']
d1 = df.set_index(cols)
d2 = df.assign(week=df.week - 1).set_index(cols)
d1.mask(d1.eq(9999), d2).reset_index()
year week day v1 v2
0 2001 1 1 46 73
1 2001 1 2 60 9335
2 2001 1 3 79 9318
3 2001 1 4 47 361
4 2001 1 5 57 9373
5 2001 1 6 77 9384
6 2001 1 7 72 9444
7 2001 2 1 75 73
8 2001 2 2 74 63
9 2001 2 3 79 377
10 2001 2 4 70 361
11 2001 2 5 75 73
12 2001 2 6 77 64
13 2001 2 7 76 57
old answer
One approach is to setup d1 with index of ['year', 'week', 'day'] and manipulate that to shift a week. Then mask it for equal to 9999 and fillna
d1 = df.set_index(['year', 'week', 'day'])
s1 = d1.unstack(['year', 'day']).shift(-1).stack(['year', 'day']).swaplevel(0, 1)
d1.mask(d1==9999).fillna(s1).reset_index()
year week day v1 v2
0 2001 1 1 46.0 73.0
1 2001 1 2 60.0 9335.0
2 2001 1 3 79.0 9318.0
3 2001 1 4 47.0 361.0
4 2001 1 5 57.0 9373.0
5 2001 1 6 77.0 9384.0
6 2001 1 7 72.0 9444.0
7 2001 2 1 75.0 73.0
8 2001 2 2 74.0 63.0
9 2001 2 3 79.0 377.0
10 2001 2 4 70.0 361.0
11 2001 2 5 75.0 73.0
12 2001 2 6 77.0 64.0
13 2001 2 7 76.0 57.0

You can working with DatetimeIndex, set value by mask with shifted rows:
a = df['year'].astype(str).add('-').add(df['week'].astype(str))
.add('-').add(df['day'].sub(1).astype(str))
#http://strftime.org/
df.index = pd.to_datetime(a, format='%Y-%U-%w')
df2 = df.shift(-1,freq='7D')
df = df.mask(df.eq(9999), df2).reset_index(drop=True)
print (df)
year week day v1 v2
0 2001 1 1 46 73
1 2001 1 2 60 9335
2 2001 1 3 79 9318
3 2001 1 4 47 361
4 2001 1 5 57 9373
5 2001 1 6 77 9384
6 2001 1 7 72 9444
7 2001 2 1 75 73
8 2001 2 2 74 63
9 2001 2 3 79 377
10 2001 2 4 70 361
11 2001 2 5 75 73
12 2001 2 6 77 64
13 2001 2 7 76 57

Related

re-arrange and plot data pandas

I have a data frame like the following:
days movements count
0 0 0 2777
1 0 1 51
2 0 2 2
3 1 0 6279
4 1 1 200
5 1 2 7
6 1 3 3
7 2 0 5609
8 2 1 110
9 2 2 32
10 2 3 4
11 3 0 4109
12 3 1 118
13 3 2 101
14 3 3 8
15 3 4 3
16 3 6 1
17 4 0 3034
18 4 1 129
19 4 2 109
20 4 3 6
21 4 4 2
22 4 5 2
23 5 0 2288
24 5 1 131
25 5 2 131
26 5 3 9
27 5 4 2
28 5 5 1
29 6 0 1918
30 6 1 139
31 6 2 109
32 6 3 13
33 6 4 1
34 6 5 1
35 7 0 1442
36 7 1 109
37 7 2 153
38 7 3 13
39 7 4 10
40 7 5 1
41 8 0 1085
42 8 1 76
43 8 2 111
44 8 3 13
45 8 4 7
46 8 7 1
47 9 0 845
48 9 1 81
49 9 2 86
50 9 3 8
51 9 4 8
52 10 0 646
53 10 1 70
54 10 2 83
55 10 3 1
56 10 4 2
57 10 5 1
58 10 6 1
This shows that for example on day 0, I have 2777 entries with 0 movements, 51 entries with 1 movement, 2 entries with 2 movements. I want to plot it as bar graph for every day and show the entries count for all movements. In order to do it, I thought I would transform the data to something like below and then plot a bar graph.
days 0 1 2 3 4 5 6 7
0 2777 51 2
1 6279 200 7 3
2 5609 110 32 4
3 4109 118 101 8 3
4 3034 129 109 6 2 2
5 2288 131 131 9 2 1
6 1918 139 109 13 1 1
7 1442 109 153 13 10 1
8 1085 76 111 13 7 1
9 845 81 86 8 8
10 646 70 83 1 2 1 1
I am not getting an idea of how should I achieve this? I have thousands of lines of data so doing it by hand does not make sense. Can someone guide me how to rearrange the data or if there is a quick way to plot the bar graph using matplotlib straight from the actual data frame that would be even better. Thanks for the help.

how to encode only categorical data in a dataframe

enter image description here
how to encode only categorical data in a data frame
Income Length of Residence Median House Value Number of Vehicles Percentage Asian Percentage Black Percentage English Speaking Percentage Hispanic Percentage White MakeDescr SeriesDescr Msrp
1 90000 15.0 F 4 1 1 71 6 81 HYUNDAI Sonata-4 Cyl. 19395.0
2 125000 7.0 H 1 11 1 91 1 81 JEEP Grand Cherokee-V6 29135.0
3 90000 8.0 F 1 1 1 71 6 86 JEEP Liberty 20700.0
4 125000 8.0 F 3 1 1 86 6 86 VOLKSWAGEN Passat-V6 28750.0
5 90000 8.0 F 1 1 1 71 6 81 JEEP Wrangler 20210.0
6 110000 7.0 G 5 6 6 71 6 76 HYUNDAI Santa Fe-V6 25645.0
7 110000 7.0 G 3 11 6 71 6 71 HYUNDAI Sonata-4 Cyl. 15999.0
8 125000 8.0 G 1 1 11 81 6 76 HYUNDAI Santa Fe-V6 23645.0
9 125000 9.0 G 1 6 1 91 1 86 CHEVROLET TRUCK Trailblazer EXT 32040.0
10 110000 8.0 E 2 6 46 81 16 26 JEEP Wrangler-V6 18660.0
11 125000 11.0 G 3 6 1 76 1 86 CHEVROLET TRUCK Silverado 2500 HD 31775.0
12 125000 12.0 G 2 11 6 66 1 71 CHEVROLET Cobalt 13675.0
13 125000 13.0 G 2 1 16 95 6 71 HYUNDAI Veracruz-V6 28600.0
15 110000 11.0 F 5 6 41 61 11 41 HYUNDAI Santa Fe 22499.0
16 125000 9.0 F 2 1 6 91 1 81 HYUNDAI Santa Fe 22499.0
17 125000 8.0 G 2 11 11 66 1 66 MITSUBISHI Endeavor-V6 32602.0
18 110000 12.0 E 1 6 46 81 16 26 HYUNDAI Accent-4 Cyl. 10899.0
19 90000 9.0 F 4 1 6 71 6 81 JEEP Grand Cherokee-6 Cyl. 29080.0
21 125000 8.0 G 1 6 1 76 1 86 MITSUBISHI Endeavor-V6 29302.0
22 110000 12.0 F 2 6 26 66 11 51 HYUNDAI Santa Fe 22499.0
23 90000 9.0 F 1 6 6 66 6 76 HYUNDAI Santa Fe-V6 20995.0
24 125000 9.0 H 1 6 1 91 1 81 HYUNDAI Sonata-V6 18799.0
25 90000 14.0 F 2 1 6 71 11 81 HYUNDAI Elantra-4 Cyl. 13299.0
26 125000 9.0 G 3 1 11 81 6 76 JEEP Grand Cherokee-6 Cyl. 29080.0
27 125000 8.0 H 5 6 1 91 1 81 CHEVROLET TRUCK Trailblazer 29395.0
28 110000 12.0 E 4 6 41 61 11 36 HYUNDAI Sonata-4 Cyl. 15999.0
29 110000 10.0 E 1 6 41 61 11 36 HYUNDAI Santa Fe-V6 20995.0
30 125000 10.0 F 2 6 1 71 6 86 CHEVROLET TRUCK Tahoe 37000.0
32 90000 10.0 F 1 1 1 71 6 86 MITSUBISHI Galant-V6 19997.0
33 125000 12.0 F 1 1 1 86 6 86 CHEVROLET TRUCK Trailblazer 28175.0
... ... ... ... ... ... ... ... ... ... ... ... ...
4451 110000 9.0 F 3 6 41 61 11 36 NISSAN Sentra-4 Cyl. 17990.0
4452 125000 11.0 G 2 1 11 81 6 76 CHEVROLET TRUCK Tahoe 39515.0
4453 125000 8.0 H 1 6 1 91 1 81 HYUNDAI Elantra-4 Cyl. 15195.0
4454 110000 10.0 F 3 6 41 61 11 41 HYUNDAI Genesis-4 Cyl. 26750.0
4455 125000 7.0 H 4 11 1 76 1 76 HYUNDAI Sonata-4 Cyl. 19695.0
4456 125000 9.0 G 5 6 1 76 1 86 NISSAN Altima 22500.0
4457 110000 11.0 E 1 6 46 81 16 26 GMC LIGHT DUTY Denali 51935.0
4458 125000 6.0 H 1 11 1 76 1 76 JEEP Liberty-V6 24865.0
4459 125000 12.0 G 3 1 16 95 6 71 HONDA Accord-V6 26700.0
4460 125000 7.0 F 1 1 1 86 6 86 HYUNDAI Veloster-4 Cyl. 17300.0
4461 90000 10.0 F 2 6 11 66 6 71 CADILLAC SRX-V6 42210.0
4463 110000 8.0 F 3 6 26 61 11 56 GMC LIGHT DUTY Acadia 42390.0
4468 125000 8.0 G 1 1 1 91 1 86 HONDA Pilot-V6 40820.0
4469 125000 10.0 H 5 11 1 91 1 81 TOYOTA Highlander-V6 30695.0
4470 110000 12.0 F 1 6 41 61 11 41 HYUNDAI Elantra-4 Cyl. 15195.0
4473 110000 13.0 F 1 6 21 66 6 61 ACURA TSX 32910.0
4476 125000 9.0 G 1 6 1 76 1 86 BMW X3 36750.0
4482 125000 10.0 H 1 6 1 91 1 81 SUBARU Forester-4 Cyl. 21195.0
4486 125000 11.0 H 2 6 1 91 1 81 GMC LIGHT DUTY Yukon XL 44315.0
4492 125000 10.0 H 2 6 1 91 1 81 BMW 5 Series 53400.0
4493 110000 12.0 G 2 6 6 71 6 76 ACURA TL 33725.0
4494 125000 12.0 F 3 1 1 86 6 86 ACURA TL 33725.0
4495 125000 12.0 F 3 1 1 86 6 86 ACURA TL 33725.0
4496 125000 7.0 G 5 1 11 81 6 76 ACURA TL 33325.0
4497 125000 9.0 G 1 6 1 76 1 86 ACURA TL 33725.0
4498 125000 12.0 G 3 1 11 81 6 76 ACURA TL 33725.0
4499 110000 14.0 G 8 11 6 71 6 71 ACURA TL 33725.0
4501 125000 9.0 G 3 11 6 66 1 71 FORD Taurus-V6 20050.0
4502 110000 2.0 G 4 11 6 71 6 71 DODGE Stratus-4 Cyl. 15910.0
4503 125000 8.0 F 1 1 1 86 6 86 DODGE Stratus-4 Cyl. 19145.0
# Using standard scikit-learn label encoder.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# Encode all string columns. Assuming all categoricals are of type str.
for c in df.select_dtypes(['object']):
print "Encoding column " + c
df[c] = le.fit_transform(df[c])

Count registers in columns sql

I have a SQL table with below values
dozen1 dozen2 dozen3 dozen4 dozen5 dozen6
----------------------------------------------
10 27 40 46 49 58
2 11 34 37 32 50
3 4 29 36 45 55
14 32 33 36 44 52
20 11 36 38 47 53
1 5 11 16 20 55
2 18 31 42 51 52
5 11 22 24 51 53
1 3 11 17 34 45
I need count the quantity of results by number, sample :
Number 10 appears 1 time
Number 2 appears 2 times
Result:
Dozen Times
--------------
10 1
2 2
....
How do this in a SQL query?
If you want to count the number of occurrences of distinct numbers in all columns:
select dozen, count(*) as times
from my_table
cross join unnest(array[dozen1, dozen2, dozen3, dozen4, dozen5, dozen6]) u(dozen)
group by 1
order by 1;
dozen | times
-------+-------
1 | 2
2 | 2
3 | 2
4 | 1
5 | 2
etc...

Update rank field based on most popular product

Trying to run an update on the following result set:
Row# ProductRankID ProductID ProductCategoryID ProductTypeID Rank Score
1 3 11266 9 80 0 765
2 14 25880 9 80 0 656
3 12 25864 9 80 0 547
4 7 11252 9 80 0 457
5 8 25719 9 80 0 456
6 4 13425 9 80 0 456
7 11 25677 9 80 0 456
8 9 25716 9 80 0 432
9 15 25714 9 80 0 324
10 13 13589 9 80 0 234
11 20 25803 9 80 0 234
12 17 25715 9 80 0 213
13 5 21269 9 80 0 154
14 10 25867 9 80 0 123
15 16 25676 9 80 0 123
16 22 17861 9 80 0 67
17 19 13534 9 80 0 55
18 23 13659 9 80 0 54
19 29 13658 9 80 0 34
20 21 13591 9 80 0 32
21 6 11249 9 80 0 23
22 18 11253 9 80 0 12
23 28 11253 9 87 0 65
24 27 13664 9 87 0 45
25 25 13658 9 87 0 14
26 26 13657 9 87 0 13
27 24 13659 9 87 0 13
28 30 11252 9 87 0 12
29 2 12345 11 80 0 324
I want the "Rank" column to be set 1...2..3..4 etc based on each row. Then on change of the ProductCategoryID + ProductTypeID, I want it to reset to 1...2...3...4 etc.
So the results should look something like:
Row# ProductRankID ProductID ProductCategoryID ProductTypeID Rank Score
1 3 11266 9 80 1 765
2 14 25880 9 80 2 656
3 12 25864 9 80 3 547
4 7 11252 9 80 4 457
5 8 25719 9 80 5 456
6 4 13425 9 80 6 456
7 11 25677 9 80 7 456
8 9 25716 9 80 8 432
9 15 25714 9 80 9 324
10 13 13589 9 80 10 234
11 20 25803 9 80 11 234
12 17 25715 9 80 12 213
13 5 21269 9 80 13 154
23 28 11253 9 87 1 65
24 27 13664 9 87 2 45
25 25 13658 9 87 3 14
26 26 13657 9 87 4 13
27 24 13659 9 87 5 13
28 30 11252 9 87 6 12
29 2 12345 11 80 1 324
Hope that makes some sense?
Thanks,
Richie
If you want a select:
select t.*,
row_number() over (partition by ProductCategoryID, ProductTypeID
order by score desc, productid
) as new_rank
from t;
If you want an update, use a CTE:
with toupdate as (
select t.*,
row_number() over (partition by ProductCategoryID, ProductTypeID
order by score desc, productid
) as new_rank
from t
)
update toupdate
set rank = new_rank;

Detect scroll is completed with PointerWheelChanged

I detect mouse wheel scroll using PointerWheelChanged event at WinRT. I use PointerPoint.Properties.MouseWheelDelta to detect amount and direction of scroll:
PointerPoint mousePosition = e.GetCurrentPoint(_control);
var delta = mousePosition.Properties.MouseWheelDelta;
Nowadays there are devices which emulate mouse scroll (touchpad or touch mice etc).
They tend to issue tens or hundreds (sic!) PointerWheelChanged events per "scroll". Legacy mouse wheel issues one event per wheel click which has delta of +-120 units.
I need to do some heavy processing as soon as user scrolls to some position.
Is there a way to understand that "new" scroll is complete?
FYI Here is a mouse wheel deltas for a single finger flick with Microsoft TouchMouse (sorry for the amount, I just want to illustrate the problem).
15
15
164
164
304
304
658
658
773
773
887
887
1000
1000
1111
1111
1221
1221
1330
1330
108
108
107
107
106
106
105
105
104
104
103
103
102
102
203
203
100
100
99
99
98
98
97
97
96
96
95
95
94
94
93
93
92
92
91
91
90
90
89
89
88
88
88
88
87
87
86
86
85
85
84
84
83
83
82
82
82
82
81
81
80
80
79
79
78
78
78
78
77
77
76
76
75
75
75
75
74
74
73
73
72
72
72
72
71
71
70
70
70
70
69
69
68
68
67
67
67
67
66
66
65
65
65
65
64
64
63
63
63
63
62
62
62
62
61
61
60
60
60
60
59
59
59
59
58
58
57
57
57
57
56
56
56
56
55
55
55
55
54
54
54
54
53
53
52
52
52
52
51
51
51
51
50
50
50
50
49
49
49
49
48
48
48
48
47
47
47
47
46
46
46
46
46
46
45
45
45
45
44
44
44
44
43
43
43
43
42
42
42
42
42
42
41
41
41
41
40
40
40
40
40
40
39
39
39
39
38
38
38
38
38
38
37
37
37
37
37
37
36
36
36
36
35
35
35
35
35
35
34
34
34
34
34
34
33
33
33
33
33
33
32
32
32
32
32
32
31
31
31
31
31
31
30
30
30
30
30
30
30
30
29
29
29
29
29
29
28
28
28
28
28
28
28
28
27
27
27
27
27
27
26
26
26
26
26
26
26
26
25
25
25
25
25
25
25
25
24
24
24
24
24
24
24
24
23
23
23
23
23
23
23
23
23
23
22
22
22
22
22
22
22
22
21
21
21
21
21
21
21
21
21
21
20
20
20
20
20
20
20
20
20
20
19
19
19
19
19
19
19
19
19
19
18
18
18
18
18
18
18
18
18
18
18
18
17
17
17
17
17
17
17
17
17
17
17
17
16
16
16
16
16
16
16
16
16
16
16
16
15
15
15
15
15
15
15
15
15
15
15
15
14
14
14
14
14
14
14
14
14
14
14
14
14
14
14
14
13
13
13
13
13
13
13
13
13
13
13
13
13
13
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
12
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
11
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
8
15
15
22
22
7
7
7
7
14
14
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
8
8
12
12
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
9
9
3
3
3
3
3
3
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
EDIT:
Now I do this hack but it is far from perfect
// interval between mouse deltas
private readonly TimeSpan _wheelDeltaThrottleInterval = TimeSpan.FromMilliseconds(8);
// interval to wait until scroll is complete
private readonly TimeSpan _wheelDeltaCompleteInterval = TimeSpan.FromMilliseconds(600);
// create smart wheel handler
IObservable<PointerPoint> pointerWheelObservable =
System.Reactive.Linq.Observable
.FromEventPattern<PointerEventHandler, PointerRoutedEventArgs>(
handler => _control.PointerWheelChanged += handler,
handler => _control.PointerWheelChanged -= handler)
.Select(eventPattern =>
{
PointerRoutedEventArgs e = eventPattern.EventArgs;
PointerPoint mousePosition = e.GetCurrentPoint(_control);
return mousePosition;
})
.Where(mousePosition => Math.Abs(mousePosition.Properties.MouseWheelDelta) > MouseWheelDeltaThreshold);
// subscribe to wheel changes
pointerWheelObservable
.Throttle(_wheelDeltaThrottleInterval)
.ObserveOnDispatcher()
.Subscribe(
OnPointerWheelChanged,
Logger.TrackException);
pointerWheelObservable
.Throttle(_wheelDeltaCompleteInterval)
.Subscribe(
OnPointerWheelCompleted,
Logger.TrackException);
EDIT2 GestureRecognizer class does not help
See this great blog post regarding Windws 8 manipulations handling.
http://blogs.msdn.com/b/windowsappdev/archive/2012/07/02/modernizing-input-in-windows-8.aspx
Unfortunately after my experiments I see GestureRecognizer is not able to detect mouse wheel events flood is over. It fires ManipulationCompleted event after each call of .ProcessMouseWheelEvent()
You can use Reactive Extension library and throttle on the WheelChangedEvent, that way you would always get the last notification for the specified throttle time period
Use GestureRecognizer for a better low level detection of manipulations including mouse whell.
All inputs (mouse, touch, pen, etc.) are included here and supported better than traditional manipulation events. (they don't support single touch rotation, mouse scroller, etc.)
http://code.msdn.microsoft.com/windowsapps/Input-Windows-8-gestures-62c6689b#content
This is much more efficient, flexible and safer than implementing everything from scratch.