How to use pd.cut in pandas - pandas

Can anyone help me figure out why this isn't working:
ages = ['15-19','20-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-64','65-69','70-74','75-79','80-84']
race['age_group'] = pd.cut(race.Age,range(13,84,5),right=False, labels=ages)
race[['Age','age_group']].head(15)
This is the result I get:
Age age_group
0 31 30-34
1 38 40-44
2 45 45-49
3 30 30-34
4 45 45-49
5 35 35-39
6 32 30-34
7 33 35-39
8 29 30-34
9 42 40-44
10 34 35-39
11 48 50-54
12 35 35-39
13 51 50-54
14 38 40-44

Your "range" is not correct, try:
ages = ['15-19','20-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-64','65-69','70-74','75-79','80-84']
race['age_group'] = pd.cut(race.Age,range(15,86,5),right=False, labels=ages)
race[['Age','age_group']].head(15)

Related

How can i get a aggregate sum of Average number of product given between two weeks and output for Each week as shown below in Pandas?

StartWeek
End Week
Numberof Week
Number of Product
Avg number of product per week
39
41
3
99
33
40
45
5
150
30
40
42
3
60
20
39
40
2
40
20
39
41
3
99
33
So that the output looks like --
Week
Sum Average Product per week
39
86
40
70
41
66
42
20
45
30
First for each row we create a list of weeks that it applies to, and put in column 'weeks'
df['weeks'] = df.apply(lambda r: np.arange(r['StartWeek'], r['EndWeek']+1),axis=1)
df looks like this
StartWeek EndWeek NumberofWeek NumberofProduct Av weeks
-- ----------- --------- -------------- ----------------- ---- -------------------
0 39 41 3 99 33 [39 40 41]
1 40 45 5 150 30 [40 41 42 43 44 45]
2 40 42 3 60 20 [40 41 42]
3 39 40 2 40 20 [39 40]
4 39 41 3 99 33 [39 40 41]
Then we explode weeks which duplicates each row for each week it is applied to, and then aggregate by the exploded week and sum:
df.explode('weeks').groupby('weeks', as_index = False)['Av'].sum()
output:
weeks Av
-- ------- ----
0 39 86
1 40 136
2 41 116
3 42 50
4 43 30
5 44 30
6 45 30
you can use the group by method in python
df=df.groupby(["StartWeek"])["Avg number of product per week"].sum()

How to use while loop in R to generate a matrix with specific number?(for->while)

I have generated a matrix by using the following for loop.
And now I am trying to generate a same matrix using while loop but don't know how to do so.
Can anyone help with this? Thank you so much.
a<-matrix(0, ncol=9, nrow=9)
for(i in 1:9) {
for(j in 1:9) {
a[i,j]<-i*j
}
}
a
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 2 4 6 8 10 12 14 16 18
[3,] 3 6 9 12 15 18 21 24 27
[4,] 4 8 12 16 20 24 28 32 36
[5,] 5 10 15 20 25 30 35 40 45
[6,] 6 12 18 24 30 36 42 48 54
[7,] 7 14 21 28 35 42 49 56 63
[8,] 8 16 24 32 40 48 56 64 72
[9,] 9 18 27 36 45 54 63 72 81
i<-1
j<-1
a<-matrix(0, ncol=9, nrow=9)
while (i<=9) {
+ while (j<=9) {
+ a[i,j]<-I*j
+ j<-j+1
+ }
+ i<-i+1
+ j<-1
+ }

SQL Server : create new column category price according to price column

I have a SQL Server table with a column price looking like this:
10
96
64
38
32
103
74
32
67
103
55
28
30
110
79
91
16
71
36
106
89
87
59
41
56
89
68
32
80
47
45
77
64
93
17
88
13
19
83
12
76
99
104
65
83
95
Now my aim is to create a new column giving a category from 1 to 10 to each of those values.
For instance the max value in my column is 110 the min is 10. Max-min = 100. Then if I want to have 10 categories I do 100/10= 10. Therefore here are the ranges:
10-20 1
21-30 2
31-40 3
41-50 4
51-60 5
61-70 6
71-80 7
81-90 8
91-100 9
101-110 10
Desired output:
my new column called cat should look like this:
price cat
-----------------
10 1
96 9
64 6
38 3
32 3
103 10
74 7
32 3
67 6
103 10
55 5
28 2
30 3
110 10
79 7
91 9
16 1
71 7
36 3
106 10
89 8
87 8
59 5
41 4
56 5
89 8
68 6
32 3
80 7
47 4
45 4
77 7
64 6
93 9
17 1
88 8
13 1
19 1
83 8
12 1
76 7
99 9
104 10
65 6
83 8
95 9
Is there a way to perform this with T-SQL? Sorry if this question is maybe too easy. I searched long time on the web. So either the problem is not as simple as I imagine. Either I entered the wrong keywords.
Yes, almost exactly as you describe the calculation:
select price,
1 + (price - min_price) * 10 / (max_price - min_price + 1) as decile
from (select price,
min(price) over () as min_price,
max(price) over () as max_price
from t
) t;
The 1 + is because you want the values from 1 to 10, rather than 0 to 9.
Yes - a case statement can do that.
select
price
,case
when price between 10 and 20 then 1
when price between 21 and 30 then 2
when price between 31 and 40 then 3
when price between 41 and 50 then 4
when price between 51 and 60 then 5
when price between 61 and 70 then 6
when price between 71 and 80 then 7
when price between 81 and 90 then 8
when price between 91 and 100 then 9
when price between 101 and 110 then 10
else null
end as cat
from [<enter your table name here>]

How to divide a result set into equal parts?

I have a table new_table
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 ?
2 19 43 58 ?
3 29 43 58 ?
4 31 43 58 ?
5 35 43 58 ?
6 37 43 58 ?
7 38 43 58 ?
8 39 43 58 ?
9 58 43 58 ?
10 79 43 58 ?
How I can select all proc_ids and update new_staff, for example
ID PROC_ID DEP_ID OLD_STAFF NEW_STAFF
1 15 43 58 15
2 19 43 58 15
3 29 43 58 15
4 31 43 58 15
5 35 43 58 23
6 37 43 58 23
7 38 43 58 23
8 39 43 58 28
9 58 43 58 28
10 79 43 58 28
15 - 4(proc_id)
23 - 3(proc_id)
28 - 3(proc_id)
58 - is busi
where 15, 23, 28 and 58 staffs in one dep
"how to divide equal parts"
Oracle has a function, ntile() which splits a result set into equal buckets. For instance this query puts your posted data into four buckets:
SQL> select id
2 , proc_id
3 , ntile(4) over (order by id asc) as gen_staff
4 from new_table;
ID PROC_ID GEN_STAFF
---------- ---------- ----------
1 15 1
2 19 1
3 29 1
4 31 2
5 35 2
6 37 2
7 38 3
8 39 3
9 58 4
10 79 4
10 rows selected.
SQL>
This isn't quite the solution you want but you need to clarify your requirements before it's possible to provide a complete answer.
update new_table
set new_staff='15'
where ID in('1','2','3','4')
update new_table
set new_staff='28'
where ID in('8','9','10')
update new_table
set new_staff='23'
where ID in('5','6','7')
Not sure if this is what you mean.

Pandas, getting mean and sum with groupby

I have a data frame, df, which looks like this:
index New Old Map Limit count
1 93 35 54 > 18 1
2 163 93 116 > 18 1
3 134 78 96 > 18 1
4 117 81 93 > 18 1
5 194 108 136 > 18 1
6 125 57 79 <= 18 1
7 66 39 48 > 18 1
8 120 83 95 > 18 1
9 150 98 115 > 18 1
10 149 99 115 > 18 1
11 148 85 106 > 18 1
12 92 55 67 <= 18 1
13 64 24 37 > 18 1
14 84 53 63 > 18 1
15 99 70 79 > 18 1
I need to produce a data frame that looks like this:
Limit <=18 >18
total mean total mean
New xx1 yy1 aa1 bb1
Old xx2 yy2 aa2 bb2
MAP xx3 yy3 aa3 bb3
I tried this without success:
df.groupby('Limit')['New', 'Old', 'MAP'].[sum(), mean()].T without success.
How can I achieve this in pandas?
You can use groupby with agg, then transpose by T and unstack:
print (df[['New', 'Old', 'Map', 'Limit']].groupby('Limit').agg([sum, 'mean']).T.unstack())
Limit <= 18 > 18
sum mean sum mean
New 217.0 108.5 1581.0 121.615385
Old 112.0 56.0 946.0 72.769231
Map 146.0 73.0 1153.0 88.692308
I edit by comment, it looks nicer:
print (df.groupby('Limit')['New', 'Old', 'Map', 'Limit'].agg([sum, 'mean']).T.unstack())
And if need total columns:
print (df.groupby('Limit')['New', 'Old', 'Map', 'Limit']
.agg({'total':sum, 'mean': 'mean'})
.T
.unstack(0))
Limit <= 18 > 18
total mean total mean
New 217.0 108.5 1581.0 121.615385
Old 112.0 56.0 946.0 72.769231
Map 146.0 73.0 1153.0 88.692308