Pandas : How can I assign group number according to specific value? - pandas

DataFrame
pd.DataFrame({'a': range(20)})
>>
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
Expected result:
a group_num
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10
What I want to do is to assign group number, from 1 to 9, according to its value.
The idea is to sort these values and split them into 10 groups and assign from 1 to 9 to each group.
But have no idea how to implement it in Pandas
Need your helps

I believe need qcut for evenly sized bins:
df['b'] = pd.qcut(df['a'], 10, labels=range(1, 11))
print (df)
a b
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10

And if you wanted to create groups of 2 you can use this:
df['b'] = df['a'].floordiv(2)+1

You can using //
df['G']=df.a//2+1
df
Out[609]:
a G
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10

Related

How can I add rows iteratively to a select result set in pl sql?

In the work_order table there is wo_no. When I query the work_order table I want 2 additional columns (Task_no, Task_step_no) in the results set as follows
this should be iterate for all the wo_no s in the work_order table. task_no should go up to 5 and task_step_no should go upto 2000. (please have a look on the attached image to see the results set if not clear)
Any idea how to get such a results set in plsql?
One option is to use 2 row generators cross joined to your current table.
SQL> with
2 work_order (wo_no) as
3 (select 1 from dual union all
4 select 2 from dual
5 ),
6 task (task_no) as
7 (select level from dual connect by level <= 5),
8 step (task_step_no) as
9 (select level from dual connect by level <= 20) --> you'd have 2000 here
10 select y.wo_no, t.task_no, s.task_step_no
11 from work_order y cross join task t cross join step s
12 order by 1, 2, 3;
Result:
WO_NO TASK_NO TASK_STEP_NO
---------- ---------- ------------
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 1 6
1 1 7
1 1 8
1 1 9
1 1 10
1 1 11
1 1 12
1 1 13
1 1 14
1 1 15
1 1 16
1 1 17
1 1 18
1 1 19
1 1 20
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 2 8
1 2 9
1 2 10
1 2 11
1 2 12
1 2 13
1 2 14
1 2 15
1 2 16
1 2 17
1 2 18
1 2 19
1 2 20
1 3 1
1 3 2
1 3 3
1 3 4
1 3 5
1 3 6
1 3 7
1 3 8
1 3 9
1 3 10
1 3 11
1 3 12
1 3 13
1 3 14
1 3 15
1 3 16
1 3 17
1 3 18
1 3 19
1 3 20
1 4 1
1 4 2
1 4 3
1 4 4
1 4 5
1 4 6
1 4 7
1 4 8
1 4 9
1 4 10
1 4 11
1 4 12
1 4 13
1 4 14
1 4 15
1 4 16
1 4 17
1 4 18
1 4 19
1 4 20
1 5 1
1 5 2
1 5 3
1 5 4
1 5 5
1 5 6
1 5 7
1 5 8
1 5 9
1 5 10
1 5 11
1 5 12
1 5 13
1 5 14
1 5 15
1 5 16
1 5 17
1 5 18
1 5 19
1 5 20
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
2 1 6
2 1 7
2 1 8
2 1 9
2 1 10
2 1 11
2 1 12
2 1 13
2 1 14
2 1 15
2 1 16
2 1 17
2 1 18
2 1 19
2 1 20
2 2 1
2 2 2
2 2 3
2 2 4
2 2 5
2 2 6
2 2 7
2 2 8
2 2 9
2 2 10
2 2 11
2 2 12
2 2 13
2 2 14
2 2 15
2 2 16
2 2 17
2 2 18
2 2 19
2 2 20
2 3 1
2 3 2
2 3 3
2 3 4
2 3 5
2 3 6
2 3 7
2 3 8
2 3 9
2 3 10
2 3 11
2 3 12
2 3 13
2 3 14
2 3 15
2 3 16
2 3 17
2 3 18
2 3 19
2 3 20
2 4 1
2 4 2
2 4 3
2 4 4
2 4 5
2 4 6
2 4 7
2 4 8
2 4 9
2 4 10
2 4 11
2 4 12
2 4 13
2 4 14
2 4 15
2 4 16
2 4 17
2 4 18
2 4 19
2 4 20
2 5 1
2 5 2
2 5 3
2 5 4
2 5 5
2 5 6
2 5 7
2 5 8
2 5 9
2 5 10
2 5 11
2 5 12
2 5 13
2 5 14
2 5 15
2 5 16
2 5 17
2 5 18
2 5 19
2 5 20
200 rows selected.
SQL>
As you already have the work_order table, you'd just use it in FROM clause (not as a CTE):
with
task (task_no) as
(select level from dual connect by level <= 5),
step (task_step_no) as
(select level from dual connect by level <= 20)
select y.wo_no, t.task_no, s.task_step_no
from work_order y cross join task t cross join step s
order by 1, 2, 3;

repeatrows based on second frame

I would like to ask for your support. I tried many things, without success.
Suppose you have two different frames, a long frame (LF) (high number of rows) and a short frame (SF) (low number of rows), see example
SF=pd.DataFrame({"col1":[1,2,3],"col2":[4,5,6]})
LF=pd.DataFrame({"col_long":[1,2,3,4,5,6,7,8,9,10,11]})
I need to loop through the values of a specific column from the short frame, let's say we take "Test col2" and concat along axis 1 both frames. I have a solution which works, like this:
EMPTY_FRAME=pd.DataFrame()
SF=pd.DataFrame({"col1":[1,2,3],"col2":[4,5,6]})
LF=pd.DataFrame({"col_long":[1,2,3,4,5,6,7,8,9,10,11]})
for i in range(len(SF.index)):
LF["col1"]=SF["col1"].values[i]
LF["col2"]=SF["col2"].values[i]
EMPTY_FRAME=EMPTY_FRAME.append(LF)
LF= col_long col1 col2
0 1 1 4
1 2 1 4
2 3 1 4
3 4 1 4
4 5 1 4
5 6 1 4
6 7 1 4
7 8 1 4
8 9 1 4
9 10 1 4
10 11 1 4
0 1 2 5
1 2 2 5
2 3 2 5
3 4 2 5
4 5 2 5
5 6 2 5
6 7 2 5
7 8 2 5
8 9 2 5
9 10 2 5
10 11 2 5
0 1 3 6
1 2 3 6
2 3 3 6
3 4 3 6
4 5 3 6
5 6 3 6
6 7 3 6
7 8 3 6
8 9 3 6
9 10 3 6
10 11 3 6
but gets pretty confusing since I have many columns inside the SF and thus I might forget some columns. So the question: Is there any chance have the following solution in a better and shorter way?
I really would be grateful if you guys have an idea how I could further improve my code
you can cross join with reindex to retain the order:
out = (SF.assign(k=1).merge(LF.assign(k=1),on='k').drop('k',1)
.reindex(columns=LF.columns.union(SF.columns,sort=False)))
out.index = out['col_long'].factorize()[0] #if required
print(out)
col_long col1 col2
0 1 1 4
1 2 1 4
2 3 1 4
3 4 1 4
4 5 1 4
5 6 1 4
6 7 1 4
7 8 1 4
8 9 1 4
9 10 1 4
10 11 1 4
0 1 2 5
1 2 2 5
2 3 2 5
3 4 2 5
4 5 2 5
5 6 2 5
6 7 2 5
7 8 2 5
8 9 2 5
9 10 2 5
10 11 2 5
0 1 3 6
1 2 3 6
2 3 3 6
3 4 3 6
4 5 3 6
5 6 3 6
6 7 3 6
7 8 3 6
8 9 3 6
9 10 3 6
10 11 3 6

re-arrange and plot data pandas

I have a data frame like the following:
days movements count
0 0 0 2777
1 0 1 51
2 0 2 2
3 1 0 6279
4 1 1 200
5 1 2 7
6 1 3 3
7 2 0 5609
8 2 1 110
9 2 2 32
10 2 3 4
11 3 0 4109
12 3 1 118
13 3 2 101
14 3 3 8
15 3 4 3
16 3 6 1
17 4 0 3034
18 4 1 129
19 4 2 109
20 4 3 6
21 4 4 2
22 4 5 2
23 5 0 2288
24 5 1 131
25 5 2 131
26 5 3 9
27 5 4 2
28 5 5 1
29 6 0 1918
30 6 1 139
31 6 2 109
32 6 3 13
33 6 4 1
34 6 5 1
35 7 0 1442
36 7 1 109
37 7 2 153
38 7 3 13
39 7 4 10
40 7 5 1
41 8 0 1085
42 8 1 76
43 8 2 111
44 8 3 13
45 8 4 7
46 8 7 1
47 9 0 845
48 9 1 81
49 9 2 86
50 9 3 8
51 9 4 8
52 10 0 646
53 10 1 70
54 10 2 83
55 10 3 1
56 10 4 2
57 10 5 1
58 10 6 1
This shows that for example on day 0, I have 2777 entries with 0 movements, 51 entries with 1 movement, 2 entries with 2 movements. I want to plot it as bar graph for every day and show the entries count for all movements. In order to do it, I thought I would transform the data to something like below and then plot a bar graph.
days 0 1 2 3 4 5 6 7
0 2777 51 2
1 6279 200 7 3
2 5609 110 32 4
3 4109 118 101 8 3
4 3034 129 109 6 2 2
5 2288 131 131 9 2 1
6 1918 139 109 13 1 1
7 1442 109 153 13 10 1
8 1085 76 111 13 7 1
9 845 81 86 8 8
10 646 70 83 1 2 1 1
I am not getting an idea of how should I achieve this? I have thousands of lines of data so doing it by hand does not make sense. Can someone guide me how to rearrange the data or if there is a quick way to plot the bar graph using matplotlib straight from the actual data frame that would be even better. Thanks for the help.

Pandas DataFrame reshaping to xyz format

If I've a dataframe like:
11-May 12-May 13-May 14-May Distance
0 1 6 11 16 10
1 2 7 12 17 20
2 3 8 13 18 30
3 4 9 14 19 40
4 5 10 15 20 50
And, I want to reorganize to only three columns like:
Distance variable value
0 10 11-May 1
1 20 11-May 2
2 30 11-May 3
3 40 11-May 4
4 50 11-May 5
5 10 12-May 6
6 20 12-May 7
7 30 12-May 8
8 40 12-May 9
9 50 12-May 10
10 10 13-May 11
11 20 13-May 12
12 30 13-May 13
13 40 13-May 14
14 50 13-May 15
15 10 14-May 16
16 20 14-May 17
17 30 14-May 18
18 40 14-May 19
19 50 14-May 20
How would I be able to do this in Pandas?
One way using unstack
In [1940]: dff.set_index('Distance').unstack().reset_index()
Out[1940]:
level_0 Distance 0
0 11-May 10 1
1 11-May 20 2
2 11-May 30 3
3 11-May 40 4
4 11-May 50 5
5 12-May 10 6
6 12-May 20 7
7 12-May 30 8
8 12-May 40 9
9 12-May 50 10
10 13-May 10 11
11 13-May 20 12
12 13-May 30 13
13 13-May 40 14
14 13-May 50 15
15 14-May 10 16
16 14-May 20 17
17 14-May 30 18
18 14-May 40 19
19 14-May 50 20
Another way using melt
In [1943]: dff.melt('Distance')
Out[1943]:
Distance variable value
0 10 11-May 1
1 20 11-May 2
2 30 11-May 3
3 40 11-May 4
4 50 11-May 5
5 10 12-May 6
6 20 12-May 7
7 30 12-May 8
8 40 12-May 9
9 50 12-May 10
10 10 13-May 11
11 20 13-May 12
12 30 13-May 13
13 40 13-May 14
14 50 13-May 15
15 10 14-May 16
16 20 14-May 17
17 30 14-May 18
18 40 14-May 19
19 50 14-May 20
Where,
In [1941]: dff
Out[1941]:
11-May 12-May 13-May 14-May Distance
0 1 6 11 16 10
1 2 7 12 17 20
2 3 8 13 18 30
3 4 9 14 19 40
4 5 10 15 20 50

SQL Existing Column Conditional Update Query

I have this data
AnsID QuesID AnsOrder
-----------------------
1 5 NULL
2 5 NULL
3 5 NULL
4 5 NULL
5 5 NULL
6 3 NULL
7 3 NULL
8 3 NULL
9 3 NULL
10 3 NULL
11 4 NULL
12 4 NULL
13 4 NULL
14 4 NULL
15 4 NULL
16 7 NULL
17 9 NULL
18 9 NULL
19 9 NULL
20 9 NULL
21 8 NULL
22 8 NULL
23 8 NULL
24 8 NULL
Want to UPDATE it into this format
AnsID QuesID AnsOrder
-----------------------
1 5 1
2 5 2
3 5 3
4 5 4
5 5 5
6 3 1
7 3 2
8 3 3
9 3 4
10 3 5
11 4 1
12 4 2
13 4 3
14 4 4
15 4 5
16 7 1
17 9 1
18 9 2
19 9 3
20 9 4
21 8 1
22 8 2
23 8 3
24 8 4
Basicaly I want to update AnsOrder column in ascending order according to QuesID column,
like this for more readability.
AnsID QuesID AnsOrder
-----------------------
1 5 1
2 5 2
3 5 3
4 5 4
5 5 5
6 3 1
7 3 2
8 3 3
9 3 4
10 3 5
11 4 1
12 4 2
13 4 3
14 4 4
15 4 5
16 7 1
17 9 1
18 9 2
19 9 3
20 9 4
21 8 1
22 8 2
23 8 3
24 8 4
You might generate row_numbers by quesID and assign them to AnsOrder like this:
; with ord as (
select *,
row_number() over (partition by quesID
order by AnsID) rn
from table1
)
update ord
set ansorder = rn
I've ordered by AnsID for consistency.
Check this # Sql Fiddle.