DataFrame
pd.DataFrame({'a': range(20)})
>>
a
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
Expected result:
a group_num
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10
What I want to do is to assign group number, from 1 to 9, according to its value.
The idea is to sort these values and split them into 10 groups and assign from 1 to 9 to each group.
But have no idea how to implement it in Pandas
Need your helps
I believe need qcut for evenly sized bins:
df['b'] = pd.qcut(df['a'], 10, labels=range(1, 11))
print (df)
a b
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10
And if you wanted to create groups of 2 you can use this:
df['b'] = df['a'].floordiv(2)+1
You can using //
df['G']=df.a//2+1
df
Out[609]:
a G
0 0 1
1 1 1
2 2 2
3 3 2
4 4 3
5 5 3
6 6 4
7 7 4
8 8 5
9 9 5
10 10 6
11 11 6
12 12 7
13 13 7
14 14 8
15 15 8
16 16 9
17 17 9
18 18 10
19 19 10
Related
In the work_order table there is wo_no. When I query the work_order table I want 2 additional columns (Task_no, Task_step_no) in the results set as follows
this should be iterate for all the wo_no s in the work_order table. task_no should go up to 5 and task_step_no should go upto 2000. (please have a look on the attached image to see the results set if not clear)
Any idea how to get such a results set in plsql?
One option is to use 2 row generators cross joined to your current table.
SQL> with
2 work_order (wo_no) as
3 (select 1 from dual union all
4 select 2 from dual
5 ),
6 task (task_no) as
7 (select level from dual connect by level <= 5),
8 step (task_step_no) as
9 (select level from dual connect by level <= 20) --> you'd have 2000 here
10 select y.wo_no, t.task_no, s.task_step_no
11 from work_order y cross join task t cross join step s
12 order by 1, 2, 3;
Result:
WO_NO TASK_NO TASK_STEP_NO
---------- ---------- ------------
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 1 6
1 1 7
1 1 8
1 1 9
1 1 10
1 1 11
1 1 12
1 1 13
1 1 14
1 1 15
1 1 16
1 1 17
1 1 18
1 1 19
1 1 20
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 2 8
1 2 9
1 2 10
1 2 11
1 2 12
1 2 13
1 2 14
1 2 15
1 2 16
1 2 17
1 2 18
1 2 19
1 2 20
1 3 1
1 3 2
1 3 3
1 3 4
1 3 5
1 3 6
1 3 7
1 3 8
1 3 9
1 3 10
1 3 11
1 3 12
1 3 13
1 3 14
1 3 15
1 3 16
1 3 17
1 3 18
1 3 19
1 3 20
1 4 1
1 4 2
1 4 3
1 4 4
1 4 5
1 4 6
1 4 7
1 4 8
1 4 9
1 4 10
1 4 11
1 4 12
1 4 13
1 4 14
1 4 15
1 4 16
1 4 17
1 4 18
1 4 19
1 4 20
1 5 1
1 5 2
1 5 3
1 5 4
1 5 5
1 5 6
1 5 7
1 5 8
1 5 9
1 5 10
1 5 11
1 5 12
1 5 13
1 5 14
1 5 15
1 5 16
1 5 17
1 5 18
1 5 19
1 5 20
2 1 1
2 1 2
2 1 3
2 1 4
2 1 5
2 1 6
2 1 7
2 1 8
2 1 9
2 1 10
2 1 11
2 1 12
2 1 13
2 1 14
2 1 15
2 1 16
2 1 17
2 1 18
2 1 19
2 1 20
2 2 1
2 2 2
2 2 3
2 2 4
2 2 5
2 2 6
2 2 7
2 2 8
2 2 9
2 2 10
2 2 11
2 2 12
2 2 13
2 2 14
2 2 15
2 2 16
2 2 17
2 2 18
2 2 19
2 2 20
2 3 1
2 3 2
2 3 3
2 3 4
2 3 5
2 3 6
2 3 7
2 3 8
2 3 9
2 3 10
2 3 11
2 3 12
2 3 13
2 3 14
2 3 15
2 3 16
2 3 17
2 3 18
2 3 19
2 3 20
2 4 1
2 4 2
2 4 3
2 4 4
2 4 5
2 4 6
2 4 7
2 4 8
2 4 9
2 4 10
2 4 11
2 4 12
2 4 13
2 4 14
2 4 15
2 4 16
2 4 17
2 4 18
2 4 19
2 4 20
2 5 1
2 5 2
2 5 3
2 5 4
2 5 5
2 5 6
2 5 7
2 5 8
2 5 9
2 5 10
2 5 11
2 5 12
2 5 13
2 5 14
2 5 15
2 5 16
2 5 17
2 5 18
2 5 19
2 5 20
200 rows selected.
SQL>
As you already have the work_order table, you'd just use it in FROM clause (not as a CTE):
with
task (task_no) as
(select level from dual connect by level <= 5),
step (task_step_no) as
(select level from dual connect by level <= 20)
select y.wo_no, t.task_no, s.task_step_no
from work_order y cross join task t cross join step s
order by 1, 2, 3;
I would like to ask for your support. I tried many things, without success.
Suppose you have two different frames, a long frame (LF) (high number of rows) and a short frame (SF) (low number of rows), see example
SF=pd.DataFrame({"col1":[1,2,3],"col2":[4,5,6]})
LF=pd.DataFrame({"col_long":[1,2,3,4,5,6,7,8,9,10,11]})
I need to loop through the values of a specific column from the short frame, let's say we take "Test col2" and concat along axis 1 both frames. I have a solution which works, like this:
EMPTY_FRAME=pd.DataFrame()
SF=pd.DataFrame({"col1":[1,2,3],"col2":[4,5,6]})
LF=pd.DataFrame({"col_long":[1,2,3,4,5,6,7,8,9,10,11]})
for i in range(len(SF.index)):
LF["col1"]=SF["col1"].values[i]
LF["col2"]=SF["col2"].values[i]
EMPTY_FRAME=EMPTY_FRAME.append(LF)
LF= col_long col1 col2
0 1 1 4
1 2 1 4
2 3 1 4
3 4 1 4
4 5 1 4
5 6 1 4
6 7 1 4
7 8 1 4
8 9 1 4
9 10 1 4
10 11 1 4
0 1 2 5
1 2 2 5
2 3 2 5
3 4 2 5
4 5 2 5
5 6 2 5
6 7 2 5
7 8 2 5
8 9 2 5
9 10 2 5
10 11 2 5
0 1 3 6
1 2 3 6
2 3 3 6
3 4 3 6
4 5 3 6
5 6 3 6
6 7 3 6
7 8 3 6
8 9 3 6
9 10 3 6
10 11 3 6
but gets pretty confusing since I have many columns inside the SF and thus I might forget some columns. So the question: Is there any chance have the following solution in a better and shorter way?
I really would be grateful if you guys have an idea how I could further improve my code
you can cross join with reindex to retain the order:
out = (SF.assign(k=1).merge(LF.assign(k=1),on='k').drop('k',1)
.reindex(columns=LF.columns.union(SF.columns,sort=False)))
out.index = out['col_long'].factorize()[0] #if required
print(out)
col_long col1 col2
0 1 1 4
1 2 1 4
2 3 1 4
3 4 1 4
4 5 1 4
5 6 1 4
6 7 1 4
7 8 1 4
8 9 1 4
9 10 1 4
10 11 1 4
0 1 2 5
1 2 2 5
2 3 2 5
3 4 2 5
4 5 2 5
5 6 2 5
6 7 2 5
7 8 2 5
8 9 2 5
9 10 2 5
10 11 2 5
0 1 3 6
1 2 3 6
2 3 3 6
3 4 3 6
4 5 3 6
5 6 3 6
6 7 3 6
7 8 3 6
8 9 3 6
9 10 3 6
10 11 3 6
I have a data frame like the following:
days movements count
0 0 0 2777
1 0 1 51
2 0 2 2
3 1 0 6279
4 1 1 200
5 1 2 7
6 1 3 3
7 2 0 5609
8 2 1 110
9 2 2 32
10 2 3 4
11 3 0 4109
12 3 1 118
13 3 2 101
14 3 3 8
15 3 4 3
16 3 6 1
17 4 0 3034
18 4 1 129
19 4 2 109
20 4 3 6
21 4 4 2
22 4 5 2
23 5 0 2288
24 5 1 131
25 5 2 131
26 5 3 9
27 5 4 2
28 5 5 1
29 6 0 1918
30 6 1 139
31 6 2 109
32 6 3 13
33 6 4 1
34 6 5 1
35 7 0 1442
36 7 1 109
37 7 2 153
38 7 3 13
39 7 4 10
40 7 5 1
41 8 0 1085
42 8 1 76
43 8 2 111
44 8 3 13
45 8 4 7
46 8 7 1
47 9 0 845
48 9 1 81
49 9 2 86
50 9 3 8
51 9 4 8
52 10 0 646
53 10 1 70
54 10 2 83
55 10 3 1
56 10 4 2
57 10 5 1
58 10 6 1
This shows that for example on day 0, I have 2777 entries with 0 movements, 51 entries with 1 movement, 2 entries with 2 movements. I want to plot it as bar graph for every day and show the entries count for all movements. In order to do it, I thought I would transform the data to something like below and then plot a bar graph.
days 0 1 2 3 4 5 6 7
0 2777 51 2
1 6279 200 7 3
2 5609 110 32 4
3 4109 118 101 8 3
4 3034 129 109 6 2 2
5 2288 131 131 9 2 1
6 1918 139 109 13 1 1
7 1442 109 153 13 10 1
8 1085 76 111 13 7 1
9 845 81 86 8 8
10 646 70 83 1 2 1 1
I am not getting an idea of how should I achieve this? I have thousands of lines of data so doing it by hand does not make sense. Can someone guide me how to rearrange the data or if there is a quick way to plot the bar graph using matplotlib straight from the actual data frame that would be even better. Thanks for the help.
If I've a dataframe like:
11-May 12-May 13-May 14-May Distance
0 1 6 11 16 10
1 2 7 12 17 20
2 3 8 13 18 30
3 4 9 14 19 40
4 5 10 15 20 50
And, I want to reorganize to only three columns like:
Distance variable value
0 10 11-May 1
1 20 11-May 2
2 30 11-May 3
3 40 11-May 4
4 50 11-May 5
5 10 12-May 6
6 20 12-May 7
7 30 12-May 8
8 40 12-May 9
9 50 12-May 10
10 10 13-May 11
11 20 13-May 12
12 30 13-May 13
13 40 13-May 14
14 50 13-May 15
15 10 14-May 16
16 20 14-May 17
17 30 14-May 18
18 40 14-May 19
19 50 14-May 20
How would I be able to do this in Pandas?
One way using unstack
In [1940]: dff.set_index('Distance').unstack().reset_index()
Out[1940]:
level_0 Distance 0
0 11-May 10 1
1 11-May 20 2
2 11-May 30 3
3 11-May 40 4
4 11-May 50 5
5 12-May 10 6
6 12-May 20 7
7 12-May 30 8
8 12-May 40 9
9 12-May 50 10
10 13-May 10 11
11 13-May 20 12
12 13-May 30 13
13 13-May 40 14
14 13-May 50 15
15 14-May 10 16
16 14-May 20 17
17 14-May 30 18
18 14-May 40 19
19 14-May 50 20
Another way using melt
In [1943]: dff.melt('Distance')
Out[1943]:
Distance variable value
0 10 11-May 1
1 20 11-May 2
2 30 11-May 3
3 40 11-May 4
4 50 11-May 5
5 10 12-May 6
6 20 12-May 7
7 30 12-May 8
8 40 12-May 9
9 50 12-May 10
10 10 13-May 11
11 20 13-May 12
12 30 13-May 13
13 40 13-May 14
14 50 13-May 15
15 10 14-May 16
16 20 14-May 17
17 30 14-May 18
18 40 14-May 19
19 50 14-May 20
Where,
In [1941]: dff
Out[1941]:
11-May 12-May 13-May 14-May Distance
0 1 6 11 16 10
1 2 7 12 17 20
2 3 8 13 18 30
3 4 9 14 19 40
4 5 10 15 20 50
I have this data
AnsID QuesID AnsOrder
-----------------------
1 5 NULL
2 5 NULL
3 5 NULL
4 5 NULL
5 5 NULL
6 3 NULL
7 3 NULL
8 3 NULL
9 3 NULL
10 3 NULL
11 4 NULL
12 4 NULL
13 4 NULL
14 4 NULL
15 4 NULL
16 7 NULL
17 9 NULL
18 9 NULL
19 9 NULL
20 9 NULL
21 8 NULL
22 8 NULL
23 8 NULL
24 8 NULL
Want to UPDATE it into this format
AnsID QuesID AnsOrder
-----------------------
1 5 1
2 5 2
3 5 3
4 5 4
5 5 5
6 3 1
7 3 2
8 3 3
9 3 4
10 3 5
11 4 1
12 4 2
13 4 3
14 4 4
15 4 5
16 7 1
17 9 1
18 9 2
19 9 3
20 9 4
21 8 1
22 8 2
23 8 3
24 8 4
Basicaly I want to update AnsOrder column in ascending order according to QuesID column,
like this for more readability.
AnsID QuesID AnsOrder
-----------------------
1 5 1
2 5 2
3 5 3
4 5 4
5 5 5
6 3 1
7 3 2
8 3 3
9 3 4
10 3 5
11 4 1
12 4 2
13 4 3
14 4 4
15 4 5
16 7 1
17 9 1
18 9 2
19 9 3
20 9 4
21 8 1
22 8 2
23 8 3
24 8 4
You might generate row_numbers by quesID and assign them to AnsOrder like this:
; with ord as (
select *,
row_number() over (partition by quesID
order by AnsID) rn
from table1
)
update ord
set ansorder = rn
I've ordered by AnsID for consistency.
Check this # Sql Fiddle.