optaplanner incorrect cloud balance output - optaplanner

I am trying a simple Cloud balance example in the opta planner tutorial.
I have 2 computers and 4 process ( the same example which is explained in the opta planner documentation ).Each process needs certain amount of RAM and CPU .
The final solved output is incorrect which you can see from the logs below. Hard rules are violated.
Ideally Process 1 and 4 should fit in computer 1 .
Process 2 and 3 should fit in computer 2.
From the logs looks like HeuristicPhase happens but local search is not showing up in the logs.
Please guide me what I am dong wrong.
After building unsolvedCloudBalance Computers [ Computer [C1][7 CPU][6 RAM], Computer [C2][6 CPU][6 RAM]] Processes [ Process [P1][5 CPU][5 RAM], Process [P2][4 CPU][3 RAM], Process [P3][2 CPU][3 RAM], Process [P4][2 CPU][1 RAM]] Hard Soft Score null
01:35:57.099 [main] INFO o.o.core.impl.solver.DefaultSolver - Solving started: time spent (3), best score (4uninitialized/0hard/0soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
01:35:57.099 [main] INFO o.o.core.impl.solver.DefaultSolver - Solving started: time spent (3), best score (4uninitialized/0hard/0soft), environment mode (REPRODUCIBLE), random (JDK with seed 0).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (0), time spent (9), score (0hard/-800soft), selected move count (2), picked move ( Process [P1][5 CPU][5 RAM] {null -> Computer [C2][6 CPU][6 RAM]}).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (0), time spent (9), score (0hard/-800soft), selected move count (2), picked move ( Process [P1][5 CPU][5 RAM] {null -> Computer [C2][6 CPU][6 RAM]}).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (1), time spent (9), score (0hard/-1800soft), selected move count (2), picked move ( Process [P2][4 CPU][3 RAM] {null -> Computer [C1][7 CPU][6 RAM]}).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (1), time spent (9), score (0hard/-1800soft), selected move count (2), picked move ( Process [P2][4 CPU][3 RAM] {null -> Computer [C1][7 CPU][6 RAM]}).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (2), time spent (9), score (0hard/-1800soft), selected move count (1), picked move ( Process [P3][2 CPU][3 RAM] {null -> Computer [C1][7 CPU][6 RAM]}).
01:35:57.104 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (2), time spent (9), score (0hard/-1800soft), selected move count (1), picked move ( Process [P3][2 CPU][3 RAM] {null -> Computer [C1][7 CPU][6 RAM]}).
01:35:57.105 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (3), time spent (10), score (-1hard/-1800soft), selected move count (2), picked move ( Process [P4][2 CPU][1 RAM] {null -> Computer [C2][6 CPU][6 RAM]}).
01:35:57.105 [main] DEBUG o.o.c.i.c.DefaultConstructionHeuristicPhase - CH step (3), time spent (10), score (-1hard/-1800soft), selected move count (2), picked move ( Process [P4][2 CPU][1 RAM] {null -> Computer [C2][6 CPU][6 RAM]}).
01:35:57.106 [main] INFO o.o.c.i.c.DefaultConstructionHeuristicPhase - Construction Heuristic phase (0) ended: step total (4), time spent (11), best score (-1hard/-1800soft).
01:35:57.106 [main] INFO o.o.c.i.c.DefaultConstructionHeuristicPhase - Construction Heuristic phase (0) ended: step total (4), time spent (11), best score (-1hard/-1800soft).
01:36:07.095 [main] DEBUG o.o.c.i.l.DefaultLocalSearchPhase - LS step (0), time spent (10000), score (-2hard/-1800soft), best score (-1hard/-1800soft), accepted/selected move count (0/9152031), picked move ( Process [P4][2 CPU][1 RAM] { Computer [C2][6 CPU][6 RAM] -> Computer [C1][7 CPU][6 RAM]}).
01:36:07.095 [main] DEBUG o.o.c.i.l.DefaultLocalSearchPhase - LS step (0), time spent (10000), score (-2hard/-1800soft), best score (-1hard/-1800soft), accepted/selected move count (0/9152031), picked move ( Process [P4][2 CPU][1 RAM] { Computer [C2][6 CPU][6 RAM] -> Computer [C1][7 CPU][6 RAM]}).
01:36:07.096 [main] INFO o.o.c.i.l.DefaultLocalSearchPhase - Local Search phase (1) ended: step total (1), time spent (10001), best score (-1hard/-1800soft).
01:36:07.096 [main] INFO o.o.c.i.l.DefaultLocalSearchPhase - Local Search phase (1) ended: step total (1), time spent (10001), best score (-1hard/-1800soft).
01:36:07.096 [main] INFO o.o.core.impl.solver.DefaultSolver - Solving ended: time spent (10001), best score (-1hard/-1800soft), average calculate count per second (915112), environment mode (REPRODUCIBLE).
01:36:07.096 [main] INFO o.o.core.impl.solver.DefaultSolver - Solving ended: time spent (10001), best score (-1hard/-1800soft), average calculate count per second (915112), environment mode (REPRODUCIBLE).
time taken 10 sec

It worked as expected after I added difficultyComparatorClass to the #PlanningEntity annotation.

Related

How do I make construction heuristic phase assigns all entities in overconstrained planning?

As title says. I'm a developer new to optaplanner. Before I switched to overconstrained planning the CH phase worked correctly: here is CH assigning 21 entities as expected and LS was able to find an optimal solution (ignore the medium score level):
2021-10-01 12:26:07,933 [main] INFO Solving started: time spent (3221), best score (-21init/0hard/0medium/0soft), environment mode (REPRODUCIBLE), move thread count (NONE), random (JDK with seed 0).
2021-10-01 12:26:10,664 [main] DEBUG CH step (0), time spent (5955), score (-20init/0hard/0medium/-130soft), selected move count (21120), picked move (optaplanner.domain.Allocation#7c51782d {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=47, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:12,924 [main] DEBUG CH step (1), time spent (8215), score (-19init/-1hard/0medium/-3025soft), selected move count (21120), picked move (optaplanner.domain.Allocation#77bc2e16 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=45, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:14,137 [main] DEBUG CH step (2), time spent (9428), score (-18init/-3hard/0medium/-3025soft), selected move count (21120), picked move (optaplanner.domain.Allocation#48e8c32a {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#50915d5, index=0, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:15,293 [main] DEBUG CH step (3), time spent (10584), score (-17init/-3hard/0medium/-3164soft), selected move count (21120), picked move (optaplanner.domain.Allocation#20a7953c {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=46, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:16,027 [main] DEBUG CH step (4), time spent (11318), score (-16init/-4hard/0medium/-6098soft), selected move count (21120), picked move (optaplanner.domain.Allocation#57c00115 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=42, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:16,654 [main] DEBUG CH step (5), time spent (11945), score (-15init/-6hard/0medium/-6098soft), selected move count (21120), picked move (optaplanner.domain.Allocation#411a5965 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#50915d5, index=45, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:17,267 [main] DEBUG CH step (6), time spent (12558), score (-14init/-6hard/0medium/-6247soft), selected move count (21120), picked move (optaplanner.domain.Allocation#4fe533ff {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:17,850 [main] DEBUG CH step (7), time spent (13141), score (-13init/-7hard/0medium/-9221soft), selected move count (21120), picked move (optaplanner.domain.Allocation#5377414a {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=39, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:18,541 [main] DEBUG CH step (8), time spent (13832), score (-12init/-9hard/0medium/-9221soft), selected move count (21120), picked move (optaplanner.domain.Allocation#4e83a98 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#50915d5, index=42, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:19,182 [main] DEBUG CH step (9), time spent (14473), score (-11init/-9hard/0medium/-9380soft), selected move count (21120), picked move (optaplanner.domain.Allocation#17aa8a11 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:19,758 [main] DEBUG CH step (10), time spent (15049), score (-10init/-10hard/0medium/-9416soft), selected move count (21120), picked move (optaplanner.domain.Allocation#71b639d0 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:20,332 [main] DEBUG CH step (11), time spent (15623), score (-9init/-12hard/0medium/-12250soft), selected move count (21120), picked move (optaplanner.domain.Allocation#18a25bbd {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#63661fc7, index=45, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:20,939 [main] DEBUG CH step (12), time spent (16230), score (-8init/-12hard/0medium/-12419soft), selected move count (21120), picked move (optaplanner.domain.Allocation#5d5b9ecb {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:21,499 [main] DEBUG CH step (13), time spent (16790), score (-7init/-13hard/0medium/-15413soft), selected move count (21120), picked move (optaplanner.domain.Allocation#1ee27d73 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=36, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:22,085 [main] DEBUG CH step (14), time spent (17376), score (-6init/-14hard/0medium/-15449soft), selected move count (21120), picked move (optaplanner.domain.Allocation#5e5aafc6 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#50915d5, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:22,669 [main] DEBUG CH step (15), time spent (17960), score (-5init/-14hard/0medium/-15628soft), selected move count (21120), picked move (optaplanner.domain.Allocation#542f6803 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:23,253 [main] DEBUG CH step (16), time spent (18544), score (-4init/-15hard/0medium/-18662soft), selected move count (21120), picked move (optaplanner.domain.Allocation#5583098b {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=33, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:26:23,833 [main] DEBUG CH step (17), time spent (19124), score (-3init/-16hard/0medium/-18698soft), selected move count (21120), picked move (optaplanner.domain.Allocation#5807efad {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#63661fc7, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:24,427 [main] DEBUG CH step (18), time spent (19718), score (-2init/-16hard/0medium/-18887soft), selected move count (21120), picked move (optaplanner.domain.Allocation#53a84ff4 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:25,010 [main] DEBUG CH step (19), time spent (20301), score (-1init/-17hard/0medium/-18923soft), selected move count (21120), picked move (optaplanner.domain.Allocation#7ce85af2 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:25,660 [main] DEBUG CH step (20), time spent (20951), score (-19hard/0medium/-18953soft), selected move count (21120), picked move (optaplanner.domain.Allocation#316acbb5 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#65130cf2, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:26:25,662 [main] INFO Construction Heuristic phase (0) ended: time spent (20953), best score (-19hard/0medium/-18953soft), score calculation speed (25027/sec), step total (21).
2021-10-01 12:26:25,961 [main] DEBUG LS step (0), time spent (21252), score (-17hard/0medium/-24372soft), new best score (-17hard/0medium/-24372soft), accepted/selected move count (1000/1000), picked move (optaplanner.domain.Allocation#5296ab0c {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=47, availability=2021-10-08T14:00-2021-10-08T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=41, availability=2021-10-05T14:00-2021-10-05T22:00}}).
2021-10-01 12:26:26,114 [main] DEBUG LS step (1), time spent (21405), score (-15hard/0medium/-40094soft), new best score (-15hard/0medium/-40094soft), accepted/selected move count (1000/1067), picked move (optaplanner.domain.Allocation#730794bb {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=45, availability=2021-10-08T14:00-2021-10-08T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=46, availability=2021-10-01T14:00-2021-10-01T22:00}}).
2021-10-01 12:26:26,309 [main] DEBUG LS step (2), time spent (21600), score (-13hard/0medium/-47776soft), new best score (-13hard/0medium/-47776soft), accepted/selected move count (1000/1160), picked move (optaplanner.domain.Allocation#73ed094c {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=46, availability=2021-10-08T14:00-2021-10-08T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=20, availability=2021-10-04T14:00-2021-10-04T22:00}}).
2021-10-01 12:26:26,471 [main] DEBUG LS step (3), time spent (21762), score (-11hard/0medium/-65742soft), new best score (-11hard/0medium/-65742soft), accepted/selected move count (1000/1145), picked move (optaplanner.domain.Allocation#3c37489b {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=42, availability=2021-10-08T14:00-2021-10-08T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=0, availability=2021-10-01T14:00-2021-10-01T22:00}}).
2021-10-01 12:26:26,649 [main] DEBUG LS step (4), time spent (21940), score (-9hard/0medium/-75202soft), new best score (-9hard/0medium/-75202soft), accepted/selected move count (1000/1148), picked move (optaplanner.domain.Allocation#a386ccf {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=41, availability=2021-10-08T14:00-2021-10-08T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=0, availability=2021-10-05T14:00-2021-10-05T22:00}}).
2021-10-01 12:26:26,832 [main] DEBUG LS step (5), time spent (22123), score (-8hard/0medium/-75202soft), new best score (-8hard/0medium/-75202soft), accepted/selected move count (1000/1172), picked move (optaplanner.domain.Allocation#144402f6 {ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=42, availability=2021-10-07T14:00-2021-10-07T22:00} -> ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=11, availability=2021-10-05T14:00-2021-10-05T22:00}}).
--------------------(omitting other steps)--------------------
2021-10-01 12:27:04,541 [main] DEBUG LS step (347), time spent (59832), score (0hard/0medium/-42754soft), best score (0hard/0medium/-42754soft), accepted/selected move count (1000/1359), picked move (optaplanner.domain.Allocation#64047c70 {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=46, availability=2021-10-06T14:00-2021-10-06T22:00}} <-> optaplanner.domain.Allocation#3c37489b {ResourceTimegrain{resource=optaplanner.domain.Resource#276b68af, index=44, availability=2021-10-06T14:00-2021-10-06T22:00}}).
2021-10-01 12:27:04,635 [main] DEBUG LS step (348), time spent (59926), score (0hard/0medium/-42754soft), best score (0hard/0medium/-42754soft), accepted/selected move count (1000/1365), picked move (optaplanner.domain.Allocation#68bb44fe {ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=6, availability=2021-10-07T14:00-2021-10-07T22:00}} <-> optaplanner.domain.Allocation#62f89e6a {ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=33, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:27:04,709 [main] DEBUG LS step (349), time spent (60000), score (0hard/0medium/-42754soft), best score (0hard/0medium/-42754soft), accepted/selected move count (802/1086), picked move (optaplanner.domain.Allocation#7f77b1b0 {ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=13, availability=2021-10-07T14:00-2021-10-07T22:00}} <-> optaplanner.domain.Allocation#6f8a9b12 {ResourceTimegrain{resource=optaplanner.domain.Resource#34d644b5, index=39, availability=2021-10-07T14:00-2021-10-07T22:00}}).
2021-10-01 12:27:04,709 [main] INFO Local Search phase (1) ended: time spent (60000), best score (0hard/0medium/-42754soft), score calculation speed (11876/sec), step total (350).
2021-10-01 12:27:04,712 [main] INFO Solving ended: time spent (60000), best score (0hard/0medium/-42754soft), score calculation speed (15115/sec), phase total (2), environment mode (REPRODUCIBLE), move thread count (NONE).
After switching to nullable planning variable, medium score level has a constraint that penalizes for every unassigned entity. This is log from solving the same problem instance to which optplanner couldn't find an optimal solution:
2021-10-01 12:39:29,144 [main] INFO Solving started: time spent (6235), best score (0hard/-21medium/0soft), environment mode (REPRODUCIBLE), move thread count (NONE), random (JDK with seed 0).
2021-10-01 12:39:33,797 [main] DEBUG CH step (0), time spent (10892), score (0hard/-20medium/-130soft), selected move count (21121), picked move (optaplanner.domain.Allocation#7c51782d {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=47, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:36,698 [main] DEBUG CH step (1), time spent (13793), score (0hard/-20medium/-130soft), selected move count (21121), picked move (optaplanner.domain.Allocation#77bc2e16 {null -> null}).
2021-10-01 12:39:39,032 [main] DEBUG CH step (2), time spent (16127), score (0hard/-20medium/-130soft), selected move count (21121), picked move (optaplanner.domain.Allocation#48e8c32a {null -> null}).
2021-10-01 12:39:40,857 [main] DEBUG CH step (3), time spent (17952), score (0hard/-19medium/-269soft), selected move count (21121), picked move (optaplanner.domain.Allocation#20a7953c {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=46, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:41,878 [main] DEBUG CH step (4), time spent (18973), score (0hard/-19medium/-269soft), selected move count (21121), picked move (optaplanner.domain.Allocation#57c00115 {null -> null}).
2021-10-01 12:39:42,586 [main] DEBUG CH step (5), time spent (19681), score (0hard/-19medium/-269soft), selected move count (21121), picked move (optaplanner.domain.Allocation#411a5965 {null -> null}).
2021-10-01 12:39:43,354 [main] DEBUG CH step (6), time spent (20449), score (0hard/-18medium/-418soft), selected move count (21121), picked move (optaplanner.domain.Allocation#4fe533ff {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:44,007 [main] DEBUG CH step (7), time spent (21102), score (0hard/-18medium/-418soft), selected move count (21121), picked move (optaplanner.domain.Allocation#5377414a {null -> null}).
2021-10-01 12:39:44,633 [main] DEBUG CH step (8), time spent (21728), score (0hard/-18medium/-418soft), selected move count (21121), picked move (optaplanner.domain.Allocation#4e83a98 {null -> null}).
2021-10-01 12:39:45,251 [main] DEBUG CH step (9), time spent (22346), score (0hard/-17medium/-577soft), selected move count (21121), picked move (optaplanner.domain.Allocation#17aa8a11 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:45,898 [main] DEBUG CH step (10), time spent (22993), score (0hard/-17medium/-577soft), selected move count (21121), picked move (optaplanner.domain.Allocation#71b639d0 {null -> null}).
2021-10-01 12:39:46,485 [main] DEBUG CH step (11), time spent (23580), score (0hard/-17medium/-577soft), selected move count (21121), picked move (optaplanner.domain.Allocation#18a25bbd {null -> null}).
2021-10-01 12:39:47,151 [main] DEBUG CH step (12), time spent (24246), score (0hard/-16medium/-746soft), selected move count (21121), picked move (optaplanner.domain.Allocation#5d5b9ecb {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:47,862 [main] DEBUG CH step (13), time spent (24957), score (0hard/-16medium/-746soft), selected move count (21121), picked move (optaplanner.domain.Allocation#1ee27d73 {null -> null}).
2021-10-01 12:39:48,556 [main] DEBUG CH step (14), time spent (25651), score (0hard/-16medium/-746soft), selected move count (21121), picked move (optaplanner.domain.Allocation#5e5aafc6 {null -> null}).
2021-10-01 12:39:49,193 [main] DEBUG CH step (15), time spent (26288), score (0hard/-15medium/-925soft), selected move count (21121), picked move (optaplanner.domain.Allocation#542f6803 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:49,781 [main] DEBUG CH step (16), time spent (26876), score (0hard/-15medium/-925soft), selected move count (21121), picked move (optaplanner.domain.Allocation#5583098b {null -> null}).
2021-10-01 12:39:50,407 [main] DEBUG CH step (17), time spent (27502), score (0hard/-15medium/-925soft), selected move count (21121), picked move (optaplanner.domain.Allocation#5807efad {null -> null}).
2021-10-01 12:39:51,168 [main] DEBUG CH step (18), time spent (28262), score (0hard/-14medium/-1114soft), selected move count (21121), picked move (optaplanner.domain.Allocation#53a84ff4 {null -> ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:51,826 [main] DEBUG CH step (19), time spent (28921), score (0hard/-14medium/-1114soft), selected move count (21121), picked move (optaplanner.domain.Allocation#7ce85af2 {null -> null}).
2021-10-01 12:39:52,372 [main] DEBUG CH step (20), time spent (29467), score (0hard/-14medium/-1114soft), selected move count (21121), picked move (optaplanner.domain.Allocation#316acbb5 {null -> null}).
2021-10-01 12:39:52,374 [main] INFO Construction Heuristic phase (0) ended: time spent (29469), best score (0hard/-14medium/-1114soft), score calculation speed (19105/sec), step total (21).
2021-10-01 12:39:52,692 [main] DEBUG LS step (0), time spent (29787), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1000), picked move (optaplanner.domain.Allocation#41418e53 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#7f77b1b0 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=46, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:52,873 [main] DEBUG LS step (1), time spent (29968), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1213), picked move (optaplanner.domain.Allocation#c4d9e83 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#6f8a9b12 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=47, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:53,061 [main] DEBUG LS step (2), time spent (30156), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1503), picked move (optaplanner.domain.Allocation#7b174491 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#d464e23 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:53,270 [main] DEBUG LS step (3), time spent (30365), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1535), picked move (optaplanner.domain.Allocation#7f77b1b0 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#41418e53 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=46, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:53,457 [main] DEBUG LS step (4), time spent (30552), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1532), picked move (optaplanner.domain.Allocation#c4d9e83 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=47, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#6f8a9b12 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:39:53,605 [main] DEBUG LS step (5), time spent (30700), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1548), picked move (optaplanner.domain.Allocation#2957f567 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#7b174491 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
--------------------(omitting other steps)--------------------
2021-10-01 12:40:22,649 [main] DEBUG LS step (367), time spent (59744), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1455), picked move (optaplanner.domain.Allocation#d464e23 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#41418e53 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:40:22,731 [main] DEBUG LS step (368), time spent (59826), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1545), picked move (optaplanner.domain.Allocation#2957f567 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#c4d9e83 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=44, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:40:22,812 [main] DEBUG LS step (369), time spent (59907), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1535), picked move (optaplanner.domain.Allocation#7b174491 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=45, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#7f77b1b0 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=46, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:40:22,890 [main] DEBUG LS step (370), time spent (59985), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (1000/1525), picked move (optaplanner.domain.Allocation#6f8a9b12 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=47, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#41418e53 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=43, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:40:22,905 [main] DEBUG LS step (371), time spent (60000), score (0hard/-14medium/-1114soft), best score (0hard/-14medium/-1114soft), accepted/selected move count (189/293), picked move (optaplanner.domain.Allocation#c4d9e83 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=42, availability=2021-10-08T14:00-2021-10-08T22:00}} <-> optaplanner.domain.Allocation#d464e23 {ResourceTimegrain{resource=optaplanner.domain.Resource#668ea404, index=41, availability=2021-10-08T14:00-2021-10-08T22:00}}).
2021-10-01 12:40:22,905 [main] INFO Local Search phase (1) ended: time spent (60000), best score (0hard/-14medium/-1114soft), score calculation speed (18451/sec), step total (372).
2021-10-01 12:40:22,906 [main] INFO Solving ended: time spent (60000), best score (0hard/-14medium/-1114soft), score calculation speed (16769/sec), phase total (2), environment mode (REPRODUCIBLE), move thread count (NONE).
Observation:
CH didn't assign all entities which also affects LS (I guess since it is optional).
CH had pointless steps (assign null -> null)
How do I make sure that all entities are assigned?
The step ("assign null -> null") is not pointless. It shows OptaPlanner making a decision to keep something null. And it made that decision, because it was the decision that, of all the possible decisions, resulted in the best possible score. (Otherwise that decision wouldn't have been made.)
Most likely, assigning to non-null values would have broken a hard constraint - and OptaPlanner avoided that by assigning null. OptaPlanner behaves as expected here - you say you have more important concerns than nullity, and OptaPlanner respects your choice. (Nullity is only a medium constraint, while some other constraints are hard and therefore more important.)
If you wish to have all variables assigned, do not make them nullable.
Alternatively, design heavily weighted hard constraints for situations where unassigned variables are a problem. (But how would that differ from just not having them nullable in the first place?)

Optaplanner 7.9.0 Multithreading - Solution not reproducable

Having updated to 7.9.0 and after initial problems: https://stackoverflow.com/questions/51597744/optaplanner-7-9-0-and-adding-multithreading-same-planningid-exception I now have been trying to test and compare to my 7.7.0 version. However I cannot get it to reproduce the same solution everytime (obviously only with the same problem data) as in my older version, even when explicitly set in the config xml. Is there additional set up for this version required?
Edit: Did some testing and found switching to TABU (haven't been through them all) gave me the expected consistency:
Extra Cores:1 - DEFAULT:
Score (Hrd:Med:Sft) Time Taken (Minutes:Seconds)
Test1 0:0:-7609 0:08
Test2 0:-1:-7758 0:13
Test3 0:-1:-7705 0:14
Extra Cores:1 - TABU:
Score (Hrd:Med:Sft) Time Taken (Minutes:Seconds)
Test1 0:0:-7763 1:29
Test2 0:0:-7763 1:29
Test3 0:0:-7763 1:28
Between two runs of the former the solution diverges at LS step 28:
LS step (25), time spent (1869), score (0hard/-3medium/-8155soft),
LS step (26), time spent (1890), score (0hard/-3medium/-8339soft),
LS step (27), time spent (1895), score (0hard/-3medium/-8126soft),
**LS step (28), time spent (1909), score (0hard/-3medium/-8256soft),
LS step (29), time spent (1915), score (0hard/-3medium/-8438soft),
LS step (30), time spent (1924), score (0hard/-3medium/-8620soft),
LS step (31), time spent (1952), score (0hard/-3medium/-8639soft),**
...and...
LS step (25), time spent (1385), score (0hard/-3medium/-8155soft),
LS step (26), time spent (1407), score (0hard/-3medium/-8339soft),
LS step (27), time spent (1412), score (0hard/-3medium/-8126soft),
**LS step (28), time spent (1422), score (0hard/-3medium/-8217soft),
LS step (29), time spent (1436), score (0hard/-3medium/-8336soft),
LS step (30), time spent (1442), score (0hard/-3medium/-8517soft),
LS step (31), time spent (1448), score (0hard/-3medium/-8571soft),**
Don't know if that all makes it more or less likely to be problem/solution set up or something else.
First, let's define reproducible: getting the same result at the same step iteration. If the CPU time is completely the same (which it never is), this also means getting the same result after the same amount of time. So you might need to run it a bit longer to get the same amount of steps.
Multithreaded solving is reproducible if and only if the same moveThreadCount is used. A run with <moveThreadCount>4</> is NOT reproducible by a <moveThreadCount>2</>. Even a run with <moveThreadCount>NONE</> is not reproducible with a run with <moveThreadCount>1</> (the latter being useless except for QA btw).
Look at your score calculation speed.
If it's lower, you probably forgot to increase the memory size and GC churns are hurting you.
If that's higher, you'd normally get a better result. If you see a worse result anyway, the config might have been overfitted for the no move threads. Use additional datasets in optaplanner-benchmark to prove or disprove that.
Furthermore, if you use <moveThreadCount>AUTO</>, it's only reproducible on the same machine, because AUTO changes the moveThreadCount depending on the number of CPU cores in the machine.

Apache Ignite poor performance compared to Redis

I did a simplistic benchmark of apache ignite recently and found that when I made use of the ignite client across 50 threads, the performance degraded tremendously. The benchmark pits Redis against Ignite and Jedis, the Redis java client, seems to do much better in the threaded scenario since it keeps multiple clients in a pool rather than using one client among multiple threads.
My question is, am I seeing the degraded performance as a result of what i've just articulated, or is this actually expected and the client isn't the bottleneck?
Here were my results. The local host is a mac pro with 12 2.7GHz cores and 64gb of ram. The remote host was an AWS m4.4xlarge.
Benchmark Results
*50 - 118kb Audio File Streamed and Read Simultaneously from One Node in 4096 byte chunks*
Redis
*Notes:*
I used a 50ms polling interval to check for updates to the cache.
*Results:*
Local:
- Total Time to stream and read 50 audio files: 226ms.
- average complete read and write: 125ms
- average time to first byte read: 26ms
- average read time per runner: 103ms
- average write time per runner: 71ms
- p99 time to first byte: 59ms
- p90 time to first byte: 57ms
- p50 time to first byte: 6ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 50 audio files: 1405ms.
- average complete read and write: 1298ms
- average time to first byte read: 81ms
- average read time per runner: 1277ms
- average write time per runner: 1238ms
- p99 time to first byte: 148ms
- p90 time to first byte: 126ms
- p50 time to first byte: 84ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 50 audio files: 2035ms.
- average complete read and write: 1245ms
- average time to first byte read: 67ms
- average read time per runner: 1226ms
- average write time per runner: 1034ms
- p99 time to first byte: 161ms
- p90 time to first byte: 87ms
- p50 time to first byte: 74ms
Ignite
*Notes:*
I have a feeling these numbers are artificially inflated. I think the client is not well built for extreme parallelism. I believe it's doing quite a bit of locking. I think if you were to have many nodes doing the same amount of work, the numbers might be better. This would require more in depth benchmarking. This is 50 caches, one cache group.
*Results:*
Local:
- Total Time to stream and read 50 audio files: 327ms.
- average complete read and write: 321ms
- average time to first byte read: 184ms
- average read time per runner: 225ms
- average write time per runner: 35ms
- p99 time to first byte: 212ms
- p90 time to first byte: 197ms
- p50 time to first byte: 191ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 50 audio files: 5148ms.
- average complete read and write: 4483ms
- average time to first byte read: 947ms
- average read time per runner: 3224ms
- average write time per runner: 2779ms
- p99 time to first byte: 4936ms
- p90 time to first byte: 926ms
- p50 time to first byte: 577ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 50 audio files: 4840ms.
- average complete read and write: 4287ms
- average time to first byte read: 780ms
- average read time per runner: 3035ms
- average write time per runner: 2562ms
- p99 time to first byte: 4458ms
- p90 time to first byte: 857ms
- p50 time to first byte: 566ms
*1 - 118kb Audio File Streamed and Read Simultaneously from One Node in 4096 byte chunks*
Redis
*Notes:*
I used a 50ms polling interval to check for updates to the cache.
*Results:*
Local:
- Total Time to stream and read 1 audio files: 62ms.
- average complete read and write: 62ms
- average time to first byte read: 55ms
- average read time per runner: 61ms
- average write time per runner: 3ms
- p99 time to first byte: 55ms
- p90 time to first byte: 55ms
- p50 time to first byte: 55ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 1 audio files: 394ms.
- average complete read and write: 394ms
- average time to first byte read: 57ms
- average read time per runner: 394ms
- average write time per runner: 342ms
- p99 time to first byte: 57ms
- p90 time to first byte: 57ms
- p50 time to first byte: 57ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 1 audio files: 388ms.
- average complete read and write: 388ms
- average time to first byte read: 61ms
- average read time per runner: 388ms
- average write time per runner: 343ms
- p99 time to first byte: 61ms
- p90 time to first byte: 61ms
- p50 time to first byte: 61ms
Ignite
*Notes:*
None
*Results:*
Local:
- Total Time to stream and read 1 audio files: 32ms.
- average complete read and write: 32ms
- average time to first byte read: 2ms
- average read time per runner: 23ms
- average write time per runner: 11ms
- p99 time to first byte: 2ms
- p90 time to first byte: 2ms
- p50 time to first byte: 2ms
Remote (Over SSH | Seattle → IAD):
- Total Time to stream and read 1 audio files: 259ms.
- average complete read and write: 258ms
- average time to first byte read: 19ms
- average read time per runner: 232ms
- average write time per runner: 169ms
- p99 time to first byte: 19ms
- p90 time to first byte: 19ms
- p50 time to first byte: 19ms
Remote (Through VIP | Seattle → IAD):
- Total Time to stream and read 1 audio files: 203ms.
- average complete read and write: 203ms
- average time to first byte read: 20ms
- average read time per runner: 174ms
- average write time per runner: 93ms
- p99 time to first byte: 20ms
- p90 time to first byte: 20ms
- p50 time to first byte: 20ms
UPDATE:
To make more apparent what I'm trying to do:
I'm going to have 50+ million devices streaming audio. The streams could be 100kb on average and 200k streams/minute at peak traffic. I'm looking for a storage solution to accommodate that need. I've been examining Bookkeeper, Kafka, Ignite, Cassandra, and Redis. So far i've only benchmarked redis and ignite, but i'm surprised ignite is so slow.
I reviewed your benchmark and made a couple runs locally. I was able to make it much faster:
30 iteration isn't enough for JVM to warm up, on my laptop, it
requires ~150 iterations. So I increased it to 300 iterations.
I moved cache creation out of the benchmark, I added it right after cache destroying.
Also, I moved out of benchmark ignite client creation, it's extremely expensive operation and in real life, you should reuse it.
Please take a look at my changes, I created a pull request:
https://github.com/Sahasrara/AudioStreamStoreDemo/pull/1/files
I don't think you should be creating a cache for every operation. This is a heavy operation with Ignite. What are your requirements for this?
I can see how performance greatly improves for subsequent runs. Ignite is based on Java, which needs some time to warm-up.
You should definitely avoid creating a lot of caches.
Use a cache group in order to share infrastructure between your caches.
Keep a barrel of N caches and re-purpose them for a new files as time goes by, freeing them once file is no longer used. This will need keeping some accounting.
Better yet, find a way to just use one cache, keep file identifier in composite cache key, keep track of what you have in cache.

Failed to load data from S3

I launched two m1.medium nodes on amazon ec2 for executing my pig script, but looks like it failed at the first line (even before MapReduce start): raw = LOAD 's3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000' USING TextLoader as (line:chararray);
The error message I got:
2015-02-04 02:15:39,804 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
2015-02-04 02:15:39,821 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Default number of map tasks: null
2015-02-04 02:15:39,822 [JobControl] INFO org.apache.hadoop.mapred.JobClient - Setting default number of map tasks based on cluster size to : 20
... (omitted)
2015-02-04 02:18:40,955 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201502040202_0002 has failed! Stop running all dependent jobs
2015-02-04 02:18:40,956 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backed error: Error: Java heap space
2015-02-04 02:18:40,997 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2015-02-04 02:18:40,997 [main] INFO org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 1.0.3 0.11.1.1-amzn hadoop 2015-02-04 02:15:32 2015-02-04 02:18:40 GROUP_BY
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201502050202_0002 ngroup,raw,triples,tt GROUP_BY,COMBINER Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201502050202_0002_m_000022
Input(s):
Failed to read data from "s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000"
Output(s):
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
I think the code should be fine since I have ever successfully loaded other data with the same syntax, and the link to s3n://uw-cse-344-oregon.aws.amazon.com/btc-2010-chunk-000 looks valid. I suspect it might be related to some of my EC2 settings, but not sure how to investigate further or narrow down the problem. Anyone has a clue?
"Java heap space" error message gives some clues. Your files seem to be quite large (~2GB). Make sure that you have enough memory for each task runner to read the data.
The problem was currently solved by changing my node from m1.medium to m3.large , thanks for the good hint from #Nat as he pointed out the error message regarding with java heap space. I'll update more details later.

Unusually long Pig job start time

A pig script (not particularly more complex than any others I have built) before the job starts it seems to loop on this for a long time:
2013-10-08 10:46:07,655 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:07,659 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:09,168 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:09,168 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:11,381 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:11,381 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:13,875 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
2013-10-08 10:46:13,875 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 10
2013-10-08 10:46:16,303 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 10
It repeats the above for around 4 minutes when usually this step is completed in seconds. I have not been able to identify the cause - other than removing parts of the script but the issue does not seem to be caused by any particular part of the script. I have other scripts as complex as this one and I have not had this problem. What could be causing the issue?
I can't say for certain without more information, but it appears that pig is waiting for your cluster's JobTracker to start running the underlying Map/Reduce jobs generated by your script. There are numerious reasons why this could be happening such as running on a shared cluster which has run out of resources. You'll most likely have to look at your cluster's JobTracker and/or TaskTrackers to know the exact reason.