How to interpret the log output of docplex optimisation library - optimization

I am having a problem interpreting this log that I get after trying to maximise an objective function using docplex:
Nodes Cuts/
Node Left Objective IInf Best Integer Best Bound ItCnt Gap
0 0 6.3105 0 10.2106 26
0 0 5.9960 8 Cone: 5 34
0 0 5.8464 5 Cone: 8 47
0 0 5.8030 11 Cone: 10 54
0 0 5.7670 12 Cone: 13 64
0 0 5.7441 13 Cone: 16 72
0 0 5.7044 9 Cone: 19 81
0 0 5.6844 14 5.6844 559
* 0+ 0 4.5362 5.6844 25.31%
0 0 5.5546 15 4.5362 Cuts: 322 1014 22.45%
0 0 5.4738 15 4.5362 Cuts: 38 1108 20.67%
* 0+ 0 4.6021 5.4738 18.94%
0 0 5.4296 16 4.6021 Cuts: 100 1155 17.98%
0 0 5.3779 19 4.6021 Cuts: 34 1204 16.86%
0 0 5.3462 17 4.6021 Cuts: 80 1252 16.17%
0 0 5.3396 19 4.6021 Cuts: 42 1276 16.03%
0 0 5.3364 24 4.6021 Cuts: 57 1325 15.96%
0 0 5.3269 17 4.6021 Cuts: 66 1353 15.75%
0 0 5.3188 20 4.6021 Cuts: 42 1369 15.57%
0 0 5.2975 21 4.6021 Cuts: 62 1387 15.11%
0 0 5.2838 24 4.6021 Cuts: 72 1427 14.81%
0 0 5.2796 21 4.6021 Cuts: 70 1457 14.72%
0 0 5.2762 24 4.6021 Cuts: 73 1471 14.65%
0 0 5.2655 24 4.6021 Cuts: 18 1479 14.42%
* 0+ 0 4.6061 5.2655 14.32%
* 0+ 0 4.6613 5.2655 12.96%
0 0 5.2554 26 4.6613 Cuts: 40 1492 12.75%
0 0 5.2425 27 4.6613 Cuts: 11 1511 12.47%
0 0 5.2360 23 4.6613 Cuts: 3 1518 12.33%
0 0 5.2296 19 4.6613 Cuts: 7 1521 12.19%
0 0 5.2213 18 4.6613 Cuts: 8 1543 12.01%
0 0 5.2163 24 4.6613 Cuts: 15 1552 11.91%
0 0 5.2106 21 4.6613 Cuts: 4 1558 11.78%
0 0 5.2106 21 4.6613 Cuts: 3 1559 11.78%
* 0+ 0 4.6706 5.2106 11.56%
0 2 5.2106 21 4.6706 5.2106 1559 11.56%
Elapsed time = 9.12 sec. (7822.43 ticks, tree = 0.01 MB, solutions = 5)
51 29 4.9031 3 4.6706 5.1575 1828 10.42%
260 147 4.9207 1 4.6706 5.1575 2699 10.42%
498 242 infeasible 4.6706 5.0909 3364 9.00%
712 346 4.7470 6 4.6706 5.0591 4400 8.32%
991 497 4.7338 6 4.6706 5.0480 5704 8.08%
1358 566 4.8085 11 4.6706 5.0005 7569 7.06%
1708 708 4.7638 14 4.6706 4.9579 9781 6.15%
1985 817 cutoff 4.6706 4.9265 11661 5.48%
2399 843 infeasible 4.6706 4.9058 15567 5.04%
3619 887 4.7066 4 4.6706 4.7875 23685 2.50%
Elapsed time = 17.75 sec. (10933.85 ticks, tree = 3.05 MB, solutions = 5)
4623 500 4.6863 13 4.6706 4.7274 35862 1.22%
What I don't understand is the following:
What is the difference between the third (Objective) and fifth column (Best integer )
How come that the third column (Objective) has higher values than the actual solution of the problem given by CPLEX which is (4.6706)
Does the values in the third column take into consideration the constraints given to the optimization problem?
This webpage didn't help me to understand neither, the explanation of Best Integer is really confusing.
Thank you in advance for your feedback.
Regards.

The user manual includes a detailed explanation of this log in section
CPLEX->User's Manual for CPLEX->Discrete Optimization->Solving Mixed Integer Programming Problems (MIP)->Progress Reports: interpreting the node log
(see https://www.ibm.com/support/knowledgecenter/SSSA5P_12.8.0/ilog.odms.cplex.help/CPLEX/UsrMan/topics/discr_optim/mip/para/52_node_log.html)

I suggest to have a look at
in
https://fr.slideshare.net/mobile/IBMOptimization/2013-11-informsminingthenodelog

Related

pandas df add new column based on proportion of two other columns from another dataframe

I have df1 which has three columns (loadgroup, cartons, blocks) like this
loadgroup
cartons
blocks
cartonsPercent
blocksPercent
1
2269
14
26%
21%
2
1168
13
13%
19%
3
937
8
11%
12%
4
2753
24
31%
35%
5
1686
9
19%
13%
total(sum of column)
8813
68
100%
100%
The interpretation is like this: out of df1 26% cartons which is also 21% of blocks are assigned to loadgroup 1, etc. we can assume blocks are 1 to 68, cartons are 1 to 8813.
I also have df2 which also has cartons and blocks columns. but does not have loadgroup.
My goal is to assign loadgroup (1-5 as well) to df2 (100 blocks 29608 cartons in total), but keep the proportions, for example, for df2, 26% cartons 21% blocks assign loadgroup 1, 13% cartons 19% blocks assign loadgroup 2, etc.
df2 is like this:
block
cartons
0
533
1
257
2
96
3
104
4
130
5
71
6
68
7
87
8
99
9
51
10
291
11
119
12
274
13
316
14
87
15
149
16
120
17
222
18
100
19
148
20
192
21
188
22
293
23
120
24
224
25
449
26
385
27
395
28
418
29
423
30
244
31
327
32
337
33
249
34
528
35
528
36
494
37
540
38
368
39
533
40
614
41
462
42
350
43
618
44
463
45
552
46
397
47
401
48
397
49
365
50
475
51
379
52
541
53
488
54
383
55
354
56
760
57
327
58
211
59
356
60
552
61
401
62
320
63
368
64
311
65
421
66
458
67
278
68
504
69
385
70
242
71
413
72
246
73
465
74
386
75
231
76
154
77
294
78
275
79
169
80
398
81
227
82
273
83
319
84
177
85
272
86
204
87
139
88
187
89
263
90
90
91
134
92
67
93
115
94
45
95
65
96
40
97
108
98
60
99
102
total 100 blocks
29608 cartons
I want to add loadgroup column to df2, try to keep those proportions as close as possible. How to do it please? Thank you very much for the help.
I don't know how to find loadgroup column based on both cartons percent and blocks percent. But generate random loadgroup based on either cartons percent or blocks percent is easy.
Here is what I did. I generate 100,000 seeds first, then for each seed, I add column loadgroup1 based on cartons percent, loadgroup2 based on blocks percent, then calculate both percentages, then compare with df1 percentages, get absolute difference, record it. For these 100,000 seeds, I take the minimum difference one as my solution, which is sufficient for my job.
But this is not the optimal solution, and I am looking for quick and easy way to do this. Hope somebody can help.
Here is my code.
df = pd.DataFrame()
np.random.seed(10000)
seeds = np.random.randint(1, 1000000, size = 100000)
for i in range(46530, 46537):
print(seeds[i])
np.random.seed(seeds[i])
df2['loadGroup1'] = np.random.choice(df1.loadgroup, len(df2), p = df1.CartonsPercent)
df2['loadGroup2'] = np.random.choice(df1.loadgroup, len(df2), p = df1.blocksPercent)
df2.reset_index(inplace = True)
three = pd.DataFrame(df2.groupby('loadGroup1').agg(Cartons = ('cartons', 'sum'), blocks = ('block', 'count')))
three['CartonsPercent'] = three.Cartons/three.Cartons.sum()
three['blocksPercent'] = three.blocks/three.blocks.sum()
four = df1[['CartonsPercent','blocksPercent']] - three[['CartonsPercent','blocksPercent']]
four = four.abs()
subdf = pd.DataFrame({'i':[i],'Seed':[seeds[i]], 'Percent':['CartonsPercent'], 'AbsDiff':[four.sum().sum()]})
df = pd.concat([df,subdf])
three = pd.DataFrame(df2.groupby('loadGroup2').agg(Cartons = ('cartons', 'sum'), blocks = ('block', 'count')))
three['CartonsPercent'] = three.Cartons/three.Cartons.sum()
three['blocksPercent'] = three.blocks/three.blocks.sum()
four = df1[['CartonsPercent','blocksPercent']] - three[['CartonsPercent','blocksPercent']]
four = four.abs()
subdf = pd.DataFrame({'i':[i],'Seed':[seeds[i]], 'Percent':['blocksPercent'], 'AbsDiff':[four.sum().sum()]})
df = pd.concat([df,subdf])
df.sort_values(by = 'AbsDiff', ascending = True, inplace = True)
df = df.head(10)
Actually the first row of df will tell me the seed I am looking for, I kept 10 rows just for curiosity.
Here is my solution.
block
cartons
loadgroup
0
533
4
1
257
1
2
96
4
3
104
4
4
130
4
5
71
2
6
68
1
7
87
4
8
99
4
9
51
4
10
291
4
11
119
2
12
274
2
13
316
4
14
87
4
15
149
5
16
120
3
17
222
2
18
100
2
19
148
2
20
192
3
21
188
4
22
293
1
23
120
2
24
224
4
25
449
1
26
385
5
27
395
3
28
418
1
29
423
4
30
244
5
31
327
1
32
337
5
33
249
4
34
528
1
35
528
1
36
494
5
37
540
3
38
368
2
39
533
4
40
614
5
41
462
4
42
350
5
43
618
4
44
463
2
45
552
1
46
397
3
47
401
3
48
397
1
49
365
1
50
475
4
51
379
1
52
541
1
53
488
2
54
383
2
55
354
1
56
760
5
57
327
4
58
211
2
59
356
5
60
552
4
61
401
1
62
320
1
63
368
3
64
311
3
65
421
2
66
458
5
67
278
4
68
504
5
69
385
4
70
242
4
71
413
1
72
246
2
73
465
5
74
386
4
75
231
1
76
154
4
77
294
4
78
275
1
79
169
4
80
398
4
81
227
4
82
273
1
83
319
3
84
177
4
85
272
5
86
204
3
87
139
1
88
187
4
89
263
4
90
90
4
91
134
4
92
67
3
93
115
3
94
45
2
95
65
2
96
40
4
97
108
2
98
60
2
99
102
1
Here are the summaries.
loadgroup
cartons
blocks
cartonsPercent
blocksPercent
1
7610
22
26%
22%
2
3912
18
13%
18%
3
3429
12
12%
12%
4
9269
35
31%
35%
5
5388
13
18%
13%
It's very close to my target though.

create new column from divided columns over iteration

I am working with the following code:
url = 'https://raw.githubusercontent.com/dothemathonthatone/maps/master/fertility.csv'
df = pd.read_csv(url)
year regional_schlüssel Aus15 Deu15 Aus16 Deu16 Aus17 Deu17 Aus18 Deu18 ... aus36 aus37 aus38 aus39 aus40 aus41 aus42 aus43 aus44 aus45
0 2000 5111000 0 4 8 25 20 45 56 89 ... 935 862 746 732 792 660 687 663 623 722
1 2000 5113000 1 1 4 14 13 33 19 48 ... 614 602 498 461 521 470 393 411 397 400
2 2000 5114000 0 11 0 5 2 13 7 20 ... 317 278 265 235 259 228 204 173 213 192
3 2000 5116000 0 2 2 7 3 28 13 26 ... 264 217 206 207 197 177 171 146 181 169
4 2000 5117000 0 0 3 1 2 4 4 7 ... 135 129 118 116 128 148 89 110 124 83
I would like to create a new set of columns fertility_deu15, ..., fertility_deu45 and fertility_aus15, ..., fertility_aus45 such that aus15 / Aus15 = fertiltiy_aus15 and deu15/ Deu15 = fertility_deu15 for each ausi and Ausj where j == i \n [15-45] and deui:Deuj where j == i \n [15-45]
I'm not sure what is up with that data but we need to fix it to make it numeric. I'll end up doing that while filtering
numerator = df.filter(regex='^[a-z]+\d+$') # Lower case ones
numerator = numerator.apply(pd.to_numeric, errors='coerce') # Fix numbers
denominator = df.filter(regex='^[A-Z][a-z]+\d+$').rename(columns=str.lower)
denominator = denominator.apply(pd.to_numeric, errors='coerce')
numerator.div(denominator).add_prefix('fertility_')

Evaluation stuck at local optima in genetic programing DEAP. How to prevent GP from converging on local optima?

I'm trying to do a symbolic regression of a geometric model. And it gets stuck at most of the time with a fitness score that is not near 0. So I did a couple of research and find out it is the problem with local minima. And some people tried to prioritize diversity of population over fitness. But that's not what I want.
So I what I did is to reconfigure the algorithms.eaSimple and added a block in that. So it resets the population when the last n=50 generation have the same fitness.
I don't have any idea other than that as I'm very new to it.
Is there any better way to do this?
I'm using min fitness. creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
def my_eaSimple(population, toolbox, cxpb, mutpb, ngen, stats=None, halloffame: tools.HallOfFame = None,
verbose=True):
logbook = tools.Logbook()
logbook.header = ['gen', 'nevals'] + (stats.fields if stats else [])
# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in population if not ind.fitness.valid]
fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
if halloffame is not None:
halloffame.update(population)
record = stats.compile(population) if stats else {}
logbook.record(gen=0, nevals=len(invalid_ind), **record)
if verbose:
print(logbook.stream)
# Begin the generational process
gen = 1
last_few_pop_to_consider = 50
starting_condition = last_few_pop_to_consider
is_last_few_fitness_same = lambda stats_array: abs(numpy.mean(stats_array) - stats_array[0]) < 0.1
while gen < ngen + 1:
# Select the next generation individuals
offspring = toolbox.select(population, len(population))
# Vary the pool of individuals
offspring = algorithms.varAnd(offspring, toolbox, cxpb, mutpb)
# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
# Update the hall of fame with the generated individuals
if halloffame is not None:
halloffame.update(offspring)
# Replace the current population by the offspring
population[:] = offspring
# Append the current generation statistics to the logbook
record = stats.compile(population) if stats else {}
logbook.record(gen=gen, nevals=len(invalid_ind), **record)
if verbose:
print(logbook.stream)
gen += 1
# stopping criteria
min_fitness = record['fitness']['min\t']
# max_fitness = record['fitness']['max\t']
if min_fitness < 0.1:
print('Reached desired fitness')
break
if gen > starting_condition:
min_stats = logbook.chapters['fitness'].select('min\t')[-last_few_pop_to_consider:]
if is_last_few_fitness_same(min_stats):
print('Defining new population')
population = toolbox.population(n=500)
starting_condition = gen + last_few_pop_to_consider
return population, logbook
Output
gen nevals avg max min std avg max min std
0 500 2.86566e+23 1.41421e+26 112.825 6.31856e+24 10.898 38 3 9.50282
1 451 2.82914e+18 1.41421e+21 90.113 6.31822e+19 6.226 38 1 5.63231
2 458 2.84849e+18 1.41421e+21 89.1206 6.3183e+19 5.602 36 1 5.18417
3 459 4.24902e+14 2.01509e+17 75.1408 9.01321e+15 5.456 35 1 4.05167
4 463 4.23166e+14 2.03904e+17 74.3624 9.11548e+15 6.604 36 1 3.61762
5 462 2.8693e+11 1.25158e+14 65.9366 5.60408e+12 7.464 34 1 3.00478
6 467 2.82843e+18 1.41421e+21 65.9366 6.31823e+19 8.144 37 1 3.51216
7 463 5.40289e+13 2.65992e+16 65.9366 1.1884e+15 8.322 22 1 2.88276
8 450 6.59849e+14 3.29754e+17 59.1286 1.47323e+16 8.744 34 1 3.03685
9 458 1.8128e+11 8.17261e+13 54.4395 3.65075e+12 9.148 23 1 2.69557
10 459 6.59851e+14 3.29754e+17 54.4395 1.47323e+16 9.724 35 1 3.02255
11 458 2.34825e+10 1.41421e+11 54.4395 5.26173e+10 9.842 18 1 2.32057
12 459 3.52996e+11 1.60442e+14 54.4395 7.1693e+12 10.56 33 1 2.63788
13 457 3.81044e+11 1.60442e+14 54.4395 7.18851e+12 11.306 35 1 2.84611
14 457 2.30681e+13 1.15217e+16 54.4395 5.14751e+14 11.724 24 1 2.6495
15 463 2.65947e+10 1.41421e+11 54.4395 5.52515e+10 12.072 29 1 2.63036
16 469 4.54286e+10 9.13693e+12 54.4395 4.10784e+11 12.104 34 1 3.00752
17 461 6.58255e+11 1.74848e+14 54.4395 9.76474e+12 12.738 36 4 3.10956
18 450 2.03669e+10 1.41421e+11 54.4395 4.96374e+10 13.062 30 4 3.01963
19 465 1.75385e+10 2.82843e+11 54.4395 4.74595e+10 13.356 24 1 2.82157
20 458 1.83887e+10 1.41421e+11 54.4395 4.7559e+10 13.282 23 1 3.03949
21 455 3.67899e+10 8.36173e+12 54.4395 4.04044e+11 13.284 34 4 3.03106
22 461 1.36372e+10 1.41422e+11 54.4395 4.16569e+10 13.06 35 3 3.01005
23 471 2.00634e+26 1.00317e+29 54.3658 4.48181e+27 12.798 36 1 3.17698
24 466 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.706 36 3 3.07043
25 464 3.00384e+10 8.36174e+12 54.3658 3.75254e+11 12.612 34 5 2.89231
26 474 2.00925e+10 1.41421e+11 54.3658 4.93588e+10 12.594 34 3 2.60253
27 452 2.9528e+11 1.41626e+14 54.3658 6.32694e+12 12.43 25 1 2.49822
28 453 1.23899e+10 1.41421e+11 54.3658 3.98511e+10 12.41 20 5 2.45721
29 456 5.98529e+14 2.99256e+17 54.3658 1.33697e+16 12.57 37 1 2.6346
30 474 1.35672e+13 6.69898e+15 54.3658 2.99297e+14 12.526 35 1 2.94029
31 446 6.92755e+22 3.46377e+25 54.3658 1.5475e+24 12.55 36 1 2.62517
32 462 4.02525e+10 8.16482e+12 54.3658 3.92769e+11 12.764 34 5 2.77061
33 449 1.53268e+13 7.65519e+15 54.3658 3.42007e+14 12.628 35 1 2.76218
34 466 3.13214e+16 1.54388e+19 54.3658 6.89799e+17 12.626 35 1 2.97626
35 464 2.82845e+18 1.41421e+21 54.3658 6.31823e+19 12.806 36 5 2.74597
36 460 2.93493e+11 1.32308e+14 54.3658 5.91505e+12 12.734 35 5 2.88084
37 456 2.93491e+10 8.29826e+12 54.3658 3.72372e+11 12.614 37 1 2.80517
38 449 3.44519e+10 8.16482e+12 54.3658 3.67344e+11 12.742 34 3 2.91881
39 466 1.53217e+13 7.65519e+15 54.3658 3.42008e+14 12.502 35 3 2.70296
40 454 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.51 36 1 2.81103
41 453 9.66059e+24 4.68888e+27 54.3658 2.09566e+26 12.554 33 1 2.47691
42 448 2.2287e+10 3.38289e+12 54.3658 1.58629e+11 12.576 26 1 2.50763
43 460 5.47399e+12 2.73042e+15 54.3658 1.21985e+14 12.584 34 1 2.80053
44 460 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.692 27 1 2.86516
45 464 2.829e+18 1.41421e+21 54.3658 6.31823e+19 12.57 34 1 3.15549
46 460 2.92607e+11 1.31556e+14 54.3658 5.88776e+12 12.61 37 3 2.78817
47 465 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.622 36 1 3.04616
48 461 1.64306e+10 2.97245e+12 54.3658 1.37408e+11 12.468 26 1 2.57856
49 463 1.54834e+10 1.41421e+11 54.3658 4.4029e+10 12.464 20 1 2.4529
50 451 1.59239e+10 1.41421e+11 54.3658 4.44609e+10 12.63 33 1 2.76281
51 455 5.40036e+19 2.70018e+22 54.3658 1.20635e+21 12.78 37 1 2.84668
52 478 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.712 36 3 2.84694
53 461 2.78669e+21 1.39193e+24 54.3658 6.21866e+22 12.714 36 1 3.23546
54 471 7.41272e+12 3.70045e+15 54.3658 1.65323e+14 12.336 34 3 2.848
55 465 2.83036e+18 1.41421e+21 54.3658 6.31822e+19 12.74 36 1 3.62662
56 459 2.82843e+18 1.41421e+21 54.3658 6.31823e+19 12.606 29 1 2.60437
57 453 5.98308e+24 2.99154e+27 54.3658 1.33652e+26 12.722 34 1 2.62311
58 460 3.62463e+21 1.8109e+24 54.3658 8.09047e+22 12.65 37 1 2.92361
Defining new population
59 500 5.83025e+48 2.91513e+51 109.953 1.30238e+50 10.846 38 1 8.89889
60 464 2.93632e+15 8.87105e+17 165.988 4.38882e+16 5.778 36 1 4.79173
61 444 5.54852e+19 2.70018e+22 93.5182 1.20674e+21 4.992 37 1 4.648
62 463 4.28647e+14 2.14148e+17 82.0774 9.56741e+15 5.468 34 1 4.34891
63 464 2.82843e+18 1.41421e+21 78.8184 6.31823e+19 6.624 35 1 4.25989
64 453 3.40035e+11 1.60954e+14 68.7629 7.19022e+12 7.356 36 1 3.77694
65 456 5.65762e+18 2.82851e+21 68.7629 1.26368e+20 7.606 35 1 4.15966
66 461 2.82843e+18 1.41421e+21 68.7629 6.31823e+19 7.906 35 1 3.81171
67 447 1.63302e+10 1.41421e+11 68.7629 4.51102e+10 7.802 33 1 3.47258
68 463 6.59552e+14 3.29754e+17 68.7629 1.47323e+16 8.37 34 3 3.80698
69 460 1.53579e+13 7.65512e+15 68.7629 3.42003e+14 8.646 35 1 3.64042
70 461 2.80014e+10 1.41421e+11 68.7629 5.63553e+10 9.212 38 1 3.69582
71 453 1.97446e+11 7.80484e+13 68.7629 3.50764e+12 9.84 34 1 3.74785
72 459 9.98853e+11 1.75397e+14 68.7629 1.25317e+13 10.284 35 3 3.61764
73 453 5.6863e+16 2.84218e+19 68.7629 1.26979e+18 10.796 36 1 3.86864
74 466 2.57445e+10 1.41434e+11 68.7629 5.4564e+10 10.806 35 1 3.2949
75 453 2.82849e+18 1.41421e+21 68.7629 6.31823e+19 10.876 34 1 3.27301
76 433 1.67235e+20 8.36174e+22 68.7629 3.73574e+21 10.868 35 1 2.94051
77 457 3.6663e+21 1.83315e+24 68.7629 8.1899e+22 10.964 37 3 3.21476
78 461 1.80829e+14 9.04015e+16 68.7629 4.03883e+15 10.992 35 3 3.26985
79 450 3.21984e+11 1.41626e+14 68.7629 6.32593e+12 11.17 28 1 2.77941
80 460 2.82843e+18 1.41421e+21 68.7629 6.31823e+19 11.044 35 1 3.25362
81 455 6.46751e+14 2.99308e+17 68.7629 1.34123e+16 11.06 34 1 3.51061
82 463 3.21908e+21 1.60954e+24 68.7629 7.19088e+22 11.112 34 1 3.58433
83 473 2.82843e+18 1.41421e+21 68.7629 6.31823e+19 10.946 38 3 3.70663
84 460 3.14081e+11 1.41626e+14 68.7629 6.32625e+12 10.896 35 1 3.4976
85 456 1.53419e+13 7.65526e+15 68.7629 3.4201e+14 11.156 36 1 3.23661
The population get reset after getting 54.4395 minimum fitness for 50 times in 59th gen.

How to delete rows containing Nan in Python 3.6.3

I want to remove rows with "nan" or "-nan":
Reading:
excel_file = 'originale_ridotto.xlsx'
df = pd.read_excel(excel_file, na_values="NaN")
print(df)
print("I am here")
df.dropna(axis=0, how="any")
print(df)
Output of dataframe colunmns (Python 3.6.3):
Data e ora Potenza Teorica Totale CC [kW]
0 01/01/2017 00:05 0
1 01/01/2017 00:10 0
2 01/01/2017 00:15 0
3 01/01/2017 00:20 0
4 01/01/2017 00:25 0
5 01/01/2017 00:30 0
6 01/01/2017 00:35 0
7 01/01/2017 00:40 0
Potenza Attiva Totale AC [kW] Energia totale cumulata al contatore [kWh] \
0 0 7760812.5
1 0 7760812.5
2 0 7760812.5
3 0 7760812.5
4 0 7760812.5
5 0 7760812.5
6 0 7760812.5
7 0 7760812.5
Temperatura modulo [°C] Irraggiamento [W/m2]
0 0 5.0
1 0 6.0
2 0 NaN
3 0 2.0
4 0 3.0
5 0 NaN
6 0 7.0
7 0 9.0
Potenza Attiva Inv.1Blocco1 [kW]
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
Data e ora Potenza Teorica Totale CC [kW]
0 01/01/2017 00:05 0
1 01/01/2017 00:10 0
2 01/01/2017 00:15 0
3 01/01/2017 00:20 0
4 01/01/2017 00:25 0
5 01/01/2017 00:30 0
6 01/01/2017 00:35 0
7 01/01/2017 00:40 0
Potenza Attiva Totale AC [kW] Energia totale cumulata al contatore [kWh]
0 0 7760812.5
1 0 7760812.5
2 0 7760812.5
3 0 7760812.5
4 0 7760812.5
5 0 7760812.5
6 0 7760812.5
7 0 7760812.5
Temperatura modulo [°C] Irraggiamento [W/m2] \
0 0 5.0
1 0 6.0
2 0 NaN
3 0 2.0
4 0 3.0
5 0 NaN
6 0 7.0
7 0 9.0
Potenza Attiva Inv.1Blocco1 [kW]
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
df.dropna(axis=0, how="any") does not remove these rows. Why?
Could you help me?
You are creating a cleaned dataframe, but you are not "remembering" it. df.dropna(how='any') returns the cleaned df - you need to assign it and then use it:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,1000,size=(10, 10)), columns=list('ABCDEFGHIJ'))
# ignoring the warnings
df['A'][2] = np.NaN
df['C'][3] = np.NaN
df['I'][5] = np.NaN
df['E'][7] = np.NaN
print(df)
df = df.dropna(how='any') # this returns a NEW dataframe, it does not modify in place
print(df)
Output:
A B C D E F G H I J
0 314.0 664 855.0 101 764.0 251 503 783 153.0 474
1 903.0 77 546.0 205 113.0 519 115 45 988.0 964
2 NaN 155 481.0 243 165.0 696 255 123 802.0 228
3 406.0 603 NaN 84 390.0 545 651 549 440.0 982
4 796.0 626 139.0 810 474.0 257 407 264 680.0 164
5 443.0 132 545.0 380 420.0 885 704 596 NaN 778
6 285.0 317 238.0 437 508.0 189 501 738 605.0 290
7 144.0 426 220.0 573 NaN 758 581 420 544.0 173
8 864.0 369 541.0 405 863.0 45 522 178 705.0 419
9 936.0 664 547.0 793 68.0 77 364 633 547.0 790
A B C D E F G H I J
0 314.0 664 855.0 101 764.0 251 503 783 153.0 474
1 903.0 77 546.0 205 113.0 519 115 45 988.0 964
4 796.0 626 139.0 810 474.0 257 407 264 680.0 164
6 285.0 317 238.0 437 508.0 189 501 738 605.0 290
8 864.0 369 541.0 405 863.0 45 522 178 705.0 419
9 936.0 664 547.0 793 68.0 77 364 633 547.0 790

Convert all colors in pdf into one specific color

I'm working on a php project where I need to perform some pdf manipulation.
I need to "convert" all colors of a vector file(pdf) into one very specific color (a spot color in my case.)
Here is an illustrated example
The input file can vary, and it can contain any color (so I can't just convert all "red" or "green" to my target color).
I have a fair idea on how to do it on a raster image using imagemagick's composite, but I'm unsure if it's even possible with a vector image.
My first approach was to create a template pdf, with a filled rectangle in the desired color. My hope was then to use ghostscript to somehow apply the input file as a mask on said template. But I assume this wouldn't be possible as vector files are different from raster images.
My second approach was to use ghostscript to convert all colors (regardless of colorspace) into the desired color. But after extensive googling, I've only found solutions that convert from one colorspace to another (i.e. sRGB to CMYK, CMYK to gray-scale, etc.)
I'm not much of a designer, so perhaps I am simply lacking the proper "terms" for these "actions".
TL;DR
I am looking for a library/tool that can help me "convert" all colors of a vector file(pdf) into one very specific color.
The input file may vary (various shapes and colors), but will always be a pdf file without any fonts.
Output must remain as a vector file (read, no rasterisation.)
I have root access on a VPS running linux (centos7, I assume that is irrelevant.)
You could try rasterising at a high resolution and converting the colours with ImageMagick, then re-vectorising with potrace
So, if you had a PDF, you would do:
convert -density 288 document.pdf ...
As you have provided a PNG, I will do:
convert image.png -fill black -fuzz 10% +opaque white pgm:- | potrace -b svg -o result.svg -
which gives this SVG:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg version="1.0" xmlns="http://www.w3.org/2000/svg"
width="800.000000pt" height="450.000000pt" viewBox="0 0 800.000000 450.000000"
preserveAspectRatio="xMidYMid meet">
<metadata>
Created by potrace 1.13, written by Peter Selinger 2001-2015
</metadata>
<g transform="translate(0.000000,450.000000) scale(0.100000,-0.100000)"
fill="#000000" stroke="none">
<path d="M4800 4324 c0 -50 -2 -55 -17 -49 -84 35 -140 -17 -130 -119 7 -77
70 -120 122 -82 16 11 21 11 33 0 7 -8 18 -12 23 -9 5 4 9 76 9 161 0 147 -1
154 -20 154 -18 0 -20 -7 -20 -56z m-22 -90 c46 -32 18 -134 -38 -134 -25 0
-40 29 -40 79 0 39 19 71 43 71 7 0 23 -7 35 -16z"/>
<path d="M4926 4358 c-9 -12 -16 -35 -16 -50 0 -18 -5 -28 -15 -28 -8 0 -15
-7 -15 -15 0 -8 7 -15 15 -15 12 0 15 -17 15 -89 0 -89 6 -105 38 -94 8 3 12
31 12 94 0 88 0 89 25 89 16 0 25 6 25 15 0 9 -9 15 -25 15 -21 0 -25 5 -25
30 0 30 7 34 43 30 13 -1 18 4 15 17 -5 29 -72 30 -92 1z"/>
<path d="M3347 4364 c-4 -4 -7 -16 -7 -26 0 -14 6 -19 23 -16 14 2 22 10 22
23 0 20 -25 32 -38 19z"/>
<path d="M4170 4310 c0 -23 -4 -30 -20 -30 -11 0 -20 -7 -20 -15 0 -8 9 -15
20 -15 18 0 20 -7 20 -80 0 -74 2 -81 25 -96 32 -21 75 -12 75 17 0 16 -4 19
-21 14 -30 -10 -39 9 -39 83 l0 62 30 0 c20 0 30 5 30 15 0 10 -10 15 -30 15
-27 0 -30 3 -30 30 0 23 -4 30 -20 30 -16 0 -20 -7 -20 -30z"/>
<path d="M3345 4278 c-3 -8 -4 -59 -3 -114 2 -80 6 -99 18 -99 12 0 15 19 15
109 0 79 -4 111 -12 113 -7 3 -15 -2 -18 -9z"/>
<path d="M3453 4283 c-9 -3 -13 -34 -13 -108 0 -74 4 -105 13 -108 29 -10 37
6 37 78 0 57 4 75 18 88 46 42 72 10 72 -91 0 -54 4 -71 15 -76 22 -8 26 10
23 104 -3 77 -5 84 -31 104 -24 17 -32 19 -59 8 -18 -6 -38 -8 -47 -3 -9 5
-22 6 -28 4z"/>
<path d="M3687 4283 c-4 -3 -7 -71 -7 -150 l0 -143 25 0 c23 0 25 4 25 45 0
42 2 45 19 35 33 -17 61 -11 92 19 24 25 29 37 29 81 0 95 -51 141 -119 107
-25 -13 -31 -13 -35 -1 -6 15 -19 18 -29 7z m122 -47 c19 -22 23 -78 9 -106
-29 -55 -88 -26 -88 43 0 62 48 100 79 63z"/>
<path d="M3927 4284 c-4 -4 -7 -45 -7 -91 0 -76 2 -86 25 -108 27 -28 61 -32
92 -10 18 13 22 13 27 0 3 -8 12 -12 21 -9 13 5 15 24 13 113 -3 98 -4 106
-23 106 -18 0 -20 -8 -23 -75 -4 -94 -28 -128 -72 -100 -10 6 -16 34 -20 91
-5 75 -15 101 -33 83z"/>
<path d="M4432 4282 c-9 -7 -12 -43 -10 -148 3 -136 4 -139 26 -142 20 -3 22
1 22 41 l0 45 35 -11 c31 -9 39 -8 63 10 37 27 54 83 42 136 -15 68 -64 94
-120 63 -20 -12 -26 -12 -35 0 -6 8 -15 10 -23 6z m122 -54 c22 -31 20 -81 -3
-109 -19 -23 -21 -23 -48 -9 -24 13 -28 23 -31 62 -3 39 1 49 20 62 30 22 44
20 62 -6z"/>
<path d="M4310 4096 c0 -30 30 -43 47 -21 16 23 5 45 -23 45 -19 0 -24 -5 -24
-24z"/>
<path d="M4046 3795 l-67 -141 -227 -12 c-418 -22 -765 -74 -1127 -167 -612
-157 -1080 -387 -1387 -684 -214 -205 -323 -393 -359 -615 -16 -101 -6 -270
20 -361 136 -461 637 -856 1409 -1111 152 -51 434 -125 583 -154 l66 -13 -30
-169 c-16 -93 -27 -171 -24 -174 2 -3 124 58 271 135 l266 140 80 -9 c44 -5
197 -14 339 -21 259 -12 617 -3 844 21 l88 9 265 -140 c146 -77 268 -138 270
-136 5 4 -41 294 -52 328 -4 13 8 19 58 28 465 89 939 260 1278 461 626 370
880 871 686 1356 -69 174 -228 375 -415 526 -517 418 -1411 697 -2402 750
l-226 12 -71 141 -70 140 -66 -140z m-202 -407 c-31 -62 -119 -241 -196 -398
-76 -156 -140 -285 -142 -287 -3 -3 -799 -120 -1156 -170 -102 -14 -188 -29
-193 -32 -4 -4 102 -113 235 -242 133 -129 353 -344 489 -479 l248 -245 -45
-260 c-25 -143 -58 -332 -73 -420 l-27 -160 -41 2 c-61 2 -333 68 -515 124
-674 209 -1153 533 -1334 905 -59 121 -77 209 -71 349 5 137 35 235 109 359
58 97 206 261 311 344 463 366 1242 627 2097 701 69 6 141 13 160 15 19 1 72
4 118 4 l82 2 -56 -112z m906 86 c760 -79 1420 -283 1875 -581 864 -566 763
-1326 -245 -1840 -266 -136 -602 -253 -942 -328 -92 -21 -173 -35 -181 -32 -9
3 -20 44 -31 114 -10 59 -42 248 -72 419 l-54 311 213 210 c116 115 337 331
489 479 153 148 274 271 270 275 -4 3 -106 20 -227 37 -452 64 -1118 162
-1120 164 -6 6 -195 387 -291 587 l-104 214 137 -7 c76 -4 203 -14 283 -22z
m-424 -2761 c137 -73 200 -111 193 -118 -14 -14 -794 -14 -809 1 -7 7 49 41
192 117 112 58 207 107 212 107 5 0 100 -48 212 -107z"/>
<path d="M1815 3669 c-46 -47 -113 -80 -221 -111 -62 -17 -106 -22 -204 -22
-137 0 -185 12 -221 58 -48 61 -211 80 -449 53 -118 -14 -400 -63 -408 -72 -3
-3 28 -145 32 -145 1 0 55 11 120 25 181 37 365 58 481 53 98 -3 105 -5 125
-30 113 -144 579 -119 806 44 50 35 109 108 97 118 -5 4 -33 21 -63 38 l-55
31 -40 -40z"/>
<path d="M7647 575 c-66 -79 -247 -137 -432 -138 -134 0 -170 10 -221 61 -18
17 -53 37 -84 46 -70 21 -238 21 -395 0 -122 -15 -364 -60 -372 -68 -5 -5 17
-119 26 -133 4 -7 47 -2 121 13 181 37 358 56 477 52 l108 -3 37 -37 c120
-117 482 -110 720 13 75 40 168 123 168 151 0 10 -110 80 -122 77 -2 0 -16
-16 -31 -34z"/>
</g>
</svg>
which looks like this as a PNG (because StackOverflow doesn't allow SVG images AFAIK):
You can make all the PATHs your preferred shade of green by editing the SVG, like this:
sed 's/path /path fill="#7CBE89" /' black.svg > green.svg
You could do this with Ghostscript, but you would need some PostScript programming experience.
Essentially you want to override all the setcolor/setcolorspace operations by looking at each setcolor operation, checking the colour space and values to see if its your target colour and, if it is, set the colorspace and values to your desired target.
The various PDF operations to set colour space and values are all defined in ghostpdl/Resource/Init/pdf_draw.ps. You'll need to modify the definitions of:
/G and /g (stroke and fill colours in DeviceGray)
/RG and /rg (stroke and fill colours in DeviceRGB)
/K and /k (stroke and fill colours in DeviceCMYK)
/SC and /sc (stroke and fill colours in Indexed, CalGray, CalRGB or Lab)
/SCN and /scn (stroke and fill colours in Pattern, Separation, DeviceN or ICCBased)
There are quite a few wrinkles in there;
You can probably ignore Pattern spaces and just deal with any colours that are set by the pattern itself.
For SC/sc and /SCN/scn you need to figure out whether the colour specified is the target colour, assuming your target can be specified in these spaces. Note that /Indexed is particularly interesting as it can have a base space of any of the other spaces, so you need to look and see.
Finally note that images (bitmaps) are specified differently, and altering those would be much harder.
Depending on the exact nature of the requirement (ie what space/colours constitute valid targets) this could be quite a lengthy task, and it will require someone with PostScript programming ability to write it.
Oh, and on a final note, have you considered transparency ? That can specify the blending colour space too, which might mean that after you had substituted the colour, it would be blended in a different colour space, resulting in your careful substitution disappearing.
Lest you think this unlikely I should mention that a number of PDF producers create files with transparency groups in them, even when no actual transparency operations take place.