Understanding true_classes in log_uniform_candidate_sampler - tensorflow

https://www.tensorflow.org/tutorials/text/word2vec uses tf.random.log_uniform_candidate_sampler for negative sampling.
The tutorial sets true_classes to context_class.
My experiment shows no matter what I set for true_classes, the function always yields good results.
> tf.random.log_uniform_candidate_sampler( true_classes=[[1]],
num_true=1, num_sampled=num_ns,
unique=True, range_max=vocab_size)
[0, 1, 7, 5]
> tf.random.log_uniform_candidate_sampler( true_classes=[[2]],
num_true=1, num_sampled=num_ns,
unique=True, range_max=vocab_size)
[0, 6, 2, 5]
What does true_classes mean in this function?

The line in the tutorial:
You can call the function on one skip-grams's target word and pass the
context word as a true class to exclude it from being sampled
That's misleading.
What does true_classes mean in this function?
Function returns true_expected_count which is defined in this line of the source code..
true_classes seems only used to calculate true_expected_count. So this function does not exclude negative classes. Every label has a probability to get sampled.
I copy an example code that can be experimented on (in case something happens to the link), taken from this GitHub issue:
# Do sampling 1000 times using true_classes [0, 8]
sample_func = lambda ii: tf.random.log_uniform_candidate_sampler(true_classes=[[ii]], num_true=1, num_sampled=4, unique=True, range_max=8, seed=42)
dd = {ii : np.stack([sample_func(ii)[0].numpy() for jj in range(1000)]) for ii in range(8)}
# Calculate the distribution in each true_class
for ii in dd:
print("true_class:", ii, ", negative value_counts:", pd.value_counts(dd[ii].flatten()).to_dict())
# true_class: 0 , negative value_counts: {0: 871, 1: 722, 2: 584, 3: 466, 4: 402, 5: 329, 7: 319, 6: 307}
# true_class: 1 , negative value_counts: {0: 867, 1: 695, 2: 571, 3: 485, 4: 411, 5: 380, 6: 316, 7: 275}
# true_class: 2 , negative value_counts: {0: 869, 1: 716, 2: 541, 3: 488, 4: 389, 5: 357, 6: 321, 7: 319}
# true_class: 3 , negative value_counts: {0: 877, 1: 715, 2: 582, 3: 482, 4: 394, 5: 355, 6: 318, 7: 277}
# true_class: 4 , negative value_counts: {0: 883, 1: 716, 2: 566, 3: 489, 4: 394, 5: 367, 6: 316, 7: 269}
# true_class: 5 , negative value_counts: {0: 862, 1: 717, 2: 583, 3: 496, 4: 376, 5: 357, 6: 315, 7: 294}
# true_class: 6 , negative value_counts: {0: 859, 1: 725, 2: 575, 3: 482, 4: 413, 5: 356, 6: 302, 7: 288}
# true_class: 7 , negative value_counts: {0: 880, 1: 724, 2: 555, 3: 488, 4: 425, 5: 324, 7: 302, 6: 302}
# Result of `true_expected_count`
print({ii : np.mean([sample_func(ii)[1].numpy() for jj in range(1000)]) for ii in range(8)})
# {0: 0.99967235, 1: 0.7245632, 2: 0.5737029, 3: 0.47004792, 4: 0.3987442, 5: 0.34728608, 6: 0.3084587, 7: 0.27554017}

Related

How to keep the number and names of columns in training and test dataset equal after one hot encoding?

Shape of the original dataset is 82580×30 with multiple string columns. Example dataset:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
df = pd.DataFrame({'Nationality': {0: 'DEU', 1: 'PRT', 2: 'PRT', 3: 'PRT', 4: 'FRA', 5: 'DEU', 6: 'CHE', 7: 'DEU', 8: 'GBR', 9: 'AUT', 10: 'PRT', 11: 'FRA', 12: 'OTR', 13: 'GBR', 14: 'ESP', 15: 'PRT', 16: 'OTR', 17: 'PRT', 18: 'ESP', 19: 'AUT'},
'Age': {0: 27.0, 1: 45.46, 2: 45.46, 3: 58.0, 4: 57.0, 5: 27.0, 6: 49.0, 7: 62.0, 8: 44.0, 9: 61.0, 10: 54.0, 11: 53.0, 12: 50.0, 13: 30.0, 14: 51.0, 15: 45.46, 16: 40.0, 17: 49.0, 18: 49.0, 19: 14.0},
'DaysSinceCreation': {0: 370, 1: 213, 2: 206, 3: 1018, 4: 835, 5: 52, 6: 597, 7: 217, 8: 999, 9: 1004, 10: 402, 11: 879, 12: 393, 13: 923, 14: 249, 15: 52, 16: 159, 17: 929, 18: 49, 19: 131},
'BookingsCheckedIn': {0: 1, 1: 0, 2: 0, 3: 1, 4: 1, 5: 1, 6: 1, 7: 2, 8: 1, 9: 1, 10: 1, 11: 1, 12: 1, 13: 1, 14: 1, 15: 0, 16: 0, 17: 1, 18: 1, 19: 0}})
# Encoding Variables
transformer = make_column_transformer((OneHotEncoder(sparse=False), ['Nationality']), remainder='passthrough')
transformed = transformer.fit_transform(df)
transformed_df = pd.DataFrame(transformed, columns=transformer.get_feature_names_out())
# Concat the two tables
transformed_df.reset_index(drop=True, inplace=True)
df.reset_index(drop=True, inplace=True)
df = pd.concat([transformed_df, df], axis=1)
# Remove old columns
df.drop(['Nationality'], axis = 1, inplace = True)
print('The shape after encoding: {}'.format(df.shape))
print(df.columns.unique())
The shape after encoding: (20, 14)
Index(['onehotencoder__Nationality_AUT', 'onehotencoder__Nationality_CHE',
'onehotencoder__Nationality_DEU', 'onehotencoder__Nationality_ESP',
'onehotencoder__Nationality_FRA', 'onehotencoder__Nationality_GBR',
'onehotencoder__Nationality_OTR', 'onehotencoder__Nationality_PRT',
'remainder__Age', 'remainder__DaysSinceCreation',
'remainder__BookingsCheckedIn', 'Age', 'DaysSinceCreation',
'BookingsCheckedIn'],
dtype='object')
After modeling, trying to test on a completely different test set:
df = pd.DataFrame({'Nationality': {0: 'CAN', 1: 'DEU', 2: 'PRT', 3: 'PRT', 4: 'FRA'},
'Age': {0: 27.0, 1: 29.0, 2: 24.0, 3: 24.0, 4: 46.0},
'DaysSinceCreation': {0: 222, 1: 988, 2: 212, 3: 685, 4: 1052},
'BookingsCheckedIn': {0: 0, 1: 1, 2: 1, 3: 1, 4: 0}})
# Encoding Variables
transformer = make_column_transformer(
(OneHotEncoder(sparse=False), ['Nationality']),
remainder='passthrough')
transformed = transformer.fit_transform(df)
transformed_df = pd.DataFrame(transformed, columns=transformer.get_feature_names_out())
# Concat the two tables
transformed_df.reset_index(drop=True, inplace=True)
df.reset_index(drop=True, inplace=True)
df = pd.concat([transformed_df, df], axis=1)
# Remove old columns
df.drop(['Nationality'], axis = 1, inplace = True)
print('The shape after encoding: {}'.format(df.shape))
print(df.columns.unique())
The shape after encoding: (5, 10)
Index(['onehotencoder__Nationality_CAN', 'onehotencoder__Nationality_DEU',
'onehotencoder__Nationality_FRA', 'onehotencoder__Nationality_PRT',
'remainder__Age', 'remainder__DaysSinceCreation',
'remainder__BookingsCheckedIn', 'Age', 'DaysSinceCreation',
'BookingsCheckedIn'],
dtype='object')
As can be seen, testing dataset has some features that were not present in the original training set and many features of training set are not present in test set. If I only use .values of X_train, y_train, X_test, y_test, I can run from logistic regression to Neural Net with >99% accuracy, but that feels like cheating and is not working out with Decision Trees. How do we deal with this?
I would like to contribute 2 inputs:
(1) the test set should be a subset of the training set, so the unknown Nationality 'CAN' is not allowed. Either: try to include the new 'CAN' in the training data, or try to replace it with 'GBR' instead in the test data.
(2) you should not do fit_transform() separately on training and test set. The right way is to fit on training set, then... transform on training set and transform on test set. To illustrate:
# Encoding Variables
transformer = make_column_transformer((OneHotEncoder(sparse=False), ['Nationality']), remainder='passthrough')
####transformed = transformer.fit_transform(df) #delete this
transformer.fit(df) #use this instead
transformed = transformer.transform(df) #use this instead
transformed_df = pd.DataFrame(transformed, columns=transformer.get_feature_names_out())
# Concat the two tables
<truncated>
print('The shape after encoding: {}'.format(df.shape))
The shape after encoding: (20, 14)
Second part, note that I have replaced 'CAN' with 'GBR'. And only use the previously fitted transformer to transform the test set:
df = pd.DataFrame({'Nationality': {0: 'GBR', 1: 'DEU', 2: 'PRT', 3: 'PRT', 4: 'FRA'},
'Age': {0: 27.0, 1: 29.0, 2: 24.0, 3: 24.0, 4: 46.0},
'DaysSinceCreation': {0: 222, 1: 988, 2: 212, 3: 685, 4: 1052},
'BookingsCheckedIn': {0: 0, 1: 1, 2: 1, 3: 1, 4: 0}})
# Encoding Variables
####transformer = make_column_transformer((OneHotEncoder(sparse=False), ['Nationality']), remainder='passthrough') #do not repeat, use the previous fitted model
####transformed = transformer.fit_transform(df) #delete this, NO fitting on test set
transformed = transformer.transform(df) #only do transform on test set
transformed_df = pd.DataFrame(transformed, columns=transformer.get_feature_names_out())
# Concat the two tables
<truncated>
print('The shape after encoding: {}'.format(df.shape))
The shape after encoding: (5, 14)
So the number of columns (14) are the same for both training set and test set

text showing up in hoverinfo not just displayed

So I'm trying to add data labels so you can see the values of each of my stacks when looking at a graph. I added the text option and put the column I want displayed, but it just returns in the hover information and not just displayed on the graph. How do I change this?
df2 = pd.DataFrame.from_dict({'Country': {0: 'Europe',
1: 'America',
2: 'Asia',
3: 'Europe',
4: 'America',
5: 'Asia',
6: 'Europe',
7: 'America',
8: 'Asia',
9: 'Europe',
10: 'America',
11: 'Asia'},
'Year': {0: 2014,
1: 2014,
2: 2014,
3: 2015,
4: 2015,
5: 2015,
6: 2016,
7: 2016,
8: 2016,
9: 2017,
10: 2017,
11: 2017},
'Amount': {0: 1600,
1: 410,
2: 150,
3: 1300,
4: 300,
5: 170,
6: 1000,
7: 500,
8: 200,
9: 900,
10: 500,
11: 210}})
fig = go.Figure()
x=[]
for i in df2['Year'].unique():
x.append(str(i))
for c in df2['Country'].unique():
df3 = df2[df2['Country'] == c]
fig.add_trace(go.Bar(x=x, y=df3['Amount'], name=c, text=df3['Amount']))
fig.update_layout(title="Personnel at Work",
barmode='stack',
title_x=.5,
yaxis={
'showgrid':False,
'visible':False
},
xaxis=dict(
tick0=0,
dtick=1,
),
plot_bgcolor='rgba(0,0,0,0)')
fig.show()
I had a similar problem and this block of code helped me!. Im not sure if it can help your case but give it a try.
fig.update_traces(texttemplate='%{your_labels =:.1f}', textposition='outside')
Go through all the use cases here,
https://plotly.com/python/text-and-annotations/

APOPT solver finds different solution every time

I'm using gekko to solve a MINLP problem. I'm using the APOPT solver since is the only one that can provide integer solution, which are strictly needed in my case.
The issue I have is that every time I run the solver I have a different solution, even for very small cases, so I can't be sure about the optimality of the solutions. From some solutions to others there are important differences in the objective final value.
I've noticed that only 1 iteration takes place, and I don't know if it should be like this. Also, it runs for less than 1 second when it could run longer and find better solutions.
Below the output of one of this runs:
--------- APM Model Size ------------
Each time step contains
Objects : 86
Constants : 0
Variables : 606
Intermediates: 0
Connections : 806
Equations : 1046
Residuals : 1046
Number of state variables: 606
Number of total equations: - 172
Number of slack variables: - 40
---------------------------------------
Degrees of freedom : 394
----------------------------------------------
Steady State Optimization with APOPT Solver
----------------------------------------------
Iter: 1 I: 0 Tm: 0.38 NLPi: 28 Dpth: 0 Lvs: 0 Obj: 1.27E+09 Gap: 0.00E+00
Successful solution
---------------------------------------------------
Solver : APOPT (v1.0)
Solution time : 0.38800000000628643 sec
Objective : 1271480486.0000000
Successful solution
---------------------------------------------------
I would appreciate any advice on how to tweak the solver so I can have consistent solutions. Thanks in advance
EDIT
I'm adding the code (reduced version). Basically what I'm trying to do is finding the best location for n warehouse based on the demand of m cities, which are represented by the corresponding hexagon in the h3 library.
from gekko import GEKKO
from random import randint
import pandas as pd
if __name__ == '__main__':
n_warehouses = 6
df_aggreg = pd.DataFrame.from_dict({'hex_id_basic': {32: 0, 12: 1, 2: 2, 3: 3, 22: 4, 24: 5, 38: 6, 8: 7, 19: 8, 27: 9, 21: 10, 25: 11, 28: 12, 26: 13, 29: 14, 30: 15, 31: 16, 1: 17, 20: 18, 18: 19, 17: 20, 33: 21, 35: 22, 14: 23, 13: 24, 11: 25, 10: 26, 36: 27, 6: 28, 5: 29, 4: 30, 16: 31, 34: 32, 0: 33, 23: 34, 15: 35, 9: 36, 37: 37, 7: 38, 39: 39}, 'value': {32: 1808, 12: 847, 2: 847, 3: 847, 22: 847, 24: 847, 38: 847, 8: 847, 19: 847, 27: 847, 21: 452, 25: 452, 28: 452, 26: 452, 29: 452, 30: 452, 31: 452, 1: 452, 20: 452, 18: 452, 17: 452, 33: 452, 35: 452, 14: 452, 13: 452, 11: 452, 10: 452, 36: 452, 6: 452, 5: 452, 4: 452, 16: 452, 34: 169, 0: 169, 23: 84, 15: 84, 9: 84, 37: 84, 7: 84, 39: 84}})
distance_matrix = [[0, 278320, 302712, 117682, 283287, 225645, 303065, 258900, 252165, 453768, 389125, 305694, 377415, 445329, 176671, 16098, 95378, 272352, 153948, 247011, 153063, 175620, 184734, 253592, 235204, 275271, 204377, 140151, 270207, 254950, 364642, 92383, 239928, 300635, 394936, 291140, 293377, 205417, 253778, 313127], [278320, 0, 144398, 187013, 244571, 107924, 257813, 279533, 334530, 195669, 205157, 84028, 109937, 187707, 427560, 265135, 188779, 292672, 307462, 280094, 408319, 281660, 131476, 440829, 46693, 204251, 233509, 147273, 263519, 267077, 185835, 310005, 254863, 323378, 119817, 407585, 280337, 171576, 280105, 261055], [302712, 144398, 0, 185183, 377007, 234161, 392715, 170354, 237346, 307577, 347861, 228110, 138929, 301200, 476655, 286696, 244493, 179154, 254571, 176769, 387398, 378208, 238178, 372115, 163476, 79788, 152001, 222356, 146928, 158337, 62584, 289416, 375428, 203381, 217686, 310387, 155900, 289533, 173468, 397435], [117682, 187013, 185183, 0, 291864, 179791, 311714, 170740, 192926, 379238, 347869, 240050, 272172, 370968, 293195, 101616, 87586, 185977, 129253, 162328, 222594, 233040, 150108, 262697, 153805, 162481, 109099, 101869, 173375, 162759, 246974, 123207, 265470, 218641, 306739, 255114, 197339, 195959, 167223, 320151], [283287, 244571, 377007, 291864, 0, 142846, 20212, 455483, 484791, 287216, 161617, 183085, 340463, 280631, 326676, 282534, 212575, 470828, 409823, 449527, 434084, 135211, 148574, 531347, 213849, 411617, 394742, 190552, 450777, 445332, 427673, 367985, 53066, 504293, 288485, 544788, 473256, 96874, 453151, 29897], [225645, 107924, 234161, 179791, 142846, 0, 158910, 325969, 365598, 232092, 169623, 84652, 214438, 223750, 344437, 216987, 130954, 340949, 308549, 322092, 374352, 178834, 41429, 440518, 71333, 272261, 268233, 88085, 317511, 314784, 285789, 286639, 147154, 374185, 190689, 432588, 338741, 64095, 324564, 164244], [303065, 257813, 392715, 311714, 20212, 158910, 0, 474649, 504627, 286841, 156847, 192075, 350549, 280649, 341260, 302511, 232768, 489976, 430017, 468858, 453490, 151205, 167295, 551499, 229266, 429328, 414077, 210234, 469552, 464393, 442147, 388101, 68990, 523448, 294341, 564878, 491892, 116277, 472394, 10359], [258900, 279533, 170354, 170740, 455483, 325969, 474649, 0, 68930, 467349, 478557, 358082, 308273, 460170, 428178, 244879, 258325, 15394, 131661, 13671, 271974, 403214, 307634, 213649, 272011, 90917, 61801, 266220, 26455, 12593, 202481, 195575, 434164, 48812, 379747, 140257, 39308, 358628, 6107, 482200], [252165, 334530, 237346, 192926, 484791, 365598, 504627, 68930, 0, 526826, 526392, 408559, 373685, 519349, 407413, 240840, 276462, 66186, 101574, 60620, 226811, 414422, 341075, 147167, 320118, 158889, 101274, 294543, 95127, 79800, 271409, 171262, 456912, 68984, 441013, 74111, 100302, 388712, 64684, 513007], [453768, 195669, 307577, 379238, 287216, 232092, 286841, 467349, 526826, 0, 136244, 148461, 178102, 8441, 573471, 443171, 358499, 479086, 502816, 469837, 595669, 392502, 273266, 636429, 225844, 383011, 426871, 313638, 447989, 454756, 317090, 500323, 328239, 507228, 90088, 600610, 461075, 278672, 468809, 282694], [389125, 205157, 347861, 347869, 161617, 169623, 156847, 478557, 526392, 136244, 0, 121715, 255074, 131581, 475633, 382466, 297470, 492617, 477097, 477118, 540843, 284992, 206940, 609805, 206849, 408806, 425937, 256909, 465228, 466469, 379467, 455895, 209773, 524589, 174173, 596319, 483539, 187375, 478271, 151054], [305694, 84028, 228110, 240050, 183085, 84652, 192075, 358082, 408559, 148461, 121715, 0, 158796, 140049, 428839, 295540, 210329, 371886, 368057, 357279, 450043, 256851, 125251, 502181, 89028, 287097, 307444, 165570, 343975, 345858, 265337, 357240, 207855, 403529, 110982, 479964, 361969, 139306, 358064, 193009], [377415, 109937, 138929, 272172, 340463, 214438, 350549, 308273, 373685, 178102, 255074, 158796, 0, 173125, 534732, 362967, 293628, 317765, 374520, 313604, 493385, 391108, 241276, 501197, 156178, 218494, 280742, 254733, 285601, 295994, 139047, 393079, 358593, 342270, 90310, 447455, 294687, 278510, 310960, 351759], [445329, 187707, 301200, 370968, 280631, 223750, 280649, 460170, 519349, 8441, 131581, 140049, 173125, 0, 565355, 434730, 350062, 472002, 494725, 462533, 587263, 384786, 264901, 628392, 217517, 376220, 419266, 305200, 440993, 447576, 311937, 491975, 321105, 500318, 84177, 593082, 454296, 270683, 461573, 276700], [176671, 427560, 476655, 293195, 326676, 344437, 341260, 428178, 407413, 573471, 475633, 428839, 534732, 565355, 0, 192659, 241685, 440309, 306114, 415366, 213722, 192267, 305547, 353506, 381057, 451860, 378110, 280381, 442537, 425977, 539074, 236503, 273614, 465255, 533298, 426186, 464854, 295413, 422602, 351173], [16098, 265135, 286696, 101616, 282534, 216987, 302511, 244879, 240840, 443171, 382466, 295540, 362967, 434730, 192659, 0, 86037, 258560, 144482, 233219, 158401, 180937, 176621, 250690, 222751, 259395, 189468, 130132, 255508, 240568, 348587, 87437, 241145, 287389, 382525, 282778, 278823, 201018, 239867, 312431], [95378, 188779, 244493, 87586, 212575, 130954, 232768, 258325, 276462, 358499, 297470, 210329, 293628, 350062, 241685, 86037, 0, 273554, 197249, 249773, 243741, 146171, 90840, 321549, 143532, 241689, 196668, 44890, 260330, 250266, 306721, 160762, 180583, 306135, 302150, 332965, 284225, 121772, 254773, 241969], [272352, 292672, 179154, 185977, 470828, 340949, 489976, 15394, 66186, 479086, 492617, 371886, 317765, 472002, 440309, 258560, 273554, 0, 140627, 25344, 279225, 418280, 322922, 213131, 286282, 99366, 77144, 281612, 32238, 26277, 207287, 206362, 449552, 33473, 391005, 133626, 34477, 373979, 19000, 497504], [153948, 307462, 254571, 129253, 409823, 308549, 430017, 131661, 101574, 502816, 477097, 368057, 374520, 494725, 306114, 144482, 197249, 140627, 0, 117996, 141824, 323726, 276426, 134201, 280163, 191685, 105340, 226127, 152834, 133897, 305377, 69716, 375475, 161033, 424673, 138867, 170911, 317471, 125583, 439203], [247011, 280094, 176769, 162328, 449527, 322092, 468858, 13671, 60620, 469837, 477118, 357279, 313604, 462533, 415366, 233219, 249773, 25344, 117996, 0, 258348, 393955, 302253, 202640, 270296, 98356, 54784, 259602, 38118, 20016, 211973, 182205, 426898, 56395, 382945, 133870, 52974, 352657, 7588, 476592], [153063, 408319, 387398, 222594, 434084, 374352, 453490, 271974, 226811, 595669, 540843, 450043, 493385, 587263, 213722, 158401, 243741, 279225, 141824, 258348, 0, 314904, 334581, 140210, 370743, 331489, 244983, 286515, 294300, 275249, 442428, 100328, 387881, 294243, 528076, 223497, 311245, 358451, 265867, 463708], [175620, 281660, 378208, 233040, 135211, 178834, 151205, 403214, 414422, 392502, 284992, 256851, 391108, 384786, 192267, 180937, 146171, 418280, 323726, 393955, 314904, 0, 150414, 428871, 237099, 386147, 341933, 156045, 406310, 395685, 438153, 267684, 82364, 450314, 367218, 462586, 430241, 117601, 399353, 161423], [184734, 131476, 238178, 150108, 148574, 41429, 167295, 307634, 341075, 273266, 206940, 125251, 241276, 264901, 305547, 176621, 90840, 322922, 276426, 302253, 334581, 150414, 0, 406344, 86738, 264273, 247556, 50786, 302265, 297224, 294689, 249321, 137664, 356394, 227861, 405170, 324686, 53055, 305544, 174613], [253592, 440829, 372115, 262697, 531347, 440518, 551499, 213649, 147167, 636429, 609805, 502181, 501197, 628392, 353506, 250690, 321549, 213131, 134201, 202640, 140210, 428871, 406344, 0, 414364, 298374, 221122, 355576, 240067, 222654, 413797, 163640, 491733, 212409, 556786, 103608, 247427, 443322, 208464, 561173], [235204, 46693, 163476, 153805, 213849, 71333, 229266, 272011, 320118, 225844, 206849, 89028, 156178, 217517, 381057, 222751, 143532, 286282, 280163, 270296, 370743, 237099, 86738, 414364, 0, 208101, 219280, 101025, 259858, 260053, 214484, 274518, 217001, 318654, 159780, 391112, 279361, 131548, 271564, 233971], [275271, 204251, 79788, 162481, 411617, 272261, 429328, 90917, 158889, 383011, 408806, 287097, 218494, 376220, 451860, 259395, 241689, 99366, 191685, 98356, 331489, 386147, 264273, 298374, 208101, 0, 86592, 233845, 67147, 79218, 115829, 239530, 400051, 124610, 293938, 231168, 78076, 317319, 94401, 435547], [204377, 233509, 152001, 109099, 394742, 268233, 414077, 61801, 101274, 426871, 425937, 307444, 280742, 419266, 378110, 189468, 196668, 77144, 105340, 54784, 244983, 341933, 247556, 221122, 219280, 86592, 0, 204896, 66374, 53769, 200202, 155140, 372506, 110258, 342408, 174094, 90128, 297872, 58728, 421827], [140151, 147273, 222356, 101869, 190552, 88085, 210234, 266220, 294543, 313638, 256909, 165570, 254733, 305200, 280381, 130132, 44890, 281612, 226127, 259602, 286515, 156045, 50786, 355576, 101025, 233845, 204896, 0, 263729, 256622, 282929, 198993, 168664, 314976, 258111, 356656, 286974, 94168, 263557, 218490], [270207, 263519, 146928, 173375, 450777, 317511, 469552, 26455, 95127, 447989, 465228, 343975, 285601, 440993, 442537, 255508, 260330, 32238, 152834, 38118, 294300, 406310, 302265, 240067, 259858, 67147, 66374, 263729, 0, 19120, 176392, 213505, 432376, 59886, 359568, 165204, 23979, 354255, 31668, 476705], [254950, 267077, 158337, 162759, 445332, 314784, 464393, 12593, 79800, 454756, 466469, 345858, 295994, 447576, 425977, 240568, 250266, 26277, 133897, 20016, 275249, 395685, 297224, 222654, 260053, 79218, 53769, 256622, 19120, 0, 192090, 195450, 424902, 59414, 367175, 152072, 38895, 348532, 15183, 471835], [364642, 185835, 62584, 246974, 427673, 285789, 442147, 202481, 271409, 317090, 379467, 265337, 139047, 311937, 539074, 348587, 306721, 207287, 305377, 211973, 442428, 438153, 294689, 413797, 214484, 115829, 200202, 282929, 176392, 192090, 0, 346396, 430496, 224040, 228276, 340846, 177161, 344487, 206946, 445989], [92383, 310005, 289416, 123207, 367985, 286639, 388101, 195575, 171262, 500323, 455895, 357240, 393079, 491975, 236503, 87437, 160762, 206362, 69716, 182205, 100328, 267684, 249321, 163640, 274518, 239530, 155140, 198993, 213505, 195450, 346396, 0, 328145, 229377, 429805, 200778, 233973, 281930, 189695, 397850], [239928, 254863, 375428, 265470, 53066, 147154, 68990, 434164, 456912, 328239, 209773, 207855, 358593, 321105, 273614, 241145, 180583, 449552, 375475, 426898, 387881, 82364, 137664, 491733, 217001, 400051, 372506, 168664, 432376, 424902, 430496, 328145, 0, 482764, 317893, 512855, 455638, 86038, 431229, 79283], [300635, 323378, 203381, 218641, 504293, 374185, 523448, 48812, 68984, 507228, 524589, 403529, 342270, 500318, 465255, 287389, 306135, 33473, 161033, 56395, 294243, 450314, 356394, 212409, 318654, 124610, 110258, 314976, 59886, 59414, 224040, 229377, 482764, 0, 418433, 121305, 47807, 407440, 51547, 530978], [394936, 119817, 217686, 306739, 288485, 190689, 294341, 379747, 441013, 90088, 174173, 110982, 90310, 84177, 533298, 382525, 302150, 391005, 424673, 382945, 528076, 367218, 227861, 556786, 159780, 293938, 342408, 258111, 359568, 367175, 228276, 429805, 317893, 418433, 0, 515064, 371904, 249637, 381509, 293301], [291140, 407585, 310387, 255114, 544788, 432588, 564878, 140257, 74111, 600610, 596319, 479964, 447455, 593082, 426186, 282778, 332965, 133626, 138867, 133870, 223497, 462586, 405170, 103608, 391112, 231168, 174094, 356656, 165204, 152072, 340846, 200778, 512855, 121305, 515064, 0, 164624, 450245, 136922, 573716], [293377, 280337, 155900, 197339, 473256, 338741, 491892, 39308, 100302, 461075, 483539, 361969, 294687, 454296, 464854, 278823, 284225, 34477, 170911, 52974, 311245, 430241, 324686, 247427, 279361, 78076, 90128, 286974, 23979, 38895, 177161, 233973, 455638, 47807, 371904, 164624, 0, 376916, 45412, 498907], [205417, 171576, 289533, 195959, 96874, 64095, 116277, 358628, 388712, 278672, 187375, 139306, 278510, 270683, 295413, 201018, 121772, 373979, 317471, 352657, 358451, 117601, 53055, 443322, 131548, 317319, 297872, 94168, 354255, 348532, 344487, 281930, 86038, 407440, 249637, 450245, 376916, 0, 356278, 124356], [253778, 280105, 173468, 167223, 453151, 324564, 472394, 6107, 64684, 468809, 478271, 358064, 310960, 461573, 422602, 239867, 254773, 19000, 125583, 7588, 265867, 399353, 305544, 208464, 271564, 94401, 58728, 263557, 31668, 15183, 206946, 189695, 431229, 51547, 381509, 136922, 45412, 356278, 0, 480029], [313127, 261055, 397435, 320151, 29897, 164244, 10359, 482200, 513007, 282694, 151054, 193009, 351759, 276700, 351173, 312431, 241969, 497504, 439203, 476592, 463708, 161423, 174613, 561173, 233971, 435547, 421827, 218490, 476705, 471835, 445989, 397850, 79283, 530978, 293301, 573716, 498907, 124356, 480029, 0]]
# Initialize Model
m = GEKKO(remote=False)
m.options.SOLVER = 1
m.options.IMODE = 3
# VARIABLES
print("Creating variables...")
# warehouse locations
warehouses_to_hexagon_vars = []
for dks_id in range(n_warehouses):
warehouses_to_hexagon_vars.append([m.Var(value=0, lb=0, ub=1, integer=True, name=f"dk_{dks_id}_{hex_id}")
for hex_id in df_aggreg["hex_id_basic"].unique()])
# hexagon assigned to warehouse
hexagon_to_warehouse_vars = []
for hex_id in df_aggreg["hex_id_basic"].unique():
hexagon_to_warehouse_vars.append([m.Var(value=0, lb=0, ub=1, integer=True, name=f"hx_{hex_id}_{dks_id}")
for dks_id in range(n_warehouses)])
# CONSTRAINTS
# A warehouse located only in one hexagon
print("Creating constraints...")
for dks_vars in warehouses_to_hexagon_vars:
m.Equation(m.sum(dks_vars) == 1)
# One hexagon contains at most one warehouse
for hex_id in df_aggreg["hex_id_basic"].unique():
m.Equation(m.sum([dks_vars[hex_id] for dks_vars in warehouses_to_hexagon_vars]) <= 1)
# One hexagon assigned only to one warehouse
for hex_vars in hexagon_to_warehouse_vars:
m.Equation(m.sum(hex_vars) == 1)
# WARM START
for dks_id in range(n_warehouses):
warehouses_to_hexagon_vars[dks_id][randint(0, len(warehouses_to_hexagon_vars[dks_id]) - 1)].value = 1
for hex_id in range(len(hexagon_to_warehouse_vars)):
hexagon_to_warehouse_vars[hex_id][randint(0, n_warehouses - 1)].value = 1
# Set objective function
print("Creating objective function...")
for wh_id in range(n_warehouses):
for hex_id_1 in df_aggreg["hex_id_basic"].unique():
for hex_id_2 in df_aggreg["hex_id_basic"].unique():
distance_hexagon_to_warehouse = int(distance_matrix[hex_id_1][hex_id_2])
demand_hexagon = df_aggreg.loc[df_aggreg["hex_id_basic"] == hex_id_2, "value"].values[0]
m.Obj(warehouses_to_hexagon_vars[wh_id][hex_id_1] *
hexagon_to_warehouse_vars[hex_id_2][wh_id] *
distance_hexagon_to_warehouse *
demand_hexagon)
# Solve simulation
m.solve()
I've implemented a parallel solution in which I run the same model n times but with a different initial solution each. Once all models have been solved, I take the solution with lowest objective value.

Matching Buy Sell entries from two dataframes and creating a new one. Python 3.8 / W10

Python / Pandas.
Matching Buy and Sell entries row by row.
BuyDF and SellDF are obtained from one excel file and sorted as per ascending Time (column L).
The image shows how the matching has to be done.
Match Buy and Sell entries by Name following first in first out method.
Take very first entry (Name AAA) from BuyDF and match with very first / Top most entry (Name AAA) from SellDF and move the matching row from SellDF in front of corrosponding row of BuyDF and delete the row Sell DF.
Go back to BuyDF second entry and match SellDF entry and move the matching row from SellDF and move the matching row from SellDF in front of corrosponding row of BuyDF and the row is deleted from Sell DF ...... and so on.
For names which do not match leave the matching rows Blank.
The ascending order (Time / Column L) should not be changed to maintain first in first out.
Tried using merge but didn't work for me.
How to proceed ?
BuyDF
{'Date': {0: '2019-04-01', 1: '2019-04-01', 2: '2019-04-01', 3: '2019-04-01', 4: '2019-04-02', 5: '2019-04-02', 6: '2019-04-02', 7: '2019-04-02', 8: '2019-04-05'}, 'Name': {0: 'AAA', 1: 'AAA', 2: 'AAA', 3: 'AAA', 4: 'BBB', 5: 'CCC', 6: 'CCC', 7: 'BBB', 8: 'AAA'}, 'Ref': {0: 1, 1: 1, 2: 1, 3: 1, 4: 5, 5: 7, 6: 7, 7: 6, 8: 1}, 'Seg': {0: 'S', 1: 'S', 2: 'S', 3: 'S', 4: 'L', 5: 'XL', 6: 'XL', 7: 'L', 8: 'S'}, 'Trans': {0: 'buy', 1: 'buy', 2: 'buy', 3: 'buy', 4: 'buy', 5: 'buy', 6: 'buy', 7: 'buy', 8: 'buy'}, 'Qty': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1}, 'Price': {0: 225, 1: 225, 2: 225, 3: 225, 4: 210, 5: 210, 6: 210, 7: 210, 8: 225}, 'Order ID': {0: 8249, 1: 111, 2: 654, 3: 111, 4: 888, 5: 444, 6: 444, 7: 888, 8: 111}, 'Trade ID': {0: 1010, 1: 1010, 2: 1010, 3: 1010, 4: 4645, 5: 132, 6: 132, 7: 4700, 8: 1010}, 'Time': {0: '2019-04-01 11:05:18', 1: '2019-04-01 13:05:18', 2: '2019-04-01 13:05:18', 3: '2019-04-01 13:05:59', 4: '2019-04-02 13:20:05', 5: '2019-04-02 13:35:02', 6: '2019-04-02 13:35:02', 7: '2019-04-02 14:20:12', 8: '2019-04-05 13:05:18'}}
SellDF
{'Date': {5: '2019-04-01', 6: '2019-04-02', 7: '2019-04-02', 8: '2019-04-02', 13: '2019-04-03', 14: '2019-04-05', 15: '2019-04-05'}, 'Name': {5: 'AAA', 6: 'BBB', 7: 'BBB', 8: 'BBB', 13: 'DDD', 14: 'AAA', 15: 'AAA'}, 'Ref': {5: 3, 6: 2, 7: 2, 8: 2, 13: 8, 14: 4, 15: 4}, 'Seg': {5: 'L', 6: 'X', 7: 'X', 8: 'X', 13: 'XS', 14: 'L', 15: 'L'}, 'Trans': {5: 'sell', 6: 'sell', 7: 'sell', 8: 'sell', 13: 'sell', 14: 'sell', 15: 'sell'}, 'Qty': {5: 1, 6: 1, 7: 1, 8: 1, 13: 1, 14: 1, 15: 1}, 'Price': {5: 210, 6: 210, 7: 210, 8: 210, 13: 210, 14: 210, 15: 210}, 'Order ID': {5: 555, 6: 222, 7: 222, 8: 222, 13: 999, 14: 555, 15: 555}, 'Trade ID': {5: 1640, 6: 1532, 7: 1532, 8: 1532, 13: 14623, 14: 1645, 15: 1645}, 'Time': {5: '2019-04-01 14:13:40', 6: '2019-04-02 13:10:32', 7: '2019-04-02 13:10:32', 8: '2019-04-02 13:10:32', 13: '2019-04-03 15:25:50', 14: '2019-04-05 14:41:45', 15: '2019-04-05 14:41:45'}}
Image posted for ease of understanding.

Apply np.average in pandas pivot aggfunc

I am trying to calculate weighted average prices using pandas pivot table.
I have tried passing in a dictionary using aggfunc.
This does not work when passed into aggfunc, although it should calculate the correct weighted average.
'Price': lambda x: np.average(x, weights=df['Balance'])
I have also tried using a manual groupby:
df.groupby('Product').agg({
'Balance': sum,
'Price': lambda x : np.average(x, weights='Balance'),
'Value': sum
})
This also yields the error:
TypeError: Axis must be specified when shapes of a and weights differ.
Here is sample data
import pandas as pd
import numpy as np
price_dict = {'Product': {0: 'A',
1: 'A',
2: 'A',
3: 'A',
4: 'A',
5: 'B',
6: 'B',
7: 'B',
8: 'B',
9: 'B',
10: 'C',
11: 'C',
12: 'C',
13: 'C',
14: 'C'},
'Balance': {0: 10,
1: 20,
2: 30,
3: 40,
4: 50,
5: 60,
6: 70,
7: 80,
8: 90,
9: 100,
10: 110,
11: 120,
12: 130,
13: 140,
14: 150},
'Price': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 9,
9: 10,
10: 11,
11: 12,
12: 13,
13: 14,
14: 15},
'Value': {0: 10,
1: 40,
2: 90,
3: 160,
4: 250,
5: 360,
6: 490,
7: 640,
8: 810,
9: 1000,
10: 1210,
11: 1440,
12: 1690,
13: 1960,
14: 2250}}
Try to calculate weighted average by passing dict into aggfunc:
df = pd.DataFrame(price_dict)
df.pivot_table(
index='Product',
aggfunc = {
'Balance': sum,
'Price': np.mean,
'Value': sum
}
)
Output:
Balance Price Value
Product
A 150 3 550
B 400 8 3300
C 650 13 8550
The expected outcome should be :
Balance Price Value
Product
A 150 3.66 550
B 400 8.25 3300
C 650 13.15 8550
Here is one way using apply
df.groupby('Product').apply(lambda x : pd.Series(
{'Balance': x['Balance'].sum(),
'Price': np.average(x['Price'], weights=x['Balance']),
'Value': x['Value'].sum()}))
Out[57]:
Balance Price Value
Product
A 150.0 3.666667 550.0
B 400.0 8.250000 3300.0
C 650.0 13.153846 8550.0