Does Simpy support optimized dynamic resource distributon across multiple nodes? - dynamic

I have 2 nodes 0 and 1 and in total there are 12 resources which will server in the nodes 0 and 1. Is there a method in Simpy to schedule the 12 resources across nodes 0 and 1 so that the average total processing time of an item through node 0 followed by node 1 is minimized. From time to time resources can move from one node to another for serving. Attached is the code where I have come up with a static distribution of 5 resources in node 0 and 7 resources in node 1. How to make it dynamic with time ?
import numpy as np
import simpy
def interarrival():
return(np.random.exponential(20))
def servicetime():
return(np.random.exponential(60))
def servicing(env, servers_1):
i = 0
while(True):
i = i+1
yield env.timeout(interarrival())
print("Customer "+str(i)+ " arrived in the process at "+str(env.now))
state = 0
env.process(items(env, i, servers_array, state))
def items(env, customer_id, servers_array, state):
with servers_array[state].request() as request:
yield request
t_arrival = env.now
print("Customer "+str(customer_id)+ " arrived in "+str(state)+ " at "+str(t_arrival))
yield env.timeout(servicetime())
t_depart = env.now
print("Customer "+str(customer_id)+ " departed from "+str(state)+ " at "+str(t_depart))
if (state == 1):
print("Customer exits")
else:
state = 1
env.process(items(env, customer_id, servers_array, state))
env = simpy.Environment()
servers_array = []
servers_array.append(simpy.Resource(env, capacity = 5))
servers_array.append(simpy.Resource(env, capacity = 7))
env.process(servicing(env, servers_array))
env.run(until=2880)

If you use the resources, start each node with a capacity of 12 and use the delay from your last question to delay some of the resources from each node so the total active resources is the total you want. Otherwise you may want to start looking at containers and stores which will allow you to move a resource from one node to another.

Related

How to fix "Solution Not Found" Error in Gekko Optimization with rolling principle

My program is optimizing the charging and decharging of a home battery to minimize the cost of electricity at the end of the year. In this case there also is a PV, which means that sometimes you're injecting electricity into the grid and receive money. The net offtake is the result of the usage of the home and the PV installation. So these are the possible situations:
Net offtake > 0 => usage home > PV => discharge from battery or take from grid
Net offtake < 0 => usage home < PV => charge battery or injection into grid
The electricity usage of homes is measured each 15 minutes, so I have 96 measurement point in 1 day. I want to optimilize the charging and decharging of the battery for 2 days, so that day 1 takes the usage of day 2 into account.
I wrote a controller that reads the data and gives each time the input values for 2 days to the optimization. With a rolling principle, it goes to the next 2 days and so on. Below you can see the code from my controller.
from gekko import GEKKO
from simulationModel2_2d_1 import getSimulation2
from exportModel2 import exportToExcelModel2
import numpy as np
#import matplotlib.pyplot as plt
import pandas as pd
import time
import math
# ------------------------ Import and read input data ------------------------
file = r'Data Sim 2.xlsx'
data = pd.read_excel(file, sheet_name='Input', na_values='NaN')
dataRead = pd.DataFrame(data, columns= ['Timestep','Verbruik woning (kWh)','Netto afname (kWh)','Prijs afname (€/kWh)',
'Prijs injectie (€/kWh)','Capaciteit batterij (kW)',
'Capaciteit batterij (kWh)','Rendement (%)',
'Verbruikersprofiel','Capaciteit PV (kWp)','Aantal dagen'])
timestep = dataRead['Timestep'].to_numpy()
usage_home = dataRead['Verbruik woning (kWh)'].to_numpy()
net_offtake = dataRead['Netto afname (kWh)'].to_numpy()
price_offtake = dataRead['Prijs afname (€/kWh)'].to_numpy()
price_injection = dataRead['Prijs injectie (€/kWh)'].to_numpy()
cap_batt_kW = dataRead['Capaciteit batterij (kW)'].iloc[0]
cap_batt_kWh = dataRead['Capaciteit batterij (kWh)'].iloc[0]
efficiency = dataRead['Rendement (%)'].iloc[0]
usersprofile = dataRead['Verbruikersprofiel'].iloc[0]
days = dataRead['Aantal dagen'].iloc[0]
pv = dataRead['Capaciteit PV (kWp)'].iloc[0]
# ------------- Optimization model & Rolling principle (2 days) --------------
# Initialise model
m = GEKKO()
# Output data
ts = []
charging = [] # Amount to charge/decharge batterij
e_batt = [] # Amount of energy in the battery
usage_net = [] # Usage after home, battery and pv
p_paid = [] # Price paid for energy of 15min
# Energy in battery to pass
energy = 0
# Iterate each day for one year
for d in range(int(days)-1):
d1_timestep = []
d1_net_offtake = []
d1_price_offtake = []
d1_price_injection = []
d2_timestep = []
d2_net_offtake = []
d2_price_offtake = []
d2_price_injection = []
# Iterate timesteps
for i in range(96):
d1_timestep.append(timestep[d*96+i])
d2_timestep.append(timestep[d*96+i+96])
d1_net_offtake.append(net_offtake[d*96+i])
d2_net_offtake.append(net_offtake[d*96+i+96])
d1_price_offtake.append(price_offtake[d*96+i])
d2_price_offtake.append(price_offtake[d*96+i+96])
d1_price_injection.append(price_injection[d*96+i])
d2_price_injection.append(price_injection[d*96+i+96])
# Input data simulation of 2 days
ts_temp = np.concatenate((d1_timestep, d2_timestep))
net_offtake_temp = np.concatenate((d1_net_offtake, d2_net_offtake))
price_offtake_temp = np.concatenate((d1_price_offtake, d2_price_offtake))
price_injection_temp = np.concatenate((d1_price_injection, d2_price_injection))
if(d == 7):
print(ts_temp)
print(energy)
# Simulatie uitvoeren
charging_temp, e_batt_temp, usage_net_temp, p_paid_temp, energy_temp = getSimulation2(ts_temp, net_offtake_temp, price_offtake_temp, price_injection_temp, cap_batt_kW, cap_batt_kWh, efficiency, energy, pv)
# Take over output first day, unless last 2 days
energy = energy_temp
if(d == (days-2)):
for t in range(1,len(ts_temp)):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
elif(d == 0):
for t in range(int(len(ts_temp)/2)+1):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
else:
for t in range(1,int(len(ts_temp)/2)+1):
ts.append(ts_temp[t])
charging.append(charging_temp[t])
e_batt.append(e_batt_temp[t])
usage_net.append(usage_net_temp[t])
p_paid.append(p_paid_temp[t])
print('Simulation day '+str(d+1)+' complete.')
# ------------------------ Export output data to Excel -----------------------
a = exportToExcelModel2(ts, usage_home, net_offtake, price_offtake, price_injection, charging, e_batt, usage_net, p_paid, cap_batt_kW, cap_batt_kWh, efficiency, usersprofile, pv)
print(a)
The optimization with Gekko happens in the following code:
from gekko import GEKKO
def getSimulation2(timestep, net_offtake, price_offtake, price_injection,
cap_batt_kW, cap_batt_kWh, efficiency, start_energy, pv):
# ---------------------------- Optimization model ----------------------------
# Initialise model
m = GEKKO(remote = False)
# Global options
m.options.SOLVER = 1
m.options.IMODE = 6
# Constants
speed_charging = cap_batt_kW/4
m.time = timestep
max_cap_batt = m.Const(value = cap_batt_kWh)
min_cap_batt = m.Const(value = 0)
max_charge = m.Const(value = speed_charging) # max battery can charge in 15min
max_decharge = m.Const(value = -speed_charging) # max battery can decharge in 15min
# Parameters
usage_home = m.Param(net_offtake)
price_offtake = m.Param(price_offtake)
price_injection = m.Param(price_injection)
# Variables
e_batt = m.Var(value=start_energy, lb = min_cap_batt, ub = max_cap_batt) # energy in battery
price_paid = m.Var() # price paid each 15min
charging = m.Var(lb = max_decharge, ub = max_charge) # amount charge/decharge each 15min
usage_net = m.Var(lb=min_cap_batt)
# Equations
m.Equation(e_batt==(m.integral(charging)+start_energy)*efficiency)
m.Equation(-charging <= e_batt)
m.Equation(usage_net==usage_home + charging)
price = m.Intermediate(m.if2(usage_net*1e6, price_injection, price_offtake))
price_paid = m.Intermediate(usage_net * price / 100)
# Objective
m.Minimize(price_paid)
# Solve problem
m.options.COLDSTART=2
m.solve()
m.options.TIME_SHIFT=0
m.options.COLDSTART=0
m.solve()
# Energy to pass
energy_left = e_batt[95]
#m.cleanup()
return charging, e_batt, usage_net, price_paid, energy_left
The data you need for input can be found in this Excel document:
https://docs.google.com/spreadsheets/d/1S40Ut9-eN_PrftPCNPoWl8WDDQtu54f0/edit?usp=sharing&ouid=104786612700360067470&rtpof=true&sd=true
With this code, it always ends at day 17 with the "Solution Not Found" Error.
I already tried extending the default iteration limit to 500 but it didn't work.
I also tried with other solvers but also no improvement.
By presolving with COLDSTART it already reached day 17, without this it ends at day 8.
I solved the days were my optimization ends individually and then the solution was always found immediately with the same code.
Someone who can explain this to me and maybe find a solution? Thanks in advance!
This is kind of big to troubleshoot, but here are some general ideas that might help. This assumes, as you said, that the model solves fine for day 1-2, and day 3-4, and day 5-6, etc. And that those results pass inspection (aka the basic model is working as you say).
Then something is (obviously) amiss around day 17. Some things to look at and try:
Start the model at day 16-17, see if it works in isolation
gather your results as you go and do a time series plot of the key variables, maybe one of them is on an obvious bad trend towards a limit causing an infeasibility... Perhaps the e_batt variable is slowly declining because not enough PV energy is available and hits minimum on Day 17
Radically change the upper/lower bounds on your variables to test them to see if they might be involved in the infeasibility (assuming there is one)
Make a different (fake) dataset where the solution is assured and kind of obvious... all constants or a pattern that you know will produce some known result and test the model outputs against it.
It might also be useful to pursue the tips in this excellent post on troubleshooting gekko, and edit your question with the results of that effort.
edit: couple ideas from your comment...
You didn't say what the result was from troubleshooting and whether the error is infeasible or max iterations, or ???. But...
If the model seems to crunch after about 15 days, I'm betting that it is becoming infeasible. Did you plot the battery level over course of days?
Also, I'm suspicious of your equation for the e_batt level... Why are you multiplying the prior battery state by the efficiency factor? That seems incorrect. That is charge that is already in the battery. Odds are you are (incorrectly) hitting the battery charge level every day with the efficiency tax and that the max charge level isn't sufficient to keep up with demand.
In addition to tips above try:
fix your efficiency formula to not multiply the efficiency times the previous state
change the efficiency to 100%
make the upper limit on charge huge
As an aside: I don't really see the connection to PV energy available here. What you are basically modeling is some "mystery battery" that you can charge and discharge anytime you want. I would get this debugged first and then look at the energy available by time of day...you aren't going to be generating charge at midnight. :).

Unable to load large pandas dataframe to pyspark

I've been trying to join two large pandas dataframes using pyspark using the following code. I'm trying to vary executor cores allocated for the application and measure scalability of pyspark (strong scaling).
r = 1000000000 # 1Bn rows
it = 10
w = 256
unique = 0.9
TOTAL_MEM = 240
TOTAL_NODES = 14
max_val = r * unique
rng = default_rng()
frame_data = rng.integers(0, max_val, size=(r, 2))
frame_data1 = rng.integers(0, max_val, size=(r, 2))
print(f"data generated", flush=True)
df_l = pd.DataFrame(frame_data).add_prefix("col")
df_r = pd.DataFrame(frame_data1).add_prefix("col")
print(f"data loaded", flush=True)
procs = int(math.ceil(w / TOTAL_NODES))
mem = int(TOTAL_MEM*0.9)
print(f"world sz {w} procs per worker {procs} mem {mem} iter {it}", flush=True)
spark = SparkSession\
.builder\
.appName(f'join {r} {w}')\
.master('spark://node:7077')\
.config('spark.executor.memory', f'{int(mem*0.6)}g')\
.config('spark.executor.pyspark.memory', f'{int(mem*0.4)}g')\
.config('spark.cores.max', w)\
.config('spark.driver.memory', '100g')\
.config('sspark.sql.execution.arrow.pyspark.enabled', 'true')\
.getOrCreate()
sdf0 = spark.createDataFrame(df_l).repartition(w).cache()
sdf1 = spark.createDataFrame(df_r).repartition(w).cache()
print(f"data loaded to spark", flush=True)
try:
for i in range(it):
t1 = time.time()
out = sdf0.join(sdf1, on='col0', how='inner')
count = out.count()
t2 = time.time()
print(f"timings {r} {w} {i} {(t2 - t1) * 1000:.0f} ms, {count}", flush=True)
del out
del count
gc.collect()
finally:
spark.stop()
Cluster:
I am using standalone spark cluster in a 15 node cluster with 48 cores and 240GB RAM each. I've spawned master and the driver code in node1, while other 14 nodes have spawned workers allocating maximum memory.
In the spark context, I am reserving 90% of total memory to executor, splitting 60% to jvm and 40% to pyspark.
Issue:
When I run the above program, I can see that the executors are being assigned to the app. But it doesn't move forward, even after 60 mins. For smaller row count (10M), this was working without a problem.
Driver output
world sz 256 procs per worker 19 mem 216 iter 8
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/08/26 14:52:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/N/u/d/dnperera/.conda/envs/cylonflow/lib/python3.8/site-packages/pyspark/sql/pandas/conversion.py:425: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed by the reason below:
Negative initial size: -589934400
Attempting non-optimization as 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
warn(msg)
Any help on this is much appreciated.

Understanding Pandas Series Data Structure

I am trying to get my head around the Pandas module and started learning about the Series data structure.
I have created the following Series in Spyder :-
songs = pd.Series(data = [145,142,38,13], name = "Count")
I can obtain information about the Series index using the code:-
songs.index
The output of the above code is as follows:-
My question is where it states Start = 0 and Stop = 4, what are these referring to?
I have interpreted start = 0 as the first element in the Series is in row 0.
But i am not sure what Stop value refers to as there are no elements in row 4 of the Series?
Can some one explain?
Thank you.
This concept as already explained adequately in the comments (indexing is at minus one the count of items) is prevalent in many places.
For instance, take the list data structure-
z = songs.to_list()
[145, 142, 38, 13]
len(z)
4 # length is four
# however indexing stops at i-1 position 'i' being the length/count of items in the list.
z[4] # this will raise an IndexError
# you will have to start at index 0 going till only index 3 (i.e. 4 items)
z[0], z[1], z[2], z[-1] # notice how -1 can be used to directly access the last element

Pyomo scheduling

I am trying to formulate a flowshop scheduling problem in Pyomo. This is an Abstract model
Problem description
There are 3 jobs (chest, door and chair) and 3 machines (cutting, welding, packing in that order). Objective is to minimise the makespan. The python code and the data are as follows.
## flowshop.py ##
from pyomo.environ import *
flowshop = AbstractModel()
flowshop.jobs = Set()
flowshop.machines = Set()
flowshop.machinesN = Param()
flowshop.jobsN = Param()
flowshop.proc_T = Param(flowshop.jobs,
flowshop.machines,
within=NonNegativeReals)
flowshop.start_T = Var(flowshop.jobs,
flowshop.machines,
within=NonNegativeReals)
flowshop.makespan = Var(within=NonNegativeReals)
def makespan_rule(flowshop,i,j):
return flowshop.makespan >= flowshop.start_T[i,j]+flowshop.proc_T[i,j]
flowshop.makespan_cons = Constraint(flowshop.jobs,
flowshop.machines,
rule=makespan_rule)
def objective_rule(flowshop):
return flowshop.makespan
flowshop.objc = Objective(rule=objective_rule,sense=minimize)
## data.dat ##
set jobs := chest door chair ;
set machines := cutting welding packing ;
param: machinesN := 3 ;
param: jobsN := 3 ;
param proc_T:
cutting welding packing :=
chest 10 40 45
door 30 20 25
chair 05 30 15
;
I havent added all the constraints yet, I plan to add them after this issue gets fixed. In the code (flowhop.py) above, for the makespan_rule, I want the makespan to be more that the completion time of only the last machine.
Currently, it is set to be more than completion times of all the machines.
For that, I believe, I have to get the last index of the machines set.
For that, I tried flowshop.machines[-1], but it gives an error saying:
Cannot index unordered set machines
How do I solve this issue?
Thanks for the help.
PS - I am also struggling to model the binary variables used to define the precedence of a job. If you have any ideas regarding that, that would also be helpful.
As the error says Cannot index unordered sets, the set flowshop.machines is not ordered. One needs to provide ordered=True argument in the while declaring the set -
flowshop.machines = Set(ordered=True)
After this, one can access any element by normal indexing - flowshop.machines[i]
For the binary variables, one can declare them as -
c = flowshop.jobsN*(flowshop.jobsN-1)/2
flowshop.prec = Var(RangeSet(1,c),within=Binary)
Then, this variable can be used to decide the precedence between 2 jobs and to formulate the assignment constraints. The precedence variable corresponding to a pair of jobs can be found out using the indices of the jobs (for which the flowshop.jobs has to be an ordered set - flowshop.jobs = Set(ordered=True))

Google Spreadsheet Python API read specific column

I have 2 questions regarding google spreadsheet's api using python. My google spreadsheet is as follows:
a b1 23 4 5 6
When I run the script below I only get
root#darkbox:~/google_py# python test.py
1
2
3
4
I only want to get column 1 so i want to see
1
3
5
my second issue here is considering there is a space between the rows my script is not getting the second part (it should be 5 in this case)
How can I get the specified column and ignore white spaces?
#!/usr/bin/env python
import gdata.docs
import gdata.docs.service
import gdata.spreadsheet.service
import re, os
email = 'xxxx#gmail.com'
password = 'passw0rd'
spreadsheet_key = '14cT5KKKWzup1jK0vc-TyZt6BBwSIyazZz0sA_x0M1Bg' # key param
worksheet_id = 'od6' # default
#doc_name = 'python_test'
def main():
client = gdata.spreadsheet.service.SpreadsheetsService()
client.debug = False
client.email = email
client.password = password
client.source = 'test client'
client.ProgrammaticLogin()
q = gdata.spreadsheet.service.DocumentQuery()
feed = client.GetSpreadsheetsFeed(query=q)
feed = client.GetWorksheetsFeed(spreadsheet_key)
rows = client.GetListFeed(spreadsheet_key,worksheet_id).entry
for row in rows:
for key in row.custom:
print "%s" % (row.custom[key].text)
return
if __name__ == '__main__':
main()
To ignore white spaces:
Suggest you switch to CellFeed - I think list feed stops reading when it hits whitespace. Sorry, I forget the fine details. But I dropped List feed and switched to cellfeed a long time ago.