I am new to Simpy and have a problem with combining batching jobs and interrupting set-up time. So could you please help me?
I would like to create a system with servers that need time to set up before being ready to serve.
The system starts to set up whenever enough M (2, 3,...) customers are in the queue. If the number of customers in the system reaches the maximum number of K(50), the coming customer will balk.
When a batch( group of M customers) leaves the system, we check if there are M customers(a batch) who are waiting to be served. If so, we keep the server remaining ON, otherwise, we turn off the server immediately.
I found some code for quite the same problem in a Simpy google group about Covid test simulation that uses Stores Resources and the answer for interrupting set-up time with Container Resources by Michael R. Gibbs
https://groups.google.com/g/python-simpy/c/iFYaDlL4fq0
Interrupt an earlier timeout event in Simpy
I tried to combine 2 codes but It didn't work.
Example, when M = 2, K = 50
Customer 1 arrives and waits
Customer 2 arrives, enough 2 customers then request a server.
Server 1 is SETUP in t1 secs.
Customer 3 arrives and waits
Customer 4 enough 2 customers then request a server.
Server 2 is SETUP in t1 secs.
Server 1 is ON.
Customers 1 and 2 occupied server 1
Customer 1 and 2 completes the service and leaves the system.
Customers 3 and 4 occupied server 1 (because when server 1 finishes
Server 2 is still in the setup process)
Server 2 (still in SETUP mode) is turned off...
... Customer 100 arrives and sees the system has 50 customers, then customer 100 balk
Broke customer arrivals into two parts. A first queue that where the customer waits until there is enough customers to make a batch. When I have enough customers to make a batch, I do so, popping the batched customers from the batching queue, and putting the batch in a processing queue. I count the customers in both queues to see if a arriving customer aborts entry.
When a batch is put in the processing queue, I also start up a server. This means that the number of batches in the processing queue will equal the number of servers starting up. This also means that when a server finishes starting up, there will be a batch to process. Since there will never be a wait for a batch, I use a simple list for my queue.
When a batch starts up, it grabs a batch, and removes itself from the list of starting servers. After the server finishes processing a batch, it checks if there is a batch in the processing queue. If so, grab the batch and keep processing, but also kill the server that is starting up to process the batch. If no batches in the processing queue, shut down.
Here is the code. You should see in the log the queues max out and customers abort, but also see servers start to shut down towards the end
"""
Simulation of servers processing batches
Customers enter a queue where they wait for
enough customers to make a batch
If the there are too many customers in the queues
the arriving customer will abort
When a batch is made, it is put into a second
processing queue where the batch waits to be processed.
When a batch is put into the processing queue, it
starts a server. The server has a start up delay
then loops by seizing a batch, process the batch, release
the batch, checking if another batch is in the
processing queue. If there is another batch, stop a server
that is starting up and process the batch, else end loop
and shutdown server
Programmer: Michael R. Gibbs
"""
import simpy
import random
max_q_size = 50
batch_size = 2
server_start_time = 55
processing_time = lambda : random.triangular(5,20,10)
arrival_gap = lambda : random.triangular(1,1,1)
# there is no wating so normal lists are good enough
batching_q = list()
processing_q = list()
server_q = list() # servers that are still starting up
class Server():
"""
Server that process batches
Has two states: starting up, and batch processing
"""
def __init__(self, id, env, processing_q, server_q):
self.id = id
self.env = env
self.processing_q = processing_q
self.server_q = server_q
self.start_process = self.env.process(self.start_up())
def start_up(self):
"""
starts up the server, then start processing batches
start up can be interrupted, stoping the server
"""
# start up
try:
print(f'{self.env.now} server {self.id} starting up')
yield self.env.timeout(server_start_time)
print(f'{self.env.now} server {self.id} started')
self.env.process(self.process())
except simpy.Interrupt:
print(f'{env.now} server {self.id} has been interupted')
def process(self):
"""
process batches
keeps going as long as there are batches in queue
If starts second batch, also interupts starting up server
"""
while True:
print(f'{self.env.now} server {self.id} starting batch process')
b = processing_q.pop(0)
yield self.env.timeout(processing_time())
print(f'{self.env.now} server {self.id} finish batch process')
if len(self.processing_q) > 0:
# more processes to do,
# steal batch from starting up server
s = self.server_q.pop() # lifo
s.stop()
else:
print(f'{env.now} server {self.id} no more batches, shutting down')
break
def stop(self):
"""
Interrupts server start up, stoping server
"""
try:
self.start_process.interrupt()
except:
pass
def gen_arrivals(env, batching_q, processing_q, server_q):
"""
Generate arring customers
If queues are too big customer will abort
If have enough customers, create a batch and start a server
"""
id = 1
while True:
yield env.timeout(arrival_gap())
q_size = len(batching_q) + (batch_size * len(processing_q))
if q_size >= max_q_size:
print(f'{env.now} customer arrived and aborted, q len: {q_size}')
else:
print(f'{env.now} customer has arrived, q len: {q_size}')
customer = object()
batching_q.append(customer)
# check if a batch can be creatd
while len(batching_q) >= batch_size:
batch = list()
while len(batch) < batch_size:
batch.append(batching_q.pop(0))
# put batch in processing q
processing_q.append(batch)
# start server
server = Server(id, env, processing_q, server_q)
id += 1
server_q.append(server)
# boot up sim
env = simpy.Environment()
env.process(gen_arrivals(env, batching_q, processing_q, server_q))
env.run(100)
When I add a condition to limit the number of servers, it works until a server was interrupted or shut down. Then, these servers seem to have disappeared and no longer active.
Sorry for asking you too much. Here is my code:
import simpy
import random
import numpy as np
class param:
def __init__(self, x):
#self.FILE = 'Setup_time.csv'
self.MEAN_INTERARRIVAL = x # arrival_gap
self.MEAN_SERVICE_TIME = 2 # processing_time
self.MEAN_SWITCH_TIME = 3 # server_start_time
self.NUM_OF_SERVER = 4 # maximum number of servers
self.MAX_SYS_SIZE = 10 # maximum number of customers in the system
self.BATCH_SIZE = 2
self.RANDOM_SEED = 0
# there is no wating so normal lists are good enough
class Server():
"""
Server that process batches
Has two states: starting up, and batch processing
"""
def __init__(self, id, env, processing_q, server_q, param):
self.id = id
self.env = env
self.processing_q = processing_q
self.server_q = server_q
self.start_process = self.env.process(self.start_up(param))
def start_up(self, param):
"""
starts up the server, then start processing batches
start up can be interrupted, stoping the server
"""
global num_servers
# start up
if self.id <= param.NUM_OF_SERVER: # I add the condition to limit the number of servers
try:
num_servers += 1
print(f'{self.env.now} server {self.id} starting up')
yield self.env.timeout(param.MEAN_SWITCH_TIME)
#yield env.timeout(np.random.exponential(1/param.MEAN_SWITCH_TIME))
print(f'{self.env.now} server {self.id} started')
self.env.process(self.process(param))
except simpy.Interrupt:
print(f'{env.now} server {self.id} has been interupted-------------------')
def process(self, param):
"""
process batches
keeps going as long as there are batches in queue
If starts second batch, also interupts starting up server
"""
global num_servers, num_active_server
while True:
num_active_server += 1
b = processing_q.pop(0)
print(f'{self.env.now} server {self.id} starting batch process')
yield self.env.timeout(param.MEAN_SERVICE_TIME)
#yield env.timeout(np.random.exponential(1/param.MEAN_SERVICE_TIME))
num_servers -= 1
num_active_server -= 1
print(f'{self.env.now} server {self.id} finish batch process')
if len(self.processing_q) > 0:
# more processes to do,
# steal batch from starting up server
#if self.server_q:
#s = self.server_q.pop(0) # Do these lines work for FIFO rule?
#s.stop()
s = self.server_q.pop() # lifo
s.stop()
else:
print(f'{env.now} server {self.id} no more batches, shutting down')
break
def stop(self):
"""
Interrupts server start up, stoping server
"""
try:
self.start_process.interrupt()
except:
pass
def gen_arrivals(env, batching_q, processing_q, server_q, param):
"""
Generate arring customers
If queues are too big customer will abort
If have enough customers, create a batch and start a server
"""
global num_servers, num_balk, num_cumulative_customer, num_active_server
id = 1
while True:
yield env.timeout(param.MEAN_INTERARRIVAL)
#yield env.timeout(np.random.exponential(1/param.MEAN_INTERARRIVAL))
num_cumulative_customer += 1
customer = object()
batching_q.append(customer)
q_size = len(batching_q) + (param.BATCH_SIZE * len(processing_q))
sys_size = q_size + (num_active_server * param.BATCH_SIZE)
#if q_size > max_q_size:
if sys_size > param.MAX_SYS_SIZE: # I check the limited condition for number customer in system instead of number customer in queue
num_balk += 1
batching_q.pop(-1) # I added the statement
print(f'{env.now} customer arrived and aborted, sys len: {sys_size }')
else:
#customer = object() # I moved these 2 lines above to update system size before using the if statement
#batching_q.append(customer)
print(f'{env.now} customer has arrived, q len: {q_size}, sys len: {sys_size}')
# check if a batch can be creatd
while len(batching_q) >= param.BATCH_SIZE:
batch = list()
while len(batch) < param.BATCH_SIZE:
batch.append(batching_q.pop(0))
# put batch in processing q
processing_q.append(batch)
# start server
server = Server(id, env, processing_q, server_q, param)
id += 1
server_q.append(server)
#Calculate balking probability
prob_balk = num_balk/num_cumulative_customer
#print(f'{env.now} prob_balk {prob_balk}')
list_prob_balk.append(prob_balk)
# boot up sim
trial = 0
Pb= [] #balking probability
global customer_balk_number
for x in range(1,3):
trial += 1
print('trial:', trial)
batching_q = list()
processing_q = list()
server_q = list() # servers that are still starting up
num_servers = 0 # number of server in system (both starting and serving server)
num_active_server = 0 # number of servers serving customers
num_balk = 0 # number of balking customers
num_cumulative_customer = 0 # total arriving customers
list_prob_balk = [] #list balk prob each trial
paramtest1 = param(x)
random.seed(paramtest1.RANDOM_SEED)
# create and start the model
env = simpy.Environment()
env.process(gen_arrivals(env, batching_q, processing_q, server_q, paramtest1))
env.run(30)
Pb.append(list_prob_balk[-1])
#print('List of balk prob', Pb )
I would like to write a daemon in Python that wakes up periodically to process some data queued up in a RabbitMQ queue.
When the daemon wakes up, it should consume all messages in the queue (or min(len(queue), N), where N is some arbitrary number) because it's better for the data to be processed in batches. Is there a way of doing this in pika, as opposed to passing in a callback that gets called on every message arrival?
Thanks.
Here is the code written using pika. A similar function can be written using basic.get
The below code will make use of channel.consume to start consuming messages. We break out/stop when the desired number of messages is reached.
I have set a batch_size to prevent pulling of huge number of messages at once. You can always change the batch_size to fit your needs.
from pika import BasicProperties, URLParameters
from pika.adapters.blocking_connection import BlockingChannel, BlockingConnection
from pika.exceptions import ChannelWrongStateError, StreamLostError, AMQPConnectionError
from pika.exchange_type import ExchangeType
import json
def consume_messages(queue_name: str):
msgs = list([])
batch_size = 500
q = channel.queue_declare(queue_name, durable=True, exclusive=False, auto_delete=False)
q_length = q.method.message_count
if not q_length:
return msgs
msgs_limit = batch_size if q_length > batch_size else q_length
try:
# Get messages and break out
for method_frame, properties, body in channel.consume(queue_name):
# Append the message
try:
msgs.append(json.loads(bytes.decode(body)))
except:
logger.info(f"Rabbit Consumer : Received message in wrong format {str(body)}")
# Acknowledge the message
channel.basic_ack(method_frame.delivery_tag)
# Escape out of the loop when desired msgs are fetched
if method_frame.delivery_tag == msgs_limit:
# Cancel the consumer and return any pending messages
requeued_messages = channel.cancel()
print('Requeued %i messages' % requeued_messages)
break
except (ChannelWrongStateError, StreamLostError, AMQPConnectionError) as e:
logger.info(f'Connection Interrupted: {str(e)}')
finally:
# Close the channel and the connection
channel.stop_consuming()
channel.close()
return msgs
You can use the basic.get API, which pulls messages from the brokers, instead of subscribin for the messages to be pushed
I have developed a machine learning python script (let's call it classify_obj written with python 3.6) that imports TensorFlow. It was developed initially for bulk analysis but now I find the need to run this script repeatedly on smaller datasets to cater for more real time usage. I am doing this on Linux RH7.
Process Flow:
Master tool (written in Java) call classify_obj with object input to categorize.
classify_obj generates the classification result as a csv (takes about 7-10s)
Master tool reads the result from #2
Master tool proceeds to do other logic
Repeat #1 with next object input
To breakdown the time taken, I switched off the main logic and just do the modules import without performing any other action. I found that the import takes about 4-5s out of the 7-10s run time on the small dataset. The classification takes about 2s. I am also looking at other ways to reduce the run time for other areas but the bulk seems to be from the import.
Import time: 4-6s
Classify time: 1s
Read, write and other logic time: 0.2s
I am thinking what options are there to reduce the import time?
One idea I had was to modify the classify_obj into a "stay alive" process. The master tool after completing all its activity will stop this process/service. The intent (not sure if this would be the case) is that all the required libraries are already loaded during the process start and when the master tool calls that process/service, it will only incur the classification time instead of needing to import the libraries repeated.
What do you think about this? Also how can I set this up on Linux RHEL 7.4? Some reference links would be greatly appreciated.
Other suggestion would be greatly appreciated.
Thanks and have a great day!
This is the solution I designed to achieve the above.
Reference: https://realpython.com/python-sockets/
I have to create 2 scripts.
1. client python script: Used to pass the raw data to be classified to the server python script using socket programming.
server python script: Loads the keras (tensorflow) lib and model at launch. Continues to stay alive until a 'stop' request from client (to exit the while loop). When the client script sends the data to the server script, server script will process the incoming data and return a ok/not ok output back to the client script.
In the end, the classification time is reduced to 0.1 - 0.3s.
Client Script
import socket
import argparse
from argparse import ArgumentParser
def main():
parser = ArgumentParser(description='XXXXX')
parser.add_argument('-i','--input', default='NA', help='Input txt file path')
parser.add_argument('-o','--output', default='NA', help='Output csv path with class')
parser.add_argument('-stop','--stop', default='no', help='Stop the server script')
args = parser.parse_args()
str = args.input + ',' + args.output + ',' + args.stop
HOST = '127.0.0.1' # The server's hostname or IP address
PORT = 65432 # The port used by the server
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((HOST, PORT))
bytedata = str.encode()
sock.send(bytedata)
data = sock.recv(1024)
print('Received', data)
if __name__== "__main__":
main()
Server Script
def main():
HOST = '127.0.0.1' # Standard loopback interface address (localhost)
PORT = 65432 # Port to listen on (non-privileged ports are > 1023)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.bind((HOST,PORT))
sock.listen(5)
stop_process = 'no'
while (stop_process == 'no'):
# print('Waiting for connection')
conn, addr = sock.accept()
data = ''
try:
# print('Connected by', addr)
while True:
data = conn.recv(1024)
if data:
stop_process = process_input(data) # process_input function processes incoming data. If client sends 'yes' for the stop argument, the stop_process variable will be set to 'yes' by the function.
byte_reply = stop_process.encode()
conn.sendall(byte_reply) # send reply back to client
else:
break
conn.close()
# print('Closing connection',addr)
finally:
conn.close()
if __name__== "__main__":
main()
The excellent redis documentation lists a Reliable queue pattern as a good candidate/example for the RPOPLPUSH function.
I understand "reliable queue" to be something with delivery patterns like Amazon SQS FIFO exactly once pattern.
Specifically, you have some N processes feeding into a queue, and some M workers working from the queue. What does this actually look like as an implementation?
I would venture something like:
Make the feeder process populating the work queue.
# feeder1
import redis
import datetime
import time
r = redis.Redis(host='localhost', port=6379, db=0)
while True:
now = datetime.datetime.now()
value_to_work_on = "f1:{}".format(now.second)
r.push('workqueue', value_to_work_on)
time.sleep(1)
Make another
# f2
import redis
import datetime
import time
r = redis.Redis(host='localhost', port=6379, db=0)
while True:
now = datetime.datetime.now()
value_to_work_on = "f2:{}".format(now.second)
r.push('workqueue', value_to_work_on)
time.sleep(1)
Now make the workers
# worker1
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
def do_work(x):
print(x)
return True
while True:
todo = r.rpoplpush("workqueue" "donequeue")
if do_work(todo):
print("success")
else:
r.push("workqueue", todo)
# worker2 is exactly the same, just running elsewhere.
My questions are:
Is this generally what they mean in the documentation? If not, can you provide a fix as an answer?
This seems still incomplete and not really reliable. For example, should there be alternative lists for error vs complete queues? One for every possible error state? What happens if your Redis goes down during processing?
As #rainhacker pointed out in comments, it is now recommended to use Redis Streams for this instead of the recipe described in "Pattern: Reliable Queue"
I'm trying to enqueue a basic job in redis using python-rq, But it throws this error
"ValueError: Functions from the main module cannot be processed by workers"
Here is my program:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
Break the provided code to two files:
count_words.py:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
and main.py (where you'll import the required function):
from rq import Connection, Queue
from redis import Redis
from count_words import count_words_at_url # added import!
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
I always separate the tasks from the logic running those tasks to different files. It's just better organization. Also note that you can define a class of tasks and import/schedule tasks from that class instead of the (over-simplified) structure I suggest above. This should get you going..
Also see here to confirm you're not the first to struggle with this example. RQ is great once you get the hang of it.
Currently there is a bug in RQ, which leads to this error. You will not be able to pass functions in enqueue from the same file without explicitly importing it.
Just add from app import count_words_at_url above the enqueue function:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
from app import count_words_at_url
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
The other way is to have the functions in a separate file and import them.