Why is Python object id different after the Process starts but the pid remains the same? - python-multiprocessing

"""
import time
from multiprocessing import Process, freeze_support
class FileUploadManager(Process):
"""
WorkerObject which uploads files in background process
"""
def __init__(self):
"""
Worker class to upload files in a separate background process.
"""
super().__init__()
self.daemon = True
self.upload_size = 0
self.upload_queue = set()
self.pending_uploads = set()
self.completed_uploads = set()
self.status_info = {'STOPPED'}
print(f"Initial ID: {id(self)}")
def run(self):
try:
print("STARTING NEW PROCESS...\n")
if 'STARTED' in self.status_info:
print("Upload Manager - Already Running!")
return True
self.status_info.add('STARTED')
print(f"Active Process Info: {self.status_info}, ID: {id(self)}")
# Upload files
while True:
print("File Upload Queue Empty.")
time.sleep(10)
except Exception as e:
print(f"{repr(e)} - Cannot run upload process.")
if __name__ == '__main__':
upload_manager = FileUploadManager()
print(f"Object ID: {id(upload_manager)}")
upload_manager.start()
print(f"Process Info: {upload_manager.status_info}, ID After: {id(upload_manager)}")
while 'STARTED' not in upload_manager.status_info:
print(f"Not Started! Process Info: {upload_manager.status_info}")
time.sleep(7)
"""
OUTPUT
Initial ID: 2894698869712
Object ID: 2894698869712
Process Info: {'STOPPED'}, ID After: 2894698869712
Not Started! Process Info: {'STOPPED'}
STARTING NEW PROCESS...
Active Process Info: {'STARTED', 'STOPPED'}, ID: 2585771578512
File Upload Queue Empty.
Not Started! Process Info: {'STOPPED'}
File Upload Queue Empty.
Why does the Process object have the same id and attribute values before and after is has started. but different id when the run method starts?
Initial ID: 2894698869712
Active Process Info: {'STARTED', 'STOPPED'}, ID: 2585771578512
Process Info: {'STOPPED'}, ID After: 2894698869712

I fixed your indentation, and I also removed everything from your script that was not actually being used. It is now a minimum, reproducible example that anyone can run. In the future, please adhere to the site guidelines, and please proofread your questions. It will save everybody's time and you will get better answers.
I would also like to point out that the question in your title is not at all the same as the question asked in your text. At no point do you retrieve the process ID, which is an operating system value. You are printing out the ID of the object, which is a value that has meaning only within the Python runtime environment.
import time
from multiprocessing import Process
# Removed freeze_support since it was unused
class FileUploadManager(Process):
"""
WorkerObject which uploads files in background process
"""
def __init__(self):
"""
Worker class to upload files in a separate background process.
"""
super().__init__(daemon=True)
# The next line probably does not work as intended, so
# I commented it out. The docs say that the daemon
# flag must be set by a keyword-only argument
# self.daemon = True
# I removed a buch of unused variables for this test program
self.status_info = {'STOPPED'}
print(f"Initial ID: {id(self)}")
def run(self):
try:
print("STARTING NEW PROCESS...\n")
if 'STARTED' in self.status_info:
print("Upload Manager - Already Running!")
return # Removed True return value (it was unused)
self.status_info.add('STARTED')
print(f"Active Process Info: {self.status_info}, ID: {id(self)}")
# Upload files
while True:
print("File Upload Queue Empty.")
time.sleep(1.0)
except Exception as e:
print(f"{repr(e)} - Cannot run upload process.")
if __name__ == '__main__':
upload_manager = FileUploadManager()
print(f"Object ID: {id(upload_manager)}")
upload_manager.start()
print(f"Process Info: {upload_manager.status_info}",
f"ID After: {id(upload_manager)}")
while 'STARTED' not in upload_manager.status_info:
print(f"Not Started! Process Info: {upload_manager.status_info}")
time.sleep(0.7)
Your question is, why is the id of upload_manager the same before and after it is started. Simple answer: because it's the same object. It does not become another object just because you called one of its functions. That would not make any sense.
I suppose you might be wondering why the ID of the FileUploadManager object is different when you print it out from its "run" method. It's the same simple answer: because it's a different object. Your script actually creates two instances of FileUploadManager, although it's not obvious. In Python, each Process has its own memory space. When you start a secondary Process (upload_manager.start()), Python makes a second instance of FileUploadManager to execute in this new Process. The two instances are completely separate and "know" nothing about each other.
You did not say that your script doesn't terminate, but it actually does not. It runs forever, stuck in the loop while 'STARTED' not in upload_manager.status_info. That's because 'STARTED' was added to self.status_info in the secondary Process. That Process is working with a different instance of FileUploadManager. The changes you make there do not get automatically reflected in the first instance, which lives in the main Process. Therefore the first instance of FileUploadManager never changes, and the loop never exits.
This all makes perfect sense once you realize that each Process works with its own separate objects. If you need to pass data from one Process to another, that can be done with Pipes, Queues, Managers and shared variables. That is documented in the Concurrent Execution section of the standard library.

Related

GtkTreeView stops updating unless I change the focus of the window

I have a GtkTreeView object that uses a GtkListStore model that is constantly being updated as follows:
Get new transaction
Feed data into numpy array
Convert numbers to formatted strings, store in pandas dataframe
Add updated token info to GtkListStore via GtkListStore.set(titer, liststore_cols, liststore_data), where liststore_data is the updated info, liststore_cols is the name of the columns (both are lists).
Here's the function that updates the ListStore:
# update ListStore
titer = ls_full.get_iter(row)
liststore_data = []
[liststore_data.append(df.at[row, col])
for col in my_vars['ls_full'][3:]]
# check for NaN value, add a (space) placeholder is necessary
for i in range(3, len(liststore_data)):
if liststore_data[i] != liststore_data[i]:
liststore_data[i] = " "
liststore_cols = []
[liststore_cols.append(my_vars['ls_full'].index(col) + 1)
for col in my_vars['ls_full'][3:]]
ls_full.set(titer, liststore_cols, liststore_data)
Class that gets the messages from the websocket:
class MyWebsocketClient(cbpro.WebsocketClient):
# class exceptions to WebsocketClient
def on_open(self):
# sets up ticker Symbol, subscriptions for socket feed
self.url = "wss://ws-feed.pro.coinbase.com/"
self.channels = ['ticker']
self.products = list(cbp_symbols.keys())
def on_message(self, msg):
# gets latest message from socket, sends off to be processed
if "best_ask" and "time" in msg:
# checks to see if token price has changed before updating
update_needed = parse_data(msg)
if update_needed:
update_ListStore(msg)
else:
print(f'Bad message: {msg}')
When the program first starts, the updates are consistent. Each time a new transaction comes in, the screen reflects it, updating the proper token. However, after a random amount of time - seen it anywhere from 5 minutes to over an hour - the screen will stop updating, unless I change the focus of the window (either activate or inactive). This does not last long, though (only enough to update the screen once). No other errors are being reported, memory usage is not spiking (constant at 140 MB).
How can I troubleshoot this? I'm not even sure where to begin. The data back-ends seem to be OK (data is never corrupted nor lags behind).
As you've said in the comments that it is running in a separate thread then i'd suggest wrapping your "update liststore" function with GLib.idle_add.
from gi.repository import GLib
GLib.idle_add(update_liststore)
I've had similar issues in the past and this fixed things. Sometimes updating liststore is fine, sometimes it will randomly spew errors.
Basically only one thread should update the GUI at a time. So by wrapping in GLib.idle_add() you make sure your background thread does not intefer with the main thread updating the GUI.

Most elegant way to execute CPU-bound operations in asyncio application?

I am trying to develop part of system that has the following requirement:
send health status to a remote server(every X seconds)
receive request for executing/canceling CPU bound job(s)(for example - clone git repo, compile(using conan) it.. etc).
I am using the socketio.AsyncClient to handle these requirements.
class CompileJobHandler(socketio.AsyncClientNamespace):
def __init__(self, namespace_val):
super().__init__(namespace_val)
// some init variables
async def _clone_git_repo(self, git_repo: str):
// clone repo and return its instance
return repo
async def on_availability_check(self, data):
// the health status
await self.emit('availability_check', " all good ")
async def on_cancel_job(self, data):
// cancel the current job
def _reset_job(self):
// reset job logics
def _reset_to_specific_commit(self, repo: git.Repo, commit_hash: str):
// reset to specific commit
def _compile(self, is_debug):
// compile logics - might be CPU intensive
async def on_execute_job(self, data):
// **request to execute the job(compile in our case)**
try:
repo = self._clone_git_repo(job_details.git_repo)
self._reset_to_specific_commit(repo, job_details.commit_hash)
self._compile(job_details.is_debug)
await self.emit('execute_job_response',
self._prepare_response("SUCCESS", "compile successfully"))
except Exception as e:
await self.emit('execute_job_response',
self._prepare_response(e.args[0], e.args[1]))
finally:
await self._reset_job()
The problem with the following code is that when execute_job message arrives, there is a blocking code running that blocks the whole async-io system.
to solve this problem, I have used the ProcessPoolExecutor and the asyncio event loop, as shown here: https://stackoverflow.com/questions/49978320/asyncio-run-in-executor-using-processpoolexecutor
after using it, the clone/compile functions are executed in another process - so that almost achieves my goals.
the questions I have are:
How can I design the code of the process more elegantly?(right now I have some static functions, and I don't like it...)
one approach is to keep it like that, another one is to pre-initialize an object(let's call it CompileExecuter and create instance of this type, and pre-iniailize it prior starting the process, and then let the process use it)
How can I stop the process in the middle of its execution?(if I received on_cancel_job request)
How can I handle the exception raised by the process correctly?
Other approaches to handle these requirements are welcomed

Twisted deferreds block when URI is the same (multiple calls from the same browser)

I have the following code
# -*- coding: utf-8 -*-
# 好
##########################################
import time
from twisted.internet import reactor, threads
from twisted.web.server import Site, NOT_DONE_YET
from twisted.web.resource import Resource
##########################################
class Website(Resource):
def getChild(self, name, request):
return self
def render(self, request):
if request.path == "/sleep":
duration = 3
if 'duration' in request.args:
duration = int(request.args['duration'][0])
message = 'no message'
if 'message' in request.args:
message = request.args['message'][0]
#-------------------------------------
def deferred_activity():
print 'starting to wait', message
time.sleep(duration)
request.setHeader('Content-Type', 'text/plain; charset=UTF-8')
request.write(message)
print 'finished', message
request.finish()
#-------------------------------------
def responseFailed(err, deferred):
pass; print err.getErrorMessage()
deferred.cancel()
#-------------------------------------
def deferredFailed(err, deferred):
pass; # print err.getErrorMessage()
#-------------------------------------
deferred = threads.deferToThread(deferred_activity)
deferred.addErrback(deferredFailed, deferred) # will get called indirectly by responseFailed
request.notifyFinish().addErrback(responseFailed, deferred) # to handle client disconnects
#-------------------------------------
return NOT_DONE_YET
else:
return 'nothing at', request.path
##########################################
reactor.listenTCP(321, Site(Website()))
print 'starting to serve'
reactor.run()
##########################################
# http://localhost:321/sleep?duration=3&message=test1
# http://localhost:321/sleep?duration=3&message=test2
##########################################
My issue is the following:
When I open two tabs in the browser, point one at http://localhost:321/sleep?duration=3&message=test1 and the other at http://localhost:321/sleep?duration=3&message=test2 (the messages differ) and reload the first tab and then ASAP the second one, then the finish almost at the same time. The first tab about 3 seconds after hitting F5, the second tab finishes about half a second after the first tab.
This is expected, as each request got deferred into a thread, and they are sleeping in parallel.
But when I now change the URL of the second tab to be the same as the one of the first tab, that is to http://localhost:321/sleep?duration=3&message=test1, then all this becomes blocking. If I press F5 on the first tab and as quickly as possible F5 on the second one, the second tab finishes about 3 seconds after the first one. They don't get executed in parallel.
As long as the entire URI is the same in both tabs, this server starts to block. This is the same in Firefox as well as in Chrome. But when I start one in Chrome and another one in Firefox at the same time, then it is non-blocking again.
So it may not neccessarily be related to Twisted, but maybe because of some connection reusage or something like that.
Anyone knows what is happening here and how I can solve this issue?
Coincidentally, someone asked a related question over at the Tornado section. As you suspected, this is not an "issue" in Twisted but rather a "feature" of web browsers :). Tornado's FAQ page has a small section dedicated to this issue. The proposed solution is appending an arbitrary query string.
Quote of the day:
One dev's bug is another dev's undocumented feature!

Run Python with IDLE on a Windows machine, put a part of the code on background so that IDLE is still active to receive command

class PS():
def __init__(self):
self.PSU_thread=Process(target=self.read(199),)
self.PSU_thread.start()
def read():
while running:
"read the power supply"
def set(current):
"set the current"
if __name__ == '__main__':
p=PS()
Basically the idea of the code is to read the data of the power supply and at the same time to have the IDLE active and can accept command to control it set(current). The problem we are having is once the object p is initialized, the while loop will occupy the IDLE terminal such that the terminal cannot accept any command any more.
We have consider to create a service, but does it mean we have to make the whole code into a service?
Please suggest me any possible solutions, we want it to run but still be able to receive my command from IDLE.
Idle, as its name suggests, is a program development environment. It is not meant for production running, and you should not use it for that, especially not for what you describe. Once you have a program written, just run it with Python.
It sounds like what you need is a gui program, such as one based on tkinter. Here is a simulation of what I understand you to be asking for.
import random
import tkinter as tk
root = tk.Tk()
psu_volts = tk.IntVar(root)
tk.Label(root, text='Mock PSU').grid(row=0, column=0)
psu = tk.Scale(root, orient=tk.HORIZONTAL, showvalue=0, variable=psu_volts)
psu.grid(row=0, column=1)
def drift():
psu_volts.set(psu_volts.get() + random.randint(0, 8) - 4)
root.after(200, drift)
drift()
volts_read=tk.IntVar(root)
tk.Label(root, text='PSU Volts').grid(row=1, column=0)
tk.Label(root, textvariable=volts_read).grid(row=1, column=1)
def read_psu():
volts_read.set(psu_volts.get())
root.after(2000, read_psu)
read_psu()
lb = tk.Label(root, text="Enter 'from=n' or 'to=n', where n is an integer")
lb.grid(row=2, column=0, columnspan=2)
envar = tk.StringVar()
entry = tk.Entry(textvariable=envar)
entry.grid(row=3, column=0)
def psu_set():
try:
cmd, val = envar.get().split('=')
psu[cmd.strip()] = val
psu_volts.set((psu['to']-psu['from'])//2)
except Exception:
pass
envar.set('')
tk.Button(root, text='Change PSU', command=psu_set).grid(row=3, column=1)
root.mainloop()
Think of psu as a 'black box' and psu_volts.get and .set as the means of interacting with the box. You would have to substitute your in read and write code. Copy and save to a file. This either run it with Python or open in Idle to change it and run it.

Celery: Task Singleton?

I have a task that I need to run asynchronously from the web page that triggered it. This task runs rather long, and as the web page could be getting a lot of these requests, I'd like celery to only run one instance of this task at a given time.
Is there any way I can do this in Celery natively? I'm tempted to create a database table that holds this state for all the tasks to communicate with, but it feels hacky.
You probably can create a dedicated worker for that task configured with CELERYD_CONCURRENCY=1 then all tasks on that worker will run synchronously
You can use memcache/redis for that.
There is an example on the celery official site - http://docs.celeryproject.org/en/latest/tutorials/task-cookbook.html
And if you prefer redis (This is a Django realization, but you can also easily modify it for your needs):
from django.core.cache import cache
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
class SingletonTask(Task):
def __call__(self, *args, **kwargs):
lock = cache.lock(self.name)
if not lock.acquire(blocking=False):
logger.info("{} failed to lock".format(self.name))
return
try:
super(SingletonTask, self).__call__(*args, **kwargs)
except Exception as e:
lock.release()
raise e
lock.release()
And then use it as a base task:
#shared_task(base=SingletonTask)
def test_task():
from time import sleep
sleep(10)
This realization is nonblocking. If you want next task to wait for the previous task change blocking=False to blocking=True and add timeout