Passing a Queue with concurrent.futures regardless of executor type - python-multiprocessing

Working up from threads to processes, I have switched to concurrent.futures, and would like to gain/retain flexibility in switching between a ThreadPoolExecutor and a ProcessPoolExecutor for various scenarios. However, despite the promise of a unified facade, I am having a hard time passing multiprocessing Queue objects as arguments on the futures.submit() when I switch to using a ProcessPoolExecutor:
import multiprocessing as mp
import concurrent.futures
def foo(q):
q.put('hello')
if __name__ == '__main__':
executor = concurrent.futures.ProcessPoolExecutor()
q = mp.Queue()
p = executor.submit(foo, q)
p.result()
print(q.get())
bumps into the following exception coming from multiprocessing's code:
RuntimeError: Queue objects should only be shared between processes through inheritance
which I believe means it doesn't like receiving the queue as an argument, but rather expects to (not in any OOP sense) "inherit it" on the multiprocessing fork rather than getting it as an argument.
The twist is that with bare-bones multiprocessing, meaning when not using it through the facade which concurrent.futures is ― there seems to be no such limitation, as the following code seamlessly works:
import multiprocessing as mp
def foo(q):
q.put('hello')
if __name__ == '__main__':
q = mp.Queue()
p = mp.Process(target=foo, args=(q,))
p.start()
p.join()
print(q.get())
I wonder what am I missing about this ― how can I make the ProcessPoolExecutor accept the queue as an argument when using concurrent.futures the same as it does when using the ThreadPoolExecutor or multiprocessing very directly like shown right above?

Related

Why is my multiprocessing program spawning processes infinitely?

import time
from multiprocessing import Pool, RawArray, sharedctypes
from ctypes import c_int
def init_worker(X):
print(f"{X}")
def worker_func(i):
print(f"{X}")
time.sleep(i) # Some heavy computations
return
# We need this check for Windows to prevent infinitely spawning new child
# processes.
if __name__ == '__main__':
X =sharedctypes.RawValue(c_int)
X=3
with Pool(processes=4, initializer=init_worker, initargs=(X)) as pool:
pool.map(worker_func, [1,2,3,4])
print(X)
--- I am simply trying to print the value of X in each subprocess. This is a toy program in order to check whether I can share a value
and update it by using multiple processes.
This program spawns infinite number of processes because it there is no ,(comma) after the X in initargs=(X) ; it should be initargs=(X,) . That is because, if the comma is left-out that causes an error.

Parallelizing apply function in pandas taking longer than expected

I have a simple cleaner function which removes special characters from a dataframe (and other preprocessing stuff). My dataset is huge and I want to make use of multiprocessing to improve performance. My idea was to break the dataset into chunks and run this cleaner function in parallel on each of them.
I used dask library and also the multiprocessing module from python. However, it seems like the application is stuck and is taking longer than running with a single core.
This is my code:
from multiprocessing import Pool
def parallelize_dataframe(df, func):
df_split = np.array_split(df, num_partitions)
pool = Pool(num_cores)
df = pd.concat(pool.map(func, df_split))
pool.close()
pool.join()
return df
def process_columns(data):
for i in data.columns:
data[i] = data[i].apply(cleaner_func)
return data
mydf2 = parallelize_dataframe(mydf, process_columns)
I can see from the resource monitor that all cores are being used, but as I said before, the application is stuck.
P.S.
I ran this on windows server 2012 (where the issue happens). Running this code on unix env, I was actually able to see some benefit from the multiprocessing library.
Thanks in advance.

multiprocessing code gets stuck

I am using python 2.7 on windows 7 and I am currently trying to learn parallel processing.
I downloaded the multiprocessing 2.6.2.1 python package and installed it using pip.
When I try to run the foolowing very simple code, the program seems to get stuck, even after one hour it doesn't exit the execution despite the code to be super simple.
What am I missing?? thank you very much
from multiprocessing import Pool
def f(x):
return x*x
array =[1,2,3,4,5]
p=Pool()
result = p.map(f, array)
p.close()
p.join()
print result
The issue here is the way multiprocessing works. Think of it as python opening a new instance and importing all the modules all over again. You'll want to use the if __name__ == '__main__' convention. The following works fine:
import multiprocessing
def f(x):
return x * x
def main():
p = multiprocessing.Pool(multiprocessing.cpu_count())
result = p.imap(f, xrange(1, 6))
print list(result)
if __name__ == '__main__':
main()
I have changed a few other parts of the code too so you can see other ways to achieve the same thing, but ultimately you only need to stop the code executing over and over as python re-imports the code you are running.

QThread doesn't appear to start; PyQt5, Python 2.7.9

SUMMARY
PyQt5 doesn't appear to be creating a new thread corresponding to QThread object, or I haven't established Slot/Signal linkage correctly. Please help me to isolate my problem.
I'm a relatively casual user of Python, but I've been asked to create a utility for another team that wraps some of their Python libraries (which themselves wrap C++) in a GUI. Because this utility is for another team, I can't change versions of compilers etc, or at least, not without providing a decent reason.
The utility is intended to provide an interface for debugging into some hardware that my colleagues are developing.
After examining the options, I decided to use Qt and the PyQt bindings. The steps I followed were:
Install Visual Studio 2010 SP1 (required because other team's libraries are compiled using this version of the MS compiler).
Install Python 2.7.9 (their version of Python)
Install qt-opensource-windows-x86-msvc2010-5.2.1.exe
Get source for SIP-4.18.zip and compile and install
Get source for PyQt-gpl-5.2.1.zip, compile and install
Try to build a PyQt application that wraps the other team's comms and translation libraries. Those libraries aren't asynchronous as far as I can tell, so I think that I need to separate that part of the application from the GUI.
The code that I've written produces the UI and is responsive in the sense that if I put break points in the methods that are called from the QAction objects, then those break points are appropriately triggered. My problem is that the Worker object that I create doesn't appear to move to a separate thread, (despite the call to moveToThread) because if I make the connection of type BlockingQueuedConnection instead of QueuedConnection then I get a deadlock. Breakpoints that I put on the slots in the Worker type are never triggered.
Here's the code::
import os
import sys
import time
from PyQt5.QtWidgets import QMainWindow, QTextEdit, QAction, QApplication, QStatusBar, QLabel, QWidget, QDesktopWidget, QInputDialog
from PyQt5.QtGui import QIcon
from PyQt5.QtCore import Qt, QThread, QObject, pyqtSignal, pyqtSlot
class Worker(QObject):
def __init__(self):
super(Worker, self).__init__()
self._isRunning = True
self._connectionId = ""
self._terminate = False
#pyqtSlot()
def cmd_start_running(self):
"""This slot is used to send a command to the HW asking for it to enter Running mode.
It will actually work by putting a command in a queue for the main_loop to get to
in its own serialised good time. All the other commands will work in a similar fashion
Up until such time as it is implemented, I will fake it."""
self._isRunning = True
pass
#pyqtSlot()
def cmd_stop_running(self):
"""This slot is used to send a command to the HW asking for it to enter Standby mode.
Up until such time as it is implemented, I will fake it."""
self._isRunning = False
#pyqtSlot()
def cmd_get_version(self):
"""This slot is used to send a command to the HW asking for its version string"""
pass
#pyqtSlot()
def cmd_terminate(self):
"""This slot is used to notify this object that it has to join the main thread."""
pass
#pyqtSlot()
def main_loop(self):
"""This slot is the main loop that is attached to the QThread object. It has sleep periods
that allow the messages on the other slots to be processed."""
while not self._terminate:
self.thread().sleep(1)
# While there is stuff on the wire, get it off, translate it, then
# signal it
# For the mean while, pretend that _isRunning corresponds to when
# RT streams will be
# being received from the HW.
if self._isRunning:
pass
# Search queue for commands, if any found, translate, then put on
# the wire
class DemoMainWindow(QMainWindow):
sgnl_get_version = pyqtSignal()
sgnl_start_running = pyqtSignal()
sgnl_stop_running = pyqtSignal()
sgnl_terminate = pyqtSignal()
def __init__(self):
super(DemoMainWindow, self).__init__()
self.initUI()
self._workerObject = Worker()
self._workerThread = QThread()
self._workerObject.moveToThread(self._workerThread)
self._workerThread.started.connect(self._workerObject.main_loop, type=Qt.QueuedConnection)
# I changed the following connection to type BlockingQueuedConnection,
# and got a Deadlock error
# reported, so I assume that there is already a problem before I get to
# this point.
# I understand that the default for 'type' (Qt.AutoConnection) is
# supposed to correctly infer that a QueuedConnection is required.
# I was getting desperate.
self.sgnl_get_version.connect(self._workerObject.cmd_get_version, type=Qt.QueuedConnection)
self.sgnl_start_running.connect(self._workerObject.cmd_start_running, type=Qt.QueuedConnection)
self.sgnl_stop_running.connect(self._workerObject.cmd_stop_running, type=Qt.QueuedConnection)
self.sgnl_terminate.connect(self._workerObject.cmd_terminate, type=Qt.QueuedConnection)
def initUI(self):
textEdit = QTextEdit()
self.setCentralWidget(textEdit)
lbl = QLabel(self.statusBar())
lbl.setText("HW Version: ")
self.statusBar().addPermanentWidget(lbl)
exitAction = QAction(QIcon('exit24.png'), 'Exit', self)
exitAction.setShortcut('Ctrl+Q')
exitAction.setStatusTip('Exit application')
exitAction.triggered.connect(self.close)
connectAction = QAction(QIcon('connect24.png'), 'Connect', self)
connectAction.setStatusTip('Connect to HW')
connectAction.triggered.connect(self.establishCanConnection)
enterRunningAction = QAction(QIcon('start24.png'), 'Start Running', self)
enterRunningAction.setStatusTip('Start Running')
enterRunningAction.triggered.connect(self.enterRunning)
enterStandbyAction = QAction(QIcon('stop24.png'), 'Stop Running', self)
enterStandbyAction.setStatusTip('Stop Running')
enterStandbyAction.triggered.connect(self.enterStandby)
self.statusBar()
menubar = self.menuBar()
fileMenu = menubar.addMenu('&File')
fileMenu.addAction(exitAction)
hwMenu = menubar.addMenu('&Hardware')
hwMenu.addAction(connectAction)
hwMenu.addAction(enterRunningAction)
hwMenu.addAction(enterStandbyAction)
toolbar = self.addToolBar('Exit')
toolbar.addAction(exitAction)
toolbar.addAction(connectAction)
toolbar.addAction(enterRunningAction)
toolbar.addAction(enterStandbyAction)
self.setGeometry(300, 300, 400, 350) # x, y, width, height
self.setWindowTitle('Demo Prog')
self.show()
def establishCanConnection(self):
iDlg = QInputDialog(self)
iDlg.setInputMode(QInputDialog.IntInput)
idInt, ok = iDlg.getInt(self, 'CAN ID Selection', 'HW ID:')
canID = '%s%d' % ('HW', idInt)
if ok:
self._workerThread.start()
pass
# this would be where the channel is established
def enterRunning(self):
self.sgnl_start_running.emit()
# this would be where the command to start running is sent from
def enterStandby(self):
self.sgnl_stop_running.emit()
# send the command to stop running
if __name__ == '__main__':
app = QApplication(sys.argv)
mainWindow = DemoMainWindow()
sys.exit(app.exec_())
Note that the call to start the _workerThread is in the establishCanConnection method, but that shouldn't be a problem, should it?
I used the procmon utility to check if more threads are created if establishCanConnection is run, and it appears that there are more threads, but I found it hard to relate which thread (if any of them) related to the QThread object.
Don't use BlockingQueuedConnection unless you really need it. If you don't know whether you need it or not, then you don't need it.
Cross-thread signals are queued in the event-loop of the receiving thread. If that thread is running code that blocks, it won't be able to process any events. Thus, if you send a signal with BlockingQueuedConnection to a thread that is blocked, you'll get a deadlock.
Your example uses a worker object that runs a blocking while loop, so it is subject to the deadlock problem outlined above. If you want to send signals to a thread that is blocked, you will need to arrange for the blocking code to periodically allow the thread to process its events, like this:
while not self._terminate:
self.thread().sleep(1)
QApplication.processEvents()
PS:
If you want to check that the worker is running in a different thread, you can print the return value of QThread.currentThread() or QThread.currentThreadId() (these functions are static, so you don't need an instance of QThread to call them).

Python Reddis Queue ValueError: Functions from the __main__ module cannot be processed by workers

I'm trying to enqueue a basic job in redis using python-rq, But it throws this error
"ValueError: Functions from the main module cannot be processed by workers"
Here is my program:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
Break the provided code to two files:
count_words.py:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
and main.py (where you'll import the required function):
from rq import Connection, Queue
from redis import Redis
from count_words import count_words_at_url # added import!
redis_conn = Redis()
q = Queue(connection=redis_conn)
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
I always separate the tasks from the logic running those tasks to different files. It's just better organization. Also note that you can define a class of tasks and import/schedule tasks from that class instead of the (over-simplified) structure I suggest above. This should get you going..
Also see here to confirm you're not the first to struggle with this example. RQ is great once you get the hang of it.
Currently there is a bug in RQ, which leads to this error. You will not be able to pass functions in enqueue from the same file without explicitly importing it.
Just add from app import count_words_at_url above the enqueue function:
import requests
def count_words_at_url(url):
resp = requests.get(url)
return len(resp.text.split())
from rq import Connection, Queue
from redis import Redis
redis_conn = Redis()
q = Queue(connection=redis_conn)
from app import count_words_at_url
job = q.enqueue(count_words_at_url, 'http://nvie.com')
print job
The other way is to have the functions in a separate file and import them.