How to use Ampoule with Twisted to create a pool mixed of local and remote processes? - twisted

I've been told, that a Twisted-based library Ampoule is a great way to create a pool of processes that are executed on different computers. However there is no docs for that and Ampoule's examples also don't make it any clear.
I'd be totally happy with interface similar to stdlib multiprocessing.Pool.map()
Could you supply an example, please?

Ampoule is not natively capable of multi-host operation. Since it uses AMP with strictly defined interactions between the parent and child processes, you could certainly imagine extending it to support multi-host operation. However, you must still solve the problem of connecting to another host (perhaps via SSH using Twisted Conch) and deploying the necessary Python libraries to it for it to be able to execute the tasks you wish to assign to it.

Related

Run MATLAB program with Web Server inputs

I have a MATLAB application that I want to execute on a linux box with inputs from a web server. Requests to the server would all be from the local network.
Searching for different solutions, I've seen recommendations to host a Django server that serves an HTML form where users could input all the various data needed by the application. When a user fills out the form and submits it, the data would be sent through an API to the MATLAB application, which would serve up the report in a network shared drive.
Would this work well? Is there a different/easier solution available?
Need more details to know if this would "work well". But in terms of the general outline you presented, seems feasible.
When you say "the data would be sent through an API to the MATLAB application", what exactly do you mean here? What API are we talking about? And what is "the Matlab application"? Do you mean just installing regular Matlab on this server machine, and then having the Django or other web application server run the matlab command to run a Matlab program, running as a distinct process (corresponding to a single matlab -batch execution, probably?) that services that? Two issues here: One, Matlab is a large program with a slow startup time. Matlab Production Server and similar solutions handle this by maintaining a pool of already-running "warmed up" Matlab worker processes to service incoming requests. Two, licensing: the "regular" Matlab licenses are aimed towards interactive use by humans; running Matlab like that on the server side to handle requests for a web app used by multiple humans may not be covered. Talk to your organization's lawyer or IT licensing expert before doing this.
#Will is right here: The Matlab Production Server is the product or "solution" that MathWorks provides for this scenario. And it's relatively easy to use. But ain't cheap. (On the other hand, when you're talking about Matlab, what is?)
If you have someone who can do a bit of system programming for you, there's a more affordable alternative: use the Matlab Compiler to build your Matlab code into a "CTF" DLL, and write a thin custom server wrapper on top of that, which can accept service calls for the particular Matlab things you need done, and dispatches them to your code. (Running that in a pool of multiple processes, if you want to be able to service multiple concurrent clients.) "Compiled" Matlab libraries that run against the Matlab Runtime do not require any additional licenses for their runtime execution.
Big questions here are: Do you want this to Go Fast? How many clients are you going to have, and how often are they going to be sending requests? What kind of data will be contained in the inputs and outputs to this Matlab code?
Have a look at the -batch option to the matlab command. Have a look at the various deployment options supported by the Matlab Compiler. And talk to your organization's lawyer.
If you decide to go the matlab -batch route, you probably do not want to pass the inputs to your Matlab code as command-line arguments. Command lines and environment variables only pass simple strings, and parsing those sucks, especially once you get in to nontrivial numerics. Bundle up all your inputs as JSON files, MAT files, or something similar, and then pass just a reference to those files (or SQL blobs, or similar) on the command line.
Also, depending on what your Matlab code is like, GNU Octave (https://gnu.org/software/octave/index) may be an option for you. Octave is many years behind Matlab in terms of functionality and stability, and doesn't have equivalents of all the Matlab Toolboxes, so it isn't a drop-in replacement in general terms. But for simple stuff, it works. And it is unencumbered by licensing, and has faster startup times in command-line mode.
An easier solution (though possibly not the cheapest if your existing MATLAB license doesn't already cover it) is to use MATLAB Production Server which basically exists for this category of problem. It has a RESTful API that would straightforwardly handle your user input use case.

How to execute an untrusted function efficiently in a cross-platform way?

I am writing an open source cross-platform application written in C++ that targets Windows, Mac, and Linux on x86 CPUs. The application produces a stream of data (integers) that needs to be validated, and my application will perform actions depending on the validation result. There are multiple validators, which we shall call "modules", and they can be swapped out for one another.
Anybody can write and share modules with other users, so my application has to ensure that maliciously-written modules cannot harm the user in any way (perhaps except via high CPU usage, in which case my application should be able to kill the module after some amount of time - this can be done by using a surrogate process). Furthermore, the stream of data is being sent at a high rate (up to 100kB/s).
Fortunately, the code in these modules are usually simple arithmetic operations on data in the stream (usually processing each incoming integer in constant time), and they do not need to make any system calls (not even heap allocation).
I've considered the following possibilities (all of them with some drawbacks):
Kernel-based sandboxing
On Linux, we can use secure computing (seccomp), which prevents a process from making any system calls except for reading and writing with already-open file destriptors. Module creators would write their modules as a single function that takes in input and output file descriptors (in a language like C or C++) and compile it into a shared object, then distribute that shared object.
My application will probably prepare input and output file descriptors, then fork() itself or exec() a surrogate process, and this child process uses dlopen() and dlsym() to get a pointer to the untrusted function. Then strict secure computing mode will be enabled, before executing the untrusted function.
Drawbacks: There's the problem that dlopen() will actually run the constructor function from the shared library. This would have to be properly sandboxed as well, and I can't think of a way to do so. Also, of course, this thing will only work on Linux. As far as I know, there is no way to ban WinNT system calls on Windows, so a similar solution on Windows won't be very secure.
Application-level sandboxing
[[ Any form of application-level sandboxing means that we cannot run untrusted machine code of any form. An untrusted function can overwrite its return value or data outside its call stack, thereby compromising the whole application (and effectively acquiring any permissions that the original application had). ]]
Make modules use a simple scripting language that does not support any system calls - just pure arithmetic operations and perhaps the ability to read an input stream. My application would contain an interpreter for this language.
Drawbacks: Unfortunately I have not found this scripting language. Many scripting languages have extensive functionalities (e.g. Python) and a sandbox (e.g. PyPy's sandbox) simply filters OS system calls. I would be shipping a lot of useless interpreter code with my application, and it arguably is more prone to security issues due to bugs in the intepreter than a language with simply no functionality to do things other than simple calculations and control flow instructions (basically a function that does not make any system calls). Furthermore, marshalling the data between C++ (machine code) and the scripting language is usually a slow process.
Distribute modules with a 'safe' compiled language that again does not support any system calls. My application would contain a JIT for this language.
Marshalling won't be necessary because my application would call into the JITted machine code of the untrusted module, so performance across this boundary should be fast. The untrusted module now won't be able to corrupt the stack, attempt return-oriented programming, or perform any other malicious actions, due to the language restrictions and checks of the 'safe' language. WebAssembly is the first and only language that comes to mind (if it can be called a language). (As far as I can tell, WebAssembly seems to provide the security guarantees for my use case, right?)
Drawbacks: The existing implementations of WebAssembly seem to be all browser-based, so I would have to steal an implementation from an open source browser. This does seem like a lot of work, considering that I would have to uncouple it from all the JavaScript and other browser bits. However, a standalone WebAssembly JIT based on LLVM seems to be under development.
Question:
What is the best way to execute an untrusted function efficiently that works on Windows, Mac, and Linux?
Right now, I think that the scripting language way would probably be the safest, and be the easiest for module writers. But for a more efficient solution, WebAssembly is probably better. Am I right, or are there better or easier solutions that I have not thought of?
(Remark: I think several pairs of tags used in this question have never been seen together before!)
Regarding WebAssembly:
Unfortunately, there is no production-quality stand-alone implementation yet. I expect some to show up in the future, but it hasn't happened yet.
For historical reasons, existing production implementations are all part of a JavaScript VM. Fortunately, none of these VMs is tied to a browser. If you don't mind including some unused JS baggage, you can embed them as they are (ripping out the JS would be very hard). One problem, though, is that these VMs don't yet provide embedding interfaces for Wasm specifically. You have to go through JS, which is stupid.
There is an initial design for a C and C++ API for WebAssembly, which would give direct access to an embedded Wasm VM. It is meant to be VM-neutral, i.e., could be implemented by any existing VM (the repo contains a prototype implementation on top of V8). This may evolve into a standard, but I cannot promise any timeline. Right now it's only for the brave.

Adding a COM interface to an existing application (EXE)

I intend to add a COM interface to an existing application (which, by the way, is written in C++ using Win32). I have some experience using COM objects, so I know the basic COM concepts of interfaces, etc., but this is the first time I'm actually implementing a component.
Ultimately I want to be able to use the COM interface to automate my application from scripts such as VB. I understand that there are two steps:
My application must act as an out-of-process server (i.e. I have to use MIDL and generate code for a proxy DLL and a stub DLL).
Once I have the server I can add automation capabilities by implementing the IDispatch interface.
Since the server-in-an-EXE thing with MIDL and what not is already a bit steep, I wanted to get a grasp on all that first before moving on to IDispatch.
I am reading the book "Inside COM" by Dale Rogerson and have completed the chapter on servers in EXEs (the following chapter will cover Automation).
The "Servers in EXEs" chapter provides example code that implements a server and a client. But it is necessary to start the server manually. This confuses me. Obviously, when my application (= server) is used by a client process, this extra manual step should not be necessary. Is there no mechanism to start the server automatically? Or is automation necessary to achieve that? At the moment, the prospect of having to start my server manually (once I even have one) makes me doubt I am moving in the right direction.
Hopefully someone with more knowledge of this can see what information I'm missing and point me in the right direction.
No, COM servers are not normally started by hand. Not sure why the book proposed it, possibly because it wanted to avoid talking about the registry keys you need to allow COM to automatically start the EXE. It isn't otherwise very complicated, you register the Application coclass of your app with the LocalServer32 key value giving the path to the EXE.
It is however not completely uncommon, especially with an existing program. One design decision to make is whether you let the client code completely control your program. Or if your program already has an existing user interface but you also want to expose services to other code. In the latter case it makes sense to let the user start the app by hand, like she'd normally does.
When your application is registered as LocalServer32, it will be invoked with the commandline specified there if no running process has registered a factory object for your CLSID yet.
This way, you can get the best of both worlds -- if the application is running already, this instance can provide the server side, and if it isn't, it will be started.
Automation is completely orthogonal to that -- your component becomes Automation compatible by implementing IDispatch.

Bonjour communication wrapper for Objective-C?

I've been using MYNetwork by the venerable Jens Alfke for an app of mine that allows devices to connect and share info over the network, it's actually a mission-critical part of the app. I tried writing my own wrapper for all of the C-level stuff you have to do for Bonjour, but it didn't work out so well, so I moved to MYNetwork.
It's been great so far, but the fact it's essentially opaque to me is causing trouble, as is that I want to move over to ARC once we can submit apps with it (there's a lot of Objective-C object references in structs, which ARC hates).
Can anyone recommend a similar wrapper, ideally that allows easy message passing between a client and a server over Bonjour as well as service discovery?
Just a thought- Would using ZeroMQ advertised and discovered by the stock NSNetService suffice? Separating the service pub/sub from the actual communication would allow you to use other bonjour libraries like Avahi on linux too. ZeroMQ is sufficiently simple to make wrapping trivial, yet powerful enough to cope with complex network topologies, fast.
I have experience with both technologies in isolation but not together although I see no reason why it wouldn't work. The only caveat right now is the limited body of collective experience of ZeroMQ use on iOS but I'd expect that to change over time.
You know you can disable ARC for specific files? So, you can just disable ARC for the library, and keep it on for your other files.
Disable Automatic Reference Counting for Some Files

Simple, non-networking example of Twisted/PyGTK

I was struggling with getting some asynchronous activity to work under PyGTK, when someone suggested that I look at using Twisted.
I know that Twisted started as a networking framework, but that it can be used for other things. However, every single example I've ever seen involves a whole lot of network-based code. I would like to see an example of using Twisted for a simple PyGTK desktop app, without the needing to expend the extra mental effort of understanding the network aspect of things.
So: Is there a clean, simple tutorial for or example of using Twisted to create a GTK (PyGTK) app and perform asynchronous tasks?
(Yes, I've seen pbgtk2.py. It's uncommented, network-centric and completely baffling to a newcomer.)
Updated: I had listed various gripes with glib.idle_add/gtk.gdk.lock and friends not working properly under Windows. This was all reasoned out on the pygtk list - there's some trickery that is needed with PyGTK to get asynchronous behaviour working under Windows.
However, my point still stands that any time I mention doing asynchronous activity in PyGTK, someone says "don't use threads, use Twisted!" I want to know why and how.
Twisted to perform is asynchronous tasks in pygtk simply uses functions such as gobject.io_add_watch/glib.io_add_watch and gobject.timeout_add/glib.timeout_add (plus some others, you find them in the gobject and glib module), so there's not much difference in using raw pygtk functions or twisted if you don't need networking.
As an addition twisted has the same problems as pygtk with asynchronous tasks, twisted use the same loop as of pygtk and so it gets blocked if you perform some blocking task!
The best thing to do is to use one of the glib functions that are intended basically for handle such situations.
I've tested in an application the correct behaviour under windows of twisted+pygtk but I avoided to do blocking stuff (max reading from a large file, chunk per chunk basically using glib.idle_add or glib.io_add_watch, in the sense that twisted uses something like that).
For example I'm not sure that spawning process and processing stdout with glib.io_add_watch seems to not work. I've written an article on my blog that handle the performing of asynchronous processes in pygtk, not very sure that works on windows though it may depend on the version.