Difference between OS process and normal process? - process

In my understanding a process is,
"an operating system level concept used to describe a set of resources (such as external code
libraries and the primary thread) and the necessary memory allocations used by a running application.
For each *.exe loaded into memory, the OS creates a separate and isolated process for use during its
lifetime." -Andrew troelsen (pro c# 2010)
so each time we start an application a process is created with its own address space which cannot be shared by other process.
recently i have read in clr via c# that,
"The CLR does, in fact, offer the ability to execute multiple managed applications in a single
OS process. Each managed application executes in an AppDomain."
this says that multiple apps can run inside a single OS process, is there a difference between OS process and the process that is started when we run an application???
can someone shed light on this please.

The difference is that the primary running process is managed by the operating system. The CLR/Framework offers a completely different stack known as "application domains" that allows separate running instances of a program to be executed under the same system-level process but act as completely independent processes. Not being an expert on C#/.NET model of design, I cannot facilitate an example of where this might be useful other than scalability of large systems.

Related

Run MATLAB program with Web Server inputs

I have a MATLAB application that I want to execute on a linux box with inputs from a web server. Requests to the server would all be from the local network.
Searching for different solutions, I've seen recommendations to host a Django server that serves an HTML form where users could input all the various data needed by the application. When a user fills out the form and submits it, the data would be sent through an API to the MATLAB application, which would serve up the report in a network shared drive.
Would this work well? Is there a different/easier solution available?
Need more details to know if this would "work well". But in terms of the general outline you presented, seems feasible.
When you say "the data would be sent through an API to the MATLAB application", what exactly do you mean here? What API are we talking about? And what is "the Matlab application"? Do you mean just installing regular Matlab on this server machine, and then having the Django or other web application server run the matlab command to run a Matlab program, running as a distinct process (corresponding to a single matlab -batch execution, probably?) that services that? Two issues here: One, Matlab is a large program with a slow startup time. Matlab Production Server and similar solutions handle this by maintaining a pool of already-running "warmed up" Matlab worker processes to service incoming requests. Two, licensing: the "regular" Matlab licenses are aimed towards interactive use by humans; running Matlab like that on the server side to handle requests for a web app used by multiple humans may not be covered. Talk to your organization's lawyer or IT licensing expert before doing this.
#Will is right here: The Matlab Production Server is the product or "solution" that MathWorks provides for this scenario. And it's relatively easy to use. But ain't cheap. (On the other hand, when you're talking about Matlab, what is?)
If you have someone who can do a bit of system programming for you, there's a more affordable alternative: use the Matlab Compiler to build your Matlab code into a "CTF" DLL, and write a thin custom server wrapper on top of that, which can accept service calls for the particular Matlab things you need done, and dispatches them to your code. (Running that in a pool of multiple processes, if you want to be able to service multiple concurrent clients.) "Compiled" Matlab libraries that run against the Matlab Runtime do not require any additional licenses for their runtime execution.
Big questions here are: Do you want this to Go Fast? How many clients are you going to have, and how often are they going to be sending requests? What kind of data will be contained in the inputs and outputs to this Matlab code?
Have a look at the -batch option to the matlab command. Have a look at the various deployment options supported by the Matlab Compiler. And talk to your organization's lawyer.
If you decide to go the matlab -batch route, you probably do not want to pass the inputs to your Matlab code as command-line arguments. Command lines and environment variables only pass simple strings, and parsing those sucks, especially once you get in to nontrivial numerics. Bundle up all your inputs as JSON files, MAT files, or something similar, and then pass just a reference to those files (or SQL blobs, or similar) on the command line.
Also, depending on what your Matlab code is like, GNU Octave (https://gnu.org/software/octave/index) may be an option for you. Octave is many years behind Matlab in terms of functionality and stability, and doesn't have equivalents of all the Matlab Toolboxes, so it isn't a drop-in replacement in general terms. But for simple stuff, it works. And it is unencumbered by licensing, and has faster startup times in command-line mode.
An easier solution (though possibly not the cheapest if your existing MATLAB license doesn't already cover it) is to use MATLAB Production Server which basically exists for this category of problem. It has a RESTful API that would straightforwardly handle your user input use case.

How to execute an untrusted function efficiently in a cross-platform way?

I am writing an open source cross-platform application written in C++ that targets Windows, Mac, and Linux on x86 CPUs. The application produces a stream of data (integers) that needs to be validated, and my application will perform actions depending on the validation result. There are multiple validators, which we shall call "modules", and they can be swapped out for one another.
Anybody can write and share modules with other users, so my application has to ensure that maliciously-written modules cannot harm the user in any way (perhaps except via high CPU usage, in which case my application should be able to kill the module after some amount of time - this can be done by using a surrogate process). Furthermore, the stream of data is being sent at a high rate (up to 100kB/s).
Fortunately, the code in these modules are usually simple arithmetic operations on data in the stream (usually processing each incoming integer in constant time), and they do not need to make any system calls (not even heap allocation).
I've considered the following possibilities (all of them with some drawbacks):
Kernel-based sandboxing
On Linux, we can use secure computing (seccomp), which prevents a process from making any system calls except for reading and writing with already-open file destriptors. Module creators would write their modules as a single function that takes in input and output file descriptors (in a language like C or C++) and compile it into a shared object, then distribute that shared object.
My application will probably prepare input and output file descriptors, then fork() itself or exec() a surrogate process, and this child process uses dlopen() and dlsym() to get a pointer to the untrusted function. Then strict secure computing mode will be enabled, before executing the untrusted function.
Drawbacks: There's the problem that dlopen() will actually run the constructor function from the shared library. This would have to be properly sandboxed as well, and I can't think of a way to do so. Also, of course, this thing will only work on Linux. As far as I know, there is no way to ban WinNT system calls on Windows, so a similar solution on Windows won't be very secure.
Application-level sandboxing
[[ Any form of application-level sandboxing means that we cannot run untrusted machine code of any form. An untrusted function can overwrite its return value or data outside its call stack, thereby compromising the whole application (and effectively acquiring any permissions that the original application had). ]]
Make modules use a simple scripting language that does not support any system calls - just pure arithmetic operations and perhaps the ability to read an input stream. My application would contain an interpreter for this language.
Drawbacks: Unfortunately I have not found this scripting language. Many scripting languages have extensive functionalities (e.g. Python) and a sandbox (e.g. PyPy's sandbox) simply filters OS system calls. I would be shipping a lot of useless interpreter code with my application, and it arguably is more prone to security issues due to bugs in the intepreter than a language with simply no functionality to do things other than simple calculations and control flow instructions (basically a function that does not make any system calls). Furthermore, marshalling the data between C++ (machine code) and the scripting language is usually a slow process.
Distribute modules with a 'safe' compiled language that again does not support any system calls. My application would contain a JIT for this language.
Marshalling won't be necessary because my application would call into the JITted machine code of the untrusted module, so performance across this boundary should be fast. The untrusted module now won't be able to corrupt the stack, attempt return-oriented programming, or perform any other malicious actions, due to the language restrictions and checks of the 'safe' language. WebAssembly is the first and only language that comes to mind (if it can be called a language). (As far as I can tell, WebAssembly seems to provide the security guarantees for my use case, right?)
Drawbacks: The existing implementations of WebAssembly seem to be all browser-based, so I would have to steal an implementation from an open source browser. This does seem like a lot of work, considering that I would have to uncouple it from all the JavaScript and other browser bits. However, a standalone WebAssembly JIT based on LLVM seems to be under development.
Question:
What is the best way to execute an untrusted function efficiently that works on Windows, Mac, and Linux?
Right now, I think that the scripting language way would probably be the safest, and be the easiest for module writers. But for a more efficient solution, WebAssembly is probably better. Am I right, or are there better or easier solutions that I have not thought of?
(Remark: I think several pairs of tags used in this question have never been seen together before!)
Regarding WebAssembly:
Unfortunately, there is no production-quality stand-alone implementation yet. I expect some to show up in the future, but it hasn't happened yet.
For historical reasons, existing production implementations are all part of a JavaScript VM. Fortunately, none of these VMs is tied to a browser. If you don't mind including some unused JS baggage, you can embed them as they are (ripping out the JS would be very hard). One problem, though, is that these VMs don't yet provide embedding interfaces for Wasm specifically. You have to go through JS, which is stupid.
There is an initial design for a C and C++ API for WebAssembly, which would give direct access to an embedded Wasm VM. It is meant to be VM-neutral, i.e., could be implemented by any existing VM (the repo contains a prototype implementation on top of V8). This may evolve into a standard, but I cannot promise any timeline. Right now it's only for the brave.

How to tell if a library is COM or DCOM?

I've been given the task of trying to recreate a DLL that has slight modifications to the original DLL, which will be executed if another program runs. Basically a mocked up version of the DLL for testing/simulating other parts of a larger system.
I've searching to see if there is any method to check if the library is COM or DCOM but have not found any. I am aware of the differences, but given a DLL library, how can I tell if it is a COM or DCOM library?
Additionally, is there any way to swap out a COM/DCOM library with a newer technology but not change parts of the code that call the COM/DCOM library?
Having the executable code alone you cannot tell which it is except that if there're proxy/stub dll shipped with it you can assume it is DCOM.
The visible differences are in how the thing is registered. Digging into registration process can be easy or not so easy depending on how registration is implemented. If registration parameters are hand-glued inside code you'd have to reverse-engineer it the harder way. If registration uses a .rgs file which is stored in resources you can just extract it and see how registration is done. Anyway your best bet is to use a VM and export its registry, then register the component, export the registry again and see the difference - what was added.
Wow, you are going old school here!
If I remember correctly any valid COM object is can also participate in DCOM. Isn't the wiring for the remote procedure calls done at the operating system level?
From https://msdn.microsoft.com/en-us/library/aa295360(v=vs.60).aspx:
Once COM was adapted to work across a network, then any interface that
was not tied to a local execution model (some interfaces have inherent
reliance on local machine facilities, such as those drawing interfaces
whose methods have handles to device contexts as parameters) would
have the capability of being distributed: An interface consumer would
make a request for a given interface; that interface may be provided
by an instance of an object running (or to be run) on a different
machine. The distribution mechanism inside COM would connect the
consumer to the provider in such a way that method calls made by the
consumer would appear at the provider end, where they would be
executed. Any return values would then be sent back to the consumer.
To all intents and purposes, the act of distribution is transparent to
both the consumer and the provider.
Such a variety of COM does now exist. DCOM (for ‘distributed COM’), is
shipped with versions of Windows NT beginning with version 4.0. Since
late 1996, it has also been available for Windows 95 and its
derivatives. In both cases, DCOM comprises a set of replacement and
additional DLLs, with some utilities, which provide both local and
remote COM capabilities. It is therefore now an inherent part of
Win32-based platforms, and will be made available on other platforms
by other organizations over time.

Automating compatibility testing against many programs

Short version: What's the best way to automate compatibility testing
against a large number of third-party programs?
The details:
I develop a program whose core feature is interacting with a variety of
different pieces of music player software via their respective RPC
interfaces. The RPC itself typically happens either via D-Bus or via some
client library specific to a particular player. Since each music player
has its own unique RPC interface, my program requires special code to
handle each.
Testing all this code is increasingly a problem for me. At last count
there are fifteen (!) different music players my program knows how to talk
to, and the interface details can vary from one version of a player to the
next. Manually testing my program against the latest version of each of
the players I'm trying to support, as well as a few older versions, is
tedious and error-prone, so I'm looking for a way to automate this as much
as possible.
The test cases themselves aren't the problem; those are just a matter of
calling a sequence of functions on a player's RPC interface and checking
the return values and/or asynchronous callbacks for the expected result.
No, the problem is having a framework to run the tests automatically.
Here are the challenges I see:
Each player maintains persistent state, usually as dotfiles under the
user's home directory. The state consists of things like the music
library, playlists, etc. These files need to be reverted to a known
initial state before each test. (Deleting it entirely isn't always an
option, since then the GUI-based players will present a setup wizard the
next time they start instead of running normally.)
Those initial states may be partially dynamic. For example, a music
library will contain full paths to the music files within it, but the
paths to the actual "music" files used for testing will vary from
machine to machine and won't be known until runtime.
The players to test against will probably be installed under
non-standard locations which will vary from system to system, in order
to have multiple versions of each installed in parallel. The framework
will probably need to know which player and version it's testing against
before the player is started, so it can initialize the player's state
files accordingly.
Since I don't have any control over development of the music players my
program interacts with, I can't modify their behavior to make it easier
for me to test against them.
What I'd like to do is set up a VM with a bunch of different players (and
a bunch of different versions of each player) installed, and then be able
to test my program against each of them in turn automatically. Ideally,
it would be possible for someone else to set up their own VM to run tests
in themselves, presumably only needing to tell the test framework which
players are installed where.
So, what's the best way to automate compatibility testing against a large
number (several dozen) of third-party programs?
In case it affects the recommendations, my program is written in Python,
and I'm using GNU autotools as the build framework.
If this only windows environment there is one way to go and that is with MS Hyper-V.
Link(http://www.microsoft.com/virtualization/en/us/solution-application-development.aspx)
They have support for having a image of for example of a Vista installation and by scripting creating a new copy of it and push an installation into the new image and have a basic installation + added software up and running within minutes.
The MS Office team use this and test engineers can order any version, language edition of windows in combination of any release of office within a few minutes..
Big problem is that this cost a good bit of money and Hyper-V is a quite complex product to set up.
As an alternative maybe could be to use Virtual Box (Open Source) and write your own script to automate installations of an image with new versions of software for test. I have done this with the standard image reading script upon start up from a network folder to install any software for test.. Its not fully automated solution but saved my team a lot of time.

Is this a reasonable "Application entry point"?

I have recently come across a situation where code is dynamically loading some libraries, wiring them up, then calling what is termed the "application entry point" (one of the libraries must implement IApplication.Run()).
Is this a valid "Appliation entry point"?
I would always have considered the application entry point to be before the loading of the libraries and found the IApplication.Run() being called after a considerable amount of work slightly misleading.
The terms application and system are terms that are so widely and diversely used that you need to agree what they mean upfront with your conversation partner. E.g. sometimes an application is something with a UI, and a system is 'UI-less'. In general it's just a case of you say potato, I say potato.
As for the example you use: that's just what a runtime (e.g. .NET or java) does: loading a set of libraries and calling the application entry point, i.e. the "main" method.
So in your case, the code loading the libraries is doing just the same, and probably calling a method on an interface, you could then consider the loading code to be the runtime for that application. It's just a matter of perspective.
The term "application" can mean whatever you want it to mean. "Application" merely means a collection of resources (libraries, code, images, etc) that work together to help you solve a problem.
So to answer your question, yes, it's a valid use of the term 'application'.
Application on its own means actually nothing. It is often used by people to talk about computer programs that provide some value to the user. A more correct term is application software and this has the following definition:
Application software is a subclass of
computer software that employs the
capabilities of a computer directly
and thoroughly to a task that the user
wishes to perform. This should be
contrasted with system software which
is involved in integrating a
computer's various capabilities, but
typically does not directly apply them
in the performance of tasks that
benefit the user. In this context the
term application refers to both the
application software and its
implementation.
And since application really means application software, and software is any piece of code that performs any kind of task on a computer, I'd say also a library can be an application.
Most terms are of artificial nature anyway. Is a plugin no application? Is the flash plugin of your browser no application? People say no, it's just a plugin. Why? Because it can't run on it's own, it needs to be loaded into a real process. But there is no definition saying only things that "can run on their own" are applications. Same holds true for a library. The core application could just be an empty container and all logic and functionality, even the interaction with the user, could be performed by plugins or libraries, in which case that would be more an application than the empty container that just provides some context for the application to run. Compare this to Java. A Java application can't run on it's own, it must run within a Java Virtual Machine (JVM), does that mean the JVM is the application and the Java Code is just... well what? Isn't the Java code the real application and the JVM just an empty runtime environment that provides nothing to the end user without the loaded Java code?
I think in this context "application entry point" means "the point at which the application (your code) enters the library".
I think probably what you're referring to is the main() function in C/C++ code or WinMain in a Windows app. That is, it's the point where execution is normally started in an app. Your question is pretty broad and vague--for example, which OS are you running this on--but this may be what you're looking for. This might also address the question.
Bear in mind when you're asking questions, details are your friend. People can give you a much better, more informed answer when you provide them with details.
EDIT:
In a broader context consider what has to happen from the standpoint of the OS. When the user specifies that they want to run an app, the OS has to load the app from the hard drive and then when the app is loaded into memory, it has to pass control to some point in the memory blocked occupied by the newly loaded app to continue execution. That would be the "Application Entry Point". When an app is constructed with dynamically linked code the OS has to load all that dynamically linked code in order to get the correct app image into memory. Loading up those shared bits of code does not change the fact that the OS must have a point to which to pass control when the app is loaded into memory.