Odd title, I know. What I REALLY mean to ask is something like "can I create a TrustedBSD MAC Framework or iOS sandbox for NSBundles?" I have a very dynamic application in which modules can be "plugged in" or "un plugged" to a "switchboard" at any time. The user operating it is on another computer, perhaps miles away. There's no way for users to know how the switchboard modules can be compliant and not wreck the system in their absence. I know iOS sandboxing and TrustedBSD (as well as Lion's Seatbelt.framework) can do this for processes, but how can I do this for bundle code?
Things I've thought of:
Static binary analysis of all posix calls and ObjC messaging (seems impossible as binaries are.... binary at this point, not code).
Tracing ObjC messages at runtime and logging them, locking the module out after a single stray call. (This wouldn't affect POSIX calls or any C calls for that matter, nor assembly, and would have at minimum one call already sent).
Create a sandboxed external process using XPC and load modules there, and use PDO to route switchboard calls to that process (very doable, but I need snow leopard compatibility here).
Any Ideas? I realize that a bundle, once loaded, ends up as PART of your application, so this adds further problems on implementation of security.,
The instant that you load a bundle into your process, you are giving it full control over your process. (This is because a bundle can contain code that runs at initialization time, before you even call any of its functions.)
Tracing messages at runtime is pointless, as a malicious bundle could simply disable the code responsible for message tracing before it fires. (It's running in the same memory space, after all, and it needn't use any ObjC messages to do this.)
In short: do not load bundles you do not trust. If you must make use of an untrusted bundle, use some form of sandboxing (whether that's XPC or something else).
Related
I am writing an open source cross-platform application written in C++ that targets Windows, Mac, and Linux on x86 CPUs. The application produces a stream of data (integers) that needs to be validated, and my application will perform actions depending on the validation result. There are multiple validators, which we shall call "modules", and they can be swapped out for one another.
Anybody can write and share modules with other users, so my application has to ensure that maliciously-written modules cannot harm the user in any way (perhaps except via high CPU usage, in which case my application should be able to kill the module after some amount of time - this can be done by using a surrogate process). Furthermore, the stream of data is being sent at a high rate (up to 100kB/s).
Fortunately, the code in these modules are usually simple arithmetic operations on data in the stream (usually processing each incoming integer in constant time), and they do not need to make any system calls (not even heap allocation).
I've considered the following possibilities (all of them with some drawbacks):
Kernel-based sandboxing
On Linux, we can use secure computing (seccomp), which prevents a process from making any system calls except for reading and writing with already-open file destriptors. Module creators would write their modules as a single function that takes in input and output file descriptors (in a language like C or C++) and compile it into a shared object, then distribute that shared object.
My application will probably prepare input and output file descriptors, then fork() itself or exec() a surrogate process, and this child process uses dlopen() and dlsym() to get a pointer to the untrusted function. Then strict secure computing mode will be enabled, before executing the untrusted function.
Drawbacks: There's the problem that dlopen() will actually run the constructor function from the shared library. This would have to be properly sandboxed as well, and I can't think of a way to do so. Also, of course, this thing will only work on Linux. As far as I know, there is no way to ban WinNT system calls on Windows, so a similar solution on Windows won't be very secure.
Application-level sandboxing
[[ Any form of application-level sandboxing means that we cannot run untrusted machine code of any form. An untrusted function can overwrite its return value or data outside its call stack, thereby compromising the whole application (and effectively acquiring any permissions that the original application had). ]]
Make modules use a simple scripting language that does not support any system calls - just pure arithmetic operations and perhaps the ability to read an input stream. My application would contain an interpreter for this language.
Drawbacks: Unfortunately I have not found this scripting language. Many scripting languages have extensive functionalities (e.g. Python) and a sandbox (e.g. PyPy's sandbox) simply filters OS system calls. I would be shipping a lot of useless interpreter code with my application, and it arguably is more prone to security issues due to bugs in the intepreter than a language with simply no functionality to do things other than simple calculations and control flow instructions (basically a function that does not make any system calls). Furthermore, marshalling the data between C++ (machine code) and the scripting language is usually a slow process.
Distribute modules with a 'safe' compiled language that again does not support any system calls. My application would contain a JIT for this language.
Marshalling won't be necessary because my application would call into the JITted machine code of the untrusted module, so performance across this boundary should be fast. The untrusted module now won't be able to corrupt the stack, attempt return-oriented programming, or perform any other malicious actions, due to the language restrictions and checks of the 'safe' language. WebAssembly is the first and only language that comes to mind (if it can be called a language). (As far as I can tell, WebAssembly seems to provide the security guarantees for my use case, right?)
Drawbacks: The existing implementations of WebAssembly seem to be all browser-based, so I would have to steal an implementation from an open source browser. This does seem like a lot of work, considering that I would have to uncouple it from all the JavaScript and other browser bits. However, a standalone WebAssembly JIT based on LLVM seems to be under development.
Question:
What is the best way to execute an untrusted function efficiently that works on Windows, Mac, and Linux?
Right now, I think that the scripting language way would probably be the safest, and be the easiest for module writers. But for a more efficient solution, WebAssembly is probably better. Am I right, or are there better or easier solutions that I have not thought of?
(Remark: I think several pairs of tags used in this question have never been seen together before!)
Regarding WebAssembly:
Unfortunately, there is no production-quality stand-alone implementation yet. I expect some to show up in the future, but it hasn't happened yet.
For historical reasons, existing production implementations are all part of a JavaScript VM. Fortunately, none of these VMs is tied to a browser. If you don't mind including some unused JS baggage, you can embed them as they are (ripping out the JS would be very hard). One problem, though, is that these VMs don't yet provide embedding interfaces for Wasm specifically. You have to go through JS, which is stupid.
There is an initial design for a C and C++ API for WebAssembly, which would give direct access to an embedded Wasm VM. It is meant to be VM-neutral, i.e., could be implemented by any existing VM (the repo contains a prototype implementation on top of V8). This may evolve into a standard, but I cannot promise any timeline. Right now it's only for the brave.
I am working with the QuickTime API and need to perform a few lengthy (as in hours) operations in the background. Unfortunately, it is not multi-thread friendly, so I am falling back to perform the tasks in a separate process, so all QuickTime related calls can happen in its main thread.
After launching it, I need a way of getting feedback on its progress, since the operations can take very long.
I am unsure of how to do this, specifically:
Should the separate process be compiled as another cocoa app or a command line tool?
How to launch it from the main cocoa app?
How to periodically get an object from it to get status information?
How to determine when it finished?
How to avoid showing a window/console when called?
How to have it part of the .app bundle so that it does not appear as a separate executable to the user?
These are really 6+ questions, but they are very related and very specific, and I think anyone needing to launch external processes (instead of spawning worker threads) can benefit from their answers. Generic code examples would be very helpful.
If it is possible, then implement the functionality in a command line tool, or another form of GUI-less application. For Cocoa applications it is possible to prevent them appearing on the Dock or in the Force Quit dialog, however a command line tool is a single binary file which does that anyway, so that would probably be a better way.
In terms of launching the tool, NSTask & NSPipe are your friends in this endeavour. The tool can definitely be kept inside your main Application's bundle, inside the Resources directory or some such, and then launched when needed. You can use the pipe to communicate back and forth.
I don't have any example code to hand, and its been a long while since I've had occasion to use either of these classes so the information I can give is limited, but it should be enough to point you in the right direction.
My iOS app has grown large and take long time to load. Lots of class in my application is not always needed, some may never been called at all depending on user. How can I split my application into several executable file and make the main application call/load them when needed (run-time)?
It's much easier if I can split my application into several executable because I can divide my team's workload into several executable with less dependency.
iOS does not support dynamically linking to anything but system-supplied libraries. Any other library must be statically linked into your application.
There's one way to make an end run around this: You can use interpreted code, provided it is shipped with your app. So, if you can split your app into an Obj-C controller and services layer driven by dynamically executed Lua code (for example), you could avoid loading code you never use.
Windows still use DLLs and Mac programs seem to not use DLL at all. Are there benefits or disadvantages of using either technique?
If a program installation includes all the DLL it requires so that it will work 100% well, will it be the same as statically linking all the libraries?
MacOS X, like other flavours of Unix, use shared libraries, which are just another form of DLL.
And yes both are advantageous as the DLL or shared library code can be shared between multiple processes. It does this by the OS loading the DLL or shared library and mapping it into the virtual address space of the processes that use it.
On Windows, you have to use dynamically-loaded libraries because GDI and USER libraries are avaliable as a DLL only. You can't link either of those in or talk to them using a protocol that doesn't involve dynamic loading.
On other OSes, you want to use dynamic loading anyway for complex apps, otherwise your binary would bloat for no good reason, and it increases the probably that your app would be incompatible with the system in the long run (However, in short run static linking can somewhat shield you from tiny breaking changes in libraries). And you can't link in proprietary libraries on OSes which rely on them.
Windows still use DLLs and Mac
programs seem to not use DLL at all.
Are they benefits or disadvantages of
using either technique?
Any kind of modularization is good since it makes updating the software easier, i.e. you do not have to update the whole program binary if a bug is fixed in the program. If the bug appears in some dll, only the dll needs to be updated.
The only downside with it imo, is that you introduce another complexity into the development of the program, e.g. if a dll is a c or c++ dll, different calling conventions etc.
If a program installation includes all
the DLL it requires, will it be the
same as statically linking all the
libraries?
More or less yes. Depends on if you are calling functions in a dll which you assume static linkage with. The dll could just as well be a "free standing" dynamic library, that you only can access via LoadLibrary() and GetProcAddress() etc.
One big advantage of shared libraries (DLLs on Windows or .so on Unix) is that you can rebuild the library and its consumers separately while with static libraries you have to rebuild the library and then relink all the consumers which is very slow on Unix systems and not very fast on Windows.
MacOS software uses "dll's" as well, they are just named differently (shared libraries).
Dll's make sense if you have code you want to reuse in different components of your software. Mostly this makes sense in big software projects.
Static linking makes sense for small single-component applications, when there is no need for code reuse. It simplifies distribution since your component has no external dependencies.
Besides memory/disk space usage, another important advantage of using shared libraries is that updates to the library will be automatically picked up by all programs on the system which use the library.
When there was a security vulnerability in the InfoZIP ZIP libraries, an update to the DLL/.so automatically made all software safe which used these. Software that was linked statically had to be recompiled.
Windows still use DLLs and Mac programs seem to not use DLL at all. Are they benefits or disadvantages of using either technique?
Both use shared libraries, they just use a different name.
If a program installation includes all the DLL it requires so that it will work 100% well, will it be the same as statically linking all the libraries?
Somewhat. When you statically link libraries to a program, you will get a single, very big file, with DLLs, you will have many files.
The statically linked file won't need the "resolve shared libraries" step (which happens while the program loads). A long time ago, loading a static program meant that the whole program was first loaded into RAM and then, the "resolve shared libraries" step happened. Today, only the parts of the program, which are actually executed, are loaded on demand. So with a static program, you don't need to resolve the DLLs. With DLLs, you don't need to load them all at once. So performance wise, they should be on par.
Which leaves the "DLL Hell". Many programs on Windows bring all DLLs they need and they write them into the Windows directory. The net effect is that the last installed programs works and everything else might be broken. But there is a simple workaround: Install the DLLs into the same directory as the EXE. Windows will search the current directory first and then the various Windows paths. This way, you'll waste a bit of disk space but your program will work and, more importantly, you won't break anything else.
One might argue that you shouldn't install DLLs which already exist (with the same version) in the Windows directory but then, you're again vulnerable to some bad app which overwrites the version you need with something that breaks your neck. The drawback is that you must distribute security fixes for your app yourself; you can't rely on Windows Update or similar things to secure your code. This is a tight spot; crackers are making lots of money from security issues and people will not like you when someone steals their banking data because you didn't issue security fixes soon enough.
If you plan to support your application very tightly for many, say, 20 years, installing all DLLs in the program directory is for you. If not, then write code which checks that suitable versions of all DLLs are installed and tell the user about it, so they know why your app suddenly starts to crash.
Yes, see this text :
Dynamic linking has the following
advantages: Saves memory and
reduces swapping. Many processes can
use a single DLL simultaneously,
sharing a single copy of the DLL in
memory. In contrast, Windows must load
a copy of the library code into memory
for each application that is built
with a static link library. Saves
disk space. Many applications can
share a single copy of the DLL on
disk. In contrast, each application
built with a static link library has
the library code linked into its
executable image as a separate
copy. Upgrades to the DLL are
easier. When the functions in a DLL
change, the applications that use them
do not need to be recompiled or
relinked as long as the function
arguments and return values do not
change. In contrast, statically linked
object code requires that the
application be relinked when the
functions change. Provides
after-market support. For example, a
display driver DLL can be modified to
support a display that was not
available when the application was
shipped. Supports multilanguage
programs. Programs written in
different programming languages can
call the same DLL function as long as
the programs follow the function's
calling convention. The programs and
the DLL function must be compatible in
the following ways: the order in which
the function expects its arguments to
be pushed onto the stack, whether the
function or the application is
responsible for cleaning up the stack,
and whether any arguments are passed
in registers. Provides a mechanism
to extend the MFC library classes. You
can derive classes from the existing
MFC classes and place them in an MFC
extension DLL for use by MFC
applications. Eases the creation
of international versions. By placing
resources in a DLL, it is much easier
to create international versions of an
application. You can place the strings
for each language version of your
application in a separate resource DLL
and have the different language
versions load the appropriate
resources. A potential
disadvantage to using DLLs is that the
application is not self-contained; it
depends on the existence of a separate
DLL module.
From my point of view an shared component has some advantages that are somtimes realized as disadvantages.
shared component defines interfaces in your process. So you are forced to decide which components/interfaces are visible outside and which are hidden. This automatically defines which interface has to be stable and which does not have to be stable and can be refactored without affecting any code outside the component..
Memory administration in case of C++ and Windows must be well thought. So normally you should not handle memory outside of an dll that isn't freed in the same dll. If you do so your component may fail if: different runtimes or compiler version are used.
So I think that using shared coponents will help the software to get better organized.
I have recently come across a situation where code is dynamically loading some libraries, wiring them up, then calling what is termed the "application entry point" (one of the libraries must implement IApplication.Run()).
Is this a valid "Appliation entry point"?
I would always have considered the application entry point to be before the loading of the libraries and found the IApplication.Run() being called after a considerable amount of work slightly misleading.
The terms application and system are terms that are so widely and diversely used that you need to agree what they mean upfront with your conversation partner. E.g. sometimes an application is something with a UI, and a system is 'UI-less'. In general it's just a case of you say potato, I say potato.
As for the example you use: that's just what a runtime (e.g. .NET or java) does: loading a set of libraries and calling the application entry point, i.e. the "main" method.
So in your case, the code loading the libraries is doing just the same, and probably calling a method on an interface, you could then consider the loading code to be the runtime for that application. It's just a matter of perspective.
The term "application" can mean whatever you want it to mean. "Application" merely means a collection of resources (libraries, code, images, etc) that work together to help you solve a problem.
So to answer your question, yes, it's a valid use of the term 'application'.
Application on its own means actually nothing. It is often used by people to talk about computer programs that provide some value to the user. A more correct term is application software and this has the following definition:
Application software is a subclass of
computer software that employs the
capabilities of a computer directly
and thoroughly to a task that the user
wishes to perform. This should be
contrasted with system software which
is involved in integrating a
computer's various capabilities, but
typically does not directly apply them
in the performance of tasks that
benefit the user. In this context the
term application refers to both the
application software and its
implementation.
And since application really means application software, and software is any piece of code that performs any kind of task on a computer, I'd say also a library can be an application.
Most terms are of artificial nature anyway. Is a plugin no application? Is the flash plugin of your browser no application? People say no, it's just a plugin. Why? Because it can't run on it's own, it needs to be loaded into a real process. But there is no definition saying only things that "can run on their own" are applications. Same holds true for a library. The core application could just be an empty container and all logic and functionality, even the interaction with the user, could be performed by plugins or libraries, in which case that would be more an application than the empty container that just provides some context for the application to run. Compare this to Java. A Java application can't run on it's own, it must run within a Java Virtual Machine (JVM), does that mean the JVM is the application and the Java Code is just... well what? Isn't the Java code the real application and the JVM just an empty runtime environment that provides nothing to the end user without the loaded Java code?
I think in this context "application entry point" means "the point at which the application (your code) enters the library".
I think probably what you're referring to is the main() function in C/C++ code or WinMain in a Windows app. That is, it's the point where execution is normally started in an app. Your question is pretty broad and vague--for example, which OS are you running this on--but this may be what you're looking for. This might also address the question.
Bear in mind when you're asking questions, details are your friend. People can give you a much better, more informed answer when you provide them with details.
EDIT:
In a broader context consider what has to happen from the standpoint of the OS. When the user specifies that they want to run an app, the OS has to load the app from the hard drive and then when the app is loaded into memory, it has to pass control to some point in the memory blocked occupied by the newly loaded app to continue execution. That would be the "Application Entry Point". When an app is constructed with dynamically linked code the OS has to load all that dynamically linked code in order to get the correct app image into memory. Loading up those shared bits of code does not change the fact that the OS must have a point to which to pass control when the app is loaded into memory.