Related
Is there a statically compiled programming language that is both stackless and heapless?
For data, such a language would not have a concept of memory allocation. Instead, the memory requirements of the program would be known completely at compile-time.
For code, there would not be a concept of call stack. There could be functions, but they'd be inlined at every call site.
I am specifically interested by portable languages with some form of implementation or a compiler that produces native binaries.
Pure x86 machine language fits your stackless and heapless constraints(within real mode constraints). Portability is not possible, unless the compiler has access to every memory location for all hardware IO(memory locations) that are Fixed for all supported platforms(this condition excludes all dynamic interfaces including PlugandPlay, USB, and PCI/PCIE busses)
It is completely possible to create such a structure within severe hardware limits(every device must be compiled in and allocated at boot, as in older computers like the c64 or Apple II) but all functionality must be pre-compiled into the OS, as in every program possible that is to run on the platform.
This is not a general computing platform anymore. Program a micro-controller, GPU, or ASIC to solve the task instead.
I am writing an open source cross-platform application written in C++ that targets Windows, Mac, and Linux on x86 CPUs. The application produces a stream of data (integers) that needs to be validated, and my application will perform actions depending on the validation result. There are multiple validators, which we shall call "modules", and they can be swapped out for one another.
Anybody can write and share modules with other users, so my application has to ensure that maliciously-written modules cannot harm the user in any way (perhaps except via high CPU usage, in which case my application should be able to kill the module after some amount of time - this can be done by using a surrogate process). Furthermore, the stream of data is being sent at a high rate (up to 100kB/s).
Fortunately, the code in these modules are usually simple arithmetic operations on data in the stream (usually processing each incoming integer in constant time), and they do not need to make any system calls (not even heap allocation).
I've considered the following possibilities (all of them with some drawbacks):
Kernel-based sandboxing
On Linux, we can use secure computing (seccomp), which prevents a process from making any system calls except for reading and writing with already-open file destriptors. Module creators would write their modules as a single function that takes in input and output file descriptors (in a language like C or C++) and compile it into a shared object, then distribute that shared object.
My application will probably prepare input and output file descriptors, then fork() itself or exec() a surrogate process, and this child process uses dlopen() and dlsym() to get a pointer to the untrusted function. Then strict secure computing mode will be enabled, before executing the untrusted function.
Drawbacks: There's the problem that dlopen() will actually run the constructor function from the shared library. This would have to be properly sandboxed as well, and I can't think of a way to do so. Also, of course, this thing will only work on Linux. As far as I know, there is no way to ban WinNT system calls on Windows, so a similar solution on Windows won't be very secure.
Application-level sandboxing
[[ Any form of application-level sandboxing means that we cannot run untrusted machine code of any form. An untrusted function can overwrite its return value or data outside its call stack, thereby compromising the whole application (and effectively acquiring any permissions that the original application had). ]]
Make modules use a simple scripting language that does not support any system calls - just pure arithmetic operations and perhaps the ability to read an input stream. My application would contain an interpreter for this language.
Drawbacks: Unfortunately I have not found this scripting language. Many scripting languages have extensive functionalities (e.g. Python) and a sandbox (e.g. PyPy's sandbox) simply filters OS system calls. I would be shipping a lot of useless interpreter code with my application, and it arguably is more prone to security issues due to bugs in the intepreter than a language with simply no functionality to do things other than simple calculations and control flow instructions (basically a function that does not make any system calls). Furthermore, marshalling the data between C++ (machine code) and the scripting language is usually a slow process.
Distribute modules with a 'safe' compiled language that again does not support any system calls. My application would contain a JIT for this language.
Marshalling won't be necessary because my application would call into the JITted machine code of the untrusted module, so performance across this boundary should be fast. The untrusted module now won't be able to corrupt the stack, attempt return-oriented programming, or perform any other malicious actions, due to the language restrictions and checks of the 'safe' language. WebAssembly is the first and only language that comes to mind (if it can be called a language). (As far as I can tell, WebAssembly seems to provide the security guarantees for my use case, right?)
Drawbacks: The existing implementations of WebAssembly seem to be all browser-based, so I would have to steal an implementation from an open source browser. This does seem like a lot of work, considering that I would have to uncouple it from all the JavaScript and other browser bits. However, a standalone WebAssembly JIT based on LLVM seems to be under development.
Question:
What is the best way to execute an untrusted function efficiently that works on Windows, Mac, and Linux?
Right now, I think that the scripting language way would probably be the safest, and be the easiest for module writers. But for a more efficient solution, WebAssembly is probably better. Am I right, or are there better or easier solutions that I have not thought of?
(Remark: I think several pairs of tags used in this question have never been seen together before!)
Regarding WebAssembly:
Unfortunately, there is no production-quality stand-alone implementation yet. I expect some to show up in the future, but it hasn't happened yet.
For historical reasons, existing production implementations are all part of a JavaScript VM. Fortunately, none of these VMs is tied to a browser. If you don't mind including some unused JS baggage, you can embed them as they are (ripping out the JS would be very hard). One problem, though, is that these VMs don't yet provide embedding interfaces for Wasm specifically. You have to go through JS, which is stupid.
There is an initial design for a C and C++ API for WebAssembly, which would give direct access to an embedded Wasm VM. It is meant to be VM-neutral, i.e., could be implemented by any existing VM (the repo contains a prototype implementation on top of V8). This may evolve into a standard, but I cannot promise any timeline. Right now it's only for the brave.
Windows still use DLLs and Mac programs seem to not use DLL at all. Are there benefits or disadvantages of using either technique?
If a program installation includes all the DLL it requires so that it will work 100% well, will it be the same as statically linking all the libraries?
MacOS X, like other flavours of Unix, use shared libraries, which are just another form of DLL.
And yes both are advantageous as the DLL or shared library code can be shared between multiple processes. It does this by the OS loading the DLL or shared library and mapping it into the virtual address space of the processes that use it.
On Windows, you have to use dynamically-loaded libraries because GDI and USER libraries are avaliable as a DLL only. You can't link either of those in or talk to them using a protocol that doesn't involve dynamic loading.
On other OSes, you want to use dynamic loading anyway for complex apps, otherwise your binary would bloat for no good reason, and it increases the probably that your app would be incompatible with the system in the long run (However, in short run static linking can somewhat shield you from tiny breaking changes in libraries). And you can't link in proprietary libraries on OSes which rely on them.
Windows still use DLLs and Mac
programs seem to not use DLL at all.
Are they benefits or disadvantages of
using either technique?
Any kind of modularization is good since it makes updating the software easier, i.e. you do not have to update the whole program binary if a bug is fixed in the program. If the bug appears in some dll, only the dll needs to be updated.
The only downside with it imo, is that you introduce another complexity into the development of the program, e.g. if a dll is a c or c++ dll, different calling conventions etc.
If a program installation includes all
the DLL it requires, will it be the
same as statically linking all the
libraries?
More or less yes. Depends on if you are calling functions in a dll which you assume static linkage with. The dll could just as well be a "free standing" dynamic library, that you only can access via LoadLibrary() and GetProcAddress() etc.
One big advantage of shared libraries (DLLs on Windows or .so on Unix) is that you can rebuild the library and its consumers separately while with static libraries you have to rebuild the library and then relink all the consumers which is very slow on Unix systems and not very fast on Windows.
MacOS software uses "dll's" as well, they are just named differently (shared libraries).
Dll's make sense if you have code you want to reuse in different components of your software. Mostly this makes sense in big software projects.
Static linking makes sense for small single-component applications, when there is no need for code reuse. It simplifies distribution since your component has no external dependencies.
Besides memory/disk space usage, another important advantage of using shared libraries is that updates to the library will be automatically picked up by all programs on the system which use the library.
When there was a security vulnerability in the InfoZIP ZIP libraries, an update to the DLL/.so automatically made all software safe which used these. Software that was linked statically had to be recompiled.
Windows still use DLLs and Mac programs seem to not use DLL at all. Are they benefits or disadvantages of using either technique?
Both use shared libraries, they just use a different name.
If a program installation includes all the DLL it requires so that it will work 100% well, will it be the same as statically linking all the libraries?
Somewhat. When you statically link libraries to a program, you will get a single, very big file, with DLLs, you will have many files.
The statically linked file won't need the "resolve shared libraries" step (which happens while the program loads). A long time ago, loading a static program meant that the whole program was first loaded into RAM and then, the "resolve shared libraries" step happened. Today, only the parts of the program, which are actually executed, are loaded on demand. So with a static program, you don't need to resolve the DLLs. With DLLs, you don't need to load them all at once. So performance wise, they should be on par.
Which leaves the "DLL Hell". Many programs on Windows bring all DLLs they need and they write them into the Windows directory. The net effect is that the last installed programs works and everything else might be broken. But there is a simple workaround: Install the DLLs into the same directory as the EXE. Windows will search the current directory first and then the various Windows paths. This way, you'll waste a bit of disk space but your program will work and, more importantly, you won't break anything else.
One might argue that you shouldn't install DLLs which already exist (with the same version) in the Windows directory but then, you're again vulnerable to some bad app which overwrites the version you need with something that breaks your neck. The drawback is that you must distribute security fixes for your app yourself; you can't rely on Windows Update or similar things to secure your code. This is a tight spot; crackers are making lots of money from security issues and people will not like you when someone steals their banking data because you didn't issue security fixes soon enough.
If you plan to support your application very tightly for many, say, 20 years, installing all DLLs in the program directory is for you. If not, then write code which checks that suitable versions of all DLLs are installed and tell the user about it, so they know why your app suddenly starts to crash.
Yes, see this text :
Dynamic linking has the following
advantages: Saves memory and
reduces swapping. Many processes can
use a single DLL simultaneously,
sharing a single copy of the DLL in
memory. In contrast, Windows must load
a copy of the library code into memory
for each application that is built
with a static link library. Saves
disk space. Many applications can
share a single copy of the DLL on
disk. In contrast, each application
built with a static link library has
the library code linked into its
executable image as a separate
copy. Upgrades to the DLL are
easier. When the functions in a DLL
change, the applications that use them
do not need to be recompiled or
relinked as long as the function
arguments and return values do not
change. In contrast, statically linked
object code requires that the
application be relinked when the
functions change. Provides
after-market support. For example, a
display driver DLL can be modified to
support a display that was not
available when the application was
shipped. Supports multilanguage
programs. Programs written in
different programming languages can
call the same DLL function as long as
the programs follow the function's
calling convention. The programs and
the DLL function must be compatible in
the following ways: the order in which
the function expects its arguments to
be pushed onto the stack, whether the
function or the application is
responsible for cleaning up the stack,
and whether any arguments are passed
in registers. Provides a mechanism
to extend the MFC library classes. You
can derive classes from the existing
MFC classes and place them in an MFC
extension DLL for use by MFC
applications. Eases the creation
of international versions. By placing
resources in a DLL, it is much easier
to create international versions of an
application. You can place the strings
for each language version of your
application in a separate resource DLL
and have the different language
versions load the appropriate
resources. A potential
disadvantage to using DLLs is that the
application is not self-contained; it
depends on the existence of a separate
DLL module.
From my point of view an shared component has some advantages that are somtimes realized as disadvantages.
shared component defines interfaces in your process. So you are forced to decide which components/interfaces are visible outside and which are hidden. This automatically defines which interface has to be stable and which does not have to be stable and can be refactored without affecting any code outside the component..
Memory administration in case of C++ and Windows must be well thought. So normally you should not handle memory outside of an dll that isn't freed in the same dll. If you do so your component may fail if: different runtimes or compiler version are used.
So I think that using shared coponents will help the software to get better organized.
If I statically link an executable in ubuntu, is there any chance that that executable won't work within another distribution such as mint os? or fedora? I know processor types are affected, but other then that is there anything else I have to be wary of? Sorry if this is a dumb question. Thanks for any help
There are a few corner cases, but for the most part, you should be in good shape with static linking. The one that comes to mind is libnss. This particular library is essentially impossible to link statically, because of the way it does its job (permissions, authentication, security tasks). As long as the glibc-versions are similar, you should be ok on this issue, though.
If your program needs to work with subtle features of the kernel, like volume managers, you've got a pretty slim chance of getting your program to work, statically linked, across distros, because the kernel interfaces may change slightly.
Most typical applications, the kind that even makes sense to discuss portability, like network services, gui-applications, language tools (like compilers/interpreters) wont have a problem with any of this.
If you statically link a program on one computer and then move it to another computer in which the system basically runs the same way, then it should work just fine. That's the point of static linking; that there are no other files the program depends on - it's entirely self-contained, so as long as it can run at all, it will run the same way it does on its "host" system.
This contrasts with dynamic linking, in which the program incorporates elements of other files (libraries) at runtime. If you move a dynamically linked program to another system where the libraries it depends on are different (or nonexistent), it won't work.
In most cases, your executable will work just fine. As long as your executable doesn't depend on anything unusual being present for it to function, there will be no problem. (And, if it does depend on something unusual being present, then you'll have the same issue even if you dynamically link.)
Statically linking is usually safer than dynamically linking for compatibility between different UNIX environments, as long as the same CPU is in use.
To have a statically linked binary fail, again assuming the same processor architecture, you would have to do something such as link on a system using the a.out binary format and try to execute it on a system running ELF, in which case the dynamically linked version would fail just as badly.
So why do people not routinely link statically? Two reasons:
It makes the executable larger, sometimes MUCH larger, and
If bugs in the libraries are fixed, you'll have to relink your program to get access to the bug fixes. If a critical security bug is fixed in the libraries, you have to relink and redistribute your exe.
On the contrary. Whatever your chances are of getting a binary to work across distributions or even OSes, those chances are maximized by static linking. Static linking makes an executable self-contained in terms of libraries. It can still go wrong if it tries to read a file that's not there on another system.
For even better chances of portability, try linking against dietlibc or some other libc. An article at Linux Journal mentions some candidates. A smaller, simpler libc is less likely to depend on things in the filesystem that differ from distro to distro.
I would, for the reasons noted above avoid statically linking something unless you absolutely must.
That being said, it should work on any other similar kernel of the same architecture (i.e. if you statically link on a machine running linux 2.4.x , the loader VDSO is going to be different on linux 2.6, VDSO being virtual dynamic shared object, a shared object that the kernel exposes to every process containing loader code).
Other pitfalls include things in /etc not being where you'd think, logs being in different places, system utilities being absent or different (ubuntu uses update-rc.d, RHEL uses chkconfig), etc.
There are sometimes that you just have no choice. I was writing a program that talked to LVM2's string based cmdlib interface in favor of using execv().. low and behold, 30% of the distros I needed to support did NOT include that library and offered no way of getting it. So, I had to link against the static object when producing binary packages.
If you are using glibc, you can be confident that stuff like getpwnam() and friends will still work .. just make sure to watch any hard coded paths (better yet, make them configurable at run time)
As long as you can guarantee it'll only be executed on a similar version of the OS on similar hardware your program will work fine if it statically linked. so, if you build for a 2.6 Linux and statically link you will be fine to run on (almost) all 2.6 Linux distributions.
Be warned you can't statically link some parts of GLIBC so if you're using them you'll have to dynamically link anyway. From memory the name service stuff (nss) parts required dynamic linking when I was investigating it.
You can't statically link a program for (say) Linux then expect it to run on BSD or Windows. BSD and Unix don't present or handle their system calls in the same way Linux does. I tell a slight lie because the BSDs have a Linux emulation layer that can be enabled, but out of the box it won't work.
No it will not work. Static linking for distribution independence is a concept from the old unix ages and is not recommended. By the fact you can't as many libraries are not avail as static libraries anyway.
Follow the Linux Standard Base way, this is your only chance to get as much cross distribution portability as possible.
The LSB also works fine if you program for FreeBSD and Solaris.
There are two compatibility questions at issue here: library versions and library inventory.
You don't say what libraries you are using.
If you have no '-l' options, then the only 'library' is glibc itself, which serves as the interface to the kernel. Glibc versions are upward compatible. If you link on a glibc 2.x system you can run on a glibc 2.y, for y > x. The developers make a firm commitment to this.
If you have -l options, static linking is always safe. If you are dynamically linked, you have to ensure that (1) the library is present on the target system, and (2) has a compatible version. Your Mileage Might Vary as to whether the target distro has what you need.
I'm working on MPSOC, specially STM ST40 (SH4 base) and ST231 and I'm wondering which OS i can use on these to port a parallel application, I already had a look at STLinux which is the STM distribution of a Linux platform for their MPSOC (which unfortunately doesn't work well for ST231 coprocessors) and I also had a look at OS21 which is a task based OS.
Any information about other RTOS are warmly welcome! (specially those with libc and pthreads :)
Those 4 came to my mind:
MicroC/OS-II: Its free and simple, but i think there are too few good resources available
LibeRTOS: I can recommend that. I used it several times for different projects. It's good it's fast and the dual kernel concept is really well done.
RTLinux: Can't tell you much about that one. Only used it once for a very small project and didn't get deep "behind the scenes" But it was fast and reliable. (and very expensive)
VxWorks: Awesome OS... From Wikipedia:
multitasking kernel with preemptive and round-robin scheduling and fast interrupt response
Memory protection to isolate user applications from the kernel
SMP support
Fast, flexible inter-process communication including TIPC
Error handling framework
Binary, counting, and mutual exclusion semaphores with priority inheritance
Local and distributed message queues
Full ANSI C compliance and enhanced C++ features for exception handling and template support
POSIX PSE52 certified conformance
File system.
IPv6 Networking stack
VxSim simulator
Supports: C/C++/JAVA
If money is no problem: Use VxWorks! You can do anything: Upgrade your fridge, built a war machine or fly to Mars ;-)
Otherwise check out LibeRTOS...
If you really want ot use an RTOS, be prepared to use a native API that is way more efficient and streamlined than pthreads...
I have used Micrium's µC/OS-II on several projects, on SH4 and a couple of different ColdFires. I continue to recommend it for new projects today.
Micrium has just announced a major upgrade to be called µC/OS-III that will add unlimited preemptively scheduled threads, as well as a round-robin scheduler for equal priority threads. It doesn't appear to be for sale yet, however.
If you need the capabilities, they also have a FAT file system, a PEG graphical UI library, USB device and host, and TCP/IP available for additional license fees.
Source code to everything is included in the price, and I've always found their support to be friendly and knowledgeable.
With the processors you mention you seem to be into set-top boxes.
You have the choice between the ST Linux distro, which is not very stable and the OSXX distro, which is proprietary for ST, but much more stable and with nice tools for debugging and the like (I'm not so sure about OSCC and libc/pthreads)
Barebones/AMP - because it allows 100% control and it allows the lowest latency.
Using Linux or FreeRTOS is very comfortable but it comes with a price tag.