Do shared libraries get copied into RAM and shared with all processes - dynamic

In Linux for example, if we have a shared .so library that is used by multiple processes, do the entire library contents get copied into RAM, like the .text, .data, and .ro section etc. ?
Does every process that uses this library, then access the library at the exact same location in RAM? It's something I am not sure about.

Related

Can malware binaries be in packed form?

Recently I'm reading malware analysis. I'm going through this Malware Repository (https://github.com/ytisf/theZoo). Here we can find malware binaries. Can binaries be in packed form? If so, how can we say that these binaries are packed or not?
PS: Packers compress a program and will try to hide internals from us(sort of compression or encryption). I got a doubt regarding this. Can binaries be in the packed form or not?
Edit2: In this repository, they just zipped it to be safe which is not actual packing I'm talking about. After unzipping, we will get a binary. Whether that can be in packed form or not?
First of all, the distinction you make between "packers" and archiver programs (ZIP, etc) or compression programs doesn't appear to have any basis.
A "packed" executable cannot be executed directly. It must be unpacked first. This is exactly the same as (say) a ZIP file containing malware, or a malware file that has been compressed with a standard compression program.
What about a "packed" executable that has been created by a program that does the "packing" in a secret way ... to evade detection? Well that won't work. The malware still has to be unpacked before it can be executed. So that means that the bad gut now has a second problem: getting the unpacker onto the victims machine. And once someone (an anti-hacker) gets hold of the super-secret unpacker, it is no longer secret. It can be reverse engineered ... or simply used as-is by an AV product on suspicious binary files.
The only practical use of "packing" that I can think of is to add self-unpacking functionality to the malware. The malware (as distributed) would consist of an executable with a small amount of code that implemented the unpacker. The rest of the executable would be packed code that implements the nasty stuff. When the user runs the malware, it would unpack the packed code, load it into memory and start executing.
However, there are potential ways to detect or prevent this kind of thing.
If the unpacker writes the executable code into a file prior to loading it, an AV product could detect that.
If the packer attempts to load code into itself, there are ways that could be blocked; e.g. using memory protection hardware + the OS, etc to stop the unpacker from creating memory segments containing executable code; see https://en.wikipedia.org/wiki/Executable_space_protection.
An AV could look for the signature in the packed code, or it cold look for a signature in the unpacker code.
In short, malware could use some kind of "packing" to hide itself, but there must be an executable component somewhere to unpack it.
If so, how can we say that these binaries are packed or not?
If the malware is distributed as a non-executable you figure out what is going to unpack it, and then see if that process is going to give you an executable.
If the malware is a self-unpacking executable, you reverse engineer the unpacking component to figure out how it works.

distinguish shared objects from position independent executables

I'm looking for a fast way to check if a ELF binary is a shared object or a position independent executable. I think a can do that by checking the contained symbols / functions. I'm looking for a more efficient way of not having to read the complete file. I have to perform the check on different platforms, at least Android, Linux (32 and 64 bit).
I'm looking for a fast way to check if a ELF binary is a shared object or a position independend executable.
There is no way to check: a PIE executable is a shared object.
I think a can do that by checking the contained symbols / functions.
Symbols can be stripped, and once they are, you can't tell.
shared objects and executables they normally differ by the linked startup code
That's true: the PIE is normally linked with Scrt1.o, but a shared library is normally not. But there is nothing to prevent a shared library to be linked with Scrt1.o as well, and in a stripped binary even finding that startup code may be somewhat problematic.
If what you really want is to distinguish between a shared library and a PIE executable which you built yourself (rather than solving a general case of any shared library and any PIE), then checking for presence of PT_INTERP (readelf -l a.out | grep INTERP) is likely the easiest way to go: a PIE executable is guaranteed to have PT_INTERP, and shared libraries normally don't have it (libc.so.6 is a notable exception).
Try the elfutils and the included program eh-readelf:
eh-readelf --file-header $ELFFILE
showw you the file header and what kind of file it is:
...
Typ: EXEC (Executable file)
...
or
Typ: DYN (Shared object file)
In combination with a little sed line you should get the results you want.

dll process in system?

i have a doubt in dlls loading &processing in memory ,normally dlls are shared library so dll should loads once is enough.if a process loads a dll (ex.advapi32.dll )into memory means ,after that another process how refers advapi32.dll to that process ...how can share common location for each process...
I'm not entirely sure what your question is, but yes, if multiple processes import the same DLL, then the read-only sections of that DLL are typically mapped into all of those processes. On the other hand, section that can change, like the BSS (variable) segment, get a copy in each process so that the changes that one process makes are invisible to other processes. If you want certain changes to be shared between processes for your own DLL, you can mark a data section in the DLL as shared. Exactly how you do this depends on the development tools you're using.

What form is DLL & what makes it processor dependent

I know DLL contains one or more exported functions that are compiled, linked, and stored separately..
My question is about not about how to create it.. but it is all about in what form it is stored.. Is it going to be in the form of 0's & 1's.. or in assembly commands ADD, MUL, DIV, MOV, CALL, RETURN etc..
Also what makes it to be processor dependent.. (like x86, x87, IBM 700 instruction set)..
Can someone please explain it little briefly..!
First of all, everything in a computer is in the form of "0's & 1's" . The fact that the computer can display some of these as text, pictures, sounds, 3D models, etc. is just a matter of how you interpret them. But down there, at the metal, it's all just "0's & 1's" (also known as bits). Note though that they are always grouped together in groups of 8, and these are called "bytes". It's really for the sake of efficiency, because operating with every bit individually would be too tedious. Actually, todays computers don't even operate on single bytes anymore (or rather - they do it very rarely). Mostly you operate with 4 or 8 bytes at a time, depending on whether you have a 32-bit or 64-bit CPU (that's in layman's terms, it's actually a bit more complicated than that).
As for a .DLL file - like an .EXE file, it contains bytes that describe instructions that a CPU can execute. The CPU takes these bytes directly from the .DLL/.EXE and executes them without any further modifications. That's why these files are CPU-specific. In different CPU architectures the same combination of bytes means different things, so a .DLL/.EXE will run correctly only on the CPU for which it was designed. On other CPUs these bytes will mean some other instructions, and when run, the program will most likely do some utter nonsense and crash immediately.
The assembly commands you mentioned also deserve an explanation. "Assembler" is not a language that a CPU can understand. It's a language a human can understand. It was created because writing directly in machine code (the bytes that the CPU actually understands) is very difficult. What you get is utter gibberish on the screen (try opening some .EXE file in Notepad!) but every bit has to be precisely set for it to work.
So assembly language is basically the same thing, except these instructions are written in text that humans can read. For every machine code that a CPU can understand, there is am instruction with a human-friendly name. An assembly compiler simply reads these instructions and replaces them with the bytes that represent the actual instructions for the CPU to execute. It's a 1:1 operation. Every command in assembly language matches a single machine instruction (again, in layman's terms).
So you see, there isn't even a single assembly language. Every CPU architecture has its own assembly language, because they each have different instructions.
Note though that all this applies to native .DLL/.EXE files. .NET files are different - they don't contain machine code, but rather instructions for an abstract, nonexistent CPU. It's like Java bytecodes. When a .NET .DLL/.EXE is run, the .NET runtime translates it from the abstract instructions to the instructions that the specific CPU can understand. They use a lot of tricks to make this very fast, so these files run almost as fast as simple .DLL/.EXE files.
Does this clear things up? :)
Native DLLs (not .NET assemblies) usually contain machine code that can only be run on a certain platform. The machine code is a sequence of bytes that the processor treats as instructions (ADD, MOV, etc.).
In Windows, dll's are stored in the PE format which is basically a collection of sections that holds the information about how to map it into memory. some sections contains the program's code (which is of course processor dependent), others contains the program's data, other the exported and imported functions and so on.
Managed code is compiled to some intermediate language that is JITed by the run-time as it is executed. therefore, your dll won't contain any processor dependent code and you'll be able to execute your program on any platform with the relevant run-time.
it depends on your DLL. generally, a DLL contains executable code as an EXE file. those code DLLs are processor dependent since the code can only be executed on a specific platform. the code is stored using the same "format" as an EXE file (binary machine code).
however, a DLL can sometimes contains only data: they are then called "resource DLL" and are not processor dependent at all. they act as a container for data files used by applications.
note that many DLLs are hybrids: they contain both code and resources. for example, most DLLs which comprises the user part of the Windows operating system are hybrid: you can open them using Visual Studio or a Resource Explorer to see the resources (the data segments) they contain, or open them with Dependency Walker or dumpbin to see the functions (the code segments) they contain.
(of course this answer is really Windows specific, i don't know for .so files which are the linux equivalent of a DLL)
Both a DLL and an EXE contain executable code.
In the case of a DLL it doesn't have the necessary parts to be directly executable. It must be called from an other piece of executable code. One DLL can call another, but all must ultimately be called from and EXE.
So the rules about what's compatible with what processor that apply to EXEs also apply to DLLs.

DLL and LIB files - what and why?

I know very little about DLL's and LIB's other than that they contain vital code required for a program to run properly - libraries. But why do compilers generate them at all? Wouldn't it be easier to just include all the code in a single executable? And what's the difference between DLL's and LIB's?
There are static libraries (LIB) and dynamic libraries (DLL) - but note that .LIB files can be either static libraries (containing object files) or import libraries (containing symbols to allow the linker to link to a DLL).
Libraries are used because you may have code that you want to use in many programs. For example if you write a function that counts the number of characters in a string, that function will be useful in lots of programs. Once you get that function working correctly you don't want to have to recompile the code every time you use it, so you put the executable code for that function in a library, and the linker can extract and insert the compiled code into your program. Static libraries are sometimes called 'archives' for this reason.
Dynamic libraries take this one step further. It seems wasteful to have multiple copies of the library functions taking up space in each of the programs. Why can't they all share one copy of the function? This is what dynamic libraries are for. Rather than building the library code into your program when it is compiled, it can be run by mapping it into your program as it is loaded into memory. Multiple programs running at the same time that use the same functions can all share one copy, saving memory. In fact, you can load dynamic libraries only as needed, depending on the path through your code. No point in having the printer routines taking up memory if you aren't doing any printing. On the other hand, this means you have to have a copy of the dynamic library installed on every machine your program runs on. This creates its own set of problems.
As an example, almost every program written in 'C' will need functions from a library called the 'C runtime library, though few programs will need all of the functions. The C runtime comes in both static and dynamic versions, so you can determine which version your program uses depending on particular needs.
Another aspect is security (obfuscation). Once a piece of code is extracted from the main application and put in a "separated" Dynamic-Link Library, it is easier to attack, analyse (reverse-engineer) the code, since it has been isolated. When the same piece of code is kept in a LIB Library, it is part of the compiled (linked) target application, and this thus harder to isolate (differentiate) that piece of code from the rest of the target binaries.
One important reason for creating a DLL/LIB rather than just compiling the code into an executable is reuse and relocation. The average Java or .NET application (for example) will most likely use several 3rd party (or framework) libraries. It is much easier and faster to just compile against a pre-built library, rather than having to compile all of the 3rd party code into your application. Compiling your code into libraries also encourages good design practices, e.g. designing your classes to be used in different types of applications.
A DLL is a library of functions that are shared among other executable programs. Just look in your windows/system32 directory and you will find dozens of them. When your program creates a DLL it also normally creates a lib file so that the application *.exe program can resolve symbols that are declared in the DLL.
A .lib is a library of functions that are statically linked to a program -- they are NOT shared by other programs. Each program that links with a *.lib file has all the code in that file. If you have two programs A.exe and B.exe that link with C.lib then each A and B will both contain the code in C.lib.
How you create DLLs and libs depend on the compiler you use. Each compiler does it differently.
One other difference lies in the performance.
As the DLL is loaded at runtime by the .exe(s), the .exe(s) and the DLL work with shared memory concept and hence the performance is low relatively to static linking.
On the other hand, a .lib is code that is linked statically at compile time into every process that requests. Hence the .exe(s) will have single memory, thus increasing the performance of the process.