How do I cut out assembler executable bloat? - optimization

I've got working multiplatform Hello World code in Gas, NASM, and YASM, and I would like to shrink their corresponding executable files from 76KB to something more reasonable for a Hello World assembly program, seeing as a basic Hello World C program leads to an 80KB executable, and assembly should be much smaller. I believe the bulk of the executables are filled with junk from the linker options.
Trace:
LIBS=c:/strawberry/c/i686-w64-mingw32/lib/crt2.o -Lc:/strawberry/c/i686-w64-mingw32/lib -lmingw32 -lmingwex -lmsvcrt
ld ld -o $(EXECUTABLE) hello.o $(LIBS)
hello.exe
Hello World!
Code:
.data
msg: .ascii "Hello World!\0"
.text
.global _main
_main:
pushl $msg
call _puts
leave
movl $0, %eax
ret
If I remove any of the options in LIBS, either the link process fails, or the resulting executable raises a Windows error when it runs. So the logical thing to do is replace the puts call with something simpler, like sys_write, but I don't know how to do this multiplatform. The little documentation online says to use int 0x80 to perform a call to the kernel, but this only works in Linux, not in Windows, and I want my assembly code to be multiplatform.

Your program bloat comes mostly from the C runtime library. In Windows, a simple hello world program can be < 5K if you write your own "tiny" CRT. Here is a link to a project which explains all of the details about how to shrink your EXE to its smallest possible size:
http://www.codeproject.com/Articles/15156/Tiny-C-Runtime-Library

For Windows, you can call the native Win32 API functions, such as GetStdHandle() and WriteFile() to write directly to stdout.
For Unix-like systems, you can call the write() syscall with file descriptor 1 for stdout.
The details of exactly how you do each of these will depend on which assembler and OS you are using.

You should be able to link dynamically to the C runtime library instead of including it statically. I don't know how to do it in Linux, but in Windows you can use msvcrt.dll.

The assembler bloat is most likely coming from the C lib dependencies, especially for puts. refactoring the code to print Hello World without using a C call will most likely require OS-specific assembly code, as the Unix standard involves interrupts that make calls to the kernel, and Windows has its own VB-like API for such tasks.
I did manage to find a solution that would create small executable while still maintaining platform agnosticism. Ordinarily, C preprocessor directives would do the trick, but I'm not sure which assembly languages even have preprocessor syntax. But a similar effect can be achieved through the use of controlled, included assembly code files. A collection of wrapper code files can handle OS-specific assembly code, while an included assembly file does the rest. And a simple Makefile can run the respective build console commands to reference the respective wrapper code on the desired platform.
For example, I was able to quickly construct FASM code that works this way. (Though I have yet to inform it to actually bypass puts with something less bloaty.) Anyway, it's progress.

Because almost all C functions use the CDECL calling convention where you the caller adjusts the stack not the callee (the function).
You will get into trouble if you don't learn how to do things correctly now, read harder to trackdown bugs.
Try this:
push szLF
push esp
push fmtint2
call printf
add esp, 4 * 3
push msg
call puts
push szLF
push esp
push fmtint2
call printf
add esp, 4 * 3
Run it and notice the numbers before and after your call to puts. They are different no? Well, they are supposed to be the same. Now add:
add esp, 4
after your call to puts and run it again.. The numbers are the same now? That means you have a balanced stack pointer and the function uses the CDECL calling convention.

Related

Using Vector Offset table with MBED library on Eclipse IDE

I'm a newly graduated electronics engineer and one of my first tasks in my new job is to import a code to Mbed compiler.
I'm trying to run the Mbed Blinky example on my custom hardware with LPC1769 chip. I've exported the Blinky app to GNU Eclipse from the Online MBED compiler and imported it to the IDE.
The Mbed blinky code runs fine when I set the appropriate led pin(changing LED1 in the PinNames.h from 1.10 to 2.13 for my hardware) and flash it directly. So MBed and my custom HW isn't problematic. However, my firm has a custom bootloader and it's required to use it with any application. The custom bootloader requires that I start the program beginning from 0x4000.
For this my firm was previously adding this line to their code, flashing the bootloader and uploading the IDE's output .bin file to the board with a custom FW loading program.
SCB->VTOR = (0x4000) & 0x1FFFFF80;
When I try to follow the same steps, compiler builds without any complaints, but I see no blinks when I upload the program to my bootloader.
I'm suspecting I have to make some changes to the built-in CMSIS library, and/or the startup_LPC17XX.o and system_LPC17xx.o files come with the MBED export, but I'm confused. Any help would be much appreciated.
Also, I'm using the automatically built make file, in case there's any wonders.
Most importantly, you need to adjust the code location in the linker script, for example:
MEMORY {
FLASH : ORIGIN = 0x4000, LENGTH = 0x7C000
}
Check the startup code and linker script for any further absolute addresses in flash memory.
Adjusting the VTOR is required for interrupts, if the bootloader doesn't already do that. The & operation looks weird; it should be sufficient to simply write 0x4000, or, even better, something like:
SCB->VTOR = (uint32_t) &_IsrVector;
Assuming you have defined _IsrVector in your linker script or startup code to refer to the very first byte in the vector table, i.e. the definition of the initial stack pointer. This way you don't have to adjust the code if the memory layout is changed in the linker script, and you avoid magic numbers.

Print NASM program on Windows 7 SP1 64-bit excluding DOSBOX, excluding C, and "possibly" Excluding Windows API calls

I've been filling myself up with notes trying to successfully create my first program on Windows 7 with NASM, but with a few self imposed stipulations (until I'm ready to move forward). In creating this first program, however, I have a ton of questions.
.
The stipulations for now are that:
I'm running Window 7 SP1 - 64-bit
I do not wish to use DOSBox so Interrupts 0x21-24 are likely not applicable
I do not wish to rely on C so this is all NASM
I would really like to avoid downloading Visual Studio or associated WDK tools if I can (this depends on whether or not I NEED to interact with the Windows API and relates to Question 2 below)
I've downloaded and installed MinGW
I'm writing my code in Notepad++ and saving as *.asm
I am linking using "ld" for now, but from what I've read, most seem to recommend "GoLink" (and Alink hasn't been updated in years?). I'll probably migrate to GoLink after I've assured myself that "ld" may be too limiting
I want to know if printing is possible without the use of the Windows API or C because of the code below?
.
The only code example that has worked for me in some capacity can be found here.
nasm is not executing file in Windows 8
.
;FILE: main.asm
[section] .text
global _main
_main:
mov eax, 6
ret ; returns eax (exits)
Linked:
c:\Users\James\Desktop>nasm -fwin32 main.asm
c:\Users\James\Desktop>ld -e _main main.obj -o main.exe
c:\Users\James\Desktop>main.exe
c:\Users\James\Desktop>echo %errorlevel%
6
.
My questions (a ton):
The fact that in the code above "ret" by itself gives output, although it just returns whatever is in EAX, is there a way to use it (or another directive outside of the Windows API) to return the contents of a variable (hopefully a string variable)? I tried to use ret with DOS calls, but as noted above, that definitely doesn't work because I'm on a 64-bit system.
In case I absolutely must use the Windows API, is the only way to interact with it by using the WDK tools? Is there some other way because that last time I downloaded Visual Studio and associated WDK tools it took up a ton of memory and massively slowed down my computer. Is there another way to make programs give output or print to the screen either by using internal commands or some other method to use API calls? One thread I admittedly skimmed (amidst 40 more tabs I have open) mentions "Russinovich's Windows Internals" but not a direct answer. At current every time I use code with the extern commands "ld" tells me that the references to commands like WinMain/WinMain#16 are undefined. In the same vein is there a table I can consult containing accurate calls to the API (i.e. _ExitProcess#4 vs. ExitProcess). I found this link to what think may be the NT API but I'm not sure it applies given my stipulations, but in reality, I'm just kind of confused:
http://j00ru.vexillium.org/ntapi/
In bits of code I've encountered I've seen directives for [Bit 16], [Bit 32], and [Bit 64]. [Bit 16] is likely ignorable, but I'm confused by the [Bit 32] and [Bit 64] for the following reasons which may not even be related: Via the code above I'm using the command, "nasm -fwin32 main.asm", then I'm linking it successfully and going on to receive output. For some reason - though I have not read the full "ld" documentation yet - when I use the command "nasm -fwin64 main.asm" and link it in the same way I receive an error saying "main.obj: File not recognized: File format not recognized". I don't understand why differentiating between 32 and 64 while I'm on a native 64-bit machine causes an error although this probably is just unique to ld.
.
In the meantime I'll be reading this question and will post an update it if helps: Executable isn't compatible with 64 bits processor
I can't answer some parts in great detail, so I expect somebody either putting up better answer, or feel free to edit this one.
you are linking against default clib, so your _main is called after Clib is initialized, the ret with value in eax is like return 6; in C++. Then Clib correctly destructs everything and calls windows exit process with exit code 6. You can return only int from _main, and I'm not even sure if full int is propagated to exit process call, or only 8 bit value is used. So you can return single char in ASCII encoding, if you treat that number as char.
You must call Windows API, if you want to display something in console/window, or write something into file, ie. do any output (and of course also for input). There's no peripheral available to win32/64 executable directly, like in DOS CGA/EGA/VGA text modes accessible trough int 10h or video ram at B800:0000. Any try to access some I/O peripheral directly should result into access violation. Only Win API should be legal for user-level application code.
How much of WDK you need I have no idea, haven't developed anything for windows for years. I think it's even possible to create executable without WDK, which would provide correct externs and dependencies on kernel32.dll and similar, but the amount of effort is way beyond simply using proper parts of WDK or clib from MinGW.
I think your linker is set to default to 32b executable, you have to figure out what kind of object format is produced by nasm for -fwin64 and how link that one with ld.
Why the difference. The 64b OS can run 32b binaries. But you can't mix 32/64 in single executable so easily (if at all). So you are either producing 32b or 64b binary, and you have to adjust everything to it (asm instructions used, directives and options, and WinAPI calls).

Why does LLDB refuse to break on compiled objective C methods?

I have a compiled objective-C binary on iOS 8.1 which I am attempting to debug with lldb on my machine and debugserver on the handset. (No XCode involved- though I am willing to get it involved if that is the issue.)
Ida can correctly recognize the binary as objective-C and decompose objects and component messages. Because of this, I would expect commands like
platform select remote-ios
connect://ip:port
breakpoint set --name "-[Login doLoginStuff]"
to correctly function, but this method is called in code without breaking in lldb.
Is there the need for some type of target call to hint to the debugger what the remote architecture or SDK target is?
Without the symbols I don't believe lldb can map -[Login doLoginStuff] to a memory address. If it cant find the name it fails silently as far as I remember.

Pure function in assembly language on a Mac

Fibonacci sequence is a great 'hello-world' app when starting with a new language. I want to make a pure machine program that will execute just that, without wasting any resources on intermediary VM, unnecessary memory management, etc.
The best solution is writing down an assembly code and compile it to native binaries. But I've never worked with Assembly language, so what is the best place to start from?
I'm using iMac 64-bit dual-core x86 system.
It's fun working with assembly language and it's a great way to learn more about the internal machinery. I am not sure you are wasting that many resources using objective-c for computing the fibonacci sequence but maybe you can prove me wrong.
To learn assembly start with something really simple and then add more functions and inputs and outputs to understand the system calls and function call sequences and then get more creative.
Be sure to document each line as it's hard maintaining assembly.
For Mac OS X
Create a file called simple.asm :-
; simple.asm - exit
section .text
global simple ; make the main function externally visible
simple:
mov eax, 0x1 ; system call number for exit
sub esp, 4 ; OS X (and BSD) system calls needs "extra space" on stack
int 0x80 ; make the system call
Compile and Link it :-
nasm -f macho simple.asm
ld -o simple -e simple simple.o
Run it :-
asm $ ./simple
asm $ echo $?
1
There are a lot of free resources online for x86 assembly as well as the intel 64-bit specific details.
http://en.wikibooks.org/wiki/X86_Assembly
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
Have a look at resources for system calls for the bsd kernel and mach kernel for osx specific system calls.
http://osxbook.com
http://www.freebsd.org/doc/en/books/developers-handbook/x86-system-calls.html
http://peter.michaux.ca/articles/assembly-hello-world-for-os-x
Have a look at linkers and loaders if you want to create libraries.

Microchip Linker problem

when i was trying to build my project in MPLAB,i got this Build error message..
Clean: Deleting intermediary and output files.
Clean: Deleted file "M:\12 CCP PWM\12 CCP PWM.o".
Clean: Done.
Executing: "C:\MCC18\bin\mcc18.exe" -p=18F46K20 "12 CCP PWM.c" -fo="12 CCP PWM.o" -Ou- -Ot- -Ob- -Op- -Or- -Od- -Opa-
MPLAB C18 v3.20 (feature limited)
Copyright 1999-2005 Microchip Technology Inc.
This version of MPLAB C18 does not support the extended mode
and will not perform all optimizations. To purchase a full
copy of MPLAB C18, please contact your local distributor or
visit buy.microchip.com.
Executing: "C:\MCC18\bin\mplink.exe" /l"C:\MCC18\lib" "C:\MCC18\lkr\18f46k20i.lkr" "12 CCP PWM.o" /u_CRUNTIME /o"12 CCP PWM.cof" /M"12 CCP PWM.map" /W
MPLINK 4.20, Linker
Copyright (c) 2008 Microchip Technology Inc.
Error - could not find definition of symbol 'main' in file 'C:\MCC18\lib/c018i.o'.
Errors : 1
Link step failed.
----------------------------------------------------------------------
Release build of project `M:\12 CCP PWM\12 CCP PWM.mcp' failed.
Thu Apr 16 14:34:41 2009
----------------------------------------------------------------------
BUILD FAILED
I have checked that the path to the linker library was correct.I suspect it has something to do with my source code...Any helps are very much appreciated.
Here is my source code.. http://cl1p.net/mplabc18
The compiler may be looking for a different definition of main. I have seen this in some PIC code:
// Main application entry point.
#ifdef __C30__
int main(void)
#else
void main(void)
#endif
{ ... }
It is a good idea to add the specific linker file to your project. If you are using MPLAB, under the workspace, right click on linker files and add the linker file from mcc16\lkr folder for the specific processor.
Clean and Re-compile the solution
The only thing that stood out to me in your source file is this part of the ISR declaration:
#pragma code InterruptVectorLow = 0x18
The user guide of the compiler you're using states this should be:
#pragma code low_vector=0x18
Since this declaration is just before your main function it might be giving you trouble.
Edit:
None of the presented solutions seem to work so I have just copy-pasted your code into a new MPLAB project, set up for the PIC18F46K20 device. It compiles just fine with the MCC18 compiler. The only thing that's missing from the project is the "12 CCP PWM.h" header file (which I do not have). So either there's something wrong with your header file, there's something wrong with your project setup, or the fact that I'm using MCC18 3.30 instead of 3.20 is the problem.
code compiles fine for me (C18 3.30 full)
i've had MPLAB flake out a bit on me especially on large source trees, many times a reboot has solved it, absolutely no idea why, tried everything else and it was the only way to get mplab to reset.
Personally I would not strain the corners of the implementation by having source file names with several spaces in, particularly with an embedded toolchain!
But it seems like they're making a reasonable effort to add all the double-quotes, so maybe that's not a real problem.
Do you actually have a 'main' function in your code, and if so, exactly how is it defined?
I use a third-party compiler, so I can't offer any specific experience on that. But one thing I may suspect is that something in the code is causing the compilation to stop partway through. This can be an unterminated comment, or a function with a closing brace missing. Consider especially the #included files, because you can't see the effects in your editor when you look at the main file, and particularly check any #includes that you have written yourself. And at the top of the list is, "what did you change last"?
What I do at this point is make a branch copy, and start mercilessly hacking out huge blocks of code, just to see when the error goes away. Divide and conquer. Of course, this can be time consuming, so I'd probably ask on StackOverflow, first :)
It's been a while, but I saw that you used a pragma to define the location of the interrupt handler before you created the function, might you need to do the same thing with main()?
It might be handled in the .h file - I'm not sure. I only ever used ASM on the PICs and I explicitly handled everything (ie, at 0x000 jump to main; at the interrupt vector address jump to this memory address; at main address do these things, etc). 'main' for me was defined to be an available address in the code section (which I see you've done, started the code section then defined main) but I believe I had to explicitly define that 'main' was to start at a memory address in the code section. Again, it was ASM, but I wouldn't doubt that you need to do something similar - a pragma to define main as main.
If c018i.o contains the reset vector, and it refers to the function main by name, then the issue could be that main needs a prototype - even in the same file as the function itself, so the linker can pick this up and put main in its list of functions.
So, try inserting:
void main (void);
immediately above the main function.