How can I optimize my C / x86 code?

How can I optimize my C / x86 code? - optimization

int lcm_old(int a, int b) {
int n;
for(n=1;;n++)
if(n%a == 0 && n%b == 0)
return n;
}
int lcm(int a,int b) {
int n = 0;
__asm {
lstart:
inc n;
mov eax, n;
mov edx, 0;
idiv a;
mov eax, 0;
cmp eax, edx;
jne lstart;
mov eax, n;
mov edx, 0;
idiv b;
mov eax, 0;
cmp eax, edx;
jnz lstart;
}
return n;
}
I'm trying to beat/match the code for the top function with my own function (bottom). Have you got any ideas how I can optimize my routine?
PS. This is just for fun.

I would optimize by using a different algorithm. Searching linearly like you are doing is super-slow. It's a fact that the least common mulitple of two natural numbers is the quotient of their product divided by their greatest common divisor. You can compute the greatest common divisor quickly using the Euclidean algorithm.
Thus:
int lcm(int a, int b) {
int p = a * b;
return p / gcd(a, b);
}
where you need to implement gcd(int, int). As the average number of steps in the Euclidean algorithm is O(log n), we beat the naive linear search hands down.
There are other approaches to this problem. If you had an algorithm that could quickly factor integers (say a quantum computer) then you can also solve this problem like so. If you write each of a and b into its canonical prime factorization
a = p_a0^e_a0 * p_a1^e_a1 * ... * p_am^e_am
b = p_b0^e_b0 * p_b1^e_b1 * ... * p_bn^e_bn
then the least common multiple of a and b is the obtained by taking for each prime factor appearing in at least one of the factorizations of a and b, taking it with the maximum exponent that it appears in the factorization of a or b. For example:
28 = 2^2 * 7
312 = 2^3 * 39
so that
lcm(28, 312) = 2^3 * 7 * 39 = 2184
All of this is to point out that naive approaches are admirable in their simplicity but you can spend all day optimizing every last nanosecond out of them and still not beat a superior algorithm.

I'm going to assume you want to keep the same algorithm. This should at least be a slightly more efficient implementation of it. The main difference is that the code in the loop only uses registers, not memory.
int lcm(int a,int b) {
__asm {
xor ecx, ecx
mov esi, a
mov edi, b
lstart:
inc ecx
mov eax, ecx
xor edx, edx
idiv esi
test edx, edx
jne lstart
mov eax, ecx;
idiv edi
test edx, edx
jnz lstart
mov eax, ecx
leave
ret
}
}
As Jason pointed out, however, this really isn't a very efficient algorithm -- multiplying, finding the GCD, and dividing will normally be faster (unless a and b are quite small).
Edit: there is another algorithm that's almost simpler to understand, that should also be a lot faster (than the original -- not than multiplying, then dividing by GCD). Instead of generating consecutive numbers until you find one that divides both a and b, generate consecutive multiples of one (preferably the larger) until you find one that divides evenly by the other:
int lcm2(int a, int b) {
__asm {
xor ecx, ecx
mov esi, a
mov edi, b
lstart:
add ecx, esi
mov eax, ecx
xor edx, edx
idiv edi
test edx, edx
jnz lstart
mov eax, ecx
leave
ret
}
}
This remains dead simple to understand, but should give a considerable improvement over the original.

Related

GetLocalTime function using win kernel base programming

I'm doing an activity "Create a program that will alert "Good Morning" at 7:00 am" in MASM32 using kernel based programming which mean I need to use peb structure. I already got the base address of the kernel32.dll which I will use in calling API. I also got the address of the API Im going to use which is GetLocalTime.
My problem is on the last part where I will call the GetLocalTime function for getting the time. Now I'm lost on this part, I can't call the SYSTEMTIME which is important in GetlocalTime function. Any advise? thank you!
.386
.model flat, stdcall
OPTION CASEMAP:NONE
.data
stime dd ?
wHour dw ?
.code
Main:
;getting the kernel32 base address
xor ecx, ecx
ASSUME FS:NOTHING
mov eax, fs:[ecx + 30H] ; EAX = PEB
ASSUME FS:ERROR
mov eax, [eax + 0CH] ; EAX = PEB->Ldrs
mov esi, [eax + 14H] ; ESI = PEB->Ldr.InMemOrder
lodsd ; EAX = Second module
xchg eax, esi ; EAX = ESI, ESI = EAX
lodsd ; EAX = Third(kernel32)
mov ebx, [eax + 10H] ; EBX = Base address
;finding the export table of kernel32
mov edx, [ebx + 3CH] ; EDX = DOS->e_lfanew
add edx, ebx ; EDX = PE Header
mov edx, [edx + 78H] ; EDX = Offset export table
add edx, ebx ; EDX = Export table
mov esi, [edx + 20H] ; ESI = Offset namestable
add esi, ebx ; ESI = Names table
xor ecx, ecx ; EXC = 0
;start of getlocaltime function;
;Find GetLocalTime function name
Get_Time:
inc ecx ; Increment the ordinal
lodsd ; Get name offset
add eax, ebx ; Get function name
cmp dword ptr[eax], 4C746547H ; GetL
jnz Get_Time
cmp dword ptr[eax + 4H], 6C61636FH ; ocal
jnz Get_Time
cmp dword ptr[eax + 8H], 656D6954H ; Time
jnz Get_Time
;Find the address of GetLocalTime function
mov esi, [edx + 24H] ; ESI = Offset ordinals
add esi, ebx ; ESI = Ordinals table
mov cx, [esi + ecx * 2] ; Number of function
dec ecx
mov esi, [edx + 1CH] ; Offset address table
add esi, ebx ; ESI = Address table
mov edx, [esi + ecx * 4] ; EDX = Pointer(offset)
add edx, ebx ; EDX = GetLocalTime
;Call GetLocalTime
add esp, 14H ; Cleanup stack
push offset stime ; offset of the address of hello is pushed in the stack memory
call edx ; getlocaltime nasa eax babagsak
;comparing
push wHour
cmp 00402001H.wHour, 37H
end Main

SYSTEMTIME is just a block of 8 words (16-bit integers); you can allocate 8 words, and refer to the Win32 for the order of the fields.

Assembly variables

I am new to assembly and am confused how some variables magically obtain values from nowhere, like in this code I have (program shifts by one ASCII code all entered symbols)
.model small
.stack 100h
.data
Enterr db 10, 13, "$"
buffer db 255
number db ?
symb db 255 dup (?)
.code
START:
MOV ax, #data
MOV ds, ax
MOV ah, 10
MOV dx, offset buffer
INT 21h
MOV ah, 9
MOV dx, offset ENTERR
INT 21h
MOV bx, offset symb
MOV cl, number
MOV ch, 0
CMP cx, 0
JE terminate
cycle:
INC byte ptr [bx]
INC bx
LOOP cycle
MOV byte ptr [bx], '$'
MOV ah, 9
MOV dx, offset symb
INT 21h
terminate:
MOV ah, 4Ch
MOV al, 0
INT 21h
END START
Just before the loop, cx has the number of symbols entered, and cycle begins to take pace from there on. This value of cx was obtained when variable "number" is copied to cl. How did variable "number" obtained such a value? Replacing
MOV cl, number
with
MOV cl, [number]
Does not effect the program. Why is that? Does every variable defined by
variable db ?
has the same value, i.e. number of symbols entered?(I am using TASM)

CPU not calling IRQ0?

I am writting an OS and trying to use the PIT. I have a handler written and wrote an ISR entry for the IRQ0 (Interrupt 32). The handler is not being called at all. I am pretty sure I am not putting the ISR entry in right. Any suggestions? Here is my ASM code
mov dword EAX, irq_common_stub
mov byte [_NATIVE_IDT_Contents + 0x100], AL
mov byte [_NATIVE_IDT_Contents + 0x101], AH
mov byte [_NATIVE_IDT_Contents + 0x102], 0x8
mov byte [_NATIVE_IDT_Contents + 0x105], 0x8E
shr dword EAX, 0x10
mov byte [_NATIVE_IDT_Contents + 0x106], AL
mov byte [_NATIVE_IDT_Contents + 0x107], AH
My code to init the PIT is
public static void PIT_Init(uint frequency)
{
uint divisor = 1193180 / frequency;
GruntyOS.IO.Ports.Outb(0x43, 0x36);
byte l = (byte)(divisor & 0xFF);
byte h = (byte)((divisor >> 8) & 0xFF);
GruntyOS.IO.Ports.Outb(0x40, l);
GruntyOS.IO.Ports.Outb(0x40, h);
}
The handler is
public static void HandlePIT()
{
GruntyOS.IO.Ports.Outb(0xA0, 0x20);
GruntyOS.IO.Ports.Outb(0x20, 0x20);
print("Tick: " + Tick.ToString());
Tick++;
}
Which is called from
irq_common_stub:
pusha
mov ax, ds
push eax
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
call System_Void__GruntyOS_Entry_HandlePIT__
pop ebx
mov ds, bx
mov es, bx
mov fs, bx
mov gs, bx
popa
add esp, 8
sti
iret

Maybe this might help. Its a simple kernel
that is capable of handling IRQs and Exceptions.
http://www.osdever.net/bkerndev/Docs/irqs.htm
http://www.ni.com/white-paper/2874/en

win32 very low level assembly - application startup issue

I am busy programming a win32 program in assembly with a form and buttons... The problem is windows modify my variables in ram. The place were a store my hInstance and hwnd variables. I have found a workaround, but it is not an elegant solution. I would like to know why windows modify my variables and also were can I find documentation which describe the start up of an application.
MyWndProc:
push EBP
mov EBP, ESP
mov eax, [EBP + 12]
cmp eax, WM_DESTROY
jne MyWndProc_j2
push 0
call PostQuitMessage
jmp MyWndProc_j1
MyWndProc_j2:
cmp eax, WM_CREATE
jne MyWndProc_j1
mov eax, [EBP+8]
push eax
call CreateControls
add esp, 4
MyWndProc_j1:
mov eax, [EBP + 20]
push eax
mov eax, [EBP + 16]
push eax
mov eax, [EBP + 12]
push eax
mov eax, [EBP + 8]
push eax
call DefWindowProcA
pop EBP
ret
segment .data
Wtitle db 'My Window',0
ClassName db 'myWindowClass',0
editClass db 'EDIT',0
buttonName db 'OK',0
buttonClass db 'BUTTON',0
textName db 'My textbox',0
textClass db 'edit',0
formEdit db 'This is just a mem test', 0
windowsVar1 dd 0
windowsVar2 dd 0
windowsVar3 dd 0
windowsVar4 dd 0
windowsVar5 dd 0
windowsVar6 dd 0
windowsVar7 dd 0
windowsVar8 dd 0
aMsg dd 0
hwnd dd 0
hwnd2 dd 0
hwnd3 dd 0
hInstance dd 0
old_proc dd 0
nCmdShow dd 0
hfDefault dd 0
MyWndProc is the callback function from windows. At the 27'th call from windows, it modify the last 7 variables. If I switch the position of the last 8 variables with windowsVarx, then it still modifies hwnd, hwnd2 ... without modifying windowsVarx. Where x is from 1 to 8
CreateControls:
push EBP
mov EBP, ESP
push 0
push 0
call GetModuleHandleA
push eax
push IDC_MAIN_BUTTON
mov eax, [EBP+8] ;hwnd
push eax
push 24
push 100
push 220
push 50
mov eax, WS_CHILD
or eax, BS_DEFPUSHBUTTON
or eax, WS_TABSTOP
or eax, WS_VISIBLE
push eax
push buttonName
push buttonClass
push 0
call CreateWindowExA
mov [hwnd2], eax
push DEFAULT_GUI_FONT
call GetStockObject
mov [hfDefault], eax
push 0
mov eax, [hfDefault]
push eax
push WM_SETFONT
mov eax, [hwnd2]
push eax
call SendMessageA
push 0
push 0
call GetModuleHandleA
push eax
push IDC_MAIN_EDIT
mov eax, [EBP+8] ;hwnd
push eax
push 100
push 200
push 100
push 50
mov eax, WS_CHILD
or eax, ES_MULTILINE
or eax, ES_AUTOVSCROLL
or eax, ES_AUTOHSCROLL
or eax, WS_VISIBLE
push eax
push 0
push editClass
push WS_EX_CLIENTEDGE
call CreateWindowExA
mov [hwnd3], eax
push 0
mov eax, [hfDefault]
push eax
push WM_SETFONT
mov eax, [hwnd3]
push eax
call SendMessageA
push Wtitle
push 0
push WM_SETTEXT
mov eax, [hwnd3]
push eax
call SendMessageA
pop EBP
ret
The following function is the message loop, which collect and dispatch.
MyMessageLoop:
push 0
push 0
push 0
push aMsg
call GetMessageA
cmp eax, 0
je MyMessageLoop_j1
push aMsg
call TranslateMessage
push aMsg
call DispatchMessageA
jmp MyMessageLoop
MyMessageLoop_j1:
ret

Your problem explanation isn't really clear. But you should remember that by calling a system call you may indeed end up with different values in your registers. I don't know about Windows, but on amd64 Linux, the kernel (which executes the system call) is required only to preserve the values of the registers r12 and up. The values in all the other registers may be changed and therefore will most probably not be the same after returning from the system call.
In order to remedy that, simply store the variables on your function's stack before calling the system.

It might appear that Windows is modifying your data, but as others have pointed out, it's more likely a bug or some other corruption in your code is causing problems.
It's almost impossible for people to determine the runtime behaviour of your entire program from snippets and programming in assembly almost always causes problems rarely seen when higher-level languages are used.
The best advice is to use a debugger and either step through the code or set a data breakpoint on the variables being modified. Data breakpoints are designed to stop your program on the instruction that performs the data modification.
You could also look at what the actual values of the data that are overwriting your variables - it might give you some clue as to where or why the memory is being overwritten.
The reason for people's sarcasm is that in your second sentence you are assuming Windows is to blame for your program not working. In many situations, blaming the Operating System is a good sign that the developer does not understand something or is reluctant to accept that they have made a mistake. The end result is almost always someone else pointing out the mistake.

Where do I start? Your code formatting is very hard to read. Hope a Uni is not teaching you it. Anyways, your WindowProc was VERY wrong. After EVERY message you handle, you DO NOT call DefWindowProc, most of the messages you just return 0 in eax.
After your call to CreateControls, you don't need add esp, 4 as long as at the end of CreateControls you do ret 4 or ret 4 * NumOfParamsPassed.
I fixed your WindowProc and the window now shows. I also removed the many unneeded movs.
MyWndProc:
push ebp
mov ebp, esp
mov eax, dword ptr[ebp + 3 * 4] ; same as [ebp + 12]
cmp eax, WM_CREATE
je _CREATE
cmp eax, WM_CLOSE
je _CLOSE
PassThrough:
push dword ptr[ebp + 5 * 4]; same as [ebp + 20]
push dword ptr[ebp + 4 * 4]; same as [ebp + 16]
push dword ptr[ebp + 3 * 4]; same as [ebp + 12]
push dword ptr[ebp + 2 * 4]; same as [ebp + 8]
call DefWindowProc
jmp _DONE
_CLOSE:
push 0
call PostQuitMessage
jmp _RET0
_CREATE:
push dword ptr[ebp + 2 * 4]
call CreateControls
;add esp, 4
_RET0:
xor eax, eax
_DONE:
pop ebp
ret 4 * 4 ; <----- you were missing this!!!!!

Why are you adding 4 to esp after a call to CreateControls? You are pushing 1 dword onto the stack, does CreateControls not cleanup the stack itself? You wrote that function? If it does adjust the stack with something like ret 4*1 then that add esp, 4 is screwing everything up. That snippet of code does nothing for us, plus there are so many uneeded movs and jmps there
#David Heffernan thanks I just started writing all my programs in Assembly over 10 years ago and well aware of how and what registers need to be preserved and when you DO NOT need to save them, just because a complier saves all registers in the prologue, doesn't mean it's correct.

Pointers and Loops

This one has been bothering me for a while now: Is there a difference (e.g. memory-wise) between this
Pointer *somePointer;
for (...)
{
somePointer = something;
// do stuff with somePointer
}
and this
for (...)
{
Pointer *somePointer = something;
// do stuff with somePointer
}

If you want to use the pointer when you're done with the loop, you need to do the first one.
Pointer *somePointer;
Pointer *somePointer2;
for(loopA)
{
if(meetsSomeCriteria(somePointer)) break;
}
for(loopB)
{
if(meetsSomeCriteria(somePointer2)) break;
}
/* do something with the two pointers */
someFunc(somePointer,somePointer2);

Well, first, in you second example somePointer will be valid only inside the loop (it's scope), so if you want to use it outside you have to do like in snippet #1.
If we turn on assembly we can see that the second snipped needs only 2 more instructions to execute:
Snippet 1:
for(c = 0; c <= 10; c++)
(*p1)++;
0x080483c1 <+13>: lea -0x8(%ebp),%eax # eax = &g
0x080483c4 <+16>: mov %eax,-0xc(%ebp) # p1 = g
0x080483c7 <+19>: movl $0x0,-0x4(%ebp) # c = 0
0x080483ce <+26>: jmp 0x80483e1 <main+45> # dive in the loop
0x080483d0 <+28>: mov -0xc(%ebp),%eax # eax = p1
0x080483d3 <+31>: mov (%eax),%eax # eax = *p1
0x080483d5 <+33>: lea 0x1(%eax),%edx # edx = eax + 1
0x080483d8 <+36>: mov -0xc(%ebp),%eax # eax = p1
0x080483db <+39>: mov %edx,(%eax) # *p1 = edx
0x080483dd <+41>: addl $0x1,-0x4(%ebp) # c++
0x080483e1 <+45>: cmpl $0xa,-0x4(%ebp) # re-loop if needed
0x080483e5 <+49>: jle 0x80483d0 <main+28>
Snippet 2:
for(c = 0; c <= 10; c++) {
int *p2 = &g;
(*p2)--;
}
0x080483f0 <+60>: lea -0x8(%ebp),%eax # eax = &g
0x080483f3 <+63>: mov %eax,-0x10(%ebp) # p2 = eax
0x080483f6 <+66>: mov -0x10(%ebp),%eax # eax = p2
0x080483f9 <+69>: mov (%eax),%eax # eax = *p2
0x080483fb <+71>: lea -0x1(%eax),%edx # edx = eax - 1
0x080483fe <+74>: mov -0x10(%ebp),%eax # eax = p2
0x08048401 <+77>: mov %edx,(%eax) # *p2 = edx
0x08048403 <+79>: addl $0x1,-0x4(%ebp) # increment c
0x08048407 <+83>: cmpl $0xa,-0x4(%ebp) # loop if needed
0x0804840b <+87>: jle 0x80483f0 <main+60>
Ok, the difference is in the first two instructions of snippet #2 which are executed at every loop, while in the first snippet they're executed just before entering the loop.
Hope I was clear. ;)

Well, with the first version you only have to release once, after the loop. With the second version you can't use the pointer from outside the loop, so you need to release inside the loop. Memory-wise it shouldn't matter that much, but you do have allocation overhead in your second example I think.

check out a similar answer on stackoverflow here with some good answers. However this is probably compiler/language independent...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I optimize my C / x86 code? - optimization

Related

GetLocalTime function using win kernel base programming

Assembly variables

CPU not calling IRQ0?

win32 very low level assembly - application startup issue

Pointers and Loops

Categories

Resources