Small assembly code sequence optimization (intel x86)

Small assembly code sequence optimization (intel x86) - optimization

I am doing some exercises in assembly language and I found a question about optimization which I can't figure out. Can anyone help me with them
So the question is to optimize the following assembly code:
----------------------------Example1-------------------------
mov dx, 0 ---> this one I know-> xor dx,dx
----------------------------Example2------------------------
cmp ax, 0
je label
----------------------------Example3-------------------------
mov ax, x
cwd
mov si, 16
idiv si
----> Most I can think of in this example is to subs last 2 lines by idiv 16, but I am not sure
----------------------------Example4-------------------------
mov ax, x
mov bx, 7
mul bx
mov t, ax
----------------------------Example5---------------------------
mov si, offset array1
mov di, offset array2
; for i = 0; i < n; ++i
do:
mov bx, [si]
mov [di], bx
add si, 2
add di, 2
loop do
endforloop

For example 2, you should look at the and or test opcodes. Similar to example 1, they allow you to remove the need for a constant.
For example 4, remember that x * 7 is the same as x * (8 - 1) or, expanding that, x * 8 - x. Multiplying by eight can be done with a shift instruction.
For example 5, you'd think Intel would have provided a much simpler way to transfer from SI to DI, since that is the whole reason for their existence. Maybe something like a REPetitive MOVe String Word :-)

For example three, division by a power of two can be implemented as a right shift.
Note that in example 5, the current code fails to initialize CX as needed (and in the optimized version, you'd definitely want to do that too).

Related

MSP430 - Bad constant when using ORG 01100h

#include "msp430g2553.h" // #include <msp430.h> - can be used as well
;-------------------------------------------------------------------------------
ORG 0x1100h
ID_F DB 0,3,9,1,0,4,9,0
ID_S DB 0,5,1,8,9,3,9,1
Size DW 16
;-------------------------------------------------------------------------------
MODULE PortLeds
PUBLIC First_SW, Second_SW, Third_SW;, Else_SW
;EXTERN ;Delay_Sec
RSEG CODE
;-------------------------------------------------------------------------------
;-------------------------------------------------------------------------------
Third_SW BIT.B #0x04, &P1IN
CLR R10 ; Index register
THIRD_LOOP MOV.B ID_F(R10), R15 ; Counter from 0 to ff-255
MOV.B R15, &P2OUT ; Save value
MOV.B #0x06, R13
Wait3 mov.w #0xFFFF,R14 ; Delay to R14
L3 dec.w R14 ; Decrement R14
jnz L3 ; Delay over?
DEC.B R13
JNZ Wait3
INC R10
CMP Size, R10
JL THIRD_LOOP
For some reason, when I am Third_SW, I reach this error:
pre_lab3_function.s43
Error[6]: Bad constant C:\Users\blala\OneDrive\Desktop\lab_3_new\pre_lab3_function.s43 3
Error while running Assembler
It appears on the org0110h, why is it happening? I am stuck for one day already because of it.
Any other code ( as you see its Third ) works pretty good, only Third which I need to use ID_F ID_S and Size, are the problem.

load 16-bit data from table in 8051 without modifying DPTR

I'm trying to make a simple routine for the 8051 processor that allows me to load any 16-bit number of my choice from a table stored in code memory without modifying any part of DPTR and without requiring stack space. So push and pop cannot be used. Also, I want to use the least amount of processing time possible.
So far I came up with the following code that sort-of allows me to load a value from a table of 4 16-bit values to accumulator and R2 where R2 has the high byte and A has the low byte.
Is this the most efficient way to do this? If so, how do I calculate how much to add to the accumulator before each movc instruction in this example?
mov A,#2h ;want 2nd entry from table
acall getpointer ;run function below
;here R2:A should form correct 16-bit pointer ( = 0456h)
END
getpointer:
rl A ;multiply A value * 2
mov R2,A ;copy to R2
inc R2 ;R2=A+1
;add something to A but what?
movc A,#A+PC ;Load first byte
xch A,R2 ;put result in R2 and let A=original A+1
;add something to A again but what?
movc A,#A+PC ;load second byte
ret ;keep result in A and exit
mytable:
dw 0123h
dw 0456h
dw 0789h
dw 0000h

Try this:
getpointer:
rl a
mov r2, a
add a, #5 ; skip all insts after 1st movc and 1 byte
movc a, #a+pc
xch a, r2 ; 1-byte
inc a ; 1-byte ; skip all instrs after 2nd movc
movc a, #a+pc ; 1-byte
ret ; 1-byte
mytable:
...
I hope I got it right. Note that movc a, #a+pc first increments pc, then adds a to this incremented value. This is why I added instruction lengths in the comments, to show how much code there is.
Note that index of 2 corresponds to 0789h, not 0456h.
Also note that you may need to swap a and r2 and the cheapest may be to swap the data within the table.

If an embedded system coded in C is 8 or 16-bit, how will it manipulate 32-bit data types like int?

I think I'm thinking about this the wrong way, but I'm wondering how an embedded system with less than 32-bits can use 32-bit data values. I'm a beginner programmer so go easy on me :)

base 10
0100 <- carry in/out
5432
+1177
======
6609
never brought up in class but we can now extend that to two operations
100
32
+77
======
09
01
54
+11
======
66
and come up with the 6609 result because we understand that it is column based and each column treated separately.
base 2
1111
+0011
=====
11110
1111
+0011
=====
10010
110
11
+11
=====
10
111
11
+00
=====
100
result 10010
you can break your operations up into however many bits you want 8, 16, 13, 97 whatever. it is column based (for addition) and it just works. division you should be able to figure out, multiplication is just shifting and adding and can turn that into multiple operations as well
n bits * n bits = 2*n bits so if you have an 8 bit * 8 bit = 16 bit multiply you can use that on an 8 bit system otherwise you have to limit to 4 bits * 4 bits = 8 bits and work with that (or if no multiply then just do the shift and add).
base 2
abcd
* 1101
========
abcd
0000
abcd
+abcd
=========
which you can break down into a shifting and adding problem, can do N bits with a 4 or 8 or M bit processor/registers/alu
Or look at it another way, grade school algebra
(a+b)*(c+d) = ac + bc + ad + bd
mnop * tuvw = ((mn*0x100)+(op)) * ((tu*0x100)+(vw)) = (a+b)*(c+d)
and you should find that you can combine the with 0x100 terms and without,
do those separately from the without putting together parts of the answer using an 8 bit alu (or 4 bits of the 8 bit as needed).
shifting should be obvious just move the bits over to the next byte or (half)word or whatever.
and bitwise operations (xor, and, or) are bitwise so dont need anything special just keep the columns lined up.
EDIT
Or you could just try it
unsigned long fun1 ( unsigned long a, unsigned long b )
{
return(a+b);
}
00000000 <_fun1>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 1d40 0004 mov 4(r5), r0
8: 1d41 0006 mov 6(r5), r1
c: 6d40 0008 add 10(r5), r0
10: 6d41 000a add 12(r5), r1
14: 0b40 adc r0
16: 1585 mov (sp)+, r5
18: 0087 rts pc
00000000 <fun1>:
0: 0e 5c add r12, r14
2: 0f 6d addc r13, r15
4: 30 41 ret
00000000 <fun1>:
0: 62 0f add r22, r18
2: 73 1f adc r23, r19
4: 84 1f adc r24, r20
6: 95 1f adc r25, r21
8: 08 95 ret
bonus points if you can figure out these instruction sets.
unsigned long fun2 ( unsigned long a, unsigned long b )
{
return(a*b);
}
00000000 <_fun2>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 10e6 mov r3, -(sp)
6: 1d41 0006 mov 6(r5), r1
a: 1d40 000a mov 12(r5), r0
e: 1043 mov r1, r3
10: 00a1 clc
12: 0c03 ror r3
14: 74d7 fff2 ash $-16, r3
18: 6d43 0004 add 4(r5), r3
1c: 70c0 mul r0, r3
1e: 00a1 clc
20: 0c00 ror r0
22: 7417 fff2 ash $-16, r0
26: 6d40 0008 add 10(r5), r0
2a: 7040 mul r0, r1
2c: 10c0 mov r3, r0
2e: 6040 add r1, r0
30: 0a01 clr r1
32: 1583 mov (sp)+, r3
34: 1585 mov (sp)+, r5
36: 0087 rts pc

An 8 bit system can perform 8 bit operations in a single instruction and single memory access, on such an 8 bit system, 16 and 32 bit operations require additional data accesses and additional instructions.
For example, typical architectures place arithmetic results in register (often an accumulator but some architectures are more_orthogonal_ and can use any register for results), and arithmetic overflow results in a carry flag being set in a status register. In operations larger that the native architecture, the code can inspect the carry flag in order to take the appropriate action in subsequent instructions.
So say for an 8 bit system you add 1 to 255, the result in the 8 bit accumulator will be zero, with the carry flag set; the next instruction can then add one to the upper byte of a 16 bit value in response to the carry flag. This can be made to ripple through to any number of bytes or words, so that a system can be made to process operations of arbitrary bit length above that of the underlying architecture just not in a single instruction operation.

How I can optimize this code more? [Assembly 8086 letters pyramid]

I wonder how I could optimize this code a little bit more. Now he has 467k and 59 lines.
Data segment:
code_char db 'A'
counter_space db 39
counter_char dw 1
counter_rows dw 25
Program segment:
rows:
mov cl, counter_space ;here I write space
mov ah,02h
mov dl,''
space:
int 21h
loop space
mov cx, counter_char ;here I write letters
mov ah,02h
mov dl,code_char
letters:
int 21h
loop letters
mov ah,02h ;here I go to another line(enter)
mov dl,0ah
int 21h
INC code_char ;here I change the value of variable's
DEC counter_space
ADD counter_char,2
DEC counter_rows
mov cx,counter_rows ;here I count the rows to 25
loop rows
mov ah,01h ;here I w8 to any key
int 21h
mov ah,4ch
mov al,0
int 21h
If you have any suggestions please comment.
I just started to learn Assembly.

You can make use of the fact that all other variables can be calculated from the counter_rows variable, so you really only need one variable:
code_char = 'A' + 25 - counter_rows
counter_space = counter_rows + 14
counter_char = 51 - counter_rows * 2
As counter_rows is your outer loop counter, you can just keep it in a register all the time instead of allocating memory for it. That makes it possible to run the program without any memory references at all.
There are some other small optimisations that can be done. You don't need to set the ah register to 02h other than for the first call. When setting ah to 01h for the keypress call, you can just decrement the register as you know that it was 02h before. You can set ax instead of setting ah and al separately.
If I counted correctly, this should take the actual code and data bytes down from 59 to 41:
mov bx, 25 ;counter_rows
rows:
;here I write space
mov cx, bx ; counter_space = counter_rows + 14
add cl, 14
mov ah, 02h
mov dl, 32 ;space
space:
int 21h
loop space
;here I write letters
mov cl, 51 ;counter_char = 51 - counter_rows * 2
sub cl, bl
sub cl, bl
;mov ah, 02h - already set
mov dl, 65 + 25 ;code_char = 'A' + 25 - counter_rows
sub dl, bl
letters:
int 21h
loop letters
;here I go to another line(enter)
;mov ah, 02h - already set
mov dl, 0ah
int 21h
dec bx
jnz rows
;here I wait for any key
dec ah ;02h - 1 = 01h
int 21h
mov ax,4c00h ;set ah and al in one go
int 21h

How to change the value of a variable in assembly

I am working on some code for my port of MikeOS. It is written in NASM x86 16 bit assembly. I am trying to change a variable that I made to have a different value. It compiles with no errors, but when I call os_print_string, it prints some wierd ASCII characters. Here is the code:
BITS 16
ORG 32768
%INCLUDE "mikedev.inc"
start:
mov si, test2 ; give si test 2 value
mov [test1], si ; give test 1 si's value
mov si, test1 ;now give test1's value to si
call os_print_string ; and print
test2 db "adsfasdfasdf", 0
test1 db "asdf", 0
This code is redundant, I know. I just need a n explanation on how to change a variable's value. Thaks in advance!
-Ryan

Another good old question, here is the answer you waited for 6.83 years :)
BITS 16
ORG 32768
%INCLUDE "mikedev.inc"
start:
mov si, test2
mov di, test1
.loop:
lodsb
or al, al
je .done
stosb
jmp .loop
.done:
mov si, test1
call os_print_string
test2 db "adsfasdfasdf", 0
test1 db "asdf ", 0
Make sure the char arrays have the same length or this will break ^^
But i am sure you know that by now ^^

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Small assembly code sequence optimization (intel x86) - optimization

For example three, division by a power of two can be implemented as a right shift. Note that in example 5, the current code fails to initialize CX as needed (and in the optimized version, you'd definitely want to do that too).

Related

MSP430 - Bad constant when using ORG 01100h

load 16-bit data from table in 8051 without modifying DPTR

If an embedded system coded in C is 8 or 16-bit, how will it manipulate 32-bit data types like int?

How I can optimize this code more? [Assembly 8086 letters pyramid]

How to change the value of a variable in assembly

Categories

Resources