I'm trying to make a simple routine for the 8051 processor that allows me to load any 16-bit number of my choice from a table stored in code memory without modifying any part of DPTR and without requiring stack space. So push and pop cannot be used. Also, I want to use the least amount of processing time possible.
So far I came up with the following code that sort-of allows me to load a value from a table of 4 16-bit values to accumulator and R2 where R2 has the high byte and A has the low byte.
Is this the most efficient way to do this? If so, how do I calculate how much to add to the accumulator before each movc instruction in this example?
mov A,#2h ;want 2nd entry from table
acall getpointer ;run function below
;here R2:A should form correct 16-bit pointer ( = 0456h)
END
getpointer:
rl A ;multiply A value * 2
mov R2,A ;copy to R2
inc R2 ;R2=A+1
;add something to A but what?
movc A,#A+PC ;Load first byte
xch A,R2 ;put result in R2 and let A=original A+1
;add something to A again but what?
movc A,#A+PC ;load second byte
ret ;keep result in A and exit
mytable:
dw 0123h
dw 0456h
dw 0789h
dw 0000h
Try this:
getpointer:
rl a
mov r2, a
add a, #5 ; skip all insts after 1st movc and 1 byte
movc a, #a+pc
xch a, r2 ; 1-byte
inc a ; 1-byte ; skip all instrs after 2nd movc
movc a, #a+pc ; 1-byte
ret ; 1-byte
mytable:
...
I hope I got it right. Note that movc a, #a+pc first increments pc, then adds a to this incremented value. This is why I added instruction lengths in the comments, to show how much code there is.
Note that index of 2 corresponds to 0789h, not 0456h.
Also note that you may need to swap a and r2 and the cheapest may be to swap the data within the table.
Related
#include "msp430g2553.h" // #include <msp430.h> - can be used as well
;-------------------------------------------------------------------------------
ORG 0x1100h
ID_F DB 0,3,9,1,0,4,9,0
ID_S DB 0,5,1,8,9,3,9,1
Size DW 16
;-------------------------------------------------------------------------------
MODULE PortLeds
PUBLIC First_SW, Second_SW, Third_SW;, Else_SW
;EXTERN ;Delay_Sec
RSEG CODE
;-------------------------------------------------------------------------------
;-------------------------------------------------------------------------------
Third_SW BIT.B #0x04, &P1IN
CLR R10 ; Index register
THIRD_LOOP MOV.B ID_F(R10), R15 ; Counter from 0 to ff-255
MOV.B R15, &P2OUT ; Save value
MOV.B #0x06, R13
Wait3 mov.w #0xFFFF,R14 ; Delay to R14
L3 dec.w R14 ; Decrement R14
jnz L3 ; Delay over?
DEC.B R13
JNZ Wait3
INC R10
CMP Size, R10
JL THIRD_LOOP
For some reason, when I am Third_SW, I reach this error:
pre_lab3_function.s43
Error[6]: Bad constant C:\Users\blala\OneDrive\Desktop\lab_3_new\pre_lab3_function.s43 3
Error while running Assembler
It appears on the org0110h, why is it happening? I am stuck for one day already because of it.
Any other code ( as you see its Third ) works pretty good, only Third which I need to use ID_F ID_S and Size, are the problem.
On an 8085 processor, an efficient algorithm for dividing a BCD by 2 comes in handy when converting a BCD to binary representation. You might think of recursive subtraction or multiplying by 0.5, however these algorithms require lengthy arithmetics.
Therefore, I would like to share with you the following code (in 8085 assembler) that does it more efficiently. The code has been thoroughly tested on GNUSim8085 and ASM80 emulators. If this code was helpful to you, please share your experience with me.
Before running the code, put the BCD in register A. Set the carry flag if there is a remainder to be received from a more significant byte (worth 50). After execution, register A will contain the result. The carry flag is used to pass the remainder, if any, to the next less significant byte.
The algorithm uses DAA instruction after manipulating C and AC flags in a very special way thus taking into account that any remainder passed down to the next nibble (i.e. half-octet) is worth 5 instead of 8.
;Division of BCD by 2 on an 8085 processor
;Set initial values.
;Register A contains a two-digit BCD. Carry flag contains remainder.
stc
cmc
mvi a, 85H
;Do modified decimal adjust before division.
cmc
cma
rar
adc a
cma
daa
cmc
;Divide by 2.
rar
;Save quotient and remainder to registers B and C.
mov b, a
mvi a, 00H
rar
mov c, a
;Continue working on decimal adjust.
mov a, b
sui 33H
mov b, a
mov a, c
ral
mov a, b
hlt
Suppose a two digit BCD number is represented as:D7D6D5D4 D3D2D1D0
For a division by 2, for binary (or hex), simply right shift the number by one place. If there is an overflow then remainder is 1, and 0 othwerwise. The same things applies to two digit (8-bit) BCD numbers when D4 is 0, i.e. there is no effective bit shift from higher order four bits. Now if D4 is 1 (before the shift), then shifting will introduce a 8 (1000) in the lower order four bits, which apparantly jeopardizes this process. Observe that in BCD the bit shift should introduce 10/2 = 5 not 16/2 = 8. Thus we can simply adjust by subtrating 8-5 = 3 from the lower order four bits, i.e. 03H from the entire number. The following code summarizes this strategy. We assume accumulator holds the data, and after the division the result is kept in the accumulator and remainder is kept in the register B.
MVI B,00H ; remainder = 0
STC
CMC ; clear the carry flag
RAR ; right shift the data
JNC SKIP
INR B ; CY=1 so, remainder = 1
SKIP: MOV D,A ; backup
ANI 08H ; if get D3 after the shift (or D4 before the shift)
MOV A,D ; get the data from backup
JZ FIN ; if D4 before the shift was 0
SUI 03H ; adjustment for the shift
FIN: HLT ; A has the result, B has the remainder
I have the following data:
A = [a0 a1 a2 a3 a4 a5 .... a24]
B = [b0 b1 b2 b3 b4 b5 .... b24]
which I then want to multiply as follows:
C = A * B' = [a0b0 a1b1 a2b2 ... a24b24]
This clearly involves 25 multiplies.
However, in my scenario, only 5 new values are shifted into A per "loop iteration" (and 5 old values are shifted out of A). Is there any fast way to exploit the fact that data is shifting through A rather than being completely new? Ideally I want to minimize the number of multiplication operations (at a cost of perhaps more additions/subtractions/accumulations). I initially thought a systolic array might help, but it doesn't (I think!?)
Update 1: Note B is fixed for long periods, but can be reprogrammed.
Update 2: the shifting of A is like the following: a[24] <= a[19], a[23] <= a[18]... a[1] <= new01, a[0] <= new00. And so on so forth each clock cycle
Many thanks!
Is there any fast way to exploit the fact that data is shifting through A rather than being completely new?
Even though all you're doing is the shifting and adding new elements to A, the products in C will, in general, all be different since one of the operands will generally change after each iteration. If you have additional information about the way the elements of A or B are structured, you could potentially use that structure to reduce the number of multiplications. Barring any such structural considerations, you will have to compute all 25 products each loop.
Ideally I want to minimize the number of multiplication operations (at a cost of perhaps more additions/subtractions/accumulations).
In theory, you can reduce the number of multiplications to 0 by shifting and adding the array elements to simulate multiplication. In practice, this will be slower than a hardware multiplication so you're better off just using any available hardware-based multiplication unless there's some additional, relevant constraint you haven't mentioned.
on the very first 5 data set you could be saving upto 50 multiplications. but after that its a flat road of multiplications. since for every set after the first 5 set you need to multiply with the new set of data.
i'l assume all the arrays are initialized to zero.
i dont think those 50 saved are of any use considering the amount of multiplication on the whole.
But still i will give you a hint on how to save those 50 maybe you could find an extension to it?
1st data set arrived : multiply the first data set in a with each of the data set in b. save all in a, copy only a[0] to a[4] to c. 25 multiplications here.
2nd data set arrived : multiply only a[0] to a[4](having new data) with b[0] to b[4] resp. save in a[0] to a[4],copy to a[0->9] to c. 5 multiplications here
3rd data set arrived : multiply a[0] to a[9] with b[0] to b[9] this time and copy to corresponding a[0->14] to c.10 multiplications here
4th data set : multiply a[0] to a[14] with corresponding b copy corresponding a[0->19] to c. 15 multiplications here.
5th data set : mutiply a[0] to a[19] with corresponding b copy corresponding a[0->24] to c. 20 multiplications here.
total saved mutiplications : 50 multiplications.
6th data set : usual data multiplications. 25 each. this is because for each set in the array a there a new data set avaiable so multiplication is unavoidable.
Can you add another array D to flag the changed/unchanged value in A. Each time you check this array to decide whether to do new multiplications or not.
I don't know what all the db, dw, dd, things mean.
I have tried to write this little script that does 1+1, stores it in a variable and then displays the result. Here is my code so far:
.386
.model flat, stdcall
option casemap :none
include \masm32\include\windows.inc
include \masm32\include\kernel32.inc
include \masm32\include\masm32.inc
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\masm32.lib
.data
num db ? ; set variable . Here is where I don't know what data type to use.
.code
start:
mov eax, 1 ; add 1 to eax register
mov ebx, 1 ; add 1 to ebx register
add eax, ebx ; add registers eax and ebx
push eax ; push eax into the stack
pop num ; pop eax into the variable num (when I tried it, it gave me an error, i think thats because of the data type)
invoke StdOut, addr num ; display num on the console.
invoke ExitProcess ; exit
end start
I need to understand what the db, dw, dd things mean and how they affect variable setting and combining and that sort of thing.
Quick review,
DB - Define Byte. 8 bits
DW - Define Word. Generally 2 bytes on a typical x86 32-bit system
DD - Define double word. Generally 4 bytes on a typical x86 32-bit system
From x86 assembly tutorial,
The pop instruction removes the 4-byte data element from the top of
the hardware-supported stack into the specified operand (i.e. register
or memory location). It first moves the 4 bytes located at memory
location [SP] into the specified register or memory location, and then
increments SP by 4.
Your num is 1 byte. Try declaring it with DD so that it becomes 4 bytes and matches with pop semantics.
The full list is:
DB, DW, DD, DQ, DT, DDQ, and DO (used to declare initialized data in the output file.)
See: http://www.tortall.net/projects/yasm/manual/html/nasm-pseudop.html
They can be invoked in a wide range of ways: (Note: for Visual-Studio - use "h" instead of "0x" syntax - eg: not 0x55 but 55h instead):
db 0x55 ; just the byte 0x55
db 0x55,0x56,0x57 ; three bytes in succession
db 'a',0x55 ; character constants are OK
db 'hello',13,10,'$' ; so are string constants
dw 0x1234 ; 0x34 0x12
dw 'A' ; 0x41 0x00 (it's just a number)
dw 'AB' ; 0x41 0x42 (character constant)
dw 'ABC' ; 0x41 0x42 0x43 0x00 (string)
dd 0x12345678 ; 0x78 0x56 0x34 0x12
dq 0x1122334455667788 ; 0x88 0x77 0x66 0x55 0x44 0x33 0x22 0x11
ddq 0x112233445566778899aabbccddeeff00
; 0x00 0xff 0xee 0xdd 0xcc 0xbb 0xaa 0x99
; 0x88 0x77 0x66 0x55 0x44 0x33 0x22 0x11
do 0x112233445566778899aabbccddeeff00 ; same as previous
dd 1.234567e20 ; floating-point constant
dq 1.234567e20 ; double-precision float
dt 1.234567e20 ; extended-precision float
DT does not accept numeric constants as operands, and DDQ does not accept float constants as operands. Any size larger than DD does not accept strings as operands.
I am doing some exercises in assembly language and I found a question about optimization which I can't figure out. Can anyone help me with them
So the question is to optimize the following assembly code:
----------------------------Example1-------------------------
mov dx, 0 ---> this one I know-> xor dx,dx
----------------------------Example2------------------------
cmp ax, 0
je label
----------------------------Example3-------------------------
mov ax, x
cwd
mov si, 16
idiv si
----> Most I can think of in this example is to subs last 2 lines by idiv 16, but I am not sure
----------------------------Example4-------------------------
mov ax, x
mov bx, 7
mul bx
mov t, ax
----------------------------Example5---------------------------
mov si, offset array1
mov di, offset array2
; for i = 0; i < n; ++i
do:
mov bx, [si]
mov [di], bx
add si, 2
add di, 2
loop do
endforloop
For example 2, you should look at the and or test opcodes. Similar to example 1, they allow you to remove the need for a constant.
For example 4, remember that x * 7 is the same as x * (8 - 1) or, expanding that, x * 8 - x. Multiplying by eight can be done with a shift instruction.
For example 5, you'd think Intel would have provided a much simpler way to transfer from SI to DI, since that is the whole reason for their existence. Maybe something like a REPetitive MOVe String Word :-)
For example three, division by a power of two can be implemented as a right shift.
Note that in example 5, the current code fails to initialize CX as needed (and in the optimized version, you'd definitely want to do that too).