I am passing three arrays of doubles from Python (3.6.2) into a DLL written in 64-bit NASM (Windows) using CTypes. The pointers to the arrays are in rcx, rdx, r8 and r9.
On entry, I extract the pointers into three separate arrays, called a_in_data,
b_in_data, and c_in_data. The elements of those arrays are (1) pointer (2) data type and (3) length.
In the area preceded by "Test #1" in the code below we check the value at b_in_data[0] and we get a valid pointer (just remove the comment symbols and jump to the end).
In the area preceded by "Test #2" we check the value at b_in_data[0] and we get zero. The array b_in_data[0] has not been changed by this point, but somehow it gets set to back zero.
The same happens in the block following for c_in_data. For some reason, the first code block (headed by "Extract data type and length") zeroes out the first value in b_in_data and c_in_data.
I have identified the line that is causing the problem; it's followed by the comment "THIS LINE IS THE PROBLEM, BUT IT'S NOT CLEAR WHY."
The Python code is long, but if it helps to reproduce this, please ask and I will post it. Here is the NASM code:
; Header Section
[BITS 64]
export TryThemAll
section .data
a_in_data: dd 0, 0, 0
b_in_data: dd 0, 0, 0
c_in_data: dd 0, 0, 0
out_array_pointer: dd 0
call_var_length: dd 0
section .text
finit
; _________________
TryThemAll:
push rdi
push rbp
push qword rcx
pop qword [a_in_data]
push qword rdx
pop qword [b_in_data]
push qword r8
pop qword [c_in_data]
push qword r9
pop qword [out_array_pointer]
; Test #1
; Now the value at b_in_data[0] is the pointer we just extracted from rdx
;mov rbp,b_in_data
;mov rax,qword [rbp]
;jmp out_here
;_______
; Extract data type and length
mov rdi,[out_array_pointer]
mov rbp,a_in_data
movsd xmm0,qword [rdi] ;Data type for a_in
cvttsd2si rax,xmm0
mov [rbp+8],rax ; THIS LINE IS THE PROBLEM, BUT IT'S NOT CLEAR WHY
movsd xmm0,qword [rdi+8] ;Length for a_in
cvttsd2si rax,xmm0
mov [rbp+16],rax
mov rbp,b_in_data
movsd xmm0,qword [rdi+16] ;Data type for b_in
cvttsd2si rax,xmm0
mov [rbp+8],rax
movsd xmm0,qword [rdi+24] ;Length for b_in
cvttsd2si rax,xmm0
mov [rbp+16],rax
; Test #2
; Now the value at [0] in b_in_data is zero !!!
mov rbp,b_in_data
mov rax,qword [rbp]
jmp out_here
mov rbp,c_in_data
movsd xmm0,qword [rdi+32] ;Data type for c_in
cvttsd2si rax,xmm0
mov [rbp+8],rax
movsd xmm0,qword [rdi+40] ;Length for c_in
cvttsd2si rax,xmm0
mov [rbp+16],rax
;_______
out_here:
pop rbp
pop rdi
ret
Thanks in advance for any help.
The solution to this problem was quite simple. The three arrays a_in_data, b_in_data and c_in_data were defined contiguously in the .data section as "dd" but should have been defined as "dq" to occupy eight bytes per element instead of four. Naturally successive writes had the effect of overstoring adjacent values.
I recently switched from 32-bit MASM to 64-bit NASM and I'm still getting used to NASM syntax and 64-bit assembly programming, so I'm still making some elementary mistakes.
Thanks, Peter, for the time you took on this. You made some other interesting points. For example, I've switched to using lea (load effective address) instead of moving the pointer to rbp (e.g., mov rbp,b_in_data).
Thanks again, and thanks to Michael Petch for adding the other tags.
BTW, these data are all converted to 64-bit integers, so the struc is not necessary -- they are not mixed types.
Related
I wrote an 8086 program, and as far as I can tell it runs fine, but when it gets to the part where I declare the variables, the emulator gives me an error. When trying to run the line temp db 0x0F, the emulator says:
unknown opcode skipped: 32
not 8086 instruction - not supported yet.
Here's my full program:
org 100h
mov ah, temp ;put variables into registers
mov al, changed
mov dx, result
lea bx, temp ;get address of temp and put into bx
add dx, [bx] ;add value at the address in bx to result
lea bx, changed ;get address of changed and put into bx
add dx, [bx] ;add value at the address in bx to result
temp db 0x0F ;declare and initialize variables
changed db 32h
result dw 0
Is this consequential to how the program functions, and how do I fix it?
EDIT: sigjuice solved the problem, as you can see in the comments. Here's the final version of the program that runs correctly:
.CODE
org 100h
mov ah, temp ;put variables into registers
mov al, changed
mov dx, result
lea bx, temp ;get address of temp and put into bx
add dx, [bx] ;add value at the address in bx to result
lea bx, changed ;get address of changed and put into bx
add dx, [bx] ;add value at the address in bx to result
.DATA
temp db 0x0F ;declare and initialize variables
changed db 32h
result dw 0
add dx, [bx] ;add value at the address in bx to result
temp db 0x0F ;declare and initialize variables
In this part of your program there's nothing that stops the CPU from executing the data at the temp label as if it were an instruction.
Although adding the .CODE and .DATA assembler directives (perhaps suggested by #sigjuice) seemingly solves the problem, this is typically not what you use when writing a .COM executable. It's a .COM executable because you used the org 100h directive.
What your program really needs is a way to return to the operating system. Since this is EMU8086 the preferred way is using the DOS.TerminateWithReturncode function.
add dx, [bx] ;add value at the address in bx to result
; Exit to the operating system
mov ax, 4C00h ;AH=4Ch function number, AL=0 exitcode (0 most often means OK)
int 21h ;DOS system call
; Now beyond this point nothing gets executed inadvertently
temp db 0Fh ;declare and initialize variables
I can't really advice to return to the operating system using a mere ret instruction, because this method requires that the SS:SP registers are set as they were when the program started. This will not always be the case. Better use this DOS function that does not rely on any specific register setting.
lea bx, temp ;get address of temp and put into bx
add dx, [bx] ;add value at the address in bx to result
lea bx, changed ;get address of changed and put into bx
add dx, [bx] ;add value at the address in bx to result
Nothing to do with your original problem but as a bonus:
Because temp and changed are both byte-sized variables, the word-sized additions don't just add the variables alone but also the byte that happens to follow them in memory! Sometimes this is intentional (I sincerily doubt this is the case here!), but you need to make sure that you understand this.
Lets say I have a PROC in My assembly code like so:
.CODE
PROC myProc
MOV EAX, 00000001
MOV EBX, 00001101
RET
ENDP myProc
I want to MOV 1, into the EAX register, and move 13 into the EBX register in my procedure, however I want to create two variables local to my PROC, assigning var a the value of 1, and var b the value of 13, and from there MOVing [a] into EAX, and [b] into EBX. I have had many ideas about this before, perhaps creating space on the stack for the variables, or something like:
.CODE
PROC myProc
PUSH ESP
PUSH EBP
MOV ESP, 00000001
MOV EBP, 00001101
MOV EAX, [ESP]
MOV EBX, [EBP]
ENDP myProc
But this still really isn't dynamic variable creation, I am just writing and reading data back and forth between registers. So in essence I am trying to figure out how to create variable in assembly at run-time. I would appreciate any help.
Variables are a high-level concept. An asm implementation of a C function will typically have a variable live in a register for some of the time, but maybe at other times it's live in a different register, or in memory at some location once it's no longer needed (or you ran out of registers).
In asm you don't really have variables (other than static storage), except by using comments to keep track of what means what. Just move data around and produce a meaningful result.
Avoid memory whenever possible. Look at C compiler output: any decent compiler will keep everything in registers as much as possible.
int foo(int a, int b) {
int c = a + 2*b;
int d = 2*a + b;
return c + d;
}
This function compiles to the following 32-bit code with gcc6.2 -O3 -fverbose-asm (on the Godbolt compiler explorer). Notice how gcc attaches variable names to registers with comments.
mov ecx, DWORD PTR [esp+4] # a, a
mov edx, DWORD PTR [esp+8] # b, b
lea eax, [ecx+edx*2] # c,
lea edx, [edx+ecx*2] # d,
add eax, edx # tmp94, d
ret
It seems like you're using MASM syntax. The standard MASM approach to create local variables is
.CODE
PROC myProc
LOCAL a: DWORD
LOCAL b: DWORD
; Initialize those vars
MOV a, 00000001
MOV b, 00001101
RET
ENDP myProc
The LOCAL directive creates space on the stack for the variables using EBP relative indexing.
I would like to have a scalar multiplication of two vectors using NASM and C. With the convention that I only declare function in C and implementation is in assembly. I have aproblem with that program, especially in assembly file. I don't know what I am doing wrong. I have some syntax errors but I don't know why they are in 27, 28 and 37 line of my code.
Those are errors displayed after first line of compilation nasm -felf32 pos.asm -o pos.o:
pos.asm:27: error: comma or end of line expected
pos.asm:28: error: expression syntax error
pos.asm:37: error: expression syntax error
assembly code:
; nasm -felf32 pos.asm -o pos.o
; gcc -m32 -o zadanie1c.o -c zadanie1.c
; gcc -m32 zadanie1c.o zadanie1a.o -o zadanie1
%define n qword [ebp+8]
%define zero qword [ebp+12]
%define vect1 qword [ebp+20]
%define vect2 qword [ebp+20+n*8]
%define result qword [ebp+20+n*16]
segment .data
MinusFour dw -4
segment .text
global scalar
scalar:
push ebp
mov ebp, esp
push ebx ;
fld zero ; for safety initialization
mov ecx,n
myLoop:
fld vect1+ecx*8 ; v1
fld vect2+ecx*8 ; v2
fmulp st1 ; v1*v2
faddp st1 ; v1*v2 +
cmp 0,ecx
je end
dec ecx
jmp myLoop
end:
mov ebx, result
fstp qword [ebx]
mov eax, ebx
pop ebx
mov esp, ebp
pop ebp
ret
%defines in NASM work like #defines in C - complete text replacement. So
%define vect1 qword [ebp+20]
%define vect2 qword [ebp+20+n*8]
...
fld vect1+ecx*8 ; v1
fld vect2+ecx*8 ; v2
becomes
fld qword [ebp+20]+ecx*8 ; v1
fld qword [ebp+20+n*8]+ecx*8 ; v2
which is the invalid syntax the compiler is complaining about.
How did I find this, even though I've never used NASM? I read the error and the line number. "Comma or end of line expected" sure sounds like a syntax error. Then I looked at everything in the line it was complaining about. What could be messing up the syntax? Hmm, vect2 isn't an assembly keyword, what is it? A %define... how do they work? Google says... " The definitions work in a similar way to C". Eureka!
This is threaded code command, like Forth ?, check is 0 is on top of stack (edi) and skip next command by dereferencing command pointer (ebx).
_ifleap:
mov eax, [edi]
add edi, 4
test eax, eax
cmovz ebx, [ebx]
mov ebx, [ebx]
jmp [ebx + 12]
Is there a way to optimize this? Less lines, faster execution, better CPU support?
Idea is to check if [edi] is zero, then mov ebx, [ebx] otherwise do nothing. The edi must increment by 4 (this is sort of stack pointer). Of course cmovz is i686 only, but using label seems overkill for this task.
(Yes I have x86 instruction set reference, but it is huge, and takes long time to master, but I only use assembly occasionally, so I look for an expert advice.)
I want to perform an atomic 'and' operation on IA-32.
Please consider the following situation:
; processor 0
lea edx, var
mov ecx, mask
mov eax, [edx]
lock and [edx], ecx
; processor 1
lea edx, var
mov eax, 0xff
xchg [edx], eax
I'm not sure if it's possible that the store to 'var' by processor 1 can or cannot occure between the load and the store to 'var' by processor 0.
So, is this working or do I need to spin lock like this:
; processor 0
push ebx
lea edx, var
mov ecx, mask
##loop:
mov ebx, [edx]
mov eax, ebx
and eax, ecx
lock cmpxchg [edx], eax
cmp eax, ebx
jne ##loop
pop ebx
Thanks for any answer. Best regards.
EDIT:
In other words:
I want to perform the conjunction in 'Processor 0' and need to fetch the initial value.
An xchg that references memory automatically locks the bus (or locks the cache when/if the data is already in the cache). See the Intel reference manual, ยง8.3.1. (Warning: I haven't looked hard recently, but Intel used to rearrange their web site, invalidating links fairly quickly. If so, Googling for something like "intel reference 3a" should turn it up).