x86 64 AT&T , moving part of register into another register - indexing

I'd like to move one byte from register rdx to register rbx, like this:
mov %rdx , (%rbx,%r15,1)
where
rdx contains 0x33
,r15 is index and rbx contains 0 at start.
I have tried using this method in many ways , always ending with SIGSEGV error.
In the end I am going to create a rbx register which will contain an array of next rdx values

You can shift the bytes in one at a time, like this:
; Calculate first dl
...
mov %dl,%bl
; Calculate next dl
...
shlq $8,%rbx
mov %dl,%bl
; Calculate next dl
...
shlq $8,%rbx
mov %dl,%bl
etc. This assumes that you want the first byte in the msb, and the last byte in the lsb. The revesre order is a bit more complicated, but not much.

Related

mov command with vs. without parenthesis?

I don't understand what is the difference between these two statements
mov [var] , 10
and
mov var,10
in assembly?
For a variable like this:
var: db 0
The instruction mov var,10 would not be allowed by NASM, because in NASM syntax writing var like that (without square brackets) means that you want the address of var as an immediate. And there's no variant of mov that takes an immediate, immediate operand pair.
Adding the square brackets makes it a reference to an address in memory. So mov [var], 10 means store the value 10 at var. Actually you'd have to specify the size of the value to store as well, e.g. mov byte [var], 10. Otherwise NASM doesn't know if you want to store a byte, a word, or a dword, because the immediate 10 could be represented in any of those sizes.
Note that in MASM/TASM syntax mov var, 10 and mov [var], 10 would mean the same thing in this case (they would both have the same meaning as mov [var], 10 in NASM sytax).

How do I print out a value in assembly language

I'm trying to print int "1" from a variable in LC3
I have:
COUNTER .FILL #1
LD R1, COUNTER
PUTC
but this prints "'0" (apostrophe zero)
To print in lc3, there are two easy system routines available to use.
1) PUTS - "Write a string of ASCII characters to the console display. The characters are contained
in consecutive memory locations, one character per memory location, starting with the address specified in R0. Writing terminates with the occurrence of x0000 in a memory location"*
2) OUT - "Write a character in R0[7:0] to the console display."*
Since you're just printing one character, you can use the OUT routine like so:
COUNTER .FILL #1
LD R0, COUNTER
OUT
Note the register is R0, not R1 like you had.
You could also use PUTS here, but PUTS will print until it finds x0000 in the next memory location. So for one character, using OUT is safer.
*See http://highered.mcgraw-hill.com/sites/dl/free/0072467509/104653/PattPatelAppA.pdf

Using a Variable to Point to a Specific Character in a String

I am working on a program where I need to be able change characters at specific places within a string as the user moves through it, and I'd like to use a variable to store the user's position within the string.
While working on other parts of the program, I temporarily used the code below, where buffer is my string:
mov eax, buffer
mov byte [eax + 14], '#'
In the finished program, I'd like to use something like:
mov byte [eax + position], '#'
However, when I use the line above, with position set to 14, I get a segmentation fault. How can I use a variable to point to a specific spot in the string?
EDIT: The position variable is set as follows:
segment .data
position db 14
Okay... [eax + position] is eax plus address of position. You want [eax + [position]] and we have no such instruction. Do something like mov ecx, [position] Make position dd not db! Then [eax + ecx] should work.

In IDA Pro what is '::lower16::' in the 2nd ARM operand do?

E.G I have this:
MOVW R1, #(:lower16:(selRef_stringWithUTF8String_ - 0xbeee)
MOV R6, R0
MOVT.W R1, #(:upper16:(selRef_stringWithUTF8String_ - 0xbeee)
There is :lower16: and :upper16: before the address of the operand. I presume its because its in thumb mode and the size of the pointer to the string is too large so its fetching lower and upper portions? Please advise.
It is just as you guessed. In Thumb-2, when loading a 32-bit number it is often using a MOV/MOVT instruction pair, e.g.
MOVW R1, #0x1234 ; Set the value of R1. R1 is now 0x1234
MOVT.W R1, #0x5678 ; Set the top-16 bit of R1. R1 is now 0x56781234.
IDA Pro recognized that the combined immediate value matches the address of a selector, and uses the :lower: and :upper: syntax to indicate that the value is split into two 16-bit parts.

Want to understand load imputed by floating point instructions

At outset, this may be a part-discussion part-solving kind of questions. No intent to offend anyone there.
I have written in 64 bit assembly the algorithm to generate MT Prime based random number generator for 64 bits. This generator function is required to be called 8 billion times to populate an array of size 2048x2048x2048, and generate a random no between 1..small_value (usually, 32)
Now I had two next steps possibilities :
(a) Keep generating numbers, compare with the limits [1..32] and discard those that don't fall within. The run time for this logic is 181,817 ms, measured by calling clock() function.
(b) take the 64 bit random number output in RAX,and scale it using FPU to be between [0..1], and then scale it up in the desired range [1..32] The code sequence for this is as below :
mov word ptr initialize_random_number_scaling,dx
fnclex ; clears status flag
call generate_fp_random_number ; returns a random number in ST(0) between [0..1]
fimul word ptr initialize_random_number_scaling ; Mults ST(0) & stores back in ST(0)
mov word ptr initialize_random_number_base,ax ; Saves base to a memory
fiadd word ptr initialize_random_number_base ; adds the base to the scaled fp number
frndint ; rounds off the ST(0)
fist word ptr initialize_random_number_result ; and stores this number to result.
ffree st(0) ; releases ST(0)
fincstp ; Logically pops the FPU
mov ax, word ptr initialize_random_number_result ; and saves it to AX
And the instructions in generate_fp_random_number are as below :
shl rax,1 ; RAX gets the original 64 bit random number using MT prime algorithm
shr ax,1 ; Clear top bit
mov qword ptr random_number_generator_act_number,rax ; Save the number in memory as we cannot move to ST(0) a number from register
fild qword ptr random_number_generator_max_number ; Load 0x7FFFFFFFFFFFFFFFH
fild qword ptr random_number_generator_act_number ; Load our number
fdiv st(0),st(1) ; We return the value through ST(0) itself, divide our random number with max possible number
fabs
ffree st(1) ; release the st(1)
fld1 ; push to top of stack a 1.0
fcomip st(0), st(1) ; compares our number in ST(1) with ST(0) and sets CF.
jc generate_fp_random_get_next_no ; if ST(0) (=1.0) < ST(1) (our no), we need a new no
fldz ; push to top of stack a 0.0
fcomip st(0),st(1) ; if ST(0) (=0.0) >ST(1) (our no) clears CF
jnc generate_fp_random_get_next_no ; so if the number is above zero the CF will be set
fclex
The problem is, just by adding these instructions, the run time jumps to a whopping 5,633,963 ms! I have written the above using xmm registers as an alternative, and the difference is absolutely marginal. (5,633,703 ms).
Would anyone kindly guide me on what degree of load do these additional instructions impute to the total run time? Is the FPU really this slow ? Or am I missing a trick? As always, all ideas are welcome and am grateful for your time and efforts.
Env : Windows 7 64 bit on Intel 2700K CPU overclocked to 4.4 GHz 16 GB RAM debugged in VS 2012 Express environment
"mov word ptr initialize_random_number_base,ax ; Saves base to a memory"
If you want the max speed you must find out how to separate write instructions and write data into different sections of memory
Rewriting data in the same area of cache creates a "self modifying code" situation
Your compiler may do this, it may not.
You need to know this because unoptimised assembly code runs 10 to 50 times slower
"All modern processors cache code and data memory for efficiency. Performance of assembly-language code can be seriously impaired if data is written to the same block of memory as that in which the code is executing, because it may cause the CPU repeatedly to reload the instruction cache (this is to ensure self-modifying-code works correctly). To avoid this, you should ensure that code and (writable) data do not occupy the same 2 Kbyte block of memory. "
http://www.bbcbasic.co.uk/bbcwin/manual/bbcwina.html#cache
There's a ton of stuff in your code that I can see no reason for. If there was a reason, feel free to correct me, but otherwise here are my alternatives:
For generate_fp_random_number
shl rax, 1
shr rax, 1
mov qword ptr act_number, rax
fild qword ptr max_number
fild qword ptr act_number
fdivrp ; divide actual by max and pop
; and that's it. It's already within bounds.
; It can't be outside [0, 1] by construction.
; It can't be < 0 because we just divided two positive number,
; and it can't be > 1 because we divided by the max it could be
For the other thing:
mov word ptr scaling, dx
mov word ptr base, ax
call generate_fp_random_number
fimul word ptr scaling
fiadd word ptr base
fistp word ptr result ; just save that thing
mov ax, word ptr result
; the default rounding mode is round to nearest,
; so the slow frndint is unnecessary
Also note the complete lack of ffree's etc. By making the right instruction pop, it all just worked out. It usually does.