what is the equivalent of an if-statement in ARM? - input

So, I am working on a program in ARM that takes a bunch of numbers from a file and tells if they are even or odd.
The problem is that I know how to multiply by 0.5, but I don't know how to do something like this high level statement in ARM
if (A / 2 == 0)
print even
else
print odd
Here's what I have in terms of code:
#open input file
ldr r0,=FileName # set Name for input file
mov r1,#0 # mode is input
swi SWI_Open # open file for input
bcs InFileError # if error?
ldr r1,=InFileHandle # load input file handle
str r0,[r1] # save the file handle
#read integers from input file
NUMBERS:
ldr r0,=InputFileHandle # load input file handle
ldr r0,[r0]
swi SWI_RdInt # read the integer into R0
bcs EofReached # Check Carry-Bit (C): if= 1 then EOF reached
#multiplication by 0.5 to test for odd or even
MUL R2 R0 0.5
#what is the test in ARM
#for ( R0 / 0.5 ) == a multiple of 1?
B NUMBERS
LOOP:
#end of program
Message1: .asciz"Hello World!"
EOL: .asciz "\n"
NewL: .ascii "\n"
Blank: .ascii " "
FileName: .asciz"input.txt"
.end
So I think the first two things inputting from the file and reading the integers works. I don't know how to test for the condition that it is divisible by 2. I think it's multiplied by 0.5 and then the next step is to say even if that number doesn't have a decimal place with anything after it at the end, but if it does then then number A that was divided into number B is odd. Otherwise it is even?

A brief answer: you don't need to multiply by 0.5 or anything like that. You need to check the value of LSB (least significant bit) of the value. It will be 0 for even numbers and 1 for odd numbers.
Upd.: your "C" code is also wrong. You want to use A % 2, not A / 2

Related

How to plot a sum with Maple?

I would like to draw with Maple the following expression :
> I_n:=Sum((H_(k+1)H_(n-k+1))/k+2,k=0..n);
witch > H_n:=sum(1/k,k=1..n);
My work:
>f:=n->sum(1/k,k=1..n);
I_n:=sum(f(j+1)*f(n-j+1)/(j+2), j = 0 .. n);
But I do not see how I can draw this.
Thank you for your help.
plot(I_n, n= 0..20);
or change 20 to any other upper limit.
It makes a big difference to the result if you distinguish whether n is allowed to take on non-integer values (eg. n=10.23 etc).
You originally wrote Sum for defining I_n, but then your code fragment had lowercase sum. You should be careful about which you try and use because it will affect what happens when plot tries to use non-integer values of n. (Using Sum without any rounding call on n will also risk generating an empty plot since evalf/Sum will baulk at the non-float values and you can get an empty plot by accident.)
Compare all these, and especially note the n that appear outside the summation (as a partial result) when using sum to define I_n.
It's up to you to figure out whether you wanted n to be purely integer-valued, and then choose the plotting method accordingly.
f:=n->sum(1/k,k=1..n):
I_n:=Sum(f(j+1)*f(n-j+1)/(j+2), j = 0 .. n);
sum(f(j+1)*f(n-j+1)/(j+2), j = 0 .. n); # Note the `n` outisde the sum.
value(I_n); # As if I_n:=sum(...) had been used. Note the `n` outside the sum.
plot(value(I_n), n= 0..20); # also what you'd get if you plotted I_N:=sum(...)
plot(subs(n=floor(n),I_n), n=0..20); # Step function. Could also try with round().
plot(I_n, n=0..20); # Empty plot since I_n=Sum(...) used without rounding `n`.
plots:-pointplot([seq([n,I_n],n=0..20)]); # use style=line option to join the points
My main point is that the result from executing sum(f(j+1)*f(n-j+1)/(j+2), j = 0 .. n) may well be not something that you intended to plot in the case that n is not an integer. And if so then you should account for that when plotting it.

The xv6-rev7 (JOS) GDT

It's very difficult for me to understand GDT (Global Descriptor Table) in JOS (xv6-rev7)
For example
.word (((lim) >> 12) & 0xffff), ((base) & 0xffff);
Why shift right 12? Why AND 0xffff?
What do these number mean?
What does the formula mean?
Can anyone give me some resources or tutorials or hints?
Here, It's two parts of snippet code as following for my problem.
1st Part
0654 #define SEG_NULLASM \
0655 .word 0, 0; \
0656 .byte 0, 0, 0, 0
0657
0658 // The 0xC0 means the limit is in 4096−byte units
0659 // and (for executable segments) 32−bit mode.
0660 #define SEG_ASM(type,base,lim) \
0661 .word (((lim) >> 12) & 0xffff), ((base) & 0xffff); \
0662 .byte (((base) >> 16) & 0xff), (0x90 | (type)), \
0663 (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)
0664
0665 #define STA_X 0x8 // Executable segment
0666 #define STA_E 0x4 // Expand down (non−executable segments)
0667 #define STA_C 0x4 // Conforming code segment (executable only)
0668 #define STA_W 0x2 // Writeable (non−executable segments)
0669 #define STA_R 0x2 // Readable (executable segments)
0670 #define STA_A 0x1 // Accessed
2nd Part
8480 # Bootstrap GDT
8481 .p2align 2 # force 4 byte alignment
8482 gdt:
8483 SEG_NULLASM # null seg
8484 SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff) # code seg
8485 SEG_ASM(STA_W, 0x0, 0xffffffff) # data seg
8486
8487 gdtdesc:
8488 .word (gdtdesc − gdt − 1) # sizeof(gdt) − 1
8489 .long gdt # address gdt
The complete part: http://pdos.csail.mit.edu/6.828/2012/xv6/xv6-rev7.pdf
Well, it isn't a real formula at all. Limit is shifted twelve bits to right, what's equivalent to division by 2^12, what is 4096, and that is granularity of GDT entry base, when G bit is set (in your code G bit is encoded in constants you use in your macro). Whenever address is to be accessed using correnspondig selector, only higher 20 bits are compared with limit and if they're greater, #GP is thrown. Also note that standard pages are 4KB in size, so any number greater than limit by less than 4 kilobytes is handled by page corresponding selector limit. Landing is there partly for suppressing compiler warnings about number overflow, as the operand 0xFFFF is maximal value for single word (16 bits).
Same applies for other shifts and AND, where in other expressions numbers can be shifted more to get another parts.
The structure of GDT descriptor sees above.
((lim) >> 12) & 0xffff) corresponding to Segment Limit(Bit 0-15). Shift right means minimal unit is 2^12 byte(granularity of GDT entry base); && 0xffff means we need the lower 16 bits of lim) >> 12, which fits to lowest part of 16 bits of GDT descriptor.
The rest of the 'formula' is the same.
here is a good material for learning GTD descriptor.

What causes a nan error result?

I have this chunk of code which runs through a loop. The lower case x var always prints correctly. The upper case X var sometimes prints correctly, sometimes prints nan or junk. Why?
N.B. The data is always identical.
Link to FFT
Link to FFT example usage
Link to my other SO question which shows how this is being used. BOUNTY OF 200 points!
double (*x)[2];
double (*X)[2];
x = malloc(2 * 512 * sizeof(double));
X = malloc(2 * 512 * sizeof(double));
for (j = 0; j < 10; j++){
(*x)[j] = // values inserted from method argument.;
}
fft(512, x, X);
for (j = 0; j < 512; j++){
if (i==512*20) {
NSLog(#"PRE POST %f - %f",(*x)[j], (*X)[j]);
}
}
free(x);
free(X);
In floating point arithmetic, there are several operations that will result in a NaN error. Wikipedia points out these operations as resulting in a NaN:
The divisions 0/0 and ±∞/±∞
The multiplications 0×±∞ and ±∞×0
The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions
(These are called indeterminate forms.)
Check your code to see if you're performing any operations that can't have a numeric answer.
As for the 'junk' results, they may be the result of messed up memory allocation, but you haven't given much detail so I can't be sure.
In other languages which I have worked in, "NaN" (not a number) is what you get when you divide a 0.0 by 0.0. I don't know Objective-C, but it's likely the same.
As for what is causing nan to be stored in X... you'll have to show us the body of fft before anyone can answer that. You said you think it might be a memory/pointer bug, because it's not consistent. I just looked up how NaN is represented in the IEE 7754 floating point format (which your platform is likely using) -- basically, several of the high-order bits (which normally hold the exponent of a floating-point number) all have to be filled with 1s.
If you do have a memory corruption bug, which is causing junk to be stored into X, then if those particular bits happened to all be 1s, that would cause the number to print as "nan".
Again, please show the body of fft, so someone can try to help you further.
I tried running this - initialised the data using (*x)[j] = j and removed the i==512*20 printing condition. All values came back fine. I also tried with random input data - still good. What is the nature of your input data?
(I'll look at your other question as well)
edit: I should point out I filled 512 values of the x array - your loop above only fills 10, so much of the input array is uninitialised.

add vs mul (IA32-Assembly)

I know that add is faster as compared to mul function.
I want to know how to go about using add instead of mul in the following code in order to make it more efficient.
Sample code:
mov eax, [ebp + 8] #eax = x1
mov ecx, [ebp + 12] #ecx = x2
mov edx, [ebp + 16] #edx = y1
mov ebx, [ebp + 20] #ebx = y2
sub eax,ecx #eax = x1-x2
sub edx,ebx #edx = y1-y2
mul edx #eax = (x1-x2)*(y1-y2)
add is faster than mul, but if you want to multiply two general values, mul is far faster than any loop iterating add operations.
You can't seriously use add to make that code go faster than it will with mul. If you needed to multiply by some small constant value (such as 2), then maybe you could use add to speed things up. But for the general case - no.
If you are multiplying two values that you don't know in advance, it is effectively impossible to beat the multiply instruction in x86 assembler.
If you know the value of one of the operands in advance, you may be able beat the multiply instruction by using a small number of adds. This works particularly well when the known operand is small, and only has a few bits in its binary representation. To multiply an unknown value x by a known value consisting 2^p+2^q+...2^r you simply add x*2^p+x*2^q+..x*2*r if bits p,q, ... and r are set. This is easily accomplished in assembler by left shifting and adding:
; x in EDX
; product to EAX
xor eax,eax
shl edx,r ; x*2^r
add eax,edx
shl edx,q-r ; x*2^q
add eax,edx
shl edx,p-q ; x*2^p
add eax,edx
The key problem with this is that it takes at least 4 clocks to do this, assuming
a superscalar CPU constrained by register dependencies. Multiply typically takes
10 or fewer clocks on modern CPUs, and if this sequence gets longer than that in time
you might as well do a multiply.
To multiply by 9:
mov eax,edx ; same effect as xor eax,eax/shl edx 1/add eax,edx
shl edx,3 ; x*2^3
add eax,edx
This beats multiply; should only take 2 clocks.
What is less well known is the use of the LEA (load effective address) instruction,
to accomplish fast multiply-by-small-constant.
LEA which takes only a single clock worst case its execution time can often
by overlapped with other instructions by superscalar CPUs.
LEA is essentially "add two values with small constant multipliers".
It computes t=2^k*x+y for k=1,2,3 (see the Intel reference manual) for t, x and y
being any register. If x==y, you can get 1,2,3,4,5,8,9 times x,
but using x and y as seperate registers allows for intermediate results to be combined
and moved to other registers (e.g., to t), and this turns out to be remarkably handy.
Using it, you can accomplish a multiply by 9 using a single instruction:
lea eax,[edx*8+edx] ; takes 1 clock
Using LEA carefully, you can multiply by a variety of peculiar constants in a small number of cycles:
lea eax,[edx*4+edx] ; 5 * edx
lea eax,[eax*2+edx] ; 11 * edx
lea eax,[eax*4] ; 44 * edx
To do this, you have to decompose your constant multiplier into various factors/sums involving
1,2,3,4,5,8 and 9. It is remarkable how many small constants you can do this for, and still
only use 3-4 instructions.
If you allow the use other typically single-clock instructions (e.g, SHL/SUB/NEG/MOV)
you can multiply by some constant values that pure LEA can't
do as efficiently by itself. To multiply by 31:
lea eax,[4*edx]
lea eax,[8*eax] ; 32*edx
sub eax,edx; 31*edx ; 3 clocks
The corresponding LEA sequence is longer:
lea eax,[edx*4+edx]
lea eax,[edx*2+eax] ; eax*7
lea eax,[eax*2+edx] ; eax*15
lea eax,[eax*2+edx] ; eax*31 ; 4 clocks
Figuring out these sequences is a bit tricky, but you can set up an organized attack.
Since LEA, SHL, SUB, NEG, MOV are all single-clock instructions worst
case, and zero clocks if they have no dependences on other instructions, you can compute the exeuction cost of any such sequence. This means you can implement a dynamic programmming algorithm to generate the best possible sequence of such instructions.
This is only useful if the clock count is smaller than the integer multiply for your particular CPU
(I use 5 clocks as rule of thumb), and it doesn't use up all the registers, or
at least it doesn't use up registers that are already busy (avoiding any spills).
I've actually built this into our PARLANSE compiler, and it is very effective for computing offsets into arrays of structures A[i], where the size of the structure element in A is the known constant. A clever person would possibly cache the answer so it doesn't
have to be recomputed each time multiplying the same constant occurs; I didn't actually do that because
the time to generate such sequences is less than you'd expect.
Its is mildly interesting to print out the sequences of instructions needed to multiply by all constants
from 1 to 10000. Most of them can be done in 5-6 instructions worst case.
As a consequence, the PARLANSE compiler hardly ever uses an actual multiply when indexing even the nastiest
arrays of nested structures.
Unless your multiplications are fairly simplistic, the add most likely won't outperform a mul. Having said that, you can use add to do multiplications:
Multiply by 2:
add eax,eax ; x2
Multiply by 4:
add eax,eax ; x2
add eax,eax ; x4
Multiply by 8:
add eax,eax ; x2
add eax,eax ; x4
add eax,eax ; x8
They work nicely for powers of two. I'm not saying they're faster. They were certainly necessary in the days before fancy multiplication instructions. That's from someone whose soul was forged in the hell-fires that were the Mostek 6502, Zilog z80 and RCA1802 :-)
You can even multiply by non-powers by simply storing interim results:
Multiply by 9:
push ebx ; preserve
push eax ; save for later
add eax,eax ; x2
add eax,eax ; x4
add eax,eax ; x8
pop ebx ; get original eax into ebx
add eax,ebx ; x9
pop ebx ; recover original ebx
I generally suggest that you write your code primarily for readability and only worry about performance when you need it. However, if you're working in assembler, you may well already at that point. But I'm not sure my "solution" is really applicable to your situation since you have an arbitrary multiplicand.
You should, however, always profile your code in the target environment to ensure that what you're doing is actually faster. Assembler doesn't change that aspect of optimisation at all.
If you really want to see some more general purpose assembler for using add to do multiplication, here's a routine that will take two unsigned values in ax and bx and return the product in ax. It will not handle overflow elegantly.
START: MOV AX, 0007 ; Load up registers
MOV BX, 0005
CALL MULT ; Call multiply function.
HLT ; Stop.
MULT: PUSH BX ; Preserve BX, CX, DX.
PUSH CX
PUSH DX
XOR CX,CX ; CX is the accumulator.
CMP BX, 0 ; If multiplying by zero, just stop.
JZ FIN
MORE: PUSH BX ; Xfer BX to DX for bit check.
POP DX
AND DX, 0001 ; Is lowest bit 1?
JZ NOADD ; No, do not add.
ADD CX,AX
NOADD: SHL AX,1 ; Shift AX left (double).
SHR BX,1 ; Shift BX right (integer halve, next bit).
JNZ MORE ; Keep going until no more bits in BX.
FIN: PUSH CX ; Xfer product from CX to AX.
POP AX
POP DX ; Restore registers and return.
POP CX
POP BX
RET
It relies on the fact that 123 multiplied by 456 is identical to:
123 x 6
+ 1230 x 5
+ 12300 x 4
which is the same way you were taught multiplication back in grade/primary school. It's easier with binary since you're only ever multiplying by zero or one (in other words, either adding or not adding).
It's pretty old-school x86 (8086, from a DEBUG session - I can't believe they still actually include that thing in XP) since that was about the last time I coded directly in assembler. There's something to be said for high level languages :-)
When it comes to assembly instruction,speed of executing any instruction is measured using the clock cycle. Mul instruction always take more clock cycle's then add operation,but if you execute the same add instruction in a loop then the overall clock cycle to do multiplication using add instruction will be way more then the single mul instruction. You can have a look on the following URL which talks about the clock cycle of single add/mul instruction.So that way you can do your math,which one will be faster.
http://home.comcast.net/~fbui/intel_a.html#add
http://home.comcast.net/~fbui/intel_m.html#mul
My recommendation is to use mul instruction rather then putting add in loop,the later one is very inefficient solution.
I'd have to echo the responses you have already - for a general multiply you're best off using MUL - after all it's what it's there for!
In some specific cases, where you know you'll be wanting to multiply by a specific fixed value each time (for example, in working out a pixel index in a bitmap) then you can consider breaking the multiply down into a (small) handful of SHLs and ADDs - e.g.:
1280 x 1024 display - each line on the
display is 1280 pixels.
1280 = 1024 + 256 = 2^10 + 2^8
y * 1280 = y * (2 ^ 10) + y * (2 ^ 8)
= ADD (SHL y, 10), (SHL y, 8)
...given that graphics processing is likely to need to be speedy, such an approach may save you precious clock cycles.

Generating random numbers with ARM Assembly

I want to generate random number to use it in my iphone project by Inlining in my Objective-C code some assembly, is this possible with arm-assembly?
Look up lfsr on google, linear feedback shift register. Not a true random number generator but you can make pretty good random numbers with maybe three or four lines of assembler.
Go to Wikipedia, find the easiest random number generation algorithm, reimplement in assembly :)
; ========================= RANDOM.INC =========================
; Call with, NOTHING
; Returns, AL = random number between 0-255,
; AX may be a random number too ?
; DW RNDNUM holds AX=random_number_in AL
SEED DW 3749h
RNDNUM DW 0
align 16
RANDOM:
PUSH DX
MOV AX,[SEED] ;; AX = seed
MOV DX,8405h ;; DX = 8405h
MUL DX ;; MUL (8405h * SEED) into dword DX:AX
;
CMP AX,[SEED]
JNZ GOTSEED ;; if new SEED = old SEED, alter SEED
MOV AH,DL
INC AX
GOTSEED:
MOV WORD [SEED],AX ;; We have a new seed, so store it
MOV AX,DX ;; AL = random number
MOV WORD [RNDNUM],AX
POP DX
RET
Jost load a variable from an uninitialized memory address. At every access, increment address to get new random numbers. Voila, guarenteed random, but not well distributed.