Full register access meaning - embedded

I am working on the MSP430 microcontroller and was going through its architecture. In the user guide, under its features tab, there is a statement like this - "Full register access including program counter (PC), status register (SR), and stack pointer (SP)". I was under the impression that the CPU always has access to all the registers irrespective of the architecture.
My understanding of the statement may be wrong. Can anyone explain me what it means exactly?

As per the wikipedia page:
The processor contains 16 16-bit registers, of which 4 are dedicated to special purposes: R0 is the program counter, R1 is the stack pointer, R2 is the status register, and R3 is a special register called the constant generator, providing access to 6 commonly used constant values without requiring an additional operand. R3 always reads as 0 and writes to it are ignored. R4 through R15 are available for general use.
In other words, "full access" in this case means not just using jmp-type instructions to be able to jump to a new location, but also allowing something like xor r0, #1234 to directly (and probably fatally) modify the program counter.
Ditto for the other special registers, except R3, the constant generator and the only one of the four not mentioned in your quote. While all the instructions could operate on that register, it ignores writes and generates various fixed values on read (-1..2, use of R2 can also give you 4 and 8) depending on the addressing mode used.
That may seem a little strange but it's not the strangest I've ever seen. For that, you would have to investigate the RCA1802A CPU which, like the MPS430 had "general purpose" registers for specific functions but you could actually choose at run time which should be the program counter or stack pointer. It actually had no call or ret instructions, instead it used a standard call and return technique (SCRT) to emulate it.

Related

Memory FRAM MB85RC256V data sheet interpretation

The FRAM (Ferroelectric Random Access Memory) Fujitsu MB85RC256V FRAM MB85RC256V-20171207-5V1 says in page 8 that
• Page Write: If additional 8 bits are continuously sent after the same
command (except stop condition) as Byte Write, a page write is
performed.
This needs no interpretation. But then, on the next page it would say:
• Current Address Read: When the previous write or read operation
finishes successfully up to the stop condition and assumes the last
accessed address is “n”, then the address at “n+1” is read by sending
the following command unless turning the power off.
My question regards interpretation of whether the English in "up to the stop condition" means the same as "except stop condition"?
I guess that not only the English needs to be understood, but also how the device works.
I believe that the two are equal, also inferred from trying to understand how the device seems to work:
Write 8 bits device address with R/W=0=write bit,
then write 16 bits FRAM memory address "n",
then do no stop bit because we could now
send an 8 bits device address with R/W=1=read bit,
then continue to read (first from address "n+1") as many bytes as needed,
until the final stop bit.
This will effectively mean that reading from an address, we need to first set the address register to one lower.
Please correct if my understanding is not 100% correct.
Aside: I have used this FRAM in an earlier project, but via a function called i2c_master_16bit_read_reg in obsoleted library module_i2c_master (which is so hard to understand). The newer lib_i2c does not have that functionality, so I have to do it by extending the XC interface function there. I am using lib_i2c 5.0: XMOS I2C Library (XC is sadly also obsoleted for lib_xcore and C by XMOS, but I still have an older xTIMEcomposer system up and running)
I guess that the next chapters of [ref1], "Random read" and "Sequential read" would show rather explicitly that the stop condition P is indeed only written when "all is done".
In other words, "up to the stop condition" does mean the same as "except stop condition".
I will correct here after I have implemented the code, should my own answer not be correct.

Print NASM program on Windows 7 SP1 64-bit excluding DOSBOX, excluding C, and "possibly" Excluding Windows API calls

I've been filling myself up with notes trying to successfully create my first program on Windows 7 with NASM, but with a few self imposed stipulations (until I'm ready to move forward). In creating this first program, however, I have a ton of questions.
.
The stipulations for now are that:
I'm running Window 7 SP1 - 64-bit
I do not wish to use DOSBox so Interrupts 0x21-24 are likely not applicable
I do not wish to rely on C so this is all NASM
I would really like to avoid downloading Visual Studio or associated WDK tools if I can (this depends on whether or not I NEED to interact with the Windows API and relates to Question 2 below)
I've downloaded and installed MinGW
I'm writing my code in Notepad++ and saving as *.asm
I am linking using "ld" for now, but from what I've read, most seem to recommend "GoLink" (and Alink hasn't been updated in years?). I'll probably migrate to GoLink after I've assured myself that "ld" may be too limiting
I want to know if printing is possible without the use of the Windows API or C because of the code below?
.
The only code example that has worked for me in some capacity can be found here.
nasm is not executing file in Windows 8
.
;FILE: main.asm
[section] .text
global _main
_main:
mov eax, 6
ret ; returns eax (exits)
Linked:
c:\Users\James\Desktop>nasm -fwin32 main.asm
c:\Users\James\Desktop>ld -e _main main.obj -o main.exe
c:\Users\James\Desktop>main.exe
c:\Users\James\Desktop>echo %errorlevel%
6
.
My questions (a ton):
The fact that in the code above "ret" by itself gives output, although it just returns whatever is in EAX, is there a way to use it (or another directive outside of the Windows API) to return the contents of a variable (hopefully a string variable)? I tried to use ret with DOS calls, but as noted above, that definitely doesn't work because I'm on a 64-bit system.
In case I absolutely must use the Windows API, is the only way to interact with it by using the WDK tools? Is there some other way because that last time I downloaded Visual Studio and associated WDK tools it took up a ton of memory and massively slowed down my computer. Is there another way to make programs give output or print to the screen either by using internal commands or some other method to use API calls? One thread I admittedly skimmed (amidst 40 more tabs I have open) mentions "Russinovich's Windows Internals" but not a direct answer. At current every time I use code with the extern commands "ld" tells me that the references to commands like WinMain/WinMain#16 are undefined. In the same vein is there a table I can consult containing accurate calls to the API (i.e. _ExitProcess#4 vs. ExitProcess). I found this link to what think may be the NT API but I'm not sure it applies given my stipulations, but in reality, I'm just kind of confused:
http://j00ru.vexillium.org/ntapi/
In bits of code I've encountered I've seen directives for [Bit 16], [Bit 32], and [Bit 64]. [Bit 16] is likely ignorable, but I'm confused by the [Bit 32] and [Bit 64] for the following reasons which may not even be related: Via the code above I'm using the command, "nasm -fwin32 main.asm", then I'm linking it successfully and going on to receive output. For some reason - though I have not read the full "ld" documentation yet - when I use the command "nasm -fwin64 main.asm" and link it in the same way I receive an error saying "main.obj: File not recognized: File format not recognized". I don't understand why differentiating between 32 and 64 while I'm on a native 64-bit machine causes an error although this probably is just unique to ld.
.
In the meantime I'll be reading this question and will post an update it if helps: Executable isn't compatible with 64 bits processor
I can't answer some parts in great detail, so I expect somebody either putting up better answer, or feel free to edit this one.
you are linking against default clib, so your _main is called after Clib is initialized, the ret with value in eax is like return 6; in C++. Then Clib correctly destructs everything and calls windows exit process with exit code 6. You can return only int from _main, and I'm not even sure if full int is propagated to exit process call, or only 8 bit value is used. So you can return single char in ASCII encoding, if you treat that number as char.
You must call Windows API, if you want to display something in console/window, or write something into file, ie. do any output (and of course also for input). There's no peripheral available to win32/64 executable directly, like in DOS CGA/EGA/VGA text modes accessible trough int 10h or video ram at B800:0000. Any try to access some I/O peripheral directly should result into access violation. Only Win API should be legal for user-level application code.
How much of WDK you need I have no idea, haven't developed anything for windows for years. I think it's even possible to create executable without WDK, which would provide correct externs and dependencies on kernel32.dll and similar, but the amount of effort is way beyond simply using proper parts of WDK or clib from MinGW.
I think your linker is set to default to 32b executable, you have to figure out what kind of object format is produced by nasm for -fwin64 and how link that one with ld.
Why the difference. The 64b OS can run 32b binaries. But you can't mix 32/64 in single executable so easily (if at all). So you are either producing 32b or 64b binary, and you have to adjust everything to it (asm instructions used, directives and options, and WinAPI calls).

ARM Cortex-M3 Startup Code

I'm trying to understand how the initialization code works that ships with Keil (realview v4) for the STM32 microcontrollers. Specifically, I'm trying to understand how the stack is initialized.
In the documentation on ARM's website it mentions that one of the routines in startup_xxx.s, __user_initial_stack_heap, should not use more than 88 bytes of stack. Do you know where that limitation is coming from?
It seems that when the reset handler calls System_Init it is executing a couple functions in a C environment which I believe means it is using some form of a temporary stack (it allocates a few automatic variables). However, all of those stack'd items should be out of scope once it returns back and then calls __main which is where __user_initial_stack_heap is called from.
So why is there this requirement for __user_initial_stack_heap to not use more than 88 bytes? Does the rest of __main use a ton of stack or something?
Any explanation of the cortex-m3 stack architecture as it relates to the startup sequence would be fantastic.
You will see from the __user_initial_stackheap() documentation, that the function is for legacy support and that it is superseded by __user_setup_stackheap(); the documentation for the latter provides a clue ragarding your question:
Unlike __user_initial_stackheap(), __user_setup_stackheap() works with systems where the application starts with a value of sp (r13) that is already correct, for example, Cortex-M3
[..]
Using __user_setup_stackheap() rather than __user_initial_stackheap() improves code size because there is no requirement for a temporary stack.
On Cortex-M the sp is initialised on reset by the hardware from a value stored in the vector table, on older ARM7 and ARM9 devices this is not the case and it is necessary to set the stack-pointer in software. The start-up code needs a small stack for use before the user defined stack is applied - this may be the case for example if the user stack were in external memory and could not be used until the memory controller were initialised. The 88 byte restriction is imposed simply because this temporary stack is sized to be as small as possible since it is probably unused after start-up.
In your case in STM32 (a Cortex-M device), it is likely that there is in fact no such restriction, but you should perhaps update your start-up code to use the newer function to be certain. That said, given the required behaviour of this function and the fact that its results are returned in registers, I would suggest that 88 bytes would be rather extravagant if you were to need that much! Moreover, you only need to reimplement it if you are using scatter loading file as described.

Why don't compilers generate microinstructions rather than assembly code?

I would like to know why, in the real world, compilers produce Assembly code, rather than microinstructions.
If you're already bound to one architecture, why not go one step further and free the processor from having to turn assembly-code into microinstructions at Runtime?
I think perhaps there's a implementation bottleneck somewhere but I haven't found anything on Google.
EDIT by microinstructions I mean: if you assembly instruction is ADD(R1,R2), the microinstructions would be. Load R1 to the ALU, load R2 to the ALU, execute the operation, load the results back onto R1. Another way to see this is to equate one microinstruction to one clock-cycle.
I was under the impression that microinstruction was the 'official' name. Apparently there's some mileage variation here.
FA
Compilers don't produce micro-instructions because processors don't execute micro-instructions. They are an implementation detail of the chip, not something exposed outside the chip. There's no way to provide micro-instructions to a chip.
Because an x86 CPU doesn't execute micro operations, it executes opcodes. You can not create a binary image that contains micro operations since there is no way to encode them in a way that the CPU understands.
What you are suggesting is basically a new RISC-style instruction set for x86 CPUs. The reason that isn't happening is because it would break compatibility with the vast amount of applications and operating systems written for the x86 instruction set.
The answer is quite easy.
(Some) compilers do indeed generate code sequences like load r1, load r2, add r2 to r1. But this are precisely the machine code instructions (that you call microcode). These instructions are the one and only interface between the outer world and the innards of a processor.
(Other compilers generate just C and let a C backend like gcc care about the dirty details.)

Arbitrary JVM Behaviour

Imagine a setup of 6-7 servers all identical with identical
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-36.b18.fc11-i386)
OpenJDK Server VM (build 14.0-b16, mixed mode)
each running a program (memory and CPU intensive) for hours even days, completing successfully many times (getting statistical data that sort of stuff), but on 1 machine, no matter the parameters or how I've complied (javac -source 1.5 *.java/javac -O -source 1.5, javac **, imagine any combination yourself :))
or ran it (-Xms200000k or just java blabla.java you get the idea)
I eventually get, not at a specific moment or iteration "java.lang.ArrayIndexOutOfBoundsException: -1341472392" ?! 1st things first the program would never work with such a large value, let alone negative. (the line of code is a contains call of an ArrayList with integers) (that number is different every time as i've noticed)
Note also that i can "resume" a crashed test and i can on this machine, it does few more tests, crashes again.
Not much of a bother, I dont own the boxes and all the others work, but this is quite strange for me.
Out of personal interest how this happens on the not-very-rosy-anyway OpenJDK?
Sounds strange. Is the variable used for indexing the array a long, or is it ever influenced by a long-variable? In that case the access to the variable is not guaranteed to be atomic:
From http://java.sun.com/docs/books/jls/second_edition/html/memory.doc.html#28733
If a double or long variable is not declared volatile, then for the purposes of load, store, read, and write actions they are treated as if they were two variables of 32 bits each: wherever the rules require one of these actions, two such actions are performed, one for each 32-bit half. The manner in which the 64 bits of a double or long variable are encoded into two 32-bit quantities is implementation-dependent. The load, store, read, and write actions on volatile variables are atomic, even if the type of the variable is double or long.
You could try to declare the index-variable as volatile or use some other means of synchronization (for instance by using AtomicLong or something similar) if you suspect that this could be the issue.
If this is a single-threaded Java application, I'd suspect a hardware fault. Of course this could be hard to prove, unless you've got someway to run hardware (e.g. memory) diagnostics.