how does JVM know the current method through PC - jvm

In 8086, we can know the next instruction to execute through CS:PC, where PC is the offset in the current code segement(CS).
However, I'm not sure how JVM knows which instruction to execute.
PC register in JVM only indicates the offset in the current method, but how does it know which method it's in?
Thanks!
I notice the codes for each method start from 0, like thisenter image description here
So, if there are many methods in a class, how can I know which method the current frame is in?
I'm new to Java, so my question may be silly and my explaination is wrong. Thanks for bear with me!

OK, so I assume that you are asking about the JVM in relation to the Java Virtual Machine Specification (JVMS). The most directly relevant part of the spec says this:
2.5.1. The pc Register
The Java Virtual Machine can support many threads of execution at once
(JLS §17). Each Java Virtual Machine thread has its own pc (program
counter) register. At any point, each Java Virtual Machine thread is
executing the code of a single method, namely the current method
(§2.6) for that thread. If that method is not native, the pc register
contains the address of the Java Virtual Machine instruction currently
being executed. If the method currently being executed by the thread
is native, the value of the Java Virtual Machine's pc register is
undefined. The Java Virtual Machine's pc register is wide enough to
hold a returnAddress or a native pointer on the specific platform.
Note the emphasized sentence. It says the address of the instruction being executed. It does not say the instruction's offset from the start of the method's code segment ... as you seem to be saying.
Furthermore, there is no obvious reference to a register holding a pointer to the current method. And the section describing the call stack doesn't mention any pointer to the current method in the stack frame.
Having said all of that, the JVM specification is really a behavioral specification that JVM implementations need to conform to. It doesn't directly mandate that the specified behavior must be implemented in any particular way.
So while it seems to state that the abstract JVM has a register called a PC that contains an "address", it doesn't state categorically what an address means in this context. For instance, it does not preclude the possibility that the interpreter represents the "address" in the PC as a tuple consisting of a method address and a bytecode offset within the method. Or something else. All that really matters is that the JVM implementation can somehow use the PC to get the bytecode instruction to be executed.

Related

Does functions in API make system calls themselves or system calls made by API are aided by system-call interface in the runtime support system?

I was going through the Dinosaur book by Galvin where I faced the difficulty as asked in the question.
Typically application developers design programs according to an application programming interface (API). The API specifies a set of functions that are available to an application programmer, including the parameters that are passed to each function and the return values the programmer can expect.
The text adds that:
Behind the scenes the functions that make up an API typically invoke the actual system calls on behalf of the application programmer. For example, the Win32 function CreateProcess() (which unsurprisingly is used to create a new process) actually calls the NTCreateProcess() system call in the Windows kernel.
From the above two points I came to know that: Programmers using the API, make the function calls to the API corresponding to the system call which they want to make. The concerning function in the API then actually makes the system call.
Next what the text says confuses me a bit:
The run-time support system (a set of functions built into libraries included with a compiler) for most programming languages provides a system-call interface that serves as the link to system calls made available by the operating system. The system-call interface intercepts function calls in the API and invokes the necessary system calls within the operating system. Typically, a number is associated with each system call, and the system-call interface maintains a table indexed according to these numbers. The system call interface then invokes the intended system call in the operating-system kernel and returns the status of the system call and any return values.
The above excerpt makes me feel that the functions in the API does not make the system calls directly. There are probably function built into the system-call interface of the runtime support system, which are waiting for an event of system call from the function in the API.
The above is a diagram in the text explaining the working of the system call interface.
The text later explains the working of a system call in the C standard library as follows:
which is quite clear.
I don't totally understand the terminology of the excerpts you shared. Some terminology is also wrong like in the blue image at the bottom. It says the standard C library provides system call interfaces while it doesn't. The standard C library is just a standard. It is a convention. It just says that, if you write a certain code, then the effect of that code when it is ran should be according to the convention. It also says that the C library intercepts printf() calls while it doesn't. This is general terminology which is confusing at best.
The C library doesn't intercept calls. As an example, on Linux, the open source implementation of the C standard library is glibc. You can browse it's source code here: https://elixir.bootlin.com/glibc/latest/source. When you write C/C++ code, you use standard functions which are specified in the C/C++ convention.
When you write code, this code will be compiled to assembly and then to machine code. Assembly is also a higher level representation of machine code. It is just closer to the actual code as it is easier to translate to it then C/C++. The easiest case to understand is when you compile code statically. When you compile code statically, all code is included in your executable. For example, if you write
#include <stdio.h>
int main() {
printf("Hello, World!");
return 0;
}
the printf() function is called in stdio.h which is a header provided by gcc written specifically for one OS or a set of UNIX-like OSes. This header provides prototypes which are defined in other .c files provided by glibc. These .c files provide the actual implementation of printf(). The printf() function will make a system call which rely on the presence of an OS like Linux to run. When you compile statically, the code is all included up to the system call. You can see my answer here: Who sets the RIP register when you call the clone syscall?. It specifically explains how system calls are made.
In the end you'll have something like assembly code pushing some arguments into some conventionnal registers then the actual syscall instruction which jumps to an MSR. I don't totally understand the mechanism behind printf() but it will jump to the Linux kernel's implementation of the write system call which will write to the console and return.
I think what confuses you is that the "runtime-support system" is probably referring to higher level languages which are not compiled to machine code directly like Python or Java. Java has a virtual machine which translates the bytecode produced by compilation to machine code during runtime using a virtual machine. It can be confusing to not make this distinction when talking about different languages. Maybe your book is lacking examples.

80C188 relocation register addressing mystery

I'm reverse-engineering an embedded system using the 80C188 and the way the relocation register (RELREG) is used mystifies me.
One of the first steps at initialization is to move the processor control registers by writing a new value to the RELREG.
By default, the RELREG has the value 20FFH which places the register block at the top of I/O space at address 0FFxxH. The example given in the Intel app note describes writing the value of 1100H to the RELREG which then places the register block in memory space at 100xxH. Clear enough.
However, in the system I am examining, the value is written is 1804H which I would expect to place the register block in memory space at the address 804xxH, yet the following writes to initialize the registers to operating values are all to 0F4xxH in memory space. The processor is operating fine in the system so this is not a programming bug.
I am absolutely sure about these addresses as not only do I see them in the code itself on the EPROM but also in logic analyzer traces of code execution at startup.
Does anyone have an explanation for this?
The address in the code is the address offset and must be combined with the appropriate segment register to map to a physical address. The mapping is performed by:
(seg * 16) + offset
So if the seg were 7100h, then the seg:off address 7100:f400 refers to the physical address 80400h.

How do I execute system calls within a Wind River DKM?

I am trying to make a DKM (Downloadable Kernel Module),"my_dkm.o", that I can load into a custom VxWorks kernel in run-time. I was able to make a simple one (it prints "hello world") but I want my DKM to invoke system calls that already exist within the kernel that is running.
From the shell, I can do -> syscallShow <my_group_#>,1 to give a list of the system calls I want to run. I can also invoke these system calls from the shell, but I don't know how to refer to them when developing my DKM.
Also, the Wind River Workbench help documentation only discusses invoking system calls from RTPs, which doesn't help because I am executing within kernel-space.
Thanks
In Short: You Don't
System call are exclusively to be used by RTPs to make a call to a function that resides in the kernel. The system call itself does a bit of housekeeping and then invokes the underlying kernel routine.
In the context of a DKM, since you are already in the kernel space, you simply need to invoke the same underlying kernel function as the system call.

Error 0x800706F7 "The stub received bad data" on Windows XP SP3

In my VB6 application I make several calls to a COM server my team created from a Ada project (using GNATCOM). There are basically 2 methods available on the COM server. Their prototypes in VB are:
Sub PutParam(Param As Parameter_Type, Value)
Function GetParam(Param As Parameter_Type)
where Parameter_Type is an enumerated type which distinguishes the many parameters I can put to/get from the COM server and 'Value' is a Variant type variable. PutParam() receives a variant and GetParam() returns a variant. (I don't really know why in the VB6 Object Browser there's no reference to the Variant type on the COM server interface...).
The product of this project has been used continuously this way for years without any problems in this interface on computers with Windows XP with SP2. On computers with WinXP SP3 we get the error 0x800706F7 "The stub received bad data" when trying to put parameters with the 'Long' type.
Does anybody have any clue on what could be causing this? The COM server is still being built in a system with SP2. Should make any difference building it on a system with SP3? (like when we build for X64 in X64 systems).
One of the calls that are causing the problem is the following (changed some var names):
Dim StructData As StructData_Type
StructData.FirstLong = 1234567
StructData.SecondLong = 8901234
StructData.Status = True
ComServer.PutParam(StructDataParamType, StructData)
Where the definition of StructData_Type is:
Type StructData_Type
FirstLong As Long
SecondLong As Long
Status As Boolean
End Type
(the following has been added after the question was first posted)
The definition of the primitive calls on the interface of the COM server in IDL are presented below:
// Service to receive data
HRESULT PutParam([in] Parameter_Type Param, [in] VARIANT *Value);
//Service to send requested data
HRESULT GetParam([in] Parameter_Type Param, [out, retval] VARIANT *Value);
The definition of the structure I'm trying to pass is:
struct StructData_Type
{
int FirstLong;
int SecondLong;
VARIANT_BOOL Status;
} StructData_Type;
I found it strange that this definition here is using 'int' as the type of FirstLong and SeconLong and when I check the VB6 object explorer they are typed 'Long'. Btw, when I do extract the IDL from the COM server (using a specific utility) those parameters are defined as Long.
Update:
I have tested the same code with a version of my COM server compiled for Windows 7 (different version of GNAT, same GNATCOM version) and it works! I don't really know what happened here. I'll keep trying to identify the problem on WinXP SP3 but It is good to know that it works on Win7. If you have a similar problem it may be good to try to migrate to Win7.
I'll focus on explaining what the error means, there are too few hints in the question to provide a simple answer.
A "stub" is used in COM when you make calls across an execution boundary. It wasn't stated explicitly in the question but your Ada program is probably an EXE and implements an out-of-process COM server. Crossing the boundary between processes in Windows is difficult due to their strong isolation. This is done in Windows by RPC, Remote Procedure Call, a protocol for making calls across such boundaries, a network being the typical case.
To make an RPC call, the arguments of a function must be serialized into a network packet. COM doesn't know how to do this because it doesn't know enough about the actual arguments to a function, it needs the help of a proxy. A piece of code that does know what the argument types are. On the receiving end is a very similar piece of code that does the exact opposite of what the proxy does. It deserializes the arguments and makes the internal call. This is the stub.
One way this can fail is when the stub receives a network packet and it contains more or less data than required for the function argument values. Clearly it won't know what to do with that packet, there is no sensible way to turn that into a StructData_Type value, and it will fail with "The stub received bad data" error.
So the very first explanation for this error to consider is a DLL Hell problem. A mismatch between the proxy and the stub. If this app has been stable for a long time then this is not a happy explanation.
There's another aspect about your code snippet that is likely to induce this problem. Structures are very troublesome beasts in software, their members are aligned to their natural storage boundary and the alignment rules are subject to interpretation by the respective compilers. This can certainly be the case for the structure you quoted. It needs 10 bytes to store the fields, 4 + 4 + 2 and they align naturally. But the structure is actually 12 bytes long. Two bytes are padded at the end to ensure that the ints still align when the structure is stored in an array. It also makes COM's job very difficult, since COM hides implementation detail and structure alignment is a massive detail. It needs help to copy a structure, the job of the IRecordInfo interface. The stub will also fail when it cannot find an implementation of that interface.
I'll talk a bit about the proxy, stub and IRecordInfo. There are two basic ways a proxy/stub pair are generated. One way is by describing the interfaces in a language called IDL, Interface Description Language, and compile that with MIDL. That compiler is capable of auto-generating the proxy/stub code, since it knows the function argument types. You'll get a DLL that needs to be registered on both the client and the server. Your server might be using that, I don't know.
The second way is what VB6 uses, it takes advantage of a universal proxy that's built into Windows. Called FactoryBuffer, its CLSID is {00000320-0000-0000-C000-000000000046}. It works by using a type library. A type library is a machine readable description of the functions in a COM server, good enough for FactoryBuffer to figure out how to serialize the function arguments. This type library is also the one that provides the info that IRecordInfo needs to figure out how the members of a structure are aligned. I don't know how it is done on the server side, never heard of GNATCOM before.
So a strong explanation for this problem is that you are having a problem with the type library. Especially tricky in VB6 because you cannot directly control the guids that it uses. It likes to generate new ones when you make trivial changes, the only way to avoid it is by selecting the binary compatibility option. Which uses an old copy of the type library and tries to keep the new one as compatible as possible. If you don't have that option turned on then do expect trouble, especially for the guid of the structure. Kaboom if it changed and the other end is still using the old guid.
Just some hints on where to start looking. Do not assume it is a problem caused by SP3, this COM infrastructure hasn't changed for a very long time. But certainly expect this kind of problem due to a new operating system version being installed and having to re-register everything. SysInternals' ProcMon is a good utility to see the programs use the registry to find the proxy, stub and type library. And you'd certainly get help from a COM Spy kind of utility, albeit that they are very hard to find these days.
If it suddenly stopped working happily on XP, the first culprit I'd look for is type mismatches. It is possible that "long" on such systems is now 64-bits, while your Ada COM code (and/or perhaps your C ints) are exepecting 32-bits. With a traditionally-compiled system this would have been checked for you by your compiler, but the extra indirection you have with COM makes that difficult.
The bit you wrote in there about "when we compile for 64-bit systems" makes me particularly leery. 64-bit compiles may change the size of many C types, you know.
This Related Post suggests you need padding in your struct, as marshalling code may expect more data than you actually send (which is a bug, of course). Your struct contains 9 bytes (assuming 4 bytes for each of the ints/longs and one for the boolean). Try to add padding so that your struct contains a multiple of 4 bytes (or, failing that, multiple of 8, as the post isn't clear on the expected size)
I am also suggesting that the problem is due to a padding issue in your structure. I don't know whether you can control this using a #pragma, but it might be worth looking at your documentation.
I think it would be a good idea to try and patch your struct so that the resulting type library struct is a multiple of four (or eight). Your Status member takes up 2 bytes, so maybe you should insert a dummy value of the same type either before or after Status - which should bring it up to 12 bytes (if packing to eight bytes, this would have to be three dummy variables).

STM32 programming tips and questions

I could not find any good document on internet about STM32 programming. STM's own documents do not explain anything more than register functions. I will greatly appreciate if anyone can explain my following questions?
I noticed that in all example programs that STM provides, local variables for main() are always defined outside of the main() function (with occasional use of static keyword). Is there any reason for that? Should I follow a similar practice? Should I avoid using local variables inside the main?
I have a gloabal variable which is updated within the clock interrupt handle. I am using the same variable inside another function as a loop condition. Don't I need to access this variable using some form of atomic read operation? How can I know that a clock interrupt does not change its value in the middle of the function execution? Should I need to cancel clock interrupt everytime I need to use this variable inside a function? (However, this seems extremely ineffective to me as I use it as loop condition. I believe there should be better ways of doing it).
Keil automatically inserts a startup code which is written in assembly (i.e. startup_stm32f4xx.s). This startup code has the following import statements:
IMPORT SystemInit
IMPORT __main
.In "C", it makes sense. However, in C++ both main and system_init have different names (e.g. _int_main__void). How can this startup code can still work in C++ even without using "extern "C" " (I tried and it worked). How can the c++ linker (armcc --cpp) can associate these statements with the correct functions?
you can use local or global variables, using local in embedded systems has a risk of your stack colliding with your data. with globals you dont have that problem. but this is true no matter where you are, embedded microcontroller, desktop, etc.
I would make a copy of the global in the foreground task that uses it.
unsigned int myglobal;
void fun ( void )
{
unsigned int myg;
myg=myglobal;
and then only use myg for the rest of the function. Basically you are taking a snapshot and using the snapshot. You would want to do the same thing if you are reading a register, if you want to do multiple things based on a sample of something take one sample of it and make decisions on that one sample, otherwise the item can change between samples. If you are using one global to communicate back and forth to the interrupt handler, well I would use two variables one foreground to interrupt, the other interrupt to foreground. yes, there are times where you need to carefully manage a shared resource like that, normally it has to do with times where you need to do more than one thing, for example if you had several items that all need to change as a group before the handler can see them change then you need to disable the interrupt handler until all the items have changed. here again there is nothing special about embedded microcontrollers this is all basic stuff you would see on a desktop system with a full blown operating system.
Keil knows what they are doing if they support C++ then from a system level they have this worked out. I dont use Keil I use gcc and llvm for microcontrollers like this one.
Edit:
Here is an example of what I am talking about
https://github.com/dwelch67/stm32vld/tree/master/stm32f4d/blinker05
stm32 using timer based interrupts, the interrupt handler modifies a variable shared with the foreground task. The foreground task takes a single snapshot of the shared variable (per loop) and if need be uses the snapshot more than once in the loop rather than the shared variable which can change. This is C not C++ I understand that, and I am using gcc and llvm not Keil. (note llvm has known problems optimizing tight while loops, very old bug, dont know why they have no interest in fixing it, llvm works for this example).
Question 1: Local variables
The sample code provided by ST is not particularly efficient or elegant. It gets the job done, but sometimes there are no good reasons for the things they do.
In general, you use always want your variables to have the smallest scope possible. If you only use a variable in one function, define it inside that function. Add the "static" keyword to local variables if and only if you need them to retain their value after the function is done.
In some embedded environments, like the PIC18 architecture with the C18 compiler, local variables are much more expensive (more program space, slower execution time) than global. On the Cortex M3, that is not true, so you should feel free to use local variables. Check the assembly listing and see for yourself.
Question 2: Sharing variables between interrupts and the main loop
People have written entire chapters explaining the answers to this group of questions. Whenever you share a variable between the main loop and an interrupt, you should definitely use the volatile keywords on it. Variables of 32 or fewer bits can be accessed atomically (unless they are misaligned).
If you need to access a larger variable, or two variables at the same time from the main loop, then you will have to disable the clock interrupt while you are accessing the variables. If your interrupt does not require precise timing, this will not be a problem. When you re-enable the interrupt, it will automatically fire if it needs to.
Question 3: main function in C++
I'm not sure. You can use arm-none-eabi-nm (or whatever nm is called in your toolchain) on your object file to see what symbol name the C++ compiler assigns to main(). I would bet that C++ compilers refrain from mangling the main function for this exact reason, but I'm not sure.
STM's sample code is not an exemplar of good coding practice, it is merely intended to exemplify use of their standard peripheral library (assuming those are the examples you are talking about). In some cases it may be that variables are declared external to main() because they are accessed from an interrupt context (shared memory). There is also perhaps a possibility that it was done that way merely to allow the variables to be watched in the debugger from any context; but that is not a reason to copy the technique. My opinion of STM's example code is that it is generally pretty poor even as example code, let alone from a software engineering point of view.
In this case your clock interrupt variable is atomic so long as it is 32bit or less so long as you are not using read-modify-write semantics with multiple writers. You can safely have one writer, and multiple readers regardless. This is true for this particular platform, but not necessarily universally; the answer may be different for 8 or 16 bit systems, or for multi-core systems for example. The variable should be declared volatile in any case.
I am using C++ on STM32 with Keil, and there is no problem. I am not sure why you think that the C++ entry points are different, they are not here (Keil ARM-MDK v4.22a). The start-up code calls SystemInit() which initialises the PLL and memory timing for example, then calls __main() which performs global static initialisation then calls C++ constructors for global static objects before calling main(). If in doubt, step through the code in the debugger. It is important to note that __main() is not the main() function you write for your application, it is a wrapper with different behaviour for C and C++, but which ultimately calls your main() function.