I write a c code with inline assembly code to read msr, but failed - msr

I use following code to read msr, but it crashed when running. I don't know why.
#include <stdio.h>
#include <stdlib.h>
int main()
{
register long ecx asm("%ecx");
register long eax asm("%eax");
register long edx asm("%edx");
asm("mov %1, %0":"=r"(ecx):"i"(0x1B0));
asm("rdmsr");
/*
asm("xor %1, %0":"+r"(eax):"r"(eax));
asm("xor %1, %0":"+r"(edx):"r"(edx));
asm("mov %1, %0":"=r"(eax):"i"(0x01));
printf("%ld %ld %ld",ecx,eax,edx);
*/
}

You can use the existing WinRing0.sys (32-bit) and WinRing0x64.sys (64-bit) drivers to allow MSR access from user space. You can find a copy here with an open and permissive license (the "WinRing0 license").
This ultimately offers you IOCTLs to read and write msrs from userspace. You can find some C# code that uses it here but there are plenty of other users of WinRing0 so there should be no shortage of examples.
You can also write your own driver, or compile one of the several other available ones that offer similar access, but the advantage of WinRin0 is that it is already signed, a process no longer really available to individuals and certainly not free.

Related

RISC-V inline assembly using memory not behaving correctly

This system call code is not working at all. The compiler is optimizing things out and generally behaving strangely:
template <typename... Args>
inline void print(Args&&... args)
{
char buffer[1024];
auto res = strf::to(buffer) (std::forward<Args> (args)...);
const size_t size = res.ptr - buffer;
register const char* a0 asm("a0") = buffer;
register size_t a1 asm("a1") = size;
register long syscall_id asm("a7") = ECALL_WRITE;
register long a0_out asm("a0");
asm volatile ("ecall" : "=r"(a0_out)
: "m"(*(const char(*)[size]) a0), "r"(a1), "r"(syscall_id) : "memory");
}
This is a custom system call that takes a buffer and a length as arguments.
If I write this using global assembly it works as expected, but program code has generally been extraordinarily good if I write the wrappers inline.
A function that calls the print function with a constant string produces invalid machine code:
0000000000120f54 <start>:
start():
120f54: fa1ff06f j 120ef4 <public_donothing-0x5c>
-->
120ef4: 747367b7 lui a5,0x74736
120ef8: c0010113 addi sp,sp,-1024
120efc: 55478793 addi a5,a5,1364 # 74736554 <add_work+0x74615310>
120f00: 00f12023 sw a5,0(sp)
120f04: 00a00793 li a5,10
120f08: 00f10223 sb a5,4(sp)
120f0c: 000102a3 sb zero,5(sp)
120f10: 00500593 li a1,5
120f14: 06600893 li a7,102
120f18: 00000073 ecall
120f1c: 40010113 addi sp,sp,1024
120f20: 00008067 ret
It's not loading a0 with the buffer at sp.
What am I doing wrong?
It's not loading a0 with the buffer at sp.
Because you didn't ask for a pointer as an "r" input in a register. The one and only guaranteed/supported behaviour of T foo asm("a0") is to make an "r" constraint (including +r or =r) pick that register.
But you used "m" to let it pick an addressing mode for that buffer, not necessarily 0(a0), so it probably picked an SP-relative mode. If you add asm comments inside the template like "ecall # 0 = %0 1 = %1 2 = %2" you can look at the compiler's asm output and see what it picked. (With clang, use -no-integrated-as so asm comments in the template come through in the -S output.)
Wrapping a system call does need the pointer in a specific register, i.e. using "r" or +"r"
asm volatile ("ecall # 0=%0 1=%1 2=%2 3=%3 4=%4"
: "=r"(a0_out)
: "r"(a0), "r"(a1), "r"(syscall_id), "m"(*(const char(*)[size]) a0)
: // "memory" unneeded; the "m" input tells the compiler which memory is read
);
That "m" input can be used instead of the "memory" clobber, not instead of an "r" pointer input. (For write specifically, because it only reads that one area of pointed-to memory and has no other side-effects on memory user-space can see, only on kernel write write buffers and file-descriptor positions which aren't C objects this program can access directly. For a read call, you'd need the memory to be an output operand.)
With optimization disabled, compilers do typically pick another register as the base for the "m" input (e.g. 0(a5) for GCC), but with optimization enabled GCC picks 0(a0) so it doesn't cost extra instructions. Clang still picks 0(a2), wasting an instruction to set up that pointer, even though the "=r"(a0_out) is not early-clobber. (Godbolt, with a very cut-down version of the function that doesn't call strf::to, whatever that is, just copies a byte into the buffer.)
Interestingly, with optimization enabled for my cut-down stand-alone version of the function without fixing the bug, GCC and clang do happen to put a pointer to buffer into a0, picking 0(a0) as the template expansion for that operand (see the Godbolt link above). This seems to be a missed optimization vs. using 16(sp); I don't see why they'd need the buffer address in a register at all.
But without optimization, GCC picks ecall # 0 = a0 1 = 0(a5) 2 = a1. (In my simplified version of the function, it sets a5 with mv a5,a0, so it did actually have the address in a0 as well. So it's a good thing you had more code in your function to make it not happen to work by accident, so you could find the bug in your code.)

Accepts and displays one character of 1 through 9 using assembly code in c

Can someone please explain me each line of this assembly code?
void main(void){
_asm{
mov ah,8 ;read key no echo
int 21h
cmp al,‘0’ ;filter key code
jb big
cmp al,‘9’
ja big
mov dl,al ;echo 0 – 9
mov ah,2
int 21h
big:
}
}
PS: I am new to assembly in c/c++.
Per the docs, the return value is in al, not ah. That's why it compares to al.
Edit: Adding more detail:
Looking at this code:
mov ah,8 ;read key no echo
int 21h
Think of this like a function call. Now normally a function call in asm looks like call myroutine. But DOS used interrupts to allow you to call various operating system functions (read a key from the keyboard, read data from a file, etc).
So, executing the int 21h instruction called the operating system. But how was the operating system supposed to know which OS function you wanted? Typically by putting a value in ah. If you search, you can find a number of resources that show listings of all the int 21h functions (like this). The numbers on the right are the values you put in ah.
So, mov ah,8 is preparing to call the "Wait for console input without echo" function. mov ah,2 is "Display output." Other registers are used to pass various parameters to the function being called. You need to read the description of the specific interrupt to understand what goes where.
Note that NONE of this is related to "writing inline asm in C." This is just how to call OS function from C code running under DOS. If you aren't running under DOS, int 21 won't work.

Inline ASM - Use 16 or 32 bit C Variable (GCC ARM, Thumb Mode)

I'm currently using the following inline ASM for the Cortex-M3 to branch to a specific address in flash.
__asm("LDR R0, =0x8000"); // Load the branch address
__asm("LDR R1, [R0]"); // Get the branch address
__asm("ORR R1, #1"); // Make sure the Thumb State bit is set.
__asm("BX R1"); // Branch execution
However, I want to replace the hard-coded value 0x8014 with a C variable that will be computed based on some other conditions.
The largest possible value this variable can take is 0x20000, so I'd planned on using a uint32_t to store it.
The compiler being used is arm-none-eabi-gcc v4.9.3
I attempted to modify my inline ASM as follows:
uint32_t destination_address = 0x8000;
__asm( "LDR R0, =%[dest]" : : [dest]"r"(destination_address) );
However, this generates the compiler error:
undefined reference to `r3'
I am fairly new to inline ASM in general. I've tried researching this issue for two days or so, but I've been confused by conflicting answers owing to the diversity of compilers out there and the fact I am using Thumb instructions for the Cortex-M3.
I think my problem is that I need to find the correct constraint for the variable destination_address (range 0x0 - 0x20000), but I'm not sure.
why are you using inline assembly?
extern void HOP ( unsigned int );
...
unsigned int some_address;
..
some_address = some_math;
HOP(some_address);
and a few lines of real asm which you can use the c compiler if you really feel you have to to make an object from to link.
.globl HOP
HOP:
bx r0
the added benefit is it is a branch link basically if you want to be.
the compiler has already computed the address it sounds like so you "simply" need to get it into a register and bx it. Inline assembly is extremely compiler specific so you need to start by talking about what assembler, version, etc you are using.
another thing you can do is if you have this
unsigned int some_address;
..
some_address = some_math;
you can use this assembly somewhere in the project.
ldr r0,=some_address;
ldr r0,[r0]
bx r0
and the linker will resolve the address to the C variable. so can use real assembler or inline for something like that. (if the inline doesnt support something like mov %0,some_address; bx %0 and do the work for you)

Input setting using Registers

I have a simple c program for printing n Fibonacci numbers and I would like to compile it to ELF object file. Instead of setting the number of fibonacci numbers (n) directly in my c code, I would like to set them in the registers since I am simulating it for an ARM processor.How can I do that?
Here is the code snippet
#include <stdio.h>
#include <stdlib.h>
#define ITERATIONS 3
static float fib(float i) {
return (i>1) ? fib(i-1) + fib(i-2) : i;
}
int main(int argc, char **argv) {
float i;
printf("starting...\n");
for(i=0; i<ITERATIONS; i++) {
printf("fib(%f) = %f\n", i, fib(i));
}
printf("finishing...\n");
return 0;
}
I would like to set the ITERATIONS counter in my Registers rather than in the code.
Thanks in advance
The register keyword can be used to suggest to the compiler that it uses a registers for the iterator and the number of iterations:
register float i;
register int numIterations = ITERATIONS;
but that will not help much. First of all, the compiler may or may not use your suggestion. Next, values will still need to be placed on the stack for the call to fib(), and, finally, depending on what functions you call within your loop, code in the procedure are calling could save your register contents in the stack frame at procedure entry, and restore them as part of the code implementing the procedure return.
If you really need to make every instruction count, then you will need to write machine code (using an assembly language). That way, you have direct control over your register usage. Assembly language programming is not for the faint of heart. Assembly language development is several times slower than using higher level languages, your risk of inserting bugs is greater, and they are much more difficult to track down. High level languages were developed for a reason, and the C language was developed to help write Unix. The minicomputers that ran the first Unix systems were extremely slow, but the reason C was used instead of assembly was that even then, it was more important to have code that took less time to code, had fewer bugs, and was easier to debug than assembler.
If you want to try this, here are the answers to a previous question on stackoverflow about resources for ARM programming that might be helpful.
One tactic you might take is to isolate your performance-critical code into a procedure, write the procedure in C, the capture the generated assembly language representation. Then rewrite the assembler to be more efficient. Test thoroughly, and get at least one other set of eyeballs to look the resulting code over.
Good Luck!
Make ITERATIONS a variable rather than a literal constant, then you can set its value directly in your debugger/simulator's watch or locals window just before the loop executes.
Alternatively as it appears you have stdio support, why not just accept the value via console input?

How do I detect programmatically in which ring (-1, 0, 1, 2, 3) I am running?

How do I detect programmatically in which ring (-1, 0, 1, 2, 3) I am running?
The easiest way is, to just run the (x86) command and catch the corresponding error.
E.g. (SEH, Windows, kernel mode)
bool ring_lower_0 = false;
__try
{
__asm { <cmd> };
ring_lower_0 = true;
}
__except( GetExceptionCode() == EXCEPTION_PRIV_INSTRUCTION )
{
ring_lower_0 = false;
}
Notes:
cmd, is an assembler command. See the Intel Architecture Reference Manuals for a list of commands and their respective Ring levels.
Linux has a slightly different concept.
But remember that VMs residing on a lower level may mask the result by emulating the call.
(NB: The Job of the VM is to translate the invalid instruction into an meaningful call)
If you really want to check if your a virtualized and want to stop execution because of this, you should read what has been written about 'Red pill'.
Unless you're a device driver, you'll always be running in Ring 3 (for systems that have "rings", per se).
Normally i would write that you should read about "protected mode programming". There is an article about how to intertact with ring 0 using windows XP SP2. Note that it will change for others windows versions and for sure others operational systems.
http://www.codeproject.com/KB/threads/MinimalisticRingZero.aspx
If you just want to detect if you are running inside of a virtual machine, to avoid that people debug your application, for example, you can check here:
http://www.codeproject.com/KB/system/VmDetect.aspx
The ring are the lower two bits of the code segment selector (CS) register in the 64-bit x86 architecture.
You can extract it like this: (test it online)
#include <stdint.h>
#include <stdio.h>
int main(void) {
uint64_t rcs = 0;
int ring;
asm ("mov %%cs, %0" : "=r" (rcs));
ring = (int) (rcs & 3);
printf("Hello, world. This program is running on ring %d!\n", ring);
return 0;
}
The problem with that code is that it will always return 3, because your code will always run on ring 3. To see that changing you need to create a Linux Kernel Module or Windows Device Driver.
I have a github repo where i was playing around with this in Ubuntu 18.
In the demo 1 I have the exact same code as above. You can run it:
cd demo-1
. c-build.sh
In the demo-2 I have a Linux Kernel module that will execute the same code, but inside the kernel in ring 1:
link to code:
...
#define EXAMPLE_MSG "Hello, World. This is executed in ring: _ \n"
...
uint64_t rcs = 0;
asm ("mov %%cs, %0" : "=r" (rcs));
msg_buffer[MSG_BUFFER_LEN - 3] = (int) (rcs & 3) + '0';
You cannot do printf to the console from the kernel driver. Instead you need to open the device as a file and read from it:
cat /dev/lkm_example
I would recommend you to follow the readme step by step, so you can see it running :)