How to find place in binary code where certain string is used - executable

I am trying to analyze a binary executable file. I want to find the location of a certain function in this file; I happen to know that this function uses a specific string literal. I have found the location of this string in the binary code; I decided to search in the file for its address, hoping to catch the instruction that refers to this string, but I couldn't find anything. I assumed that this is because the position of data within the binary code is different from the actual address of the data when the code is being executed.
To test this, I compiled and executed a test program:
#include <stdio.h>
const char* s1 = "ABCDEF\n";
int main(void) {
const char* s2 = "123456\n";
printf("%p %p", s1, s2);
}
Sure enough, the program prints "404000 404008", while the positions of the two strings in the binary code are 0x2400 and 0x2408, respectively.
How should I proceed in this situation?
I am running a Windows, 64-bit, AMD architecture. I am not opposed to downloading new software for this task, as long as it is free (I already have a decent hex editor.)

Related

Sending a text string out a serial port in C without using printf

I'm writing 8-bit PIC microcontroller C code in Microchip's MPLAB X tool using their XC8 compiler for a demo / evaluation board for a sensor IC. The PIC I'm using has multiple serial ports, so I'm using UART2 to output only sensor data. I wanted to be able to format the output data for hex or decimal so I have stdio redirected to this port. The Rx side of UART2 is unused.
UART1 is used for command and control. This port needs to be able to output ASCII messages such as "Enter the register address:". The sensor is highly configurable, so my code initializes it with default values, but the user can halt sampling, read and write registers and re-start sampling.
Since I can't use printf when writing to UART1, I had to find another way to do this. Here's an example of what I'm doing. The file is large so I'm just cutting and pasting the relevent stuff here:
char msgString[64];
switch(command)
{
case (READCMD): // Read command
{
if (SERIAL_STATE_IDLE == serialPortState)
{
strcpy (msgString, "Input Read Address Hex Value 00-18");
Display_PrintMessage();
Display_PrintChar(prompt);
serialPortState = SERIAL_STATE_READ;
}
...
void Display_PrintMessage(void)
{
msgLength=strlen(msgString);
for (i=0; i<msgLength; ++i)
{
UART1_Write(msgString[i]);
}
Display_PrintChar(newLine);
}
void Display_PrintChar(char c)
{
UART1_Write(c);
}
I've verified that the port is working by writing some characters out to it early in main and they do appear on my terminal. I think the problem is with how I'm using the strcpy function. In the MPLAB debugger it appears that msgString is still all null characters after the strcpy executes. The strlen function may see msgString as zero length, hence, nothing prints out.
I would love to hear what you think may be wrong with my code, or, please suggest another way of accomplishing what I need.
Thanks in advance for any and all responses, they are much appreciated.
CobraRGuy

RISC-V inline assembly using memory not behaving correctly

This system call code is not working at all. The compiler is optimizing things out and generally behaving strangely:
template <typename... Args>
inline void print(Args&&... args)
{
char buffer[1024];
auto res = strf::to(buffer) (std::forward<Args> (args)...);
const size_t size = res.ptr - buffer;
register const char* a0 asm("a0") = buffer;
register size_t a1 asm("a1") = size;
register long syscall_id asm("a7") = ECALL_WRITE;
register long a0_out asm("a0");
asm volatile ("ecall" : "=r"(a0_out)
: "m"(*(const char(*)[size]) a0), "r"(a1), "r"(syscall_id) : "memory");
}
This is a custom system call that takes a buffer and a length as arguments.
If I write this using global assembly it works as expected, but program code has generally been extraordinarily good if I write the wrappers inline.
A function that calls the print function with a constant string produces invalid machine code:
0000000000120f54 <start>:
start():
120f54: fa1ff06f j 120ef4 <public_donothing-0x5c>
-->
120ef4: 747367b7 lui a5,0x74736
120ef8: c0010113 addi sp,sp,-1024
120efc: 55478793 addi a5,a5,1364 # 74736554 <add_work+0x74615310>
120f00: 00f12023 sw a5,0(sp)
120f04: 00a00793 li a5,10
120f08: 00f10223 sb a5,4(sp)
120f0c: 000102a3 sb zero,5(sp)
120f10: 00500593 li a1,5
120f14: 06600893 li a7,102
120f18: 00000073 ecall
120f1c: 40010113 addi sp,sp,1024
120f20: 00008067 ret
It's not loading a0 with the buffer at sp.
What am I doing wrong?
It's not loading a0 with the buffer at sp.
Because you didn't ask for a pointer as an "r" input in a register. The one and only guaranteed/supported behaviour of T foo asm("a0") is to make an "r" constraint (including +r or =r) pick that register.
But you used "m" to let it pick an addressing mode for that buffer, not necessarily 0(a0), so it probably picked an SP-relative mode. If you add asm comments inside the template like "ecall # 0 = %0 1 = %1 2 = %2" you can look at the compiler's asm output and see what it picked. (With clang, use -no-integrated-as so asm comments in the template come through in the -S output.)
Wrapping a system call does need the pointer in a specific register, i.e. using "r" or +"r"
asm volatile ("ecall # 0=%0 1=%1 2=%2 3=%3 4=%4"
: "=r"(a0_out)
: "r"(a0), "r"(a1), "r"(syscall_id), "m"(*(const char(*)[size]) a0)
: // "memory" unneeded; the "m" input tells the compiler which memory is read
);
That "m" input can be used instead of the "memory" clobber, not instead of an "r" pointer input. (For write specifically, because it only reads that one area of pointed-to memory and has no other side-effects on memory user-space can see, only on kernel write write buffers and file-descriptor positions which aren't C objects this program can access directly. For a read call, you'd need the memory to be an output operand.)
With optimization disabled, compilers do typically pick another register as the base for the "m" input (e.g. 0(a5) for GCC), but with optimization enabled GCC picks 0(a0) so it doesn't cost extra instructions. Clang still picks 0(a2), wasting an instruction to set up that pointer, even though the "=r"(a0_out) is not early-clobber. (Godbolt, with a very cut-down version of the function that doesn't call strf::to, whatever that is, just copies a byte into the buffer.)
Interestingly, with optimization enabled for my cut-down stand-alone version of the function without fixing the bug, GCC and clang do happen to put a pointer to buffer into a0, picking 0(a0) as the template expansion for that operand (see the Godbolt link above). This seems to be a missed optimization vs. using 16(sp); I don't see why they'd need the buffer address in a register at all.
But without optimization, GCC picks ecall # 0 = a0 1 = 0(a5) 2 = a1. (In my simplified version of the function, it sets a5 with mv a5,a0, so it did actually have the address in a0 as well. So it's a good thing you had more code in your function to make it not happen to work by accident, so you could find the bug in your code.)

Read running processes' virtual memory from mm_struct in linux

I'm trying to write a kernel module that uses the proc file system to read the total memory of each running process. This is for an assignment and it is suggested that use the information of the "mm_struct" which is inside " "task_struct". I should compare the output of content of the proc file with what I obtain from running "ps -ef". I'm using the seq_file API to read from the proc file. My current read function looks like this:
static int proc_show(struct seq_file *m, void *v){
struct task_struct *task;
/*struct mm_struct *mm; Commented this since it's inside task_struct
*struct vm_area_struct *mmap; Commented this since it's inside task_struct */
unsigned long size;
for_each_process(task){
seq_printf(m,"Task = %s PID = %d\n",task->comm,task->pid);
down_read(&task->mm->mmap_sem);
if(task->mm){
size = (task->mm->mmap->vm_end - task->mm->mmap->vm_start );
seq_printf(m," VIRT = %lu\n",size);
}else{
seq_printf(m," VIRT = 0\n");
}
up_read(&task->mm->mmap_sem);
}
return 0;
}
The message "Killed" is promted as soon I try "~# cat proc/my_proc_file". So far what I think I now is that vm_area_struct is inside of mm_struct which is inside task_struct, all of them defined in the include /linux/sched.h.
A similar question, which is the one I partially based this code on, was posted by #confusedkid (see the post Linux Kernel programming: trying to get vm_area_struct->vm_start crashes kernel). However the replies contradict each other and no one actually explains the reasoning behind the code.
Could anyone please point out what the problem might be or suggest any documentation that explains how to access these structure correctly?

PIC C18: Converting double to string

I am using PIC18F2550. Programming it with C18 language.
I need a function that converts double to string like below:
void dtoa( char *szString, // Output string
double dbDouble, // Input number
unsigned char ucFPlaces) // Number of digits in the resulting fractional part
{
// ??????????????
}
To be called like this in the main program:
void main (void)
{
// ...
double dbNumber = 123.45678;
char szText[9];
dtoa(szText, dbNumber, 3); // szText becomes "123.456" or rounded to "123.457"
// ...
}
So write one!
5mins, a bit of graph paper and a coffee is all it should take.
In fact it's a good interview question
Tiny printf might work for you: http://www.sparetimelabs.com/tinyprintf/index.html
Generally, the Newlib C library (BSD license, from RedHat, part of Cygwin as well as used in many many "bare-metal" embedded-systems compilers) is a good place to start for usefuls sources for things that would be in the standard C library.
The Newlib dtoa.c sources are in the src/newlib/libc/stdlib subdirectory of the source tree:
Online source browser: http://sourceware.org/cgi-bin/cvsweb.cgi/src/newlib/libc/stdlib/?cvsroot=src#dirlist
Direct link to the current version of the dtoa.c file: http://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/src/newlib/libc/stdlib/dtoa.c?rev=1.5&content-type=text/plain&cvsroot=src
The file is going to be a little odd, in that Newlib uses some odd macros for the function declarations, but should be straightforward to adapt -- and, being BSD-licensed, you can pretty much do whatever you want with it if you keep the copyright notice on it.

Input setting using Registers

I have a simple c program for printing n Fibonacci numbers and I would like to compile it to ELF object file. Instead of setting the number of fibonacci numbers (n) directly in my c code, I would like to set them in the registers since I am simulating it for an ARM processor.How can I do that?
Here is the code snippet
#include <stdio.h>
#include <stdlib.h>
#define ITERATIONS 3
static float fib(float i) {
return (i>1) ? fib(i-1) + fib(i-2) : i;
}
int main(int argc, char **argv) {
float i;
printf("starting...\n");
for(i=0; i<ITERATIONS; i++) {
printf("fib(%f) = %f\n", i, fib(i));
}
printf("finishing...\n");
return 0;
}
I would like to set the ITERATIONS counter in my Registers rather than in the code.
Thanks in advance
The register keyword can be used to suggest to the compiler that it uses a registers for the iterator and the number of iterations:
register float i;
register int numIterations = ITERATIONS;
but that will not help much. First of all, the compiler may or may not use your suggestion. Next, values will still need to be placed on the stack for the call to fib(), and, finally, depending on what functions you call within your loop, code in the procedure are calling could save your register contents in the stack frame at procedure entry, and restore them as part of the code implementing the procedure return.
If you really need to make every instruction count, then you will need to write machine code (using an assembly language). That way, you have direct control over your register usage. Assembly language programming is not for the faint of heart. Assembly language development is several times slower than using higher level languages, your risk of inserting bugs is greater, and they are much more difficult to track down. High level languages were developed for a reason, and the C language was developed to help write Unix. The minicomputers that ran the first Unix systems were extremely slow, but the reason C was used instead of assembly was that even then, it was more important to have code that took less time to code, had fewer bugs, and was easier to debug than assembler.
If you want to try this, here are the answers to a previous question on stackoverflow about resources for ARM programming that might be helpful.
One tactic you might take is to isolate your performance-critical code into a procedure, write the procedure in C, the capture the generated assembly language representation. Then rewrite the assembler to be more efficient. Test thoroughly, and get at least one other set of eyeballs to look the resulting code over.
Good Luck!
Make ITERATIONS a variable rather than a literal constant, then you can set its value directly in your debugger/simulator's watch or locals window just before the loop executes.
Alternatively as it appears you have stdio support, why not just accept the value via console input?