This question already has answers here:
Solution needed for building a static IDT and GDT at assemble/compile/link time
(1 answer)
How to do computations with addresses at compile/linking time?
(2 answers)
Closed 5 days ago.
I'm working on a project that has tight boot time requirements. The targeted architecture is an IA-32 based processor running in 32 bit protected mode. One of the areas identified that can be improved is that the current system dynamically initializes the processor's IDT (interrupt descriptor table). Since we don't have any plug-and-play devices and the system is relatively static, I want to be able to use a statically built IDT.
However, this proving to be troublesome for the IA-32 arch since the 8 byte interrupt gate descriptors splits the ISR address. The low 16 bits of the ISR appear in the first 2 bytes of the descriptor, some other bits fill in the next 4 bytes, and then finally the last 16 bits of the ISR appear in the last 2 bytes.
I wanted to use a const array to define the IDT and then simply point the IDT register at it like so:
typedef struct s_myIdt {
unsigned short isrLobits;
unsigned short segSelector;
unsigned short otherBits;
unsigned short isrHibits;
} myIdtStruct;
myIdtStruct myIdt[256] = {
{ (unsigned short)myIsr0, 1, 2, (unsigned short)(myIsr0 >> 16)},
{ (unsigned short)myIsr1, 1, 2, (unsigned short)(myIsr1 >> 16)},
etc.
Obviously this won't work as it is illegal to do this in C because myIsr is not constant. Its value is resolved by the linker (which can do only a limited amount of math) and not by the compiler.
Any recommendations or other ideas on how to do this?
You ran into a well known x86 wart. I don't believe the linker can stuff the address of your isr routines in the swizzled form expected by the IDT entry.
If you are feeling ambitious, you could create an IDT builder script that does something like this (Linux based) approach. I haven't tested this scheme and it probably qualifies as a nasty hack anyway, so tread carefully.
Step 1: Write a script to run 'nm' and capture the stdout.
Step 2: In your script, parse the nm output to get the memory address of all your interrupt service routines.
Step 3: Output a binary file, 'idt.bin' that has the IDT bytes all setup and ready for the LIDT instruction. Your script obviously outputs the isr addresses in the correct swizzled form.
Step 4: Convert his raw binary into an elf section with objcopy:
objcopy -I binary -O elf32-i386 idt.bin idt.elf
Step 5: Now idt.elf file has your IDT binary with the symbol something like this:
> nm idt.elf
000000000000000a D _binary_idt_bin_end
000000000000000a A _binary_idt_bin_size
0000000000000000 D _binary_idt_bin_start
Step 6: relink your binary including idt.elf. In your assembly stubs and linker scripts, you can refer to symbol _binary_idt_bin_start as the base of the IDT. For example, your linker script can place the symbol _binary_idt_bin_start at any address you like.
Be careful that relinking with the IDT section doesn't move anyting else in your binary, e.g. your isr routines. Manage this in your linker script (.ld file) by puting the IDT into it's own dedicated section.
---EDIT---
From comments, there seems to be confusion about the problem. The 32-bit x86 IDT expects the address of the interrupt service routine to be split into two different 16-bit words, like so:
31 16 15 0
+---------------+---------------+
| Address 31-16 | |
+---------------+---------------+
| | Address 15-0 |
+---------------+---------------+
A linker is thus unable to plug-in the ISR address as a normal relocation. So, at boot time, software must construct this split format, which slows boot time.
Related
I'm trying to build u-boot for our simple test board. (arm64)
After setting in include/configs/ab21m.h (our board),
#define CONFIG_SPL_BSS_START_ADDR 0x4f00000
#define CONFIG_SPL_BSS_MAX_SIZE SZ_32K
when I compile it, it gives me error while linking u-boot-spl. The error message is like this.
===================== WARNING ======================
This board does not use CONFIG_DM_ETH (Driver Model
for Ethernet drivers). Please update the board to use
CONFIG_DM_ETH before the v2020.07 release. Failure to
update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.
====================================================
UPD include/generated/timestamp_autogenerated.h
CFGCHK u-boot.cfg
CC cmd/version.o
AR cmd/built-in.o
LD u-boot
CC spl/common/spl/spl.o
OBJCOPY u-boot.srec
OBJCOPY u-boot-nodtb.bin
SYM u-boot.sym
RELOC u-boot-nodtb.bin
COPY u-boot.bin
MKIMAGE u-boot.img
LD u-boot.elf
AR spl/common/spl/built-in.o
LD spl/u-boot-spl
aarch64-none-elf-ld.bfd: invalid length for memory region .sdram
make[1]: *** [scripts/Makefile.spl:509: spl/u-boot-spl] Error 1
make: *** [Makefile:1984: spl/u-boot-spl] Error 2
make: *** Waiting for unfinished jobs....
By the way, the linker script for spl starts like this after the build.
MEMORY { .sram : ORIGIN = 0x4000000,
LENGTH = (14*1024*1024) }
MEMORY { .sdram : ORIGIN = 0x4f00000,
LENGTH = SZ_32K }
OUTPUT_FORMAT("elf64-littleaarch64", "elf64-littleaarch64", "elf64-littleaarch64")
OUTPUT_ARCH(aarch64)
ENTRY(_start)
SECTIONS
{
This is strange because this 0x4f00000 and SZ_32K is what I gave for CONFIG_SPL_BSS_START_ADDR and CONFIG_SPL_BSS_MAX_SIZE. I placed this range in the on-chip RAM area some enough space above CONFIG_SPL_TEXT_BASE and below CONFIG_SPL_STACK with enough stack space. (I referenced imx8mm_evk board). What should I correct?
BTW, I found CONFIG_SYS_SDRAM_BASE, CONFIG_SYS_INIT_RAM_ADDR are all set to 0x40000000in imx8mm_evk, which is the DRAM start addrss. But CONFIG_SYS_INIT_RAM_SIZE is set to 0x200000 (2MB) where there is actually 3072MB DDR. Why is this value set to small size?
I asked two qeustions. Any help will be very much appreciated.
So, your first problem is a literal one. You have SZ_32K as the value for CONFIG_SPL_BSS_MAX_SIZE but since you're likely lacking #include <linux/sizes.h> in include/configs/ab21m.h you're not getting that constant evaluated.
As for what this is all doing, and why you should likely use something more like 2MB as seen on other platforms and place it in SDRAM rather than much smaller on-chip memory, if you look at arch/arm/cpu/armv8/u-boot-spl.lds you can see we're defining where the BSS should reside and that's likely larger than 32KB (and you'll get an overflow error when linking, if so).
I'm running some code under Valgrind, compiled with gcc 7.5 targeting an aarch64 (ARM 64 bits) architecture, with optimizations enabled.
I get the following error:
==3580== Invalid write of size 8
==3580== at 0x38865C: ??? (in ...)
==3580== Address 0x1ffeffdb70 is on thread 1's stack
==3580== 16 bytes below stack pointer
This is the assembly dump in the vicinity of the offending code:
388640: a9bd7bfd stp x29, x30, [sp, #-48]!
388644: f9000bfc str x28, [sp, #16]
388648: a9024ff4 stp x20, x19, [sp, #32]
38864c: 910003fd mov x29, sp
388650: d1400bff sub sp, sp, #0x2, lsl #12
388654: 90fff3f4 adrp x20, 204000 <_IO_stdin_used-0x4f0>
388658: 3dc2a280 ldr q0, [x20, #2688]
38865c: 3c9f0fe0 str q0, [sp, #-16]!
I'm trying to ascertain whether this is a possible bug in my code (note that I've thoroughly reviewed my code and I'm fairly confident it's correct), or whether Valgrind will blindly report any writes below the stack pointer as an error.
Assuming the latter, it looks like a Valgrind bug since the offending instruction at 0x38865c uses the pre-decrement addressing mode, so it's not actually writing below the stack pointer.
Furthermore, at address 0x388640 a similar access (and again with pre-decrement addressing mode) is performed, yet this isn't reported by Valgrind; the main difference being the use of an x register at address 0x388640 versus a q register at address 38865c.
I'd also like to draw attention to the large stack pointer subtraction at 0x388650, which may or may not have anything to do with the issue (note this subtraction makes sense, given that the offending C code declares a large array on the stack).
So, will anyone help me make sense of this, and whether I should worry about my code?
The code looks fine, and the write is certainly not below the stack pointer. The message seems to be a valgrind bug, possibly #432552, which is marked as fixed. OP confirms that the message is not produced after upgrading valgrind to 3.17.0.
code declares a large array on the stack
should [I] worry about my code?
I think it depends upon your desire for your code to be more portable.
Take this bit of code that I believe represents at least one important thing you mentioned in your post:
#include <stdio.h>
#include <stdlib.h>
long long foo (long long sz, long long v) {
long long arr[sz]; // allocating a variable on the stack
arr[sz-1] = v;
return arr[sz-1];
}
int main (int argc, char *argv[]) {
long long n = atoll(argv[1]);
long long v = foo(n, n);
printf("v = %lld\n", v);
}
$ uname -mprsv
Darwin 20.5.0 Darwin Kernel Version 20.5.0: Sat May 8 05:10:33 PDT 2021; root:xnu-7195.121.3~9/RELEASE_X86_64 x86_64 i386
$ gcc test.c
$ a.out 1047934
v = 1047934
$ a.out 1047935
Segmentation fault: 11
$ uname -snrvmp
Linux localhost.localdomain 3.19.8-100.fc20.x86_64 #1 SMP Tue May 12 17:08:50 UTC 2015 x86_64 x86_64
$ gcc test.c
$ ./a.out 2147483647
v = 2147483647
$ ./a.out 2147483648
v = 2147483648
There are at least some minor portability concerns with this code. The amount of allocatable stack memory for these two environments differs significantly. And that's only for two platforms. Haven't tried it on my Windows 10 vm but I don't think I need to because I got bit by this one a long time ago.
Beyond OP issue that was due to a Valgrind bug, the title of this question is bound to attract more people (like me) who are getting "invalid write at X bytes below stack pointer" as a legitimate error.
My piece of advice: check that the address you're writing to is not a local variable of another function (not present in the call stack)!
I stumbled upon this issue while attempting to write into the address returned by yyget_lloc(yyscanner) while outside of function yyparse (the former returns the address of a local variable in the latter).
I've been working on rop recently. When using perf to count hardware information, I want to measure the number of return instructions executed by a given piece of code. But the perf interface only provides branch instructions.
If you're only x86 with a recent Intel CPU:
perf list on my Skylake shows there's a hardware counter for br_inst_retired.near_return. That will count only ret instructions, not other branches. But see erratum SKL091 for branch-instruction counters.
perf stat -e instructions,br_inst_retired.near_return,... ./a.out may be what you're looking for. Or maybe attaching perf stat to an already-running program, or maybe -I 1000 to print accumulated counts over intervals.
But note that if you're looking for ROP gadgets, you can find a C3 opcode inside what normally decodes as some other instruction. So restricting yourself only to ret instructions that actually run during the target program's normal execution is more limiting than it needs to be.
e.g. a 4-byte immediate might usefully decode as something + ret if you jump to the immediate.
I am reworking the programmer for the Olimex iCE40HX1K board (targetted towards a STM32F103 ma) where I also would like to implement the "SPI Slave" mode to configure an image directly into RAM without using the serial flash.
Looking at the Lattice "programming and Configuration guide" (page 11), it is noted in table 8 that a EPROM for a ICE40-LP/LX1K must be at least 34112 bytes. (which -I guess- means that the configuration-files can be up to that size).
However, all images I have (sofar) created with the icestorm tools are 32220 octets.
I am a bit puzzled here.
Can somebody explain the difference between these two figures?
Does the HX1K need a configuration-file of 32220 or 34112 bytes?
I don't know how Lattice arrived at this number. A complete HX1K bin file with BRAM initialization but without comment and without multiboot header is 32220 bytes in size. The (optional) multiboot header would add another 160 bytes (32220 + 160 = 32380). The lattice tools usually add about 80 bytes to the comment field (32220 + 80 = 32300). Whatever I do, all numbers I have are more than 1000 short of 34112.
I don't know if there is a maximum length for the comment. Maybe there is and 34112 is the size of a bit stream with a comment of maximum length?
34112 - 32220 = 1892. Maybe someone decided to add 8kB (8192 bytes) just in case, but that person accidentally swapped the first two digits? Idk..
If you don't care about comments or multiboot headers, then iCE40 1K bit-streams have a fixed size, and that size is 32220 bytes.
I was wondering if it is possible to access debug information in a running application that has been compiled with /DEBUG (Pascal and/or C), in order to retrieve information about structures used in the application.
The application can always ask the debugger to do something using SS$_DEBUG. If you send a list of commands that end with GO then the application will continue running after the debugger does its thing. I've used it to dump a bunch of structures formatted neatly without bothering to write the code.
ANALYZE/IMAGE can be used to examine the debugger data in the image file without running the application.
Although you may not see the nice debugger information, you can always look into a running program's data with ANALYZE/SYSTEM .. SET PROCESS ... EXAMINE ....
The SDA SEARCH command may come in handy to 'find' recognizable morcels of date, like a record that you know the program must have read.
Also check out FORMAT/TYPE=block-type, but to make use of data you'll have to compile your structures into .STB files.
When using SDA, you may want to try run the program yourself interactively in an other session to get sample sample addresses to work from.... easier than a link map!
If you programs use RMS a bunch (mine always do :-), then SDA> SHOW PROC/RMS=(FAB,RAB) may give handy addresses for record and key buffers, allthough those may also we managed by the RTL's and thus not be meaningful to you.
Too long for a comment ...
As far as I know, structure information about elements is not in the global symbol table.
What I did, on Linux, but that should work on VMS/ELF files as well:
$ cat tests.c
struct {
int ii;
short ss;
float ff;
char cc;
double dd;
char bb:1;
void *pp;
} theStruct;
...
$ cc -g -c tests.c
$ ../extruct/extruct
-e-insarg, supply an ELF object file.
Usage: ../extruct/extruct [OPTION]... elf-file variable
Display offset and size of members of the named struct/union variable
extracted from the dwarf info in the elf file.
Options are:
-b bit offsets and bit sizes for all members
-lLEVEL display level for nested structures
-n only the member names
-t print base types
$ ../extruct/extruct -t ./tests.o theStruct
size of theStruct: 0x20
offset size type name
0x0000 0x0004 int ii
0x0004 0x0002 short int ss
0x0008 0x0004 float ff
0x000c 0x0001 char cc
0x0010 0x0008 double dd
0x0018 0x0001 char bb:1
0x001c 0x0004 pp
$