what is the x86 opcode for assembly variables and constants? - variables

i know that every instruction has opcodes.
i could find opcodes for mov , sub instructions.
but what is the opcode for variables and it's types.
we use assembler directives to define a variable and constant?
how they are represented in x86 opcodes?
nasm assembler x86: segment .bss
largest resb 2 ; reserves two bytes for largest
segment .data
number1 DW 12345 ; defines a constant number1
i tried online this https://defuse.ca/online-x86-assembler.htm#disassembly assembly to opcode conveter. but when i used nasm code to define a variable it shows error!

There's no opcode for variables. There are even no variables in machine code.
There is CPU and memory. Memory contains some values (bytes).
The CPU has cs:ip instruction pointer, pointing to memory address, where is the next instruction to execute, so it will read byte(s) from that address, and interpret them as opcode, and execute it as an instruction.
Whether you have in memory stored data or machine code doesn't matter, both are byte values.
What makes part of memory "data" or "variable" is the logical interpretation created by the running code, it's the code which does use certain part of memory only as "data/variables" and other part of memory as "code" (or eventually as both at the same time, like in this DOS 51B long COM code drawing Greece flag on screen, where the XLAT instruction is using the code opcodes also as source data for blue/white strips configuration).
Whether you write in your source:
x:
add al,al
or
x:
db 0x00, 0xC0
Doesn't matter, the resulting machine code is identical (in both cases the CPU will execute add al,al when pointed to that memory to be executed as instruction, and mov ax,[x] will set ax to 0xC000 in both cases, when used as "variable".
You may want to check listing file from the assembler (-l <listing_file_name> command line option for nasm) to see yourself there's no way to tell which bytes are code and which are data.

Assembler directives like segment, resb, or dw are not instructions and do not correspond to opcodes. That's why they are directives instead of instructions. Roughly speaking, there are two kinds of directives:
one kind of directive configures the assembler. For example, the segment directive configures the assembler to continue assembly in the section you provided.
another kind of directive emits data. For example, the dw directive emits the given datum into the object file. This can be used to place arbitrary data into memory for use with your program.

Related

Raku-native disk space usage

Purpose:
Save a program that writes data to disk from vain attempts of writing to a full filesystem;
Save bandwidth (don't download if nowhere to store);
Save user's and programmer's time and nerves (notify them of the problem instead of having them tearing out their hair with reading misleading error messages and "why the heck this software is not working!").
The question comes in 2 parts:
Reporting storage space statistics (available, used, total etc.), either of all filesystems or of the filesystem that path in question belongs to.
Reporting a filesystem error on running out of space.
Part 1
Share please NATIVE Raku alternative(s) (TIMTOWTDIBSCINABTE "Tim Toady Bicarbonate") to:
raku -e 'qqx{ df -P $*CWD }.print'
Here, raku -executes df (disk free) external program via shell quoting with interpolation qqx{}, feeding -Portable-format argument and $*CWD Current Working Directory, then .prints the df's output.
The snippet initially had been written as raku -e 'qqx{ df -hP $*CWD }.print' — with both -human-readable and -Portable — but it turned out that it is not a ubiquitously valid command. In OpenBSD 7.0, it exits with an error: df: -h and -i are incompatible with -P.
For adding human-readability, you may consider Number::Bytes::Human module
raku -e 'run <<df -hP $*CWD>>'
If you're just outputting what df gives you on STDOUT, you don't need to do anything.
The << >> are double quoting words, so that the $*CWD will be interpolated.
Part 1 — Reporting storage space statistics
There's no built in function for reporting storage space statistics. Options include:
Write Raku code (a few lines) that uses NativeCall to invoke a platform / filesystem specific system call (such as statvfs()) and uses the information returned by that call.
Use a suitable Raku library. FileSystem::Capacity picks and runs an external program for you, and then makes its resulting data available in a portable form.
Use run (or similar1) to invoke a specific external program such as df.
Use an Inline::* foreign language adaptor to enable invoking of a foreign PL's solution for reporting storage space statistics, and use the info it provides.2
Part 2 — Reporting running out of space
Raku seems to neatly report No space left on device:
> spurt '/tmp/failwrite', 'filesystem is full!'
Failed to write bytes to filehandle: No space left on device
in block <unit> at <unknown file> line 1
> mkdir '/tmp/failmkdir'
Failed to create directory '/tmp/failmkdir' with mode '0o777': Failed to mkdir: No space left on device
in block <unit> at <unknown file> line 1
(Programmers will need to avoid throwing away these exceptions.)
Footnotes
1 run runs an external command without involving a shell. This guarantees that the risks attendant with involving a shell are eliminated. That said, Raku also supports use of a shell (because that can be convenient and appropriate in some scenarios). See the exchange of comments under the question (eg this one) for some brief discussion of this, and the shell doc for a summary of the risk:
All shell metacharacters are interpreted by the shell, including pipes, redirects, environment variable substitutions and so on. Shell escapes are a severe security concern and can cause confusion with unusual file names. Use run if you want to be safe.
2 Foreign language adaptors for Raku (Raku modules in the Inline:: namespace) allow Raku code to use code written in other languages. These adaptors are not part of the Raku language standard, and most are barely experimental status, if that, but, conversely, the best are in great shape and allow Raku code to use foreign libraries as if they were written for Raku. (As of 2021 Inline::Perl5 is the most polished.)

Is it possible to access the procedure linkage table in NASM?

In the report I received from Agner Fog's objconv, I see suggested errors to fix in the .plt (procedure linkage table) section, for example:
SECTION .plt align=16 execute ; section number 9, code
?_001: push qword [rel ?_086] ; 10F0 _ FF. 35, 00201F12(rel)
jmp near [rel ?_087] ; 10F6 _ FF. 25, 00201F14(rel)
; Filling space: 4H
; Filler type: Multi-byte NOP
; db 0FH, 1FH, 40H, 00H
ALIGN 8
?_002: jmp near [rel ?_088] ; 1100 _ FF. 25, 00201F12(rel)
; Note: Immediate operand could be made smaller by sign extension
push 0 ; 1106 _ 68, 00000000
; Note: Immediate operand could be made smaller by sign extension
jmp ?_001 ; 110B _ E9, FFFFFFE0
In two cases (and others in code not shown) it suggests "Immediate operand could be made smaller by sign extension." How can I access the procedure linkage table to make those changes? Is it possible?
PLT stubs intentionally use longer immediates and jump displacements than necessary so they're a constant size even when you have enough PLT entries that the jmp ?_001 in the fall-through path needs a rel32 to reach from later PLT entries.
They're automatically generated by the linker when linking code that used call printf wrt ..plt, or when linking a non-PIE that just used call printf.
You can avoid the PLT entirely by writing call [rel printf wrt ..got], like GCC does when you compile with -fno-plt. This does early binding (instead of lazy), resolving all the GOT at startup before your _start.
See Can't call C standard library function on 64-bit Linux from assembly (yasm) code. Using default rel lets you leave out the explicit rel part of the addressing mode. The equivalent AT&T syntax is call *printf#GOTPCREL(%rip)
I don't know if this fixed-width array of PLT stubs is strictly necessary for anything at run time. e.g. lazy dynamic linking only modifies the GOT, not the PLT itself, because modern PLTs use an indirect jump. The push 0 is pushing an index of the PLT entry, but I don't think anything uses it to actually find the address of the machine code of that PLT stub, only indexing a GOT entry.
At this point it might just be a missed optimization in the linker. NASM isn't generating it so you can't really do anything about it.
I seem to recall historically seeing a jmp rel32 as the first instruction of PLT stubs in 32-bit code, not a jmp [mem], but maybe that was just a guess at how PLT stubs worked before I really knew much. If they ever worked that way, lazy dynamic linking would modify the actual PLT itself to fix up the relative jump target, so indexing the machine code of the PLT entry would be important. (And thus having every entry be fixed width would be important).
But even 32-bit code doesn't use jmp rel32 these days so the PLT stubs are read-only. And in 64-bit code, jmp rel32 can only reach +-2GiB so wouldn't be usable to reach libraries mapped to a random address.
Note that those longer-than-needed instructions only ever run once for each PLT stub. After the first call, the indirect jmp target will be the function in the library. (On the first call, the jmp target will be the next instruction after the jmp.)
The padding might possibly be a good thing: too many jmp instructions in a single 16-byte block of code is bad for branch predictors on some CPUs. But I think the limit is like 3 or 4 jumps in a 16-byte block of machine code for some AMD or Core 2, so that wouldn't be hit anyway with 6-byte jmp [RIP+rel32] + 2-byte push imm8 + 2-byte jmp rel8.

Filter the output of GNU nm by section

I'm trying to identify the largest symbols in an .elf file for each memory section (.text, .data, .bss). So far I'm using GNU nm to get the largest symbols:
nm foo.elf --size-sort --reverse-sort --radix=d --demangle --line-numbers
Is there a builtin way in nm to filter the ouput by section or do I need to resort to text filtering?
nm outputs a section type for every symbol as single letter code (B: .bss, D: .data, T: .text), but there seems no way to filter by symbol type.
Background: The code runs on a microcontroller which is able to execute instruction directly from flash memory. The instructions from the .text section stay in the flash memory during execution, .bss and .data are loaded into the RAM. That's way I would like to be able to identify the largest symbols in each section independently.
there seems no way to filter by symbol type.
Just use grep to perform any filtering you may need.
You may also want to look at Bloaty McBloatface: a size profiler for binaries.

Where are Ksh User-Defined Variables stored on UNIX machine?

Where are KornShell (ksh) User-Defined Variables (UDV) stored on a AIX (Advanced Interactive eXecutive) machine?
Sample Commands:
#:/dir #variable=fooValue
#:/dir #echo $variable
fooValue
So is there a file on the AIX server with "fooValue" in text? Is the value stored in memory? Can the variable be sniffed out anyway?
The shell is a running process with its own little chunk of memory it gets dealt by the operating system.
As you define and set variables, the shell stores their names and values
inside its own process memory.
When the shell process exits, that memory is released back to the operating system, and the variables and their values are lost.
it's certainly not stored on disk. You can access the memory region where the variables are stored using
extern char** environ;
see man 5 environ.
But the less higher-level way is via
char * getenv(char * varname)
which returns you the value of a single environment variable (which ought to be part of the environ pointer list) See man 3 getenv

In an ELF file, how does the address for _start get detemined?

I've been reading the ELF specification and cannot figure out where the program entry point and _start address come from.
It seems like they should have to be in a pretty consistent place, but I made a few trivial programs, and _start is always in a different place.
Can anyone clarify?
The _start symbol may be defined in any object file. Normally it is generated automatically (it corresponds to main in C). You can generate it yourself, for instance in an assembler source file:
.globl _start
_start:
// assembly here
When the linker has processed all object files it looks for the _start symbol and puts its value in the e_entry field of the elf header. The loader takes the address from this field and makes a call to it after it has finished loading all sections in memory and is ready to execute the file.
Take a look at the linker script ld is using:
ld -verbose
The format is documented at: https://sourceware.org/binutils/docs-2.25/ld/Scripts.html
It determines basically everything about how the executable will be generated.
On Binutils 2.24 Ubuntu 14.04 64-bit, it contains the line:
ENTRY(_start)
which sets the entry point to the _start symbol (goes to the ELF header as mentioned by ctn)
And then:
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
which sets the address of the first headers to 0x400000 + SIZEOF_HEADERS.
I have modified that address to 0x800000, passed my custom script with ld -T and it worked: readelf -s says that _start is at that address.
Another way to change it is to use the -Ttext-segment=0x800000 option.
The reason for using 0x400000 = 4Mb = getconf PAGE_SIZE is to start at the beginning of the second page as asked at: Why is the ELF execution entry point virtual address of the form 0x80xxxxx and not zero 0x0?
A question describes how to set _start from the command line: Why is the ELF entry point 0x8048000 not changeable with the "ld -e" option?
SIZEOF_HEADERS is the size of the ELF + program headers, which are at the beginning of the ELF file. That data gets loaded into the very beginning of the virtual memory space by Linux (TODO why?) In a minimal Linux x86-64 hello world with 2 program headers it is worth 0xb0, so that the _start symbol comes at 0x4000b0.
I'm not sure but try this link http://www.docstoc.com/docs/23942105/UNIX-ELF-File-Format
at page 8 it is shown where the entry point is if it is executable. Basically you need to calculate the offset and you got it.
Make sure to remember the little endianness of x86 ( i guess you use it) and reorder if you read bytewise edit: or maybe not i'm not quit sure about this to be honest.