why does not readelf report right sizes? - elf

I have a Linux executable file (which is an ELF file) called a.out.
I use
readelf -hl a.out
and get output
ELF Header:
Magic: 7f 45 4c 46 02 01 01 03 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - GNU
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4008e0
Start of program headers: 64 (bytes into file)
Start of section headers: 910680 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 6
Size of section headers: 64 (bytes)
Number of section headers: 33
Section header string table index: 30
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000c9be6 0x00000000000c9be6 R E 200000
LOAD 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000001c98 0x0000000000003550 RW 200000
NOTE 0x0000000000000190 0x0000000000400190 0x0000000000400190
0x0000000000000044 0x0000000000000044 R 4
TLS 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000000020 0x0000000000000050 R 8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 10
GNU_RELRO 0x00000000000c9eb8 0x00000000006c9eb8 0x00000000006c9eb8
0x0000000000000148 0x0000000000000148 R 1
Section to Segment mapping:
Segment Sections...
00 .note.ABI-tag .note.gnu.build-id .rela.plt .init .plt .text __libc_freeres_fn __libc_thread_freeres_fn .fini .rodata __libc_subfreeres __libc_atexit .stapsdt.base __libc_thread_subfreeres .eh_frame .gcc_except_table
01 .tdata .init_array .fini_array .jcr .data.rel.ro .got .got.plt .data .bss __libc_freeres_ptrs
02 .note.ABI-tag .note.gnu.build-id
03 .tdata .tbss
04
05 .tdata .init_array .fini_array .jcr .data.rel.ro .got
According to the above output, this ELF file should have size
ELF header size : 64
+
Section headers : 64*33 = 2112
+
Program headers : 56*6 = 336
+
Segments : 834090 = 0x00000000000c9be6 + 0x0000000000001c98 + 0x0000000000000044 + 0x0000000000000020 + 0x0000000000000148
= 64 + 2112 + 336 + 834090 = 836602
However, /bin/ls reports file size 912792.
where are the 912792 - 836602 = 76190 bytes?
which parts did I forget to count?
UPDATE
according to #Jonathon Reinhart, I re count the size using section sizes instead of segment size by
echo "print 0x020 + 0x024 + 0x0f0 + 0x01a + 0x0a0 + 0x09eda4 + 0x02529 + 0x0de + 0x09 + 0x01d320 + 0x050 + 0x08 + 0x01 + 0x08 + 0x0b04c + 0x0b2 + 0x020 + 0x010 + 0x010 + 0x08 + 0x0e4 + 0x010 + 0x068 + 0x01ad0 + 0x023 + 0x0f18 + 0x0169 + 0x0b100 + 0x0685a" | /usr/bin/python
The section size values used above are extracted from the output of readelf -S a.out and the size of section type of "NOBITS" were not counted.
The above code output 909459 which is bigger than segment size 834090,
however, the total size is still not equal to the file size.
64 + 2112 + 336 + 909459 = 911971 != 912792
still missing 912792 - 911971 = 821 bytes.
update2 [solved]
according to the comment, I have test the offset. The result tells me that the adding pad does exist.

Related

ELF program segments offset in file

I have a question,about elf program segments offsize in file. For example , a program readelf -f xx -W like this:
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x4ca8e6 0x4ca8e6 R E 0x200000
LOAD 0x4cb000 0x0000000000acb000 0x0000000000acb000 0x035db8 0x04ed80 RW 0x200000
DYNAMIC 0x4ed4c8 0x0000000000aed4c8 0x0000000000aed4c8 0x000230 0x000230 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
TLS 0x4cb000 0x0000000000acb000 0x0000000000acb000 0x000010 0x000018 R 0x10
GNU_EH_FRAME 0x3dcf04 0x00000000007dcf04 0x00000000007dcf04 0x024c64 0x024c64 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame .gcc_except_table
03 .tdata .init_array .fini_array .jcr .data.rel.ro .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .tdata .tbss
07 .eh_frame_hdr
08
The first load begin at offset 0x000000 and the size is 0x4ca8e6. why the second offset not (0x000000 + 0x4ca8e6), I see the (0x4cb000 - 0x4ca8e6) content, all 0. I can't get it. What the rule about the offset in file?
The first load begin at offset 0x000000 and the size is 0x4ca8e6. why the second offset not (0x000000 + 0x4ca8e6)
Because the loader mmaps LOAD segments directly into memory, for each LOAD segment the following must be true: (p_vaddr - p_offset) % page_size == 0.
On x86_64 the maximum page size is 2MiB (0x200000). This places severe restriction on the second (and subsequent) LOAD segment location.

What is the format of special section in ELF

I try to write a program to generate ELF(based on Arm and execute through qemu-arm). Most format in ELF has been well illustrated
on wiki. But I can't find any spec describe the format of special section(e.g. .text .data(especially what I want to know)).
I tried to put some initialized global variable in .data section. What format should I write in ELF(.data section) if I have global statement like: int a = 10;
There is not special format for .text and .data.
When the static linker links several .o file,
it simply concatenates the .text and .data segments (while resolving relocations)
and places them in the final .so or executable file according to the linker script (see gcc -Wl,-verbose /dev/null).
The .data segment simply contains the initial values of the instanciated global variables.
The .text segment simply contains the machine code of the routines/functions.
Let's take this simple C file:
char x[5] = {0xba, 0xbb, 0xbc, 0xbd, 0xbe};
char f(int i) {
return x[i];
}
Let's compile it:
$ gcc -c -o test.o test.c
Let's dump the .data section, using elfcat:
$ elfcat test.o --section-name .data | xxd
00000000: babb bcbd be .....
We can clearly explain the content of .data section.
Let's dump the .text section:
$ elfcat test.o --section-name .text | xxd
00000000: 5548 89e5 897d fc8b 45fc 4898 488d 1500 UH...}..E.H.H...
00000010: 0000 000f b604 105d c3
Let's decompile this:
$ elfcat test.o --section-name .text > test.text
$ r2 -a x86 -b 64 -qc pd test.text
0x00000000 55 push rbp
0x00000001 4889e5 mov rbp, rsp
0x00000004 897dfc mov dword [rbp - 4], edi
0x00000007 8b45fc mov eax, dword [rbp - 4]
0x0000000a 4898 cdqe
0x0000000c 488d15000000. lea rdx, qword [0x00000013] ; 19
0x00000013 0fb60410 movzx eax, byte [rax + rdx]
0x00000017 5d pop rbp
0x00000018 c3 ret
Again, there is nothing special in the text segment: it only contains the machine code of the routines/functions of my program.
Notice however the relocation and symbol informations in other segments:
$ readelf -a test.o
[ ... ]
Relocation section '.rela.text' at offset 0x1b8 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000f 000800000002 R_X86_64_PC32 0000000000000000 x - 4
Relocation section '.rela.eh_frame' at offset 0x1d0 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000000020 000200000002 R_X86_64_PC32 0000000000000000 .text + 0
[...]
Symbol table '.symtab' contains 10 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS test.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4
5: 0000000000000000 0 SECTION LOCAL DEFAULT 6
6: 0000000000000000 0 SECTION LOCAL DEFAULT 7
7: 0000000000000000 0 SECTION LOCAL DEFAULT 5
8: 0000000000000000 5 OBJECT GLOBAL DEFAULT 3 x
9: 0000000000000000 25 FUNC GLOBAL DEFAULT 1 f

Accessing USB device data based only on DESCRIPTOR HID Report

I have a Digital Sound Level Meter (sonometer) GM1356 with USB. There is some software to handle it on Windows, however I don't have CD and it's not available on the internet. What I want do to is to read it's data about current noise level on Linux.
I found already a library that allows me to do this in a language I know (ruby, libusb). In next step I installed wireshark to check out what it sends do the pc. It doesn't send too much. The most interesting packet I found is DESCRIPTOR HID Report. I wonder what next steps should I take to read data that is interesting for me. How can I determine what requests I should send to get it?
HID Report
Global item (Usage)
Header
.... ..10 = bSize: 2 bytes (2)
.... 01.. = bType: Global (1)
0000 .... = bTag: Usage (0x0)
Usage page: [Vendor-defined] (0xffa0)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa00001)
Main item (Collection)
Header
.... ..01 = bSize: 1 byte (1)
.... 00.. = bType: Main (0)
1010 .... = bTag: Collection (0xa)
Collection type: Application (0x01)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa00002)
Main item (Collection)
Header
.... ..01 = bSize: 1 byte (1)
.... 00.. = bType: Main (0)
1010 .... = bTag: Collection (0xa)
Collection type: Physical (0x00)
Global item (Usage)
Header
.... ..10 = bSize: 2 bytes (2)
.... 01.. = bType: Global (1)
0000 .... = bTag: Usage (0x0)
Usage page: [Vendor-defined] (0xffa1)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa10003)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa10004)
Global item (Logical minimum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0001 .... = bTag: Logical minimum (0x1)
Logical minimum: 128
Global item (Logical maximum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0010 .... = bTag: Logical maximum (0x2)
Logical maximum: 127
Global item (Physical minimum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0011 .... = bTag: Physical minimum (0x3)
Physical minimum: 0
Global item (Physical maximum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0100 .... = bTag: Physical maximum (0x4)
Physical maximum: 255
Global item (Report size)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0111 .... = bTag: Report size (0x7)
Report size: 8
Global item (Report count)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
1001 .... = bTag: Report count (0x9)
Report count: 8
Main item (Input)
Header
.... ..01 = bSize: 1 byte (1)
.... 00.. = bType: Main (0)
1000 .... = bTag: Input (0x8)
.... .... 0 = Data/constant: Data
.... ...1 . = Data type: Variable
.... ..0. . = Coordinates: Absolute
.... .0.. . = Min/max wraparound: No Wrap
.... 0... . = Physical relationship to data: Linear
...0 .... . = Preferred state: Preferred State
..0. .... . = Has null position: No Null position
.0.. .... . = [Reserved]: False
0... .... . = Bits or bytes: Buffered bytes (default, no second byte present)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa10005)
Local item (Usage)
Header
.... ..01 = bSize: 1 byte (1)
.... 10.. = bType: Local (2)
0000 .... = bTag: Usage (0x0)
Usage: [Vendor-defined] (0xffa10006)
Global item (Logical minimum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0001 .... = bTag: Logical minimum (0x1)
Logical minimum: 128
Global item (Logical maximum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0010 .... = bTag: Logical maximum (0x2)
Logical maximum: 127
Global item (Physical minimum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0011 .... = bTag: Physical minimum (0x3)
Physical minimum: 0
Global item (Physical maximum)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0100 .... = bTag: Physical maximum (0x4)
Physical maximum: 255
Global item (Report size)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
0111 .... = bTag: Report size (0x7)
Report size: 8
Global item (Report count)
Header
.... ..01 = bSize: 1 byte (1)
.... 01.. = bType: Global (1)
1001 .... = bTag: Report count (0x9)
Report count: 8
Main item (Output)
Header
.... ..01 = bSize: 1 byte (1)
.... 00.. = bType: Main (0)
1001 .... = bTag: Output (0x9)
.... .... 0 = Data/constant: Data
.... ...1 . = Data type: Variable
.... ..0. . = Coordinates: Absolute
.... .0.. . = Min/max wraparound: No Wrap
.... 0... . = Physical relationship to data: Linear
...0 .... . = Preferred state: Preferred State
..0. .... . = Has null position: No Null position
.0.. .... . = (Non)-volatile: Non Volatile
0... .... . = Bits or bytes: Buffered bytes (default, no second byte present)
Main item (End collection)
Header
.... ..00 = bSize: 0 bytes (0)
.... 00.. = bType: Main (0)
1100 .... = bTag: End collection (0xc)
Main item (End collection)
Header
.... ..00 = bSize: 0 bytes (0)
.... 00.. = bType: Main (0)
1100 .... = bTag: End collection (0xc)
When you decode the HID descriptor it will show the packet formats. Unfortunately, in this case the usage pages are vendor-defined so it is not possible to say exactly how each usage is to be interpreted.
I decoded it using hidrdd (disclaimer: I wrote it, but it is free open source so I have no conflict of interest) as:
//--------------------------------------------------------------------------------
// Decoded Application Collection
//--------------------------------------------------------------------------------
/*
06 A0FF (GLOBAL) USAGE_PAGE 0xFFA0 Vendor-defined
09 01 (LOCAL) USAGE 0xFFA00001 <-- Warning: Undocumented usage (document it by inserting 0001 into file FFA0.conf)
A1 01 (MAIN) COLLECTION 0x01 Application (Usage=0xFFA00001: Page=Vendor-defined, Usage=, Type=) <-- Error: COLLECTION must be preceded by a known USAGE
09 02 (LOCAL) USAGE 0xFFA00002 <-- Warning: Undocumented usage (document it by inserting 0002 into file FFA0.conf)
A1 00 (MAIN) COLLECTION 0x00 Physical (Usage=0xFFA00002: Page=Vendor-defined, Usage=, Type=) <-- Error: COLLECTION must be preceded by a known USAGE
06 A1FF (GLOBAL) USAGE_PAGE 0xFFA1 Vendor-defined
09 03 (LOCAL) USAGE 0xFFA10003 <-- Warning: Undocumented usage (document it by inserting 0003 into file FFA1.conf)
09 04 (LOCAL) USAGE 0xFFA10004 <-- Warning: Undocumented usage (document it by inserting 0004 into file FFA1.conf)
15 80 (GLOBAL) LOGICAL_MINIMUM 0x80 (-128)
25 7F (GLOBAL) LOGICAL_MAXIMUM 0x7F (127)
35 00 (GLOBAL) PHYSICAL_MINIMUM 0x00 (0) <-- Info: Consider replacing 35 00 with 34
45 FF (GLOBAL) PHYSICAL_MAXIMUM 0xFF (-1)
75 08 (GLOBAL) REPORT_SIZE 0x08 (8) Number of bits per field
95 08 (GLOBAL) REPORT_COUNT 0x08 (8) Number of fields
81 02 (MAIN) INPUT 0x00000002 (8 fields x 8 bits) 0=Data 1=Variable 0=Absolute 0=NoWrap 0=Linear 0=PrefState 0=NoNull 0=NonVolatile 0=Bitmap <-- Error: PHYSICAL_MAXIMUM (-1) is less than PHYSICAL_MINIMUM (0)
09 05 (LOCAL) USAGE 0xFFA10005 <-- Warning: Undocumented usage (document it by inserting 0005 into file FFA1.conf)
09 06 (LOCAL) USAGE 0xFFA10006 <-- Warning: Undocumented usage (document it by inserting 0006 into file FFA1.conf)
15 80 (GLOBAL) LOGICAL_MINIMUM 0x80 (-128) <-- Redundant: LOGICAL_MINIMUM is already -128
25 7F (GLOBAL) LOGICAL_MAXIMUM 0x7F (127) <-- Redundant: LOGICAL_MAXIMUM is already 127
35 00 (GLOBAL) PHYSICAL_MINIMUM 0x00 (0) <-- Redundant: PHYSICAL_MINIMUM is already 0 <-- Info: Consider replacing 35 00 with 34
45 FF (GLOBAL) PHYSICAL_MAXIMUM 0xFF (-1) <-- Redundant: PHYSICAL_MAXIMUM is already -1
75 08 (GLOBAL) REPORT_SIZE 0x08 (8) Number of bits per field <-- Redundant: REPORT_SIZE is already 8
95 08 (GLOBAL) REPORT_COUNT 0x08 (8) Number of fields <-- Redundant: REPORT_COUNT is already 8
91 02 (MAIN) OUTPUT 0x00000002 (8 fields x 8 bits) 0=Data 1=Variable 0=Absolute 0=NoWrap 0=Linear 0=PrefState 0=NoNull 0=NonVolatile 0=Bitmap <-- Error: PHYSICAL_MAXIMUM (-1) is less than PHYSICAL_MINIMUM (0)
C0 (MAIN) END_COLLECTION Physical <-- Warning: Physical units are still in effect PHYSICAL(MIN=0,MAX=-1) UNIT(0x,EXP=0)
C0 (MAIN) END_COLLECTION Application <-- Warning: Physical units are still in effect PHYSICAL(MIN=0,MAX=-1) UNIT(0x,EXP=0)
*/
//--------------------------------------------------------------------------------
// Vendor-defined inputReport (Device --> Host)
//--------------------------------------------------------------------------------
typedef struct
{
// No REPORT ID byte
// Collection: CA: CP:
int8_t VEN_0003; // Usage 0xFFA10003: , Value = -128 to 127, Physical = (Value + 128) x -1 / 255
int8_t VEN_0004[7]; // Usage 0xFFA10004: , Value = -128 to 127, Physical = (Value + 128) x -1 / 255
} inputReport_t;
//--------------------------------------------------------------------------------
// Vendor-defined outputReport (Device <-- Host)
//--------------------------------------------------------------------------------
typedef struct
{
// No REPORT ID byte
// Collection: CA: CP:
int8_t VEN_0005; // Usage 0xFFA10005: , Value = -128 to 127, Physical = (Value + 128) x -1 / 255
int8_t VEN_0006[7]; // Usage 0xFFA10006: , Value = -128 to 127, Physical = (Value + 128) x -1 / 255
} outputReport_t;
As you can see, the above HID descriptor has some issues (for example, physical maximum 45 FF is -1, but I think they meant 255 - which should be represented as 46 FF 00) but the problem remains that it tells you nothing about the meaning of the usages. BTW, even Wireshark has not reported the logical minimum correctly: 15 80 is -128 not 128.
All we can tell from it is that the reports are 8-bytes long and that the first byte seems to be some kind of id (well, its usage is different from the remaining 7 bytes).
Only the vendor's driver knows how to interpret the reports, but with a sufficient number of Wireshark packet captures obtained under controlled conditions you may be able reverse engineer a workable interpretation.
Sorry, but that's the best I can do with this.
I bought a decibelimeter too, which happens to be compatible with your model. I am currently trying to port this code to a bash script: https://github.com/dobra-noc/gm1356 which works fine for me with my device (which btw isn't even the gm1356) and I'm guessing it will work for you too.

Microsoft Visual Studio 2017 has stopped working - after computer sleep

My visual studio keeps crashing after waking up my computer from sleep.
Spectacular is that sometimes it blocks mouse and keyboard. Mouse moves in a speed of few pixels every 5 seconds, for pressed key on the keyboard you wait like 10 seconds. Highly unusual, because mouse and keyboard have usually the highest priority no matter what happens. With parallelly installed VS2015 and VS2013 there is no problem (therefore I suppose it is not caused by Resharper).
Program and system info:
VS Community 2017, Version 15.2 (26430.12)
Using Resharper Ultimate 2017.1.2
OS: Windows 8.1 Pro, Version 6.3.9600
Edit:
By following the suggestion about writing and reading crash dump by WinDbg I got this error description (memory corruption there is frightening ...). Any further suggestion will be appreciated.
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** WARNING: Unable to verify checksum for mscorlib.ni.dll
*** WARNING: Unable to verify checksum for PresentationFramework.ni.dll
*** WARNING: Unable to verify checksum for WindowsBase.ni.dll
*** WARNING: Unable to verify checksum for Microsoft.VisualStudio.Shell.15.0.ni.dll
*** WARNING: Unable to verify checksum for System.ni.dll
*** WARNING: Unable to verify checksum for Microsoft.CodeAnalysis.Features.ni.dll
*** WARNING: Unable to verify checksum for Microsoft.CodeAnalysis.Workspaces.ni.dll
*** WARNING: Unable to verify checksum for Microsoft.CodeAnalysis.EditorFeatures.Text.ni.dll
*** WARNING: Unable to verify checksum for Microsoft.Build.ni.dll
*** ERROR: Symbol file could not be found. Defaulted to export symbols for libleveldb.dll -
*** WARNING: Unable to verify checksum for Microsoft.CodeAnalysis.EditorFeatures.ni.dll
*** WARNING: Unable to verify checksum for System.Runtime.Remoting.ni.dll
DEBUG_FLR_EXCEPTION_CODE(80131509) and the ".exr -1" ExceptionCode(e0434352) don't match
DUMP_CLASS: 2
DUMP_QUALIFIER: 400
CONTEXT: (.ecxr)
eax=55f9e8c0 ebx=00000005 ecx=00000005 edx=00000000 esi=55f9e980 edi=00000001
eip=760f2f71 esp=55f9e8c0 ebp=55f9e918 iopl=0 nv up ei pl nz ac pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000216
KERNELBASE!RaiseException+0x48:
760f2f71 8b4c2454 mov ecx,dword ptr [esp+54h] ss:002b:55f9e914=ee800b58
Resetting default scope
FAULTING_IP:
KERNELBASE!RaiseException+48
760f2f71 8b4c2454 mov ecx,dword ptr [esp+54h]
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 760f2f71 (KERNELBASE!RaiseException+0x00000048)
ExceptionCode: e0434352 (CLR exception)
ExceptionFlags: 00000001
NumberParameters: 5
Parameter[0]: 80131509
Parameter[1]: 00000000
Parameter[2]: 00000000
Parameter[3]: 00000000
Parameter[4]: 715a0000
PROCESS_NAME: devenv.exe
ERROR_CODE: (NTSTATUS) 0xe0434352 -
EXCEPTION_CODE: (HRESULT) 0x80131509 (2148734217) -
EXCEPTION_CODE_STR: 80131509
EXCEPTION_PARAMETER1: 80131509
EXCEPTION_PARAMETER2: 00000000
EXCEPTION_PARAMETER3: 00000000
EXCEPTION_PARAMETER4: 0
WATSON_BKT_PROCSTAMP: 59275f23
WATSON_BKT_PROCVER: 15.0.26430.12
PROCESS_VER_PRODUCT: Microsoft® Visual Studio®
WATSON_BKT_MODULE: KERNELBASE.dll
WATSON_BKT_MODSTAMP: 53eeb460
WATSON_BKT_MODOFFSET: 12f71
WATSON_BKT_MODVER: 6.3.9600.17278
MODULE_VER_PRODUCT: Microsoft® Windows® Operating System
BUILD_VERSION_STRING: 6.3.9600.17056 (winblue_gdr.140319-1520)
DETOURED_IMAGE: 1
MODLIST_WITH_TSCHKSUM_HASH: fb123f85e82dc66a6aaa47baaf54a8d6c688d06a
MODLIST_SHA1_HASH: 5d338567fc7fda41c0d3a856681d67a8a2273337
NTGLOBALFLAG: 0
PROCESS_BAM_CURRENT_THROTTLED: 0
PROCESS_BAM_PREVIOUS_THROTTLED: 0
APPLICATION_VERIFIER_FLAGS: 0
CHKIMG_EXTENSION: !chkimg -lo 50 -d !KERNELBASE
760ed598-760ed59c 5 bytes - KERNELBASE!GetModuleHandleW
[ 8b ff 55 8b ec:e9 20 3c 57 eb ]
760edbce-760edbd2 5 bytes - KERNELBASE!GetModuleHandleExW (+0x636)
[ 8b ff 55 8b ec:e9 41 34 57 eb ]
760f1bc6-760f1bca 5 bytes - KERNELBASE!RegCloseKey (+0x3ff8)
[ 8b ff 55 8b ec:e9 84 47 94 8a ]
760f213e-760f2142 5 bytes - KERNELBASE!RegQueryValueExW (+0x578)
[ 8b ff 55 8b ec:e9 8c 42 94 8a ]
760f2441-760f2445 5 bytes - KERNELBASE!RegOpenKeyExW (+0x303)
[ 8b ff 55 8b ec:e9 2d 3a 94 8a ]
760f30c6-760f30ca 5 bytes - KERNELBASE!FreeLibrary (+0xc85)
[ 8b ff 55 8b ec:e9 9c e1 56 eb ]
760f30f4-760f30f8 5 bytes - KERNELBASE!LoadLibraryExW (+0x2e)
[ 8b ff 55 8b ec:e9 52 e4 56 eb ]
760f772e-760f7732 5 bytes - KERNELBASE!RegOpenKeyExA (+0x463a)
[ 8b ff 55 8b ec:e9 e8 f1 93 8a ]
760f7e99-760f7e9d 5 bytes - KERNELBASE!RegQueryValueExA (+0x76b)
[ 8b ff 55 8b ec:e9 1c ea 93 8a ]
760fa5b1-760fa5b5 5 bytes - KERNELBASE!RegCreateKeyExW (+0x2718)
[ 8b ff 55 8b ec:e9 45 c4 93 8a ]
760fe5b7-760fe5bb 5 bytes - KERNELBASE!RegCreateKeyExA (+0x4006)
[ 8b ff 55 8b ec:e9 75 e7 93 8a ]
76100273-76100279 7 bytes - KERNELBASE!RegQueryInfoKeyW (+0x1cbc)
[ 6a 48 68 d0 03 10 76:e9 2d 65 93 8a cc cc ]
7610049e-761004a4 7 bytes - KERNELBASE!RegDeleteValueW (+0x22b)
[ 6a 20 68 28 05 10 76:e9 e6 83 93 8a cc cc ]
76100fd0-76100fd6 7 bytes - KERNELBASE!RegEnumValueW (+0xb32)
[ 6a 38 68 f8 10 10 76:e9 7d 58 93 8a cc cc ]
76102ad4-76102ad8 5 bytes - KERNELBASE!RegEnumKeyExA (+0x1b04)
[ 68 58 02 00 00:e9 09 6e 93 8a ]
76106c9a-76106c9e 5 bytes - KERNELBASE!RegEnumKeyExW (+0x41c6)
[ 8b ff 55 8b ec:e9 45 fb 92 8a ]
7610b27e-7610b284 7 bytes - KERNELBASE!RegEnumValueA (+0x45e4)
[ 6a 60 68 08 b5 10 76:e9 21 fe 96 8a cc cc ]
7611f3f4-7611f3fa 7 bytes - KERNELBASE!RegQueryInfoKeyA (+0x14176)
[ 6a 60 68 88 f5 11 76:e9 1b bd 95 8a cc cc ]
7612107a-7612107e 5 bytes - KERNELBASE!RegDeleteKeyExW (+0x1c86)
[ 8b ff 55 8b ec:e9 5f 66 91 8a ]
7612489c-761248a2 7 bytes - KERNELBASE!RegDeleteValueA (+0x3822)
[ 6a 20 68 30 49 12 76:e9 c6 67 95 8a cc cc ]
7617240a-7617240e 5 bytes - KERNELBASE!RegDeleteKeyExA (+0x4db6e)
[ 8b ff 55 8b ec:e9 ed 8b 90 8a ]
117 errors : !KERNELBASE (760ed598-7617240e)
PRODUCT_TYPE: 1
SUITE_MASK: 272
DUMP_FLAGS: 8000c07
DUMP_TYPE: 3
MISSING_CLR_SYMBOL: 0
MANAGED_EXCEPTION_HRESULT: 80131509
ANALYSIS_SESSION_HOST: KOMP
ANALYSIS_SESSION_TIME: 06-11-2017 08:17:44.0657
ANALYSIS_VERSION: 10.0.15063.400 x86fre
MANAGED_CODE: 1
MANAGED_ENGINE_MODULE: clr
MANAGED_ANALYSIS_PROVIDER: SOS
MANAGED_THREAD_ID: c44
MANAGED_EXCEPTION_ADDRESS: 905fdb68
LAST_CONTROL_TRANSFER: from 716f0245 to 760f2f71
THREAD_ATTRIBUTES:
FAULTING_THREAD: ffffffff
THREAD_SHA1_HASH_MOD_FUNC: 8b084063f74c10f14fd5a9c68991db600ea504a6
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: ba90ca3a69862e0865ec0206bec2d1add0fdbffe
ADDITIONAL_DEBUG_TEXT: SOS.DLL is not loaded for managed code. Analysis might be incomplete ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
OS_LOCALE: ENU
PROBLEM_CLASSES:
ID: [0n237]
Type: [CLR_EXCEPTION]
Class: Primary
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [0xc44]
Frame: [0] : KERNELBASE!RaiseException
ID: [0n235]
Type: [#ManagedObjectName]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Omit
Data: Add
String: [System.InvalidOperationException]
PID: [Unspecified]
TID: [Unspecified]
Frame: [0]
ID: [0n203]
Type: [MEMORY_CORRUPTION]
Class: Primary
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [0x988]
TID: [0xc44]
Frame: [Unspecified]
ID: [0n151]
Type: [PATCH]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [0x988]
TID: [0xc44]
Frame: [Unspecified]
ID: [0n234]
Type: [NOSOS]
Class: Addendum
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [Unspecified]
Frame: [0]
BUGCHECK_STR: CLR_EXCEPTION_System.InvalidOperationException_NOSOS_MEMORY_CORRUPTION_PATCH
DEFAULT_BUCKET_ID: CLR_EXCEPTION_System.InvalidOperationException_NOSOS_MEMORY_CORRUPTION_PATCH
PRIMARY_PROBLEM_CLASS: CLR_EXCEPTION
STACK_TEXT:
00000000 00000000 memory_corruption!KERNELBASE+0x0
STACK_COMMAND: !sos.pe 0x905fdb68 ; ** Pseudo Context ** ; kb
THREAD_SHA1_HASH_MOD: 7da7fbec386ce361a40d03d69a994bc4836f03e8
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: memory_corruption!KERNELBASE
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: memory_corruption
DEBUG_FLR_IMAGE_TIMESTAMP: 0
FAILURE_BUCKET_ID: CLR_EXCEPTION_System.InvalidOperationException_NOSOS_MEMORY_CORRUPTION_PATCH_80131509_memory_corruption!KERNELBASE
BUCKET_ID: CLR_EXCEPTION_System.InvalidOperationException_NOSOS_MEMORY_CORRUPTION_PATCH_DETOURED_memory_corruption!KERNELBASE
FAILURE_EXCEPTION_CODE: 80131509
IMAGE_NAME: memory_corruption
FAILURE_IMAGE_NAME: memory_corruption
BUCKET_ID_IMAGE_STR: memory_corruption
FAILURE_MODULE_NAME: memory_corruption
BUCKET_ID_MODULE_STR: memory_corruption
FAILURE_FUNCTION_NAME: KERNELBASE
BUCKET_ID_FUNCTION_STR: KERNELBASE
BUCKET_ID_OFFSET: 0
BUCKET_ID_MODTIMEDATESTAMP: 0
BUCKET_ID_MODCHECKSUM: 0
BUCKET_ID_MODVER_STR: 0.0.0.0
BUCKET_ID_PREFIX_STR: CLR_EXCEPTION_System.InvalidOperationException_NOSOS_
FAILURE_PROBLEM_CLASS: CLR_EXCEPTION
FAILURE_SYMBOL_NAME: memory_corruption!KERNELBASE
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/devenv.exe/15.0.26430.12/59275f23/KERNELBASE.dll/6.3.9600.17278/53eeb460/80131509/00012f71.htm?Retriage=1
TARGET_TIME: 2017-06-10T19:22:40.000Z
OSBUILD: 9600
OSSERVICEPACK: 17056
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
OSPLATFORM_TYPE: x86
OSNAME: Windows 8.1
OSEDITION: Windows 8.1 WinNt SingleUserTS
USER_LCID: 0
OSBUILD_TIMESTAMP: 2014-03-20 00:55:24
BUILDDATESTAMP_STR: 140319-1520
BUILDLAB_STR: winblue_gdr
BUILDOSVER_STR: 6.3.9600.17056
ANALYSIS_SESSION_ELAPSED_TIME: 9ad5
ANALYSIS_SOURCE: UM
FAILURE_ID_HASH_STRING: um:clr_exception_system.invalidoperationexception_nosos_memory_corruption_patch_80131509_memory_corruption!kernelbase
FAILURE_ID_HASH: {1100017e-170d-400c-940f-f475e873df74}
Followup: MachineOwner
---------

GNU inline assembly optimisation

I am trying to write a small library for highly optimised x86-64 bit operation code and am fiddling with inline asm.
While testing this particular case has caught my attention:
unsigned long test = 0;
unsigned long bsr;
// bit test and set 39th bit
__asm__ ("btsq\t%1, %0 " : "+rm" (test) : "rJ" (39) );
// bit scan reverse (get most significant bit id)
__asm__ ("bsrq\t%1, %0" : "=r" (bsr) : "rm" (test) );
printf("test = %lu, bsr = %d\n", test, bsr);
compiles and runs fine in both gcc and icc, but when I inspect the assembly I get differences
gcc -S -fverbose-asm -std=gnu99 -O3
movq $0, -8(%rbp)
## InlineAsm Start
btsq $39, -8(%rbp)
## InlineAsm End
movq -8(%rbp), %rax
movq %rax, -16(%rbp)
## InlineAsm Start
bsrq -16(%rbp), %rdx
## InlineAsm End
movq -8(%rbp), %rsi
leaq L_.str(%rip), %rdi
xorb %al, %al
callq _printf
I am wondering why so complicated? I am writing high performance code in which the number of instructions is critical. I am especially wondering why gcc makes a copy of my variable test before passing it to the second inline asm?
Same code compiled with icc gives far better results:
xorl %esi, %esi # test = 0
movl $.L_2__STRING.0, %edi # has something to do with printf
orl $32832, (%rsp) # part of function initiation
xorl %eax, %eax # has something to do with printf
ldmxcsr (%rsp) # part of function initiation
btsq $39, %rsi #106.0
bsrq %rsi, %rdx #109.0
call printf #111.2
despite the fact that gcc decides to keep my variables on the stack rather then in registers, what I do not understand is why make a copy of test before passing it to the second asm?
If I put test in as an input/output variable in the second asm
__asm__ ("bsrq\t%1, %0" : "=r" (bsr) , "+rm" (test) );
then those lines disappear.
movq $0, -8(%rbp)
## InlineAsm Start
btsq $39, -8(%rbp)
## InlineAsm End
## InlineAsm Start
bsrq -8(%rbp), %rdx
## InlineAsm End
movq -8(%rbp), %rsi
leaq L_.str(%rip), %rdi
xorb %al, %al
callq _printf
Is this gcc screwed up optimisation or am I missing some vital compiler switches? I do have icc for my production system, but if I decide to distribute the source code at some point then it will have to be able to compile with gcc too.
compilers used:
gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.1.00)
icc Version 12.0.2
I've tried your example on Linux like this (making it "evil" by forcing a stack ref/loc for test through using &test in the printf:):#include <stdio.h>
int main(int argc, char **argv)
{
unsigned long test = 0;
unsigned long bsr;
// bit test and set 39th bit
asm ("btsq\t%1, %0 " : "+rm" (test) : "rJ" (39) );
// bit scan reverse (get most significant bit id)
asm ("bsrq\t%1, %0" : "=r" (bsr) : "rm" (test) );
printf("test = %lu, bsr = %d, &test = %p\n", test, bsr, &test);
return 0;
}
and compiled it with various versions of gcc -O3 ... to the following results:
code generated gcc version
================================================================================
400630: 48 83 ec 18 sub $0x18,%rsp 4.7.2,
400634: 31 c0 xor %eax,%eax 4.6.2,
400636: bf 50 07 40 00 mov $0x400750,%edi 4.4.6
40063b: 48 8d 4c 24 08 lea 0x8(%rsp),%rcx
400640: 48 0f ba e8 27 bts $0x27,%rax
400645: 48 89 44 24 08 mov %rax,0x8(%rsp)
40064a: 48 89 c6 mov %rax,%rsi
40064d: 48 0f bd d0 bsr %rax,%rdx
400651: 31 c0 xor %eax,%eax
400653: e8 68 fe ff ff callq 4004c0
[ ... ]
---------------------------------------------------------------------------------
4004f0: 48 83 ec 18 sub $0x18,%rsp 4.1
4004f4: 31 c0 xor %eax,%eax
4004f6: bf 28 06 40 00 mov $0x400628,%edi
4004fb: 48 8d 4c 24 10 lea 0x10(%rsp),%rcx
400500: 48 c7 44 24 10 00 00 00 00 movq $0x0,0x10(%rsp)
400509: 48 0f ba e8 27 bts $0x27,%rax
40050e: 48 89 44 24 10 mov %rax,0x10(%rsp)
400513: 48 89 c6 mov %rax,%rsi
400516: 48 0f bd d0 bsr %rax,%rdx
40051a: 31 c0 xor %eax,%eax
40051c: e8 c7 fe ff ff callq 4003e8
[ ... ]
---------------------------------------------------------------------------------
400500: 48 83 ec 08 sub $0x8,%rsp 3.4.5
400504: bf 30 06 40 00 mov $0x400630,%edi
400509: 31 c0 xor %eax,%eax
40050b: 48 c7 04 24 00 00 00 00 movq $0x0,(%rsp)
400513: 48 89 e1 mov %rsp,%rcx
400516: 48 0f ba 2c 24 27 btsq $0x27,(%rsp)
40051c: 48 8b 34 24 mov (%rsp),%rsi
400520: 48 0f bd 14 24 bsr (%rsp),%rdx
400525: e8 fe fe ff ff callq 400428
[ ... ]
---------------------------------------------------------------------------------
4004e0: 48 83 ec 08 sub $0x8,%rsp 3.2.3
4004e4: bf 10 06 40 00 mov $0x400610,%edi
4004e9: 31 c0 xor %eax,%eax
4004eb: 48 c7 04 24 00 00 00 00 movq $0x0,(%rsp)
4004f3: 48 0f ba 2c 24 27 btsq $0x27,(%rsp)
4004f9: 48 8b 34 24 mov (%rsp),%rsi
4004fd: 48 89 e1 mov %rsp,%rcx
400500: 48 0f bd 14 24 bsr (%rsp),%rdx
400505: e8 ee fe ff ff callq 4003f8
[ ... ]
and while there's a significant difference in the created code (including whether the bsr acceesses test as register or memory), none of the tested revs recreate the assembly that you've shown. I'd suspect a bug in the 4.2.x version you used on MacOSX, but then I don't have either your testcase nor that specific compiler version available.
Edit: The code above is obviously different in the sense that it forces test into the stack; if that is not done, then all "plain" gcc versions I've tested do a direct pair bts $39, %rsi / bsr %rsi, %rdx.
I have found, though, that clang creates different code there: 140: 50 push %rax
141: 48 c7 04 24 00 00 00 00 movq $0x0,(%rsp)
149: 31 f6 xor %esi,%esi
14b: 48 0f ba ee 27 bts $0x27,%rsi
150: 48 89 34 24 mov %rsi,(%rsp)
154: 48 0f bd d6 bsr %rsi,%rdx
158: bf 00 00 00 00 mov $0x0,%edi
15d: 30 c0 xor %al,%al
15f: e8 00 00 00 00 callq printf#plt>so the difference seems to be indeed between the code generators of clang/llvm and "gcc proper".