How to decode this relocated call? - elf

I am trying to learn a thing or two about assembly language by studying the instructions in some shared objects. I have encountered a construction where a call instruction leads to 1 byte after its beginning, for example (output from hte):
af6fc | e8fcffffff call af6fdh
Clearly the destination address must be replaced by a proper function (which I know in this case is strcmp). I find this strange because in other parts of the same shared object the same strcmp function is called using the .got / .plt mechanism eliminating the need to rewrite parts of .text. In the latter case the destination function can be identified by studying the .rel.plt table along with .dynsym. But how do I find where the immediate address is redirected to in the former? I could not find any occurrence of the addresses af6fc or af6fd in any of the sections, at least not in those made accessible by hte.

You didn't say which platform you are on. It appears to be ix86.
On ix86, it is possible to link non--fPIC compiled code into a shared library (this produces a library with text relocations, which is suboptimal).
If you dump dynamic relocations with objdump -R foo.so, you should see that there is a relocation against address 0xaf6fd. The dynamic linker will update 4 bytes at 0xaf6fd to point to wherever the relocation tells it to after loading foo.so.
in other parts of the same shared object the same strcmp function is called using the .got / .plt mechanism
These calls come from objects that were (properly) compiled with -fPIC.

Related

How one refers to an external global variable in Squeak FFI

For interfacing HDF5, I need to get a reference to/value of an external variable exported by the library, and pass it to other external functions.
For example, there is a variable representing the native double type: H5T_NATIVE_DOUBLE_g.
In VW, DLLCC, this is achievable thru a declarative annotation:
H5T_NATIVE_DOUBLE_g
<C: hid_t H5T_NATIVE_DOUBLE_g>
Is there anything similar in Squeak FFI? Is there any support for such use case?
After inquiry, there seem to be at least a basic support in the (Threaded)FFIPlugin:
ExternalAddress class>>loadSymbol: moduleSymbol module: module
<primitive: 'primitiveLoadSymbolFromModule' module: 'SqueakFFIPrims'>
^ self primitiveFailed
So we might create an ExternalData, get its address via above message (passing the global variable name and a given ExternalLibrary as module argument), and specify its type.
We can then use this ExternalData to pass the address if the external function expects a pointer.
To pass the value, one needs to dereference the address, not sure that it is automated by the plugin...
In my case, I know I will have to pass the value, so I may directly de-reference the address to get the value and store that (assuming that the global variable is assigned once at initialization and won't change afterward and assuming that initialization has already occured at library load time - lots of application specific assumptions...).

Distinguish two functions with the same name

I want to use multiple external CMake files in my project. Unfortunately two different files use the same CMake function name foo. I don't want to modify these external files.
Is there a way to call one specific function or will CMake error out? Would it help if one of the functions has a named parameter, i.e., foo(a b c …) and foo(DESTINATION a b c …)?
New function's definition replaces the previous one with the same name. So access to the previous function is lost.
If different functions (but with the same name) are used in different subprojects, you may try to build one subproject as ExternalProject, so function's collision wouldn't occure.
In CMake any function definitions contains the only piece of information for the caller - minimal number of parameters which should be passed to the function. By using this information it is impossible to resolve function's overloading, if it would be implemented.

ABAP type pool: program with type code TYPP but with name longer than five characters

We are writing a tool in Java that parses and transforms ABAP code. We therefore have no intention to write new ABAP code but our tool has to handle all of ABAP, even obsolete statements. Furthermore, I'm not an ABAP expert.
ABAP programs can use type groups, introduced by key word TYPE-POOL. Names of type groups have a maximal length of five (internally eight, if you count the prefix "% C"), their type code is TYPP. In the past, relying on these assumptions worked well for us.
Recently, we see ABAP programs with type code TYPP but with name longer than 5, e.g., 'OIA===========================P'. Furthermore, for each of those, there is another, empty object with same name but type code INCL. These new objects are referenced only if a regular type group is, too.
These new objects may be internal ones and irrelevant for us - I haven't seen any reference to them in the ABAP Keyword Documentation. On the other hand, they are confusing us because we see them.
Can someone explain to me the meaning of these objects and point me to some documentation?
Edit: Here examples from an EHP7 for SAP ERP 6.0 system
An example object. Entries in D010INC look fine:
The same object now using type pool mrm. Where do the additional includes come from?
These objects are introduced through inclusions, extensions and switched objects. To read along:
Check type pool MRM, type mrm_idoc_data_ers - that type contains a statement to include rmrm_idoc_data_ers_sbo. A similar include statement pulls rmrm_upd_arseg_nfm into mrm_upd_arseg. That explains the last two lines. Your parser should have caught that.
RMRM_IDOC_DATA_ERS_SBO contains an enhancement point named RMRM_IDOC_DATA_ERS_SBO_02 that belongs to an enhancement spot ES_RMRM_IDOC_DATA_ERS_SBO. Similarly, RMRM_UPD_ARSEG_NFM contains an enhancement point RMRM_UPD_ARSEG_NFM_01 that belongs to the enhancement spot ES_RMRM_UPD_ARSEG_NFM.
For ES_RMRM_IDOC_DATA_ERS_SBO, an enhancement implementation named ISAUTO_MRM_RMRM_IDOC_DATA_ERS exists. For ES_RMRM_UPD_ARSEG_NFM, an implementation named /NFM/MM_RMRM_UPD_ARSEG_NFM exists. That explains the references ending with =E
The implementation ISAUTO_MRM_RMRM_IDOC_DATA_ERS is located in the package ISAUTO_MRM. The implementation /NFM/MM_RMRM_UPD_ARSEG_NFM is located in the package /NFM/MM. That explains the references ending with =P. Obviously, these references are not generated for every package:
The package ISAUTO_MRM is controlled by the switch AM_ERS, the package /NFM/MM is controlled by the switch /NFM/MM. That explains the references ending in =S.
Ultimately, these references can be used to determine which programs need to be re-generated when the state of a switch is changed.

In an elf file, how to find which compile unit the variable is defined?

I'm writing a tool to add variables to a2l file, the input is elf file.
For searching which compile unit (CU) has the variable, I have to search through all CUs (till meet the variable).
Because the SW is very big, it takes time to find a variable.
I would like to know if there's any faster way to know which CU the variable is defined ?
The DWARF standard includes an optional section, .debug_pubnames, that provides name-to-offset translation for global objects and functions.
Another approach is to use the symbol table. If it has an entry for the variable then you can use its address with the optional .debug_aranges section or, failing that, read every DW_TAG_compile_unit looking for the one with the enclosing address range.

Semantics of GCC hot attribute

Assume I have a compilation unit consisting of three functions, A, B, and C. A is invoked once from a function external to the compilation unit (e.g. it's an entry point or callback); B is invoked many times by A (e.g. it's invoked in a tight loop); C is invoked once by each invocation of B (e.g. it's a library function).
The entire path through A (passing through B and C) is performance-critical, though the performance of A itself is non-critical (as most time is spent in B and C).
What is the minimal set of functions which one should annotate with __attribute__ ((hot)) to effect more aggressive optimization of this path? Assume we cannot use -fprofile-generate.
Equivalently: Does __attribute__ ((hot)) mean "optimize the body of this function", "optimize calls to this function", "optimize all descendant calls this function makes", or some combination thereof?
The GCC info page does not clearly address these questions.
Official documentation:
hot
The hot attribute on a function is used to inform the compiler that the function is a hot spot of the compiled program. The function is optimized more aggressively and on many target it is placed into special subsection of the text section so all hot functions appears close together improving locality.
When profile feedback is available, via -fprofile-use, hot functions are automatically detected and this attribute is ignored.
The hot attribute on functions is not implemented in GCC versions earlier than 4.3.
The hot attribute on a label is used to inform the compiler that path following the label are more likely than paths that are not so annotated. This attribute is used in cases where __builtin_expect cannot be used, for instance with computed goto or asm goto.
The hot attribute on labels is not implemented in GCC versions earlier than 4.8.
2007:
__attribute__((hot))
Hint that the marked function is "hot" and should be optimized more aggresively and/or placed near other "hot" functions (for cache locality).
Gilad Ben-Yossef:
As their name suggests, these function attributes are used to hint the compiler that the corresponding functions are called often in your code (hot) or seldom called (cold).
The compiler can then order the code in branches, such as if statements, to favour branches that call these hot functions and disfavour functions cold functions, under the assumption that it is more likely that that the branch that will be taken will call a hot function and less likely to call a cold one.
In addition, the compiler can choose to group together functions marked as hot in a special section in the generated binary, on the premise that since data and instruction caches work based on locality, or the relative distance of related code and data, putting all the often called function together will result in better caching of their code for the entire application.
Good candidates for the hot attribute are core functions which are called very often in your code base. Good candidates for the cold attribute are internal error handling functions which are called only in case of errors.
So, according to these sources, __attribute__ ((hot)) means:
optimize calls to this function
optimize the body of this function
put body of this function to .hot section (to group all hot code in one location)
After source code analysis we can say that "hot" attribute is checked with (lookup_attribute ("hot", DECL_ATTRIBUTES (current_function_decl)); and when it is true, the functions's node->frequency is set to NODE_FREQUENCY_HOT (predict.c, compute_function_frequency()).
If the function has frequency as NODE_FREQUENCY_HOT,
If there is no profile information and no likely/unlikely on branches, maybe_hot_frequency_p will return true for the function (== "...frequency FREQ is considered to be hot."). This turns value of maybe_hot_bb_p into true for all Basic Blocks (BB) in the function ("BB can be CPU intensive and should be optimized for maximal performance.") and maybe_hot_edge_p true for all edges in function. In turn in non -Os-modes these BB and edges and also loops will be optimized for speed, not for size.
For all outbound call edges from this function, cgraph_maybe_hot_edge_p will return true ("Return true if the call can be hot."). This flag is used in IPA (ipa-inline.c, ipa-cp.c, ipa-inline-analysis.c) and influence inline and cloning decisions