String table in ELF - elf

I get some symbol and I get (a hexdump of) an ELF file. How can I know in which section this symbol appears?
What is the difference between .strtab and .shstrtab? Is there another array of symbol strings?
When I get an index for the symbol names table, is it an index in .strtab or in .shstrtab?

For the first question, we would need the hexedit of the elf file to understand properly.
For the second question -
strtab stands for String Table
shstrtab stands for Section Header String table.
When we read ELF header, we see that every ElfHeader structure contains a member called e_shstrndx. This is an index to the shstrtab. If you use this index and then read from shstrtab you can find the name of that section.
strtab, is the string table for all other references. When you read symbols from an ELF object, every SYmbol structure (Elf32_Sym) has a member called st_name. This is an index into strtab to get the string name of that symbol.
Can you please elaborate more on array of symbol strings? Also, what do you mean by names table?
You can refer to the following link -
Reading ELF String Table on Linux from C
Hope this answers your question.

I will take a stab at the first question since Samir answered the second one so well.
The symbol's name will be in one of the STRTAB sections, and then there will be an entry in the symbol table (one of the SYMTAB or DYNSYM sections) which references that string by an offset in the containing section. The entry in the symbol table can tell you the index of the section this symbol is found in, but not where it is used.
For that you need to check the relocation table, contained in sections of type REL; common names include .rel.dyn, .rel.plt. A relocation table lists all the references to symbol in one other code section, i.e. code and relocation sections are paired. Each entry in the table is one "usage" of a symbol, and contains the offset in the corresponding section where the usage is and the index of the symbol in the symbol table.
If you can use the readelf utility, you can easily use readelf -r <binary> | grep <symbol name> to get all the references to a symbol.
If you are set on using hexedit/cannot use readelf, then you would need to
Find the offset of the symbol name string in the binary, what section that is in, and then get the offset of that string in that section;
Look through all the entries in the symbol table and find which one(s) match that name (st_name == offset of string in the string section);
Look through all entries in each relocation table to find symbol usages of that symbol in the corresponding code section for that table. The r_info field of each entry contains the index of the symbol table entry it references (this index is bitmapped to part of r_info, and at different places for 32- and 64-bit).
All relocation entries matching that symbol table index are usages of your string somewhere.
More info:
Symbol table: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-79797.html
Relocation table: https://docs.oracle.com/cd/E19683-01/816-1386/6m7qcoblj/index.html#chapter6-54839

Related

ERROR: extra data after last expected column on PostgreSQL while the number of columns is the same

I am new to PostgreSQL and I need to import a set of csv files, but some of them weren't imported successfully. I got the same error with these files: ERROR: extra data after last expected column. I have investigated this error report and learned that these errors occur might because the number of columns of the table is not equal to that in the file. But I don't think I am in this situation.
For example, I create this table:
CREATE TABLE cast_info (
id integer NOT NULL PRIMARY KEY,
person_id integer NOT NULL,
movie_id integer NOT NULL,
person_role_id integer,
note character varying,
nr_order integer,
role_id integer NOT NULL
);
And then I want to copy the csv file:
COPY cast_info FROM '/private/tmp/cast_info.csv' WITH CSV HEADER;
Then I got the error:
**ERROR: extra data after last expected column
CONTEXT: COPY cast_info, line 8801: "612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions..."**
The complete row in this csv file is as follows:
612,207,2222077,1,"(segments \"Homies\" - \"Tilt A Whirl\" - \"We don't die\" - \"Halls of Illusions\" - \"Chicken Huntin\" - \"Another love song\" - \"How many times?\" - \"Bowling balls\" - \"The people\" - \"Piggy pie\" - \"Hokus pokus\" - \"Let\"s go all the way\" - \"Real underground baby\")/Full Clip (segments \"Duk da fuk down\" - \"Real underground baby\")/Guy Gorfey (segment \"Raw deal\")/Sugar Bear (segment \"Real underground baby\")",2,1
You can see that there's exactly 7 columns as the table has.
The strange thing is, I found that the error lines of all these files contain the characters backslash and quotation mark (\"). Also, these rows are not the only row that contains \" in the files. I wonder why this error doesn't appear in other rows. Because of that, I am not sure if this is the problem.
After modifying these rows (e.g. replace the \" or delete the content while remaining the commas), there are new errors: ERROR: invalid input syntax for line 2 of every file. And the errors occur because the data in the last column of these rows have been added three semicolons(;;;) for no reason. But when I open these csv files, I can't see the three semicolons in those rows.
For example, after deleting the content in the fifth column of this row:
612,207,2222077,1,,2,1
I got the error:
**ERROR: invalid input syntax for type integer: "1;;;"
CONTEXT: COPY cast_info, line 2, column role_id: "1;;;"**
While the line 2 doesn't contain three semicolons, as follows:
2,2,2163857,1,,25,1
In principle, I hope the problem can be solved without any modification to the data itself. Thank you for your patience and help!
The CSV format protects quotation marks by doubling them, not by backslashing them. You could use the text format instead, except that that doesn't support HEADER, and also it would then not remove the outer quote marks. You could instead tweak the files on the fly with a program:
COPY cast_info FROM PROGRAM 'sed s/\\\\/\"/g /private/tmp/cast_info.csv' WITH CSV;
This works with the one example you gave, but might not work for all cases.
ERROR: invalid input syntax for line 2 of every file. And the errors
occur because the data in the last column of these rows have been
added three semicolons(;;;) for no reason. But when I open these csv
files, I can't see the three semicolons in those rows
How are you editing and viewing these files? Sounds like you are using something that isn't very good at preserving formatting, like Excel.
Try actually naming the columns you want processed in the copy statement:
copy cast_info (id, person_id, movie_id, person_role_id, note, nr_order, role_id) from ...
According to a friend's suggestion, I need to specify the backslashes as escape characters:
copy <table_name> from '<csv_file_path>' csv escape '\';
and then the problem is solved.

Bad $FILE_NAME entries in $MFT on NTFS disk

I have some code which is parsing the $MFT on an NTFS disk.
All works perfectly, except that a handful of records (roughly 10 out of 60000) return incorrect characters in the file name. See the screenshot below:
Note the Unicode character defined by byte '0E'. In all other applications, this is an underscore character. See below:
Even in the $INDEX_ROOT attribute of the containing directory, it has the correct name:
Am I reading the $FILE_NAME attribute wrong? Or should I ignore what's there and always use the name from the $INDEX_ROOT attribute of the directory instead? This seems a bit backwards?
Note: it isn't always '0E', and isn't always this file name, but seems to always be only one character which is wrong in each 'bad' record.
For anyone in the future, I stumbled across the answer while reading this link:
The fixup array starts at offset 0x30. The first two bytes (0x 8c 06) are the last two bytes in every sector of the record. The real last couple of bytes in all the sectors are stored in the fixup array that follows, namely all zeroes.
Noting that your values will be different, but that you'll notice your 'bad' file names are present whenever the filename attribute spans across a sector boundary (as in the above screenshots from WinHex). Once the end of sector bytes are replaced with the relevant fixup bytes, the filenames are then correct.

Locating ELF shared library exports at runtime

It is possible to extract exported symbols of a loaded shared library using only its memory image?
I'm talking about the symbols listed in .dynsym section. As I understand, we can go this way:
Locate the base address of the library.For example, by reading /proc/<pid>/maps it is possible to find memory areas which are mapped from the library on disk, and then we can look for ELF magic bytes to find the ELF header which gives us the base address.
Find the PT_DYNAMIC segment from the program headers.Parse the ELF header, then iterate over the program headers to find the segment which contains the .dynamic section.
Extract the location of the dynamic symbol table.Iterate over the ElfN_Dyn structs to find the ones with d_tags DT_STRTAB and DT_SYMTAB. These will give us addresses of the string table (with symbol names) and the dynamic symbol table itself.
And this is where I stumbled. .dynamic section has a tag for the size of the string table (DT_STRSZ), but there is no indication of the symbol table size. It only contains the size of a single entry (DT_SYMENT). How can I retrieve the number of symbol entries in the table?
It should be possible to infer that from the size of the .dynsym section, but ELF files are represented as segments in memory. The section table is not required to be loaded into memory and can only be (reliably) accessed by reading the corresponding file.
I believe it is possible because the dynamic linker has to know the size of the symbol table. However, the dynamic loader may have stored it somewhere when the file had been loaded, and the linker is just using the cached value. Though it seems somewhat stupid to load the symbol table into memory, but to not load a handful of bytes with its size alongside.
The size of the dynamic symbol table must be inferred from the symbol hash table (DT_HASH or DT_GNU_HASH): this answer gives some code which does that.
The standard hash table (which is not used on GNU systems anymore) is quite simple. The first entry is nchain which is:
The number of symbol table entries should equal nchain
The GNU hash table is more complicated.

Internal structure of PDF file: decode params

What the next parametres of decoding does mean?
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<4DC888EB77E2D649AEBD54CA55A09C54><227DCAC2C364E84A9778262D41602AD4>]/Info 37 0 R/Length 69/Root 39 0 R/Size 38/Type/XRef/W[1 2 1]>>
I know, that Filter/FlateDecode -- it's filter, which was used to compress the stream. But what are ID, Info, Length, Root, Size? Are these parametres realeted with compression/decompression?
Please consult ISO-32000-1:
You are showing the dictionary of a compressed cross reference table (/Type/XRef):
7.5.8 Cross-Reference Streams
Cross-reference streams are stream objects, and contain a dictionary and a data stream.
Flatedecode: the way the stream is compressed.
Length: This is the number of bytes in the stream. Your PDF is at least a PDF 1.5 file and it has a compressed xref table.
DecodeParms: contains information about the way the stream is encoded.
A Cross-reference stream has some typical dictionary entries:
W: An array of integers representing the size of the fields in a single cross-reference entry. In your case [1 2 1].
Size: The number one greater than the highest object number used in this section or in any section for which this shall be an update. It shall be equivalent to the Size entry in a trailer dictionary.
I also see some entries that belong in the /Root dictionary (aka Catalog) of a PDF file:
14.4 File Identifiers
File identifiers shall be defined by the optional ID entry in a PDF
file’s trailer dictionary. The ID entry is optional but should be
used. The value of this entry shall be an array of two byte strings.
The first byte string shall be a permanent identifier based on the
contents of the file at the time it was originally created and shall
not change when the file is incrementally updated. The second byte
string shall be a changing identifier based on the file’s contents at
the time it was last updated. When a file is first written, both
identifiers shall be set to the same value.
14.3.3 Document Information Dictionary
What you see is a reference to another indirectory object that is a dictionary called the Info dictionary:
The optional Info entry in the trailer of a PDF file shall hold a
document information dictionary containing metadata for the document.
Note: this question isn't really suited for StackOverflow. StackOverflow is a forum where you can post programming problems. Your question isn't a programming problem. You are merely asking us to copy/paste quotes from ISO-32000-1.

Returning Unicode Name With Code Point

I know how to return a Unicode character from a code point. That's not what I'm after. What I want to know is how to return the name associated with a particular code point. For example, The code point for 🍀 is 1F340. And its name is FOUR LEAF CLOVER. Is it possible for us to return this name with its code point? I've read about 100 topics involving Unicode. But I haven't see one discussing my question. I hope that's possible.
Thank you for your help.
Have you considered the ICU library? It offers the following C API: http://icu-project.org/apiref/icu4c/uchar_8h.html#aa488f2a373998c7decb0ecd3e3552079
int32_t u_charName(
UChar32 code,
UCharNameChoice nameChoice,
char* buffer,
int32_t bufferLength,
UErrorCode* pErrorCode)
Retrieve the name of a Unicode character.
Depending on nameChoice, the character name written into the buffer is the "modern" name or the name that was defined in Unicode version 1.0. The name contains only "invariant" characters like A-Z, 0-9, space, and '-'. Unicode 1.0 names are only retrieved if they are different from the modern names and if the data file contains the data for them. gennames may or may not be called with a command line option to include 1.0 names in unames.dat.
Parameters
code The character (code point) for which to get the name. It must be 0<=code<=0x10ffff.
nameChoice Selector for which name to get.
buffer Destination address for copying the name. The name will always be zero-terminated. If there is no name, then the buffer will be set to the empty string.
bufferLength ==sizeof(buffer)
pErrorCode Pointer to a UErrorCode variable; check for U_SUCCESS() after u_charName() returns.
Returns
The length of the name, or 0 if there is no name for this character. If the bufferLength is less than or equal to the length, then the buffer contains the truncated name and the returned length indicates the full length of the name. The length does not include the zero-termination.
ICU is the right approach, but it's even simpler than Chris said. Foundation includes ICU already, for various text processing functions, including CFStringTransform(). Its transform parameter accepts "any valid ICU transform ID defined in the ICU User Guide for Transforms".
One of ICU's transforms is Any-Name:
Converts between characters and their Unicode names in curly braces. For example:
., ⇆ {FULL STOP}{COMMA}
(The syntax isn't exactly as documented, but it's close enough you can figure it out.)
There's also an Any-Hex transform which can be used for translating to/from the codepoint hex value.