OMF(Object Module Format) length field appears incorrect - omf

I am a little confused, with the PUBDEF record in the OMF object format.
My assembler has generated a result which states the record is 4000 bytes, when it clearly is not so why would it do this?
Image of Hex view of OMF
The 0xa0 and 0x0f is the record length in little endian format,
please view the specificaiton: http://pierrelib.pagesperso-orange.fr/exec_formats/OMF_v1.1.pdf
It also appears to state that the strings are zero bytes in length and at one point even has just a zero string length with no string provided. Maybe I am reading the file wrong? I have spent hours now and am struggling.
If anyone can help me with my issue as I am writing a linker and cannot continue without understanding this.
Thanks

There is no PUBDEF record in the file. You seem to have miscalculated the previous record size:
0000:80 THEADR
000e:88 CoMENT
0032:96 LNAMES
0041:98 SEGDEF
004B:98 SEGDEF
0055:88 COMENT
005C:a0 LEDATA
006E:a0 LEDATA
007b:8a MODEND
Learn to use more sofisticated tools for OMF inspection, such as Tdump.exe or ODU.exe.

Related

What is SBLineEntry.GetLine()?

SBLineEntry is a proxy object in LLDB Python interface. SBLineEntry.GetColumn() returns point in a line, but I am not sure what it actually means.
In C++ side source, it resolves to LineEntry.column value, but it also lacks how it is measured in.
At first, I thought it as UTF-8 code unit offset. But it seems it isn't because when I measure it it looks like UTF-16 code unit offset. But I still couldn't find any definition for this value.
What is this value?
Raw byte offset in source code file?
UTF-8 code unit offset?
UTF-16 code unit offset?
Something else?
That's a good question! If the debug information is DWARF (except for Windows systems, it is), lldb is providing the DNS_LNS_set_column data from the DWARF line table as the number returned by SBLineEntry::GetColumn(). The DWARF5 specification doesn't say what this integer is counting -- it says only,
The DW_LNS_set_column opcode takes a single unsigned LEB128 operand and stores it in the column register of the state machine.
You're probably seeing that clang puts the UTF-16 code unit offset in the DWARF, but the standard doesn't require that. This would be a reasonable clarification request to file with the DWARF standards committee, http://dwarfstd.org
For the case of Rust programs, I think it's Unicode Scalar value offset.
Here's an open issue about column number. It says span_start function produces the column number.
span_start calls lookup_char_pos.
lookup_char_pos calls bytepos_to_file_charpos.
bytepos_to_file_charpos
They are repeating the word "char", and in Rust, "char" means Unicode Scalar Value.

C/Unix: How to extract the bits from st_mode?

I am a beginner to Unix programming and C and I have two questions regarding the stat struc and its field st_mode:
When accessing the st_mode field as below, what type of number is returned ( octal, decimal, etc.)?
struct stat file;
stat( someFilePath, &file);
printf("%d", file.st_mode );
I thought the number is in octal but when I ran this code, and I got the value 33188. What is the base?
I found out that the st_mode encodes a 16 bit binary number that represents the file type and file permissions. How do I get the 16-bit number from the above output (especially when it doesn't seem to be in octal). And which parts of the 16-bit digit encode which information?
Thanks for any help.
The actual type behind mode_t and how it encodes information is implementation defined. The only thing that's certain is that it's a bitmask.
To work with st_mode, use the flags and macros defined in the sys/stat.h header. For a list of those defines, consult:
man 2 stat
If you truly need to know what each bit represents, or are simply curious, read the header or use printf to inspect the flags.

Special unicode question mark characters in database table

Firstly anyone who reads this and response, thanks for your assistance.
I'm having a problem where I have a site (primarily in English), with many translations for different language. I have a database which stores these translations. Unfortunately one of the language seems to be populated with question mark characters between each general character. Because of this, any text which contains these characters wont show up in IE.
Is there any SQL statements that will seek these characters out and remove them? There's a find/replace option, but I can't seem to find a rule that applies.
Thanks for any help you can give.
As an example, this is how text shows in a table:
�i�O�N� �k�i�t� �d�e� �s�u�p�p�o�r�t� �V�é�l�o� - which stops it showing IE.
Removing these as below will show it in IE:
iON kit de support Vélo
Any idea how I go about this?
Thanks :)
Your translation database contains mangled data that has come from misinterpreting UTF-16-encoded input as ISO-8859-1 (or the closely related Windows code page 1252; you can't tell the difference from the example data).
You could attempt to undo the damage by extracting the data, encoding it back to what is hopefully the original set of bytes, and re-decoding it, then inserting it back into the database. For example in PHP:
$mangled = "i\0O\0N\0 \0k\0i\0t\0 \0d\0e\0 \0s\0u\0p\0p\0o\0r\0t\0 \0V\0\xE9\0l\0o\0"
$fixed = iconv('utf-16le', 'utf-8', $mangled)
# "iON kit de support V\xC3\xA9lo"
but it would be best to go back to the original input data and re-import it properly really.
Just removing zero bytes from a UTF-16-encoded bytes string (str_replace("\0", '', $mangled)) isn't really fixing it, it would work for the ASCII characters (U+0000–U+007F) but you would end up with ISO-8859-1 bytes for characters U+0080–U+00FF (more usually you would want UTF-8) and any other characters outside that range would remain unreadable nonsense.

What is the rationale behind "0xHHHHHHHH" formatted Microsoft error codes?

Why does Microsoft tend to report "error codes" as hexadecimal values?
Error codes are 32-bit double word values (4 byte values.) This is likely the raw integer return code of whatever C-style function has reported an error.
However, why report the error to a user in hexadecimal? The "0x" prefix is worthless, and the savings in character length is minimal. These errors end up displayed to end users in Microsoft software and even on Microsoft websites.
For example:
0x80302010 is 10 characters long, and very cryptic.
2150637584 is the decimal equivalent, and much more user friendly.
Is there any description of the "standard" use of a 32-bit field as an error code mechanism (possibly dividing the field into multiple fields for developer interpretation) or of the logic behind presenting a hexadecimal code to end users?
We can only guess about the reason, so this question cannot be answered for sure. But let's guess:
One reason might be that with hex numbers, you know the number will have 8 digits. If it has more or less digits the number is "corrupt" (for example, the customer mistyped). With decimal numbers the number of digits for the same value varies.
Also, to a developer, hex numbers are more convenient and natural than decimal numbers. For example, if some info is coded as bit flags you can decipher them manually easily in hex numbers but not in decimal numbers.
It is a little bit subjective as to whether hexadecimal or decimal error codes are more user friendly. Here is a scenario where the hexadecimal error codes are significantly more convenient, which could be part of the reason that hexadecimal error codes are used in the first place.
Consider the documentation for Win32 Error Codes for Active Directory Service Interfaces, ADSI uses error codes with the format 0x8007XXXX, where the XXXX corresponds to a DWORD value that maps to a Win32 error code.
This makes it extremely easy to get the corresponding Win32 error code, because you can just strip off the last 4 digits. This would not be possible with a decimal error code representation.
The middle ground answer to this would be that formatting the number like an IPv4 address would be more luser-friendly while preserving some sort of formatting that helps the dev guys.
Although TBH I think hex is fine, the hypothetical non-technical user has no more idea what 0x1234ABCD means than 1234101112 or "Cracked gangle pin on fwip valve".

00626 SQL Loader error

How to avoid
"characterset conversion buffer overflow" error in sql*loader? error # 00626.
I am not able to find this on internet please suggest me the solution for this.
What is the character set of the input datafile? You might try specifying the character set in the control file:
CHARACTERSET char_set_name LENGTH SEMANTICS CHARACTER
By default, if not specified, Oracle will use byte length semantics. Thus, if you define a field length in your control file as VARCHAR(20), in byte semantics you'd have 20 byte buffer, but in character length semantics you might have a 40 byte buffer. This would be my guess as to what could be the source of the error.
It's not a lot of help, but here's what the Oracle error manual has to say about that error:
SQL*Loader-00626: Character set
conversion buffer overflow.
Cause: A conversion from the datafile character set to the client
character set required more space than
that allocated for the conversion
buffer. The size of the conversion
buffer is limited by the maximum size
of a varchar2 column.
Action: The input record is rejected. The data will not fit into
the column.
It sounds like there isn't any way to work around this within SQLLoader. If it is affecting a small number of records then it may be easiest to simply handle those manually. If it is many records, then you probably need to find or create a different loading tool.
Just a few ideas for you to think about:
You could try to load different parts of the "string" into different fields in the database .. maybe that way you can work around the limitation.
You could try to do the character set conversion in a different tool .. some text editors may give you some options .. and then load the file without it requiring the conversion.
Not sure if there's any merit in these ideas, but hopefully you can work something out.
Thanks for all your help. This problem has been resolved. We split the file and loaded in chunks and it worked fine