Doxygen and Assembly Language

Doxygen and Assembly Language - documentation

I'd like to use Doxygen to document legacy code that's a mix of C and x86 assembly language. The assembly language is not inline, but in separate assembly-only files. How can I document the assembly language portion?

Question 12 of the Doxygen FAQ eventually led me to a Perl filter that looks promising. It converts the assembly code into something C-like that Doxygen can parse. Thanks!
The original link appears to be dead. However back in 2008, I had pulled down a copy of asm4doxy.pl and squirreled it away. I've put it up on Pastebin if anyone is still interested. As I recall, I tried it, but it didn't really work for me at the time, but YMMV.

See question 12 of the Doxygen FAQ. Are you dealing with pure assembly files, or inline assembly inside C sources? Assuming the former, you'll have to either write an input filter to transform the assembly code into something C-like (easier), or write a new parser (much harder).

Related

In which language is the proto compiler (of google protocol buffers) written?

I would like to know in which language the "proto compiler" (the compiler used to generate source files from Java, Python or c++) is written? Is it maybe a mix of languages?
Any help would be appreciated.
Thanks in Advance
Horace

It appears to be written in C++. There's also documentation on Java and Python APIs, but those don't appear to contain the compiler itself (at least I don't see anything that's obviously the compiler in either case, though I didn't spend a whole lot of time looking for it either).
That said, I'm almost tempted to vote to close -- for most practical purposes, the language used to implement the compiler is basically a trivia question, irrelevant to actual use. There is, however, an entirely legitimate exception: if you're going to download and modify the compiler, knowing the language you'd need to work with could be quite useful.

The protoc compiler is written in C or C++ (its a native program anyway).
When I want to process proto files in java files, I
I use the protoc command to convert them to a Protocol Buffer File ie
protoc protofile.proto --descriptor_set_out=OutputFile
Read the new protocol buffer file (its a FileDescriptorSet) and use it
An over complicated example is example, is compileProto method in
http://code.google.com/p/protobufeditor/source/browse/trunk/%20protobufeditor/Source/ProtoBufEditor/src/net/sf/RecordEditor/ProtoBuf/re/display/ProtoLayoutSelection.java
its compilcated because options because the protoc command and options can be stored in a properties file.
Note: The getFileDescriptor method reads the newly created protocol buffer

what language is dotnet executable written in?

I thought it would be Common Intermediate Language, but in notepad it does not look like that at all. Does it just look uglier in reality than in tutorials? Or is it some bytecode form that is further compiled from CIL?

It's CIL is the name of the binary format, not of the "assembler" you're thinking of.
Can you possibly imagine that .NET assemblies would be text files?

A .NET executable is a binary file that has a PE header (same as a native executable, but with slightly different values). The PE header tells the OS to load the CLR, which in turn loads the assembly.
The content beyond the header is a binary representation of the CIL code, plus some metadata and other stuff. The text you see in tutorials is the text representation of CIL, in much the same way that the assembly language code you see in a tutorial about assembly language programming is just the text representation of the binary machine code.
See http://www.yetanotherchris.me/home/2010/7/12/inside-net-assemblies-part-1.html (among many others) for more information.

A .Net executable is usually not written, it is compiled from another language such as C#, F# or VB.Net.
The contents of a .Net executable can be viewed with the ILDASM tool.
The contents are first a manifest which is used for reflection, signatures or other meta-code purposes.
Secondly there are the MSIL instructions themselves. These are in a kind of bytecode format, but ILDASM will show you what the instructions are.
And there are sometimes resources such as imagery, sounds or other content packed into the executable.
The executable is just-in-time compiled to native code either during installation (I think this is uncommon), or as a precursor to execution. The resulting native code can be stored for reuse. (This is what I was told during PDC 2001, might be "out of date".)

Generate syntax-colored, hyperlinked source code from Haskell or Objective-C

Are there any packages that can take a directory full of source code (Objective-C and Haskell are the ones that interest me) and generate syntax-colored HTML from it where function names are links to their source code?

For haskell you can take a look at Haddock:
http://www.haskell.org/haddock/

Documenting C++/CLI library code for use from c# - best tools and practices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm working on a project where a c++/cli library is being used primarily from a c# application.
Is there any way to make the code comments in c++/cli visible to c# intellisence within visual studio?
Assuming there isn't, what would be the best way to document the c++/cli code to enable its easier use from c# (and within c++/cli of course)? What is you opinion on XML comments vs doxygen vs other tools (which)?

I have gotten it to work as follows:
Use XML style comments for your C++/CLI header entries. This means the full XML comment is required (triple-slash comments, <summary> tag at a minimum)
Make sure that the C++ compiler option Generate XML Documentation Files is on. This should generate an XML file with documentation with the same name as your assembly (MyDll.xml).
Make sure that the C# project references your assembly MyDll.dll where MyDll.xml is also present in the same folder. When you mouse over a reference from the assembly, MS Visual Studio will load the documentation.
This worked for me in Visual Studio 2008 on an assembly built for .NET 3.5.

DocXml has the major advantage of being supported by VS (syntax colouring, intellisense, automatic export to the XML files). The Doxygen tools can read DocXml format so you can still use them with this format too.
To help you generate tidy and accurate Doc comments with a minimum of effort, you might like to check out my addin AtomineerUtils. This takes most of the work out of creating and updating DocXml, Doxygen, JavaDoc or Qt format comments, and it supports C, C++, C++/CLI, C#, Java, JavaScript, TypeScript, JScript, UnrealScript, PHP and Visual Basic code.

Interesting. After trying several methods, it's looking like the intellisense between a Managed C++ project and C# doesn't work.
The following example will give you proper intellisense in the C++ environment where it is declared, but referencing the object in C# shows nothing:
// Gets the value of my ID for the object, which is always 14.
public: virtual property int MyId
{
int get() { return 14; }
}
XML comments don't work either. I would guess that this is either a bug, or requires something I can't figure out. Judging from the lack of answers on this question, perhaps a bug.
As far as documentation generation, I'd recommend going the path of XML documentation. Doxygen supports reading XML documentation which is mostly identical to the standard XML documentation for C#. It does tend to add extra lines just for tag openings and closings, but is much more readable in my opinion than the following doxygen alternative:
//! A normal member taking two arguments and returning an integer value.
/*!
\param a an integer argument.
\param s a constant character pointer.
\return The test results
\sa Test(), ~Test(), testMeToo() and publicVar()
*/

You are right. It doesn't work. The C++ build will add its IntelliSense information into the master .ncb file, and you will get the autocompletion of method names, etc. However, you are correct in that you will be unable to get the "comment" description about each method, etc.

You'll probably have a lot of value taking a look at Doxygen. And then look up Doxygen.NET - which is something we wrote for our own use which builds "Object Hierarchies" from the XML file outputs from Doxygen...

Refactoring dissassembled code

You write a function and, looking at the resulting assembly, you see it can be improved.
You would like to keep the function you wrote, for readability, but you would like to substitute your own assembly for the compiler's. Is there any way to establish a relationship between your high-livel language function and the new assembly?

If you are looking at the assembly, then its fair to assume that you have a good understanding about how code gets compiled down. If you have this knowledge, then its sometimes possible to 'reverse enginer' the changes back up into the original language but its often better not to bother.
The optimisations that you make are likely to be very small in comparison to the time and effort required in first making these changes. I would suggest that you leave this kind of work to the compiler and go have a cup of tea. If the changes are significant, and the performance is critical, (as say in the embedded world) then you might want to mix the normal code with the assemblar in some fashion, however, on most computers and chips the performance is usually sufficient to avoid this headache.
If you really need more performance, then optimise the code not the assembly.

None, I suppose. You've rejected the compiler's work in favor of your own. You might as well throw out the function you wrote in the compiled language, because now all you have is your assembler in that platform.
I would highly advise against engaging in this kind of optimization because unless you're sure, via profiling and analysis, that you truly are making a difference.

It depends on the language you wrote your function in. Some languages like C are very low-level, translating each function call or statement to specific assembly statements. If you did use C, you can replace your function with inline assembly to improve performance.
Other high-level languages may convert each statement into macro routines or other more complex calls on the assembly side. Certain optimizations (like tail recursion, loop unrolling, etc) can be implemented easily on the source side, but others (like making more efficient use of the register file) may be impossible (again, depending on the language and the compiler you're using).

Its tough to say there is any relationship between modified assembly and the source which generated the unmodified version. It will certainly confuse debugging tools: register contents will no longer match the source variables they were supposed to correspond to.
There are a number of places in packet processing code where I've examined the generated assembly and gone back to change the original source code in order to improve the result. Re-arranging source can reduce the number of branches, __attribute__ and compiler arguments can align branch points and functions to reduce I$ misses. In desperate cases a little inline assembly can be used, so that the binary can still be compiled from source.

Something you could try is to separate your original function into its own file, and provide a make rule to build the assembler from there. Then update the assembler file with your improved version, and provide a make rule to build an object file from the assembler file. Then change your link rules to include that object file.
If you only ever change the assembler file, that will keep on being used. If you ever change the original higher-level language file, the assembler file will be rebuilt and the object file built from the new (unimproved) version.
This gives you a relationship between the two; you probably want to add a warning comment at the top of the higher-level language file to warn about the behaviour. Using some form of VCS will give you the ability to recover the improved assembler file if you make a mistake here.

If you're writing a native compiled app in Visual C++, there are two methods:
Use the __asm { } block and write your assembler in there.
Write your functions in MASM assembler, assemble to .obj, and link it as an static library. In your C/C++ code, declare the function with an extern "C" declaration.
Other C/C++ compilers have similar approaches.

In this situation, you generally have two options: optimize the code or rewrite the compiler. I can't see where breaking the link between source and op is ever going to be the correct solution.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Doxygen and Assembly Language - documentation

I'd like to use Doxygen to document legacy code that's a mix of C and x86 assembly language. The assembly language is not inline, but in separate assembly-only files. How can I document the assembly language portion?

See question 12 of the Doxygen FAQ. Are you dealing with pure assembly files, or inline assembly inside C sources? Assuming the former, you'll have to either write an input filter to transform the assembly code into something C-like (easier), or write a new parser (much harder).

Related

In which language is the proto compiler (of google protocol buffers) written?

what language is dotnet executable written in?

Generate syntax-colored, hyperlinked source code from Haskell or Objective-C

Documenting C++/CLI library code for use from c# - best tools and practices? [closed]

Refactoring dissassembled code

Categories

Resources