In which language is the proto compiler (of google protocol buffers) written? - serialization

I would like to know in which language the "proto compiler" (the compiler used to generate source files from Java, Python or c++) is written? Is it maybe a mix of languages?
Any help would be appreciated.
Thanks in Advance
Horace

It appears to be written in C++. There's also documentation on Java and Python APIs, but those don't appear to contain the compiler itself (at least I don't see anything that's obviously the compiler in either case, though I didn't spend a whole lot of time looking for it either).
That said, I'm almost tempted to vote to close -- for most practical purposes, the language used to implement the compiler is basically a trivia question, irrelevant to actual use. There is, however, an entirely legitimate exception: if you're going to download and modify the compiler, knowing the language you'd need to work with could be quite useful.

The protoc compiler is written in C or C++ (its a native program anyway).
When I want to process proto files in java files, I
I use the protoc command to convert them to a Protocol Buffer File ie
protoc protofile.proto --descriptor_set_out=OutputFile
Read the new protocol buffer file (its a FileDescriptorSet) and use it
An over complicated example is example, is compileProto method in
http://code.google.com/p/protobufeditor/source/browse/trunk/%20protobufeditor/Source/ProtoBufEditor/src/net/sf/RecordEditor/ProtoBuf/re/display/ProtoLayoutSelection.java
its compilcated because options because the protoc command and options can be stored in a properties file.
Note: The getFileDescriptor method reads the newly created protocol buffer

Related

What is the need of JVM when you can pass the source code?

i am new to java.
i wanted to know this.
what is the need to create the .class file in java ?
can't we just pass the source code to every machine so that each machine can compile it according to the OS and the hardware ?
I believe it's mostly for efficiency reasons.
From wikipedia http://en.wikipedia.org/wiki/Bytecode:
Bytecode, also known as p-code (portable code), is a form of
instruction set designed for efficient execution by a software
interpreter. Unlike human-readable source code, bytecodes are compact
numeric codes, constants, and references (normally numeric addresses)
which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of program objects. They
therefore allow much better performance than direct interpretation of
source code.
(my emphasis)
And as others have mentioned possible weak obfuscation of the source code.
The main reason for the compilation is that the Virtual Machines which are used to host java classes and run them only understands bytecode
And since compiling a class each time to the language the virtual machine understands is expensive. That's the only reason why the source code is compiled into bytecode.
But we can also use some compilers which compiles source code directly into machine code.But that's a different story which I don't know about much.

Extract Objective-c binary

Is it possible to extract a binary, to get the code that is behind the binary? With Class-dump you can see the implementation addresses, but is it possible to also see the code thats IN the implementation addresses? Is there ANY way to do it?
All your code compiles to single instructions, placed in the text section of your executable. The compiler is responsible for translating your higher level language to the processor specific instructions, which are simpler. Reverting this process would be nearly impossible, unless the code is quite simple. Some problems are ambiguity of statements, and the overall readability: local variables, for instance, will be nothing but an offset address.
If you want to read the disassembled code (the instructions of which the higher level code was compiled to) use this command in an executable:
otool -tV file
You can decompile (more accurately, disassemble) a binary and get it's assembly, but there is no way to get back the original Objective-C.
My curiosity begs me to ask why you want to do this!?
otx http://otx.osxninja.com/ is a good tool for symbolicating the otool based disassembly
It will handle both x86_64 and i386 disassembly.
and
Mach-O-Scope https://github.com/smorr/Mach-O-Scope is a a tool built on top of otx to dump it all into a sqlite3 database for browsing and annotating.
It won't give you the original source -- but it will get you pretty close providing you with the messages that are being sent around in methods.

why not a lua implementation of google's protocol buffers? is there already any better solution exist for lua?

why not a lua implementation of google's protocol buffers? is there already any better solution exist for lua?
I'm working on it as we speak: https://github.com/haberman/upb/wiki
Also, I'm the guy who wrote the 100-line parser above. But my upb library is much more full-featured.
I just created a Lua implementation of protocol buffers lua-pb. It dynamically loads/parsers .proto files to create message objects, so there is no dependence on the standard .proto compiler from Google.
It uses LPeg for parsing .proto files and struct & Lua BitOp for encoding/decoding.
Probably because a C or C++ implementation would be faster (and easier to write), and then you could hand the data off to Lua to be used if you want.
There's a 100 line C protocol buffer parser here: http://blog.reverberate.org/2008/07/12/100-lines-of-c-that-can-parse-any-protocol-buffer/
Or you could just use the Google C++ one, and then hand your data off to Lua from there.
Lua isn't built for bit twiddling, so perhaps that's why nobody has implemented protocol buffers in it yet. It doesn't even have bitwise operators built in: http://lua-users.org/wiki/BitwiseOperators

Does Ada have a preprocessor?

To support multiple platforms in C/C++, one would use the preprocessor to enable conditional compiles. E.g.,
#ifdef _WIN32
#include <windows.h>
#endif
How can you do this in Ada? Does Ada have a preprocessor?
The answer to your question is no, Ada does not have a pre-processor that is built into the language. That means each compiler may or may not have one and there is not "uniform" syntax for pre-processing and things like conditional compilation. This was intentional: it's considered "harmful" to the Ada ethos.
There are almost always ways around a lack of a preprocessor but often times the solution can be a little cumbersome. For example, you can declare the platform specific functions as 'separate' and then use build-tools to compile the correct one (either a project system, using pragma body replacement, or a very simple directory system... put all the windows files in /windows/ and all the linux files in /linux/ and include the appropriate directory for the platform).
All that being said, GNAT realized that sometimes you need a preprocessor and has created gnatprep. It should work regardless of the compiler (but you will need to insert it into your build process). Similarly, for simple things (like conditional compilation) you can probably just use the c pre-processor or even roll your own very simple one.
AdaCore provides the gnatprep preprocessor, which is specialized for Ada. They state that gnatprep "does not depend on any special GNAT features", so it sounds as though it should work with non-GNAT Ada compilers. Their User Guide also provides some conditional compilation advice.
I have been on a project where m4 was used as well, with the Ada spec and body files suffixed as ".m4s" and ".m4b", respectively.
My preference is really to avoid preprocessing altogether, and just use specialized bodies, setting up CM and the build process to manage them.
No but the CPP preprocessor or m4 can be called on any file on the command line or using a building tool like make or ant. I suggest calling your .ada file something else. I have done this for some time on java files. I call the java file .m4 and use a make rule to create the .java and then build it in the normal way.
I hope that helps.
Yes, it has.
If you are using GNAT compiler, you can use gnatprep for doing the preprocessing, or if you use GNAT Programming Studio you can configure your project file to define some conditional compilation switches like
#if SOMESWITCH then
-- Your code here is executed only if the switch SOMESWITCH is active in your build configuration
#end if;
In this case you can use gnatmake or gprbuild so you don't have to run gnatprep by hand.
That's very useful, for example, when you need to compile the same code for several different OS's using even different cross-compilers.
Some old Ada1983-era compilers have a package called a.app that utilized a #-prefixed subset of Ada (interpreted at build-time) as a preprocessing language for generating Ada (to be then translated to machine code at compile-time). Rational's Verdix Ada Development System (VADS) appears to be the progenitor of a.app among several Ada compilers. Sun Microsystems, for example, derived the Ada SPARCompiler from VADS and thus also had a.app. This is not unlike the use of PL/I as the preprocessor of PL/I, which IBM did.
Chapter 2 is some documentation of what a.app looks like: http://dlc.sun.com/pdf/802-3641/802-3641.pdf
No, it does not.
If you really want one, there are ways to get one (Use C's, use a stand-alone one, etc.) However I'd argue against it. It was a purposeful design decision to not have one. The whole idea of a preprocessor is very un-Ada.
Most of what C's preprocessor is used for can be accomplished in Ada in other more reliable ways. The only major exception is in making minor changes to a source file for cross-platform support. Given how much this gets abused in a typical cross-platform C program, I'm still happy there's no support for it in Ada. Very few C/C++ developers can control themselves enough to keep the changes "minor". The result may work, but is often nearly impossible for a human to read.
The typical Ada way to accomplish this would be to put the different code in different files and use your build system to somehow choose between them at compile time. Make is plenty powerful enough to help you do this.

Refactoring dissassembled code

You write a function and, looking at the resulting assembly, you see it can be improved.
You would like to keep the function you wrote, for readability, but you would like to substitute your own assembly for the compiler's. Is there any way to establish a relationship between your high-livel language function and the new assembly?
If you are looking at the assembly, then its fair to assume that you have a good understanding about how code gets compiled down. If you have this knowledge, then its sometimes possible to 'reverse enginer' the changes back up into the original language but its often better not to bother.
The optimisations that you make are likely to be very small in comparison to the time and effort required in first making these changes. I would suggest that you leave this kind of work to the compiler and go have a cup of tea. If the changes are significant, and the performance is critical, (as say in the embedded world) then you might want to mix the normal code with the assemblar in some fashion, however, on most computers and chips the performance is usually sufficient to avoid this headache.
If you really need more performance, then optimise the code not the assembly.
None, I suppose. You've rejected the compiler's work in favor of your own. You might as well throw out the function you wrote in the compiled language, because now all you have is your assembler in that platform.
I would highly advise against engaging in this kind of optimization because unless you're sure, via profiling and analysis, that you truly are making a difference.
It depends on the language you wrote your function in. Some languages like C are very low-level, translating each function call or statement to specific assembly statements. If you did use C, you can replace your function with inline assembly to improve performance.
Other high-level languages may convert each statement into macro routines or other more complex calls on the assembly side. Certain optimizations (like tail recursion, loop unrolling, etc) can be implemented easily on the source side, but others (like making more efficient use of the register file) may be impossible (again, depending on the language and the compiler you're using).
Its tough to say there is any relationship between modified assembly and the source which generated the unmodified version. It will certainly confuse debugging tools: register contents will no longer match the source variables they were supposed to correspond to.
There are a number of places in packet processing code where I've examined the generated assembly and gone back to change the original source code in order to improve the result. Re-arranging source can reduce the number of branches, __attribute__ and compiler arguments can align branch points and functions to reduce I$ misses. In desperate cases a little inline assembly can be used, so that the binary can still be compiled from source.
Something you could try is to separate your original function into its own file, and provide a make rule to build the assembler from there. Then update the assembler file with your improved version, and provide a make rule to build an object file from the assembler file. Then change your link rules to include that object file.
If you only ever change the assembler file, that will keep on being used. If you ever change the original higher-level language file, the assembler file will be rebuilt and the object file built from the new (unimproved) version.
This gives you a relationship between the two; you probably want to add a warning comment at the top of the higher-level language file to warn about the behaviour. Using some form of VCS will give you the ability to recover the improved assembler file if you make a mistake here.
If you're writing a native compiled app in Visual C++, there are two methods:
Use the __asm { } block and write your assembler in there.
Write your functions in MASM assembler, assemble to .obj, and link it as an static library. In your C/C++ code, declare the function with an extern "C" declaration.
Other C/C++ compilers have similar approaches.
In this situation, you generally have two options: optimize the code or rewrite the compiler. I can't see where breaking the link between source and op is ever going to be the correct solution.