Its driving me crazy, I spend soo much time getting it wrong, and then fixing it wrong.
I'm thinking of using -= and =- as delineators, but it probably means a lot of hours in learning how to fool the compiler into a substitution. Is this a quixotic quest? can such be done? has it been done already, albeit with different keystrokes?
I work alone. I don't collaborate.
So I don't mind a non-standard work environment
If I need to in the future i could make a scheme whereby both could work.
Not without building your own custom version of the preprocessor. Comment syntax is an inherent part of the language and is not designed to be configurable.
(Incidentally, -= is already a token in Objective-C — it means "assign to LHS the result of subtracting RHS from LHS.")
It should be possible to extend clang, then modify your Xcode builds to include your clang extensions. I have no personal experience with writing complier extension for clang, but I did work on a tool that extended cl.exe. Warning: this would be a very deep dive into the internals of the build system.
Extending Clang
Good Luck
Related
I'm working on a very demanding project (actually an interpreter), exclusively written in D, and I'm wondering what type of optimizations would generally be recommended. The project makes heavy use of GC, classes, asssociative arrays, and pretty much anything.
Regarding compilation, I've already experimented both with DMD and LDC flags and LDC with -flto=full -O3 -Os -boundscheck=off seems to be making a difference.
However, as rudimentary as this may sound, I would like you to suggest anything that comes to your mind that could help speed up the performance, related or not to the D language. (I'm sure I'm missing several things).
Compiler flags: I would add -mcpu=native if the program will be running on your machine. Not sure what effect -Os has in addition to -O3.
Profiling has been mentioned in comments. Personally under Linux I have a script which dumps a process's stack trace and I do that a few times to get an idea of where it's getting hung up on.
Not sure what you mean by GS.
Since you mentioned classes: in D, methods are virtual by default; virtual methods add indirections and are not inlineable. Make sure only those methods that must be virtual are. See if you can rewrite your program using a form of polymorphism that doesn't involve indirections, such as using template metaprogramming.
Since you mentioned associative arrays: these make heavy use of the GC; to speed them up, switch to a third-party library that works on top of std.allocator, such as https://github.com/dlang-community/containers
If some parts of your code are parallelizable, std.parallelism is a good tool for this.
Since you mentioned that the project is an interpreter: there are many avenues for optimizing them, up to JIT/AOT compilation. Perhaps you could link to an existing library such as LLVM or libjit.
...is there anything I could do about it?
To be more precise, I would like to replace the caret "^" with something like "§" - granted, there's not much left on the keyboard that's not in use already.
After thinking about it for a while (dismissed using run script build phases along the way) I think the only way to do it would be a custom llvm build.
While I don't quite think I'm ready to deal with the internals of compilers, I have the naive hope that replacing one symbol with another isn't too hard. And the idea of building and running my own version of a compiler tickles me, be it just for a good deal of childish fun.
So I started poking around in the llvm sources, but - surprise - got nowhere so far.
If someone is familiar with these kind of things, could you please point me to a place to look at?
That would be awesome! Thanks!
Extending LLVM can be a bit of a hassle, especially considering how fast-moving the compiler team is, so it's a good thing you don't have to. The C preprocessor exists to perform the exact same thing you've outlined (text replacement). I'm fairly sure § isn't aliased to anything important, so #define § ^ should work great. If you still want to write your own module, LLVM provides instructions on how to extend their compiler.
Actually the code relevant for such a change isn't a part of LLVM at all, but a part of its Objective-C frontend, called Clang. Confusingly, "Clang" is also the name of the entire C/C++/ObjC compiler based on both Clang and LLVM.
While I don't quite think I'm ready to deal with the internals of compilers, I have the naive hope that replacing one symbol with another isn't too hard.
And you'll be right. What you're trying to do is very simple change.
In fact, if ^ was only used for blocks, it would be a trivial change - just modify the lexer to generate the "caret" token from § instead of ^: take a look at the lexer code to see what I mean (search for ^).
Unfortunately it's used for xor as well, so we'll have to modify both the lexer and the parser. The lexer to add a new token type and create that token from §, the parser to actually do something with it, e.g. by adding:
case tok::section: // 'section' is the token type you've added
Res = ParseBlockLiteralExpression();
break;
(and then fixing the assert at the beginning of ParseBlockLiteralExpression()).
You might run into some issues, though, as § isn't in ASCII - though as far as I know Clang should be able to deal with UTF-8 encoded files.
Is it easy to achieve high level of optimization with LLVM?
To give a concrete example let's assume that I have a simple lanuage that I want to write a compiler for.
simple functions
simple structs
tables
pointers (with arithmetic)
control structures
etc.
I can quite easily create compilation-to-C backend and rely on clang -O3.
Is it as easy to use LLVM API for that purpose?
Except perhaps for a few high-level (as in, aware of high-level language features or details that aren't encoded in LLVM IR) optimizations, Clang's backend does little more than generate straightforward IR and run some set of LLVM optimization passes on it. All of these (or at least most) should be available trough the opt command and also as API calls when using the C++ libraries that all LLVM tools are built on. See the tutorial for a simple example. I see several advantages:
LLVM IR is far simpler than C and there's already a convenient API for generating it programatically. To generate C, you either have lots of ugly and unreliable string fiddling or have to build an AST for the C language yourself. Or both.
You get to choose the set of optimizations yourself (it's quite possible that Clang's set of passes isn't ideal for the code the language supports and the IR representation your compiler generates). This also means you can, during development, just run the passes checking for IR wellformedness (uncovering compiler bugs faster). You can just copy Clang's pass order, but if you feel like it, you can also experiment.
It will allow better compile times. Clang is fast for a C compiler, but you'd be adding unnecessary overhead: You generate C code, then Clang parses it, converts it to IR, and goes on to do pretty much what you could do right away.
You may have access to a broader range of features, or at least you'd get them easier (i.e. without having to incorporate #defines, obscure pragmas, instrincts or command line options) to provide them. I'm talking about like vectors, guaranteed (well, more than in C anyway - AFAIK, some code generators ignore them) tail calls, pure/readonly functions, more control over memory layout and type conversions (for instance zero extending vs. sign extending). Granted, you may not need most of them.
LLVM has built-in optimization passes so that you can achieve O3-like optimizations using API.
While looking in the sample code for FunkyOverlayWindow, I just found a pretty interesting declaration:
pascal OSStatus MyHotKeyHandler(
EventHandlerCallRef nextHandler,
EventRef theEvent,
void *userData
);
Here, pascal is highlighted as a keyword (pink in standard Xcode color scheme). But I just found it's a macro, interestingly enough defined in file CarbonCore/ConditionalMacros.h as:
#define pascal
So, what is (or was) it supposed to do? Maybe it had some special use in the past?
While this discussion might not be well suited here, it would be interesting to know why Apple still using Carbon if this relates to the answer. I have no experience in Carbon, but this code appears to set a keyboard event handler which makes me wonder if there are any advantages over the Cocoa approach. Won't Carbon be ever removed completely?
Under the 68k Classic Mac OS runtime (e.g, before PowerPC or x86), C and Pascal used different calling conventions, so C applications had to declare the appropriate convention when calling into libraries using the Pascal conventions (including most of the operating system). The macro was implemented in contemporaneous compilers (e.g, MPW, Metrowerks, Think C).
In all newer runtimes, and in all modern compilers, the keyword is no longer recognized, so the ConditionalMacros.h header defines it away. There are a few comments in that file which may help explain a bit more -- read through it, if you're game.
You have encountered a calling convention.
The pascal calling convention for the x86 is described here.
It is very interesting that it was defined-away-to-nothing, which you notice means that it is not used anymore. It was common in the old days in x86-land, especially in the Microsoft Windows APIs, because of the ability of the processor to remove parameters from the stack at the end of a call with its special RET n instruction. Functions with the pascal calling convention were sometimes advantageous, because the caller wasn't explicity adjusting the stack pointer after each call returned.
If someone with more knowledge of why the macro still exists in the codebase, but is defined away comes along and gives you a better answer, I will happily withdraw this one.
I am working on a legacy project that has a large amount of files dating back to pre-OS X days. It's data has been 16 bit aligned for > 15 years. I would like to move to a full LLVM compilation but I can't seem to get 2 byte alignment working. Are there any compiler level options available for this? (previously using -malign-mac68k)
I am aware of the #pragma pack(2) option here. However that would require me to modify upwards of 1000 source files to include this. That it's a worst-case option, but it seems like a hack. Besides, if this is possible then surely there is a default option to set the alignment?
According to clang's sources, it does support mac68k alignment rules. It seems that right now you can enable it via "#pragma options align=mac68k" only. If you're ok with small clang hacking, then you can implement the cmdline option as well and submit the patch to clang.
I would suggest looking at #pragma pack (see http://msdn.microsoft.com/en-us/library/2e70t5y1%28v=vs.80%29.aspx and http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html). It's relatively easy to use, and should work with any compiler Apple ships.
i use the latest gcc/gortran on osx compiled by the guys at ibm, if you read the gcc manual there are at least 8 different alignment optimizations to consider not just the bulk malign
Because system-level headers assume default alignment (they don't have directives to override alignment in most cases), changing the alignment for everything will break OS calls. As such, you don't want to be doing this.
Just write a script to apply your #pragma pack(2) to the source files in question, avoiding #includes. It's relatively easy and unlikely to cause unfortunate side-effects.