Transition from infix to prefix notation - oop

I started learning Clojure recently. Generally it looks interesting, but I can't get used to some syntactic inconveniences (comparing to previous Ruby/C# experience).
Prefix notation for nested expressions. In Ruby I get used to write complex expressions with chaining/piping them left-to-right: some_object.map { some_expression }.select { another_expression }. It's really convenient as you move from input value to result step-by-step, you can focus on a single transformation and you don't need to move cursor as you type. Contrary to that when I writing nested expressions in Clojure, I write the code from inner expression to outer and I have to move cursor constantly. It slows down and distracts. I know about -> and ->> macros but I noticed that it's not an idiomatic. Did you have the same problem when you started coding in Clojure/Haskell etc? How did you solve it?

I felt the same about Lisps initially so I feel your pain :-)
However the good news is that you'll find that with a bit of time and regular usage you will probably start to like prefix notation. In fact with the exception of mathematical expressions I now prefer it to infix style.
Reasons to like prefix notation:
Consistency with functions - most languages use a mix of infix (mathematical operators) and prefix (functional call) notation . In Lisps it is all consistent which has a certain elegance if you consider mathematical operators to be functions
Macros - become much more sane if the function call is always in the first position.
Varargs - it's nice to be able to have a variable number of parameters for pretty much all of your operators. (+ 1 2 3 4 5) is nicer IMHO than 1 + 2 + 3 + 4 + 5
A trick then is to use -> and ->> librerally when it makes logical sense to structure your code this way. This is typically useful when dealing with subsequent operations on objects or collections, e.g.
(->>
"Hello World"
distinct
sort
(take 3))
==> (\space \H \W)
The final trick I found very useful when working in prefix style is to make good use of indentation when building more complex expressions. If you indent properly, then you'll find that prefix notation is actually quite clear to read:
(defn add-foobars [x y]
(+
(bar x y)
(foo y)
(foo x)))

To my knowledge -> and ->> are idiomatic in Clojure. I use them all the time, and in my opinion they usually lead to much more readable code.
Here are some examples of these macros being used in popular projects from around the Clojure "ecosystem":
Ring cookie parsing
Leiningen internals
ClojureScript compiler
Proof by example :)

If you have a long expression chain, use let. Long runaway expressions or deeply nested expressions are not especially readable in any language. This is bad:
(do-something (map :id (filter #(> (:age %) 19) (fetch-data :people))))
This is marginally better:
(do-something (map :id
(filter #(> (:age %) 19)
(fetch-data :people))))
But this is also bad:
fetch_data(:people).select{|x| x.age > 19}.map{|x| x.id}.do_something
If we're reading this, what do we need to know? We're calling do_something on some attributes of some subset of people. This code is hard to read because there's so much distance between first and last, that we forget what we're looking at by the time we travel between them.
In the case of Ruby, do_something (or whatever is producing our final result) is lost way at the end of the line, so it's hard to tell what we're doing to our people. In the case of Clojure, it's immediately obvious that do-something is what we're doing, but it's hard to tell what we're doing it to without reading through the whole thing to the inside.
Any code more complex than this simple example is going to become pretty painful. If all of your code looks like this, your neck is going to get tired scanning back and forth across all of these lines of spaghetti.
I'd prefer something like this:
(let [people (fetch-data :people)
adults (filter #(> (:age %) 19) people)
ids (map :id adults)]
(do-something ids))
Now it's obvious: I start with people, I goof around, and then I do-something to them.
And you might get away with this:
fetch_data(:people).select{|x|
x.age > 19
}.map{|x|
x.id
}.do_something
But I'd probably rather do this, at the very least:
adults = fetch_data(:people).select{|x| x.age > 19}
do_something( adults.map{|x| x.id} )
It's also not unheard of to use let even when your intermediary expressions don't have good names. (This style is occasionally used in Clojure's own source code, e.g. the source code for defmacro)
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))
This can be a big help in debugging, because you can inspect things at any point by doing:
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
_ (prn x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))

I did indeed see the same hurdle when I first started with a lisp and it was really annoying until I saw the ways it makes code simpler and more clear, once I understood the upside the annoyance faded
initial + scale + offset
became
(+ initial scale offset)
and then try (+) prefix notation allows functions to specify their own identity values
user> (*)
1
user> (+)
0
There are lots more examples and my point is NOT to defend prefix notation. I just hope to convey that the learning curve flattens (emotionally) as the positive sides become apparent.
of course when you start writing macros then prefix notation becomes a must-have instead of a convenience.
to address the second part of your question, the thread first and thread last macros are idiomatic anytime they make the code more clear :) they are more often used in functions calls than pure arithmetic though nobody will fault you for using them when they make the equation more palatable.
ps: (.. object object2 object3) -> object().object2().object3();
(doto my-object
(setX 4)
(sety 5)`

Related

Performance tools for erlang

When writing a function like factorial:
fac(Val) when is_integer(Val)->
Visit = fun (X, _F) when X < 2 ->
1;
(X, F) ->
X * F(X -1, F)
end,
Visit(Val, Visit).
one cannot help but notice that tail call optimization is not straight forward however writing it in continuation parsing style is:
fac_cps(Val) when is_integer(Val)->
Visit = fun (X, _F, K) when X < 2 ->
K (1);
(X, F, K) ->
F(X-1, F, fun (Y) -> K(X * Y) end)
end,
Visit(Val, Visit, fun (X) -> X end).
Or perhaps even defunctionalized:
fac_cps_def_lambdas({lam0}, X) ->
X;
fac_cps_def_lambdas({lam1, X, K}, Y) ->
fac_cps_def_lambdas(K, X*Y).
fac_cps_def(X) when is_integer(X) ->
fac_cps_def(X, {lam0}).
fac_cps_def(X, K) when X < 2 ->
fac_cps_def_lambdas(K,1);
fac_cps_def(X, K) ->
fac_cps_def(X-1, {lam1, X, K}).
Timing these three implementations I found that execution time is, as expected, the same.
My question is, is there a way to get more detailed knowledge than this?
How do I for instance get the memory usage of executing the function - am I avoiding any stack memory at all?
What are the standart tools for inspecting these sorts of things?
The questions are again, how do I mesure the stack heights of the functions, how do I determine the memory usage of a function call on each of them, and finally, which one is best?
My solution is to just inspect the code with my eyes. Over time, you learn to spot if the code is in tail-call style. Usually, I don't care too much about it, unless I know the size of the structure passing through that code to be huge.
It is just by intuition for me. You can inspect the stack size of a process with erlang:process_info/2. You can inspect the runtime with fprof. But I only do it as a last resort fix.
This doesn't answer your question, but why have you written the code like that? It is not very Erlangy. You generally don't use an explicit CPS unless there is a specific reason for it, it is normally not needed.
As #IGIVECRAPANSWERS says you soon learn to see tail-calls, and there are very few cases where you actually MUST use it.
EDIT: A comment to the comment. No there is no direct way of checking whether the compiler has used LCO or not. It does exactly what you tell it to and assumes you know what you are doing, and why. :-) However, you can be certain that it does when it can, but that it is about it. The only way to check is to look at the stack size of a process to see whether it is growing or not. Unfortunately if you got it wrong in the right place the process can grow very slowly and be hard to detect except over a long period of time.
But again there are very few places where you really need to get the LCO right.
P.S. You use the term LCO (Last Call Optimisation) which is what I learnt way back when. Now, however, "they" seem to use TCO (Tail Call Optimisation) instead. That's progress. :-)

Lambda Calculus Expression Test-bed?

I would like to test out the Lambda Calculus interpreter that I've written against a fairly large test set of Lambda Calculus expressions. Does anyone know of a Lambda Calc expression generator I can use (couldn't find anything upon an initial search on Google)? These expressions would obviously have to be properly formed.
Better yet, while I have created various examples myself and worked out the solutions so I could check the results, does anyone know of a good (and large) set of worked out Lambda Calculus reduction problems with solutions? I can type in the expressions myself so it's more important to just have a good variety of simpler (and larger) lambda calculus expressions upon which I can test my interpreter (which at the moment models Normal Order and Call by Name evaluation strategies).
Any help or guidance would be greatly appreciated.
Asperti and Guerrini (1998, The Optimal Implementation of Functional Programming Languages, CUP Press; see especially chapters 5 and 6) describe some of the more painful lambda terms that arise from Jean-Jacques Levy's theory of families of redexes and labelled reduction: these give measures of the complexity of interactions between colliding beta reductions, where reducing either redex creates work for the other.
A relatively simple example of colliding reductions is:
let D = λx(x x); F= λf.(f (f y)); and I= λx.x in
(D (F I))
which has two beta-redexes and reduces to (y y): reduce either one of them by regular substitution and you will create two new redexes, each of which is related to a piece of structure in the original term.
Iterating Church numerals is good in the same way:
let T = λfx. f(f( x)) in
λfx.(T (T (T (T T))) f x)
(which reduces the Church numeral for 65 536), which generates a lot of colliding redexes.
Generally, applying higher-order terms to each other, regardless of whether they are "well-typed" or make obvious sense, is a good source of hard work that generates complex intermediate structure.

If I come from an imperative programming background, how do I wrap my head around the idea of no dynamic variables to keep track of things in Haskell?

So I'm trying to teach myself Haskell. I am currently on the 11th chapter of Learn You a Haskell for Great Good and am doing the 99 Haskell Problems as well as the Project Euler Problems.
Things are going alright, but I find myself constantly doing something whenever I need to keep track of "variables". I just create another function that accepts those "variables" as parameters and recursively feed it different values depending on the situation. To illustrate with an example, here's my solution to Problem 7 of Project Euler, Find the 10001st prime:
answer :: Integer
answer = nthPrime 10001
nthPrime :: Integer -> Integer
nthPrime n
| n < 1 = -1
| otherwise = nthPrime' n 1 2 []
nthPrime' :: Integer -> Integer -> Integer -> [Integer] -> Integer
nthPrime' n currentIndex possiblePrime previousPrimes
| isFactorOfAnyInThisList possiblePrime previousPrimes = nthPrime' n currentIndex theNextPossiblePrime previousPrimes
| otherwise =
if currentIndex == n
then possiblePrime
else nthPrime' n currentIndexPlusOne theNextPossiblePrime previousPrimesPlusCurrentPrime
where currentIndexPlusOne = currentIndex + 1
theNextPossiblePrime = nextPossiblePrime possiblePrime
previousPrimesPlusCurrentPrime = possiblePrime : previousPrimes
I think you get the idea. Let's also just ignore the fact that this solution can be made to be more efficient, I'm aware of this.
So my question is kind of a two-part question. First, am I going about Haskell all wrong? Am I stuck in the imperative programming mindset and not embracing Haskell as I should? And if so, as I feel I am, how do avoid this? Is there a book or source you can point me to that might help me think more Haskell-like?
Your help is much appreciated,
-Asaf
Am I stuck in the imperative programming mindset and not embracing
Haskell as I should?
You are not stuck, at least I don't hope so. What you experience is absolutely normal. While you were working with imperative languages you learned (maybe without knowing) to see programming problems from a very specific perspective - namely in terms of the van Neumann machine.
If you have the problem of, say, making a list that contains some sequence of numbers (lets say we want the first 1000 even numbers), you immediately think of: a linked list implementation (perhaps from the standard library of your programming language), a loop and a variable that you'd set to a starting value and then you would loop for a while, updating the variable by adding 2 and putting it to the end of the list.
See how you mostly think to serve the machine? Memory locations, loops, etc.!
In imperative programming, one thinks about how to manipulate certain memory cells in a certain order to arrive at the solution all the time. (This is, btw, one reason why beginners find learning (imperative) programming hard. Non programmers are simply not used to solve problems by reducing it to a sequence of memory operations. Why should they? But once you've learned that, you have the power - in the imperative world. For functional programming you need to unlearn that.)
In functional programming, and especially in Haskell, you merely state the construction law of the list. Because a list is a recursive data structure, this law is of course also recursive. In our case, we could, for example say the following:
constructStartingWith n = n : constructStartingWith (n+2)
And almost done! To arrive at our final list we only have to say where to start and how many we want:
result = take 1000 (constructStartingWith 0)
Note that a more general version of constructStartingWith is available in the library, it is called iterate and it takes not only the starting value but also the function that makes the next list element from the current one:
iterate f n = n : iterate f (f n)
constructStartingWith = iterate (2+) -- defined in terms of iterate
Another approach is to assume that we had another list our list could be made from easily. For example, if we had the list of the first n integers we could make it easily into the list of even integers by multiplying each element with 2. Now, the list of the first 1000 (non-negative) integers in Haskell is simply
[0..999]
And there is a function map that transforms lists by applying a given function to each argument. The function we want is to double the elements:
double n = 2*n
Hence:
result = map double [0..999]
Later you'll learn more shortcuts. For example, we don't need to define double, but can use a section: (2*) or we could write our list directly as a sequence [0,2..1998]
But not knowing these tricks yet should not make you feel bad! The main challenge you are facing now is to develop a mentality where you see that the problem of constructing the list of the first 1000 even numbers is a two staged one: a) define how the list of all even numbers looks like and b) take a certain portion of that list. Once you start thinking that way you're done even if you still use hand written versions of iterate and take.
Back to the Euler problem: Here we can use the top down method (and a few basic list manipulation functions one should indeed know about: head, drop, filter, any). First, if we had the list of primes already, we can just drop the first 1000 and take the head of the rest to get the 1001th one:
result = head (drop 1000 primes)
We know that after dropping any number of elements form an infinite list, there will still remain a nonempty list to pick the head from, hence, the use of head is justified here. When you're unsure if there are more than 1000 primes, you should write something like:
result = case drop 1000 primes of
[] -> error "The ancient greeks were wrong! There are less than 1001 primes!"
(r:_) -> r
Now for the hard part. Not knowing how to proceed, we could write some pseudo code:
primes = 2 : {-an infinite list of numbers that are prime-}
We know for sure that 2 is the first prime, the base case, so to speak, thus we can write it down. The unfilled part gives us something to think about. For example, the list should start at some value that is greater 2 for obvious reason. Hence, refined:
primes = 2 : {- something like [3..] but only the ones that are prime -}
Now, this is the point where there emerges a pattern that one needs to learn to recognize. This is surely a list filtered by a predicate, namely prime-ness (it does not matter that we don't know yet how to check prime-ness, the logical structure is the important point. (And, we can be sure that a test for prime-ness is possible!)). This allows us to write more code:
primes = 2 : filter isPrime [3..]
See? We are almost done. In 3 steps, we have reduced a fairly complex problem in such a way that all that is left to write is a quite simple predicate.
Again, we can write in pseudocode:
isPrime n = {- false if any number in 2..n-1 divides n, otherwise true -}
and can refine that. Since this is almost haskell already, it is too easy:
isPrime n = not (any (divides n) [2..n-1])
divides n p = n `rem` p == 0
Note that we did not do optimization yet. For example we can construct the list to be filtered right away to contain only odd numbers, since we know that even ones are not prime. More important, we want to reduce the number of candidates we have to try in isPrime. And here, some mathematical knowledge is needed (the same would be true if you programmed this in C++ or Java, of course), that tells us that it suffices to check if the n we are testing is divisible by any prime number, and that we do not need to check divisibility by prime numbers whose square is greater than n. Fortunately, we have already defined the list of prime numbers and can pick the set of candidates from there! I leave this as exercise.
You'll learn later how to use the standard library and the syntactic sugar like sections, list comprehensions, etc. and you will gradually give up to write your own basic functions.
Even later, when you have to do something in an imperative programming language again, you'll find it very hard to live without infinte lists, higher order functions, immutable data etc.
This will be as hard as going back from C to Assembler.
Have fun!
It's ok to have an imperative mindset at first. With time you will get more used to things and start seeing the places where you can have more functional programs. Practice makes perfect.
As for working with mutable variables you can kind of keep them for now if you follow the rule of thumb of converting variables into function parameters and iteration into tail recursion.
Off the top of my head:
Typeclassopedia. The official v1 of the document is a pdf, but the author has moved his v2 efforts to the Haskell wiki.
What is a monad? This SO Q&A is the best reference I can find.
What is a Monad Transformer? Monad Transformers Step by Step.
Learn from masters: Good Haskell source to read and learn from.
More advanced topics such as GADTs. There's a video, which does a great job explaining it.
And last but not least, #haskell IRC channel. Nothing can even come close to talk to real people.
I think the big change from your code to more haskell like code is using higher order functions, pattern matching and laziness better. For example, you could write the nthPrime function like this (using a similar algorithm to what you did, again ignoring efficiency):
nthPrime n = primes !! (n - 1) where
primes = filter isPrime [2..]
isPrime p = isPrime' p [2..p - 1]
isPrime' p [] = True
isPrime' p (x:xs)
| (p `mod` x == 0) = False
| otherwise = isPrime' p xs
Eg nthPrime 4 returns 7. A few things to note:
The isPrime' function uses pattern matching to implement the function, rather than relying on if statements.
the primes value is an infinite list of all primes. Since haskell is lazy, this is perfectly acceptable.
filter is used rather than reimplemented that behaviour using recursion.
With more experience you will find you will write more idiomatic haskell code - it sortof happens automatically with experience. So don't worry about it, just keep practicing, and reading other people's code.
Another approach, just for variety! Strong use of laziness...
module Main where
nonmults :: Int -> Int -> [Int] -> [Int]
nonmults n next [] = []
nonmults n next l#(x:xs)
| x < next = x : nonmults n next xs
| x == next = nonmults n (next + n) xs
| otherwise = nonmults n (next + n) l
select_primes :: [Int] -> [Int]
select_primes [] = []
select_primes (x:xs) =
x : (select_primes $ nonmults x (x + x) xs)
main :: IO ()
main = do
let primes = select_primes [2 ..]
putStrLn $ show $ primes !! 10000 -- the first prime is index 0 ...
I want to try to answer your question without using ANY functional programming or math, not because I don't think you will understand it, but because your question is very common and maybe others will benefit from the mindset I will try to describe. I'll preface this by saying I an not a Haskell expert by any means, but I have gotten past the mental block you have described by realizing the following:
1. Haskell is simple
Haskell, and other functional languages that I'm not so familiar with, are certainly very different from your 'normal' languages, like C, Java, Python, etc. Unfortunately, the way our psyche works, humans prematurely conclude that if something is different, then A) they don't understand it, and B) it's more complicated than what they already know. If we look at Haskell very objectively, we will see that these two conjectures are totally false:
"But I don't understand it :("
Actually you do. Everything in Haskell and other functional languages is defined in terms of logic and patterns. If you can answer a question as simple as "If all Meeps are Moops, and all Moops are Moors, are all Meeps Moors?", then you could probably write the Haskell Prelude yourself. To further support this point, consider that Haskell lists are defined in Haskell terms, and are not special voodoo magic.
"But it's complicated"
It's actually the opposite. It's simplicity is so naked and bare that our brains have trouble figuring out what to do with it at first. Compared to other languages, Haskell actually has considerably fewer "features" and much less syntax. When you read through Haskell code, you'll notice that almost all the function definitions look the same stylistically. This is very different than say Java for example, which has constructs like Classes, Interfaces, for loops, try/catch blocks, anonymous functions, etc... each with their own syntax and idioms.
You mentioned $ and ., again, just remember they are defined just like any other Haskell function and don't necessarily ever need to be used. However, if you didn't have these available to you, over time, you would likely implement these functions yourself when you notice how convenient they can be.
2. There is no Haskell version of anything
This is actually a great thing, because in Haskell, we have the freedom to define things exactly how we want them. Most other languages provide building blocks that people string together into a program. Haskell leaves it up to you to first define what a building block is, before building with it.
Many beginners ask questions like "How do I do a For loop in Haskell?" and innocent people who are just trying to help will give an unfortunate answer, probably involving a helper function, and extra Int parameter, and tail recursing until you get to 0. Sure, this construct can compute something like a for loop, but in no way is it a for loop, it's not a replacement for a for loop, and in no way is it really even similar to a for loop if you consider the flow of execution. Similar is the State monad for simulating state. It can be used to accomplish similar things as static variables do in other languages, but in no way is it the same thing. Most people leave off the last tidbit about it not being the same when they answer these kinds of questions and I think that only confuses people more until they realize it on their own.
3. Haskell is a logic engine, not a programming language
This is probably least true point I'm trying to make, but hear me out. In imperative programming languages, we are concerned with making our machines do stuff, perform actions, change state, and so on. In Haskell, we try to define what things are, and how are they supposed to behave. We are usually not concerned with what something is doing at any particular time. This certainly has benefits and drawbacks, but that's just how it is. This is very different than what most people think of when you say "programming language".
So that's my take how how to leave an imperative mindset and move to a more functional mindset. Realizing how sensible Haskell is will help you not look at your own code funny anymore. Hopefully thinking about Haskell in these ways will help you become a more productive Haskeller.

What is the 'accumulator' in HQ9+?

I was just reading a bit about the HQ9+ programming language:
https://esolangs.org/wiki/HQ9+,
https://en.wikipedia.org/wiki/HQ9+, and
https://cliffle.com/esoterica/hq9plus,
and it tells me something about a so-called “accumulator” which can be incremented, but not accessed. Also, using + doesn't manipulate the result, so that the input
H+H
gives the result:
Hello World
Hello World
Can anyone explain me how this works, what it does, and whether it makes any sense? Thanks.
Having recently completed an implementation in Clojure (which follows) I can safely say that the accumulator is absolutely central to a successful implementation of HQ9+. Without it one would be left with an implementation of HQ9 which, while doubtless worthy in and of itself, is clearly different, and thus HQ9+ without an accumulator, and the instruction to increment it, would thus NOT be an implementation of HQ9+.
(Editor's note: Bob has taken his meds today but they haven't quite kicked in yet; thus, further explanation is perhaps needed. What I believe Bob is trying to say is that HQ9+ is useless as a programming language, per se; however, implementing it can actually be useful in the context of learning how to implement something successfully in a new language. OK, I'll just go and curl up quietly in the back of Bob's brain now and let him get back to doing...whatever it is he does when I'm not minding the store...).
Anyways...implementation in Clojure follows:
(defn hq9+ [& args]
"HQ9+ interpreter"
(loop [program (apply concat args)
accumulator 0]
(if (not (empty? program))
(case (first program)
\H (println "Hello, World!")
\Q (println (first (concat args)))
\9 (apply println (map #(str % " bottles of beer on the wall, "
% " bottles of beer, if one of those bottles should happen to fall, "
(if (> % 0) (- % 1) 99) " bottles of beer on the wall") (reverse (range 100))))
\+ (inc accumulator)
(println "invalid instruction: " (first program)))) ; default case
(if (> (count program) 1)
(recur (rest program) accumulator))))
Note that this implementation only accepts commands passed into the function as parameters; it doesn't read a file for its program. This may be remedied in future releases. Also note that this is a "strict" implementation of the language - the original page (at the Wayback Machine) clearly shows that only UPPER CASE 'H's and 'Q's should be accepted, although it implies that lower-case letters may also be accepted. Since part of the point of implementing any programming language is to strictly adhere to the specification as written this version of HQ9+ is written to only accept upper-case letters. Should the need arise I am fully prepared to found a religion, tentatively named the CONVOCATION OF THE HOLY CAPS LOCK, which will declare the use of upper-case to be COMMANDED BY FRED (our god - Fred - it seems like such a friendly name for a god, doesn't it?), and will deem the use of lower-case letters to be anathema...I MEAN, TO BE ANATHEMA!
Share and enjoy.
Having written an implementation, I think I can say without a doubt that it makes no sense at all. I advise you to not worry about it; it's a very silly language after all.
It's a joke.
There's also an object-oriented extension of HQ9+, called HQ9++. It has a new command ++ which instantiates an object, and, for reasons of backwards-compatibility, also increments the accumulator register twice. And again, since there is no way to store, retrieve, access, manipulate, print or otherwise affect an object, it's completely useless.
It increments something not accessible, not spec-defined, and apparently not really even used. I'd say you can implement it however you want or possibly not at all.
The right answer is one that has been hinted at by the other answers but not quite stated explicitly: the effect of incrementing the accumulator is undefined by the language specification and left as a choice of the implementation.
Actually, I am mistaken.
The accumulator is the register to which the result of the last calculation is stored. In an Intel x86, any register may be specified as the accumulator, except in the case of MUL.
Source:
http://en.wikipedia.org/wiki/Accumulator_(computing)
I was quite surprised the first time I visited the third site in your question to find out a schoolmate of mine wrote the OCaml implementation at the bottom of the page.
(updated site link)
I think that there is, or there must be, a reason for this accumulator and the most important operation on it - increment: future compatibility.
Very often we see that a language is invented, often inspirated by some other language, with of course some salt (new concepts, or at least some improvement). Later, when the language spreads, problems arise and modifications, additions or whatever are introduced. That is the same as saying "we were wrong, this thing was necessary but we didn't tought of it at the time".
Well, this accumulator idea in HQ9+ is exactly the opposite. In the future, when the language will be spread, nobody will be able to say "we need an accumulator, but HQ9+ lacks it", because the standard of the language, even in its first draft, states that an accumulator is present and it is even modifiable (otherwise, it would be a non-sense).

(x86) Assembler Optimization

I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows.
High-level concepts (I do not have any "source code": there is no syntax nor lexical translation, and all languages are regular) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between regular languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels).
My problem is the following; if I have higher-level code (X and Y) translated to lower-level code (x, y, U and V), then an example of such a translation is, in pseudo-code:
x + U(f) // generated by X
+
V(f) + y // generated by Y
(An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V). This needs to be 'optimized' into:
x + y
(essentially removing the "useless" code)
My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null, meaning for all x find U(x) followed by V(x) and replace by null. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels.
What do you suggest? What would be a good approach to optimize and produce fast x86 assembly?
What you should actually do is build an Abstract Syntax Tree (AST).
It is a representation of the source code in the form of a tree, that is much easier to work with, especially to make transformations and optimizations.
That code, represented as a tree, would be something like:
(+
(+
x
(U f))
(+
(V f)
y))
You could then try to make some transformations: a sum of sums is a sum of all the terms:
(+
x
(U f)
(V f)
y)
Then you could scan the tree and you could have the following rules:
(+ (U x) (V x)) = 0, for all x
(+ 0 x1 x2 ...) = x, for all x1, x2, ...
Then you would obtain what you are looking for:
(+ x y)
Any good book on compiler-writing will discuss a lot on ASTs. Functional programming languages are specially suited for this task, since in general it is easy to represent trees and to do pattern matching to parse and transform the tree.
Usually, for this task, you should avoid using regular expressions. Regular expressions define what mathematicians call regular languages. Any regular language can be parsed by a set of regular expressions. However, I think your language is not regular, so it cannot be properly parsed by regexps.
People try, and try, and try to parse languages such as HTML using regular expressions. This has been extensively discussed here in SO, and you cannot parse HTML with regular expressions. There will always be an exceptional case in which your regular expressions would fail, and you would have to adapt it.
It might be the same with your language: if it is not regular, you should avoid lots of headaches and not try to parse it (and especially "transform" it) using regular expressions.
I'm having a lot of trouble understanding this question, but I think you will find it useful to learn something about term-rewriting systems, which seems to be what you are proposing. Whether the mechanism is tree rewriting (always works) or regular expressions (will work for some languages some of the time and other languages all of the time) is of secondary importance.
It is definitely possible to optimize object code by term rewriting. You probably also will benefit from learning something about peephole optimization; a good place to start, because it is very strong on the fundamentals, is a paper by Davidson and Fraser on a retargetable peephole optimizer. There's also excellent later work by Benitez and Davidson.