Nowadays with constexpr, we can make pure functions that are calculated compile time. The restriction on these functions is that they may only consist of one statement (though additional lines with typedefs, static_assert, etc are OK), and this statement may only call other constexpr functions.
If-statements are easy - just use the ternary ?: operator or, if you want to show off, template specialization. But the only way to loop with these restrictions is by using recursion. Furthermore, neither clang 3.1 nor GCC 4.7 (nor the current build of GCC 4.8) support tail call elimination in these constexprs, so normal linear loops will still eat stack space if we loop for a while. Also, the standard recommends that the default maximum recursion depth should be 512, which means that if we want to do something silly/fun/interesting compile time, we have to mess with proprietary compiler switches to get the program to compile. No fun.
In this post, I show one way of working around those limitations.
But first - why would one want to do constexpr meta programming, when we already have template meta programming? Constexprs and templates are almost exactly as powerful, except that template meta programming is lazy, which some might think fits better into a functional style. A more important difference is that template meta programming can work with types, which means that we can use it for cool type-level correctness stuff. So, once again, why would we use constexprs? 1) Constexprs are evaluated several times faster in the compilers I know of, which can be important. 2) The syntax is much nicer.
Simple recursion
Simple recursion is not a problem:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
This program will reach a depth of at the most sqrt(220), which is 210 = 1024, and since we check at most slightly less than half of those, we will be fine. Since 220 - 1 is not a prime number, we will be even more fine. However if we try check if 231 - 1 is prime, the compiler will refuse to do so. In this case we can fix it by increasing the maximum constexpr evaluation depth for our compiler, but in general that might take too much RAM, so lets do our own tail call elimination with continuation passing style and a trampoline instead.
This way, we have a trampoline function that takes a CPS function as an argument and loops, continously calling the continuation returned from the previous call.
Exponential recursion
In order to use a trampoline, we still need an infinite or near
infinite loop for the basic trampoline function. This is where
exponential recursion comes in. The recursion in rec_isprime
behaves
sort of like a linked list - each invokation links to the next. If
each rec_isprime
statement instead called two rec_isprime
, which in
turn called two other, etc, we would get a call graph that was more
like a binary tree than a linked list. This way we would not reach our
maximum recursion depth until we visited 2512 nodes and we would
never blow the stack!
Here is such a function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
This program will visit 224 - 1 = 16777215 nodes. On my normal desktop computer with clang 3.1 this program took 27 seconds to compile and the process uses no almost memory at all. With GCC 4.7.2 it compiles in 0.2 seconds! This is because GCC uses memoization for both templates and constexprs, so it will instantly see that the other count call is unneccessary and transform our 16 million node tree to a linked list of depth 24 again.
That is sort of cheating, so let’s change the function slightly:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Now memoization will not work. Clang takes 26 seconds and still uses no memory. GCC surprises us again, now taking 44 seconds and using over 3 gigs of RAM! If we increase MAXD to 25, GCC will again use twice as much RAM. I don’t know if this is because of memoization or something else. I have reported it as a bug to GCC.
Still though, it works!
The trampoline
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|
As you can see, we are allowed to create objects with constexpr constructors in a constexpr, so we can create containers like linked lists and trees. You can create maps, (simulated) vectors and sets on top of trees, which means that anything is possible. In truth, prime number calculation is not very exciting, but this post is running long as it is, so a compile-time CPS implementation of Game of Life or something will have to wait for next time.
Now, imagine the cheating possibilities in The Computer Language Benchmarks Game ;). Yes, yes, I know it can be done in some other compiled languages as well. Template Haskell and the Lisps come to mind.