One of the things I was determined to implement in BlitzMax right from the start was a decent 'register allocator'.
Register allocation is the process of mapping 'local' program variables (i.e. those variables declared 'Local' in BlitzMax) to CPU registers (i.e. the finite set of 'variables' inside the actual x86 CPU).
Blitz3D and BlitzPlus largely ignored this. Instead, all local variables were simply stored in memory (on the stack), and loaded into CPU registers as required. For example, Blitz3D code like this…
…would involve approximately the following steps:
1) Write the constant '1' into the memory location for variable 'x'.
2) Write the constant '2' into the memory location for variable 'y'.
3) Read the variables 'x' and 'y' from memory and add them together.
4) Store the result into the memory location for variable 'sum'.
That's 2 reads and 3 writes from/to system memory, and system memory is generally slow (even with fancy caches) when compared with the CPU (but I guess it'll be catching up soon…).
BlitzMax on the other hand will try to represent the variables x,y and sum using CPU registers, so equivalent BlitzMax code might involve something like…
1) Write the constant '1' to CPU register EAX.
2) Write the constant '2' to CPU register ECX.
3) Write the sum of EAX and ECX to CPU register EDX.
…none of which 'hits' memory so is theoretically much faster.
And, happily, this time at least, theory matches practice - BlitzMax code runs significantly faster than equivalent Blitz3D code in general, and this is almost completely due to BlitzMax's register allocator.
There is, however, one fairly major drawback to writing a register allocator - it's *@#!$^& hard! ESPECIALLY on the crappy Intel X86 which has only about 8 general purpose registers.
In particular, the trickiest thing to deal with is 'spillage' - handling the situation where there just aren't enough CPU registers to go round and some local variables MUST be 'spilled' to memory.
In this case, a spilled variable ends up behaving much like a Blitz3D/BlitzPlus variable above does - it must be constantly loaded/stored from/to memory. Therefore, the most important job of the register allocator becomes minimizing the number of spilt variables.
This is tricky because it can be hard to know precisely what effect spilling a variable will have. Spilling a variable may even cause a 'cascade' of more spills, because even though a spilled variable is stored in memory, it still needs to be moved into a CPU register at SOME point to be used for anything useful (e.g. arithmetic, memory addressing etc)!
And every now and then, someone will come up with a little piece of code in BlitzMax that causes this effect. The code will usually be small, involve about 8 or so local variables and will cause the compiler to lock up completely as the register allocator gets caught in an endless allocate/spill/allocate/spill loop...
In fact, it happened again just yesterday - hence this post!
I *think* I have it sorted now, but to be honest I seriously doubt I'll ever be able to say with 100% confidence that I know it works!