Thursday, May 28, 2009

Computer performance vibes

There are a number of topics pertaining to computer performance that I want to learn more about. As an ex-EE, I should be keeping up with this stuff better.

Processors are fast, memory chips are slow. We put a cache between them so that the processor need not go out to memory on every read and write. There is a dense body of thought about cache design and optimization. I might blog about this stuff in future. It's a kinda heavy topic.

One way to make processors run really fast is to arrange steps in a pipeline. The CPU reads instruction one from instruction memory, and then it needs to read something from data memory, do an arithmetic operation on it, and put the result in a register. While reading from data memory, the CPU is simultaneously reading instruction two. While doing arithmetic, it's reading instruction three, and also doing the memory read for instruction two. And so forth, so that the processor is chopped up into a sequence of sub-processors, each busy all the time.

Apple has a nice, more detailed discussion, here.
But there is a complication with pipelining. Some of these instructions are branch instructions, which means that the next instruction could be either of two different ones. That's potentially a mess, because you've already got the pipeline full of stuff when you discover whether or not you're taking the branch, and you might find that the instructions you fetched were the wrong ones, so you have to go back and do all those same operations with a different sequence of instructions. Ick.
The work-around is branch prediction. The CPU tries to figure out, as accurately as it can, which sequence it will end up going down, and if all goes well, does that early enough to fill the pipeline correctly the first time. It doesn't have to be perfect, but it should try to guess correctly as often as possible.
A couple more things they're doing these days. One is to perform several memory transfers per cycle. Another is something Intel calls hyper-threading, where some of the CPU's registers are duplicated, allowing it to behave like two separate CPUs. This can be a win if one half is stalled waiting for a memory access; the other half just plows ahead.
That's all the vaguely intelligent stuff I have to say on this topic at the moment. Maybe I'll go into more detail in future, no promises.

No comments: