As Emacs is open source, we don’t need to speculate. We can just read the code. I’m lazy though, so I prefer to speculate 😉 If you were making a text editor, what would you use for your buffer type?
Array of Chars
The obvious, naive choice is an array of chars. But that means that, while append may be fast (until you need to resize your array at least), insertion is very slow. To insert a character in the middle of an array of chars, you need to move all of the characters following the insertion point up by one character. If we assume that insertions are evenly distributed, inserting a character will take, on average, time proportional to N/2.
Queue up Inserts
We can be a bit smarter than that. With normal text editor usage, for any given insert, we will probably insert a bunch of characters in the same place. We could append those characters into a preallocated array, and after a certain amount of time or number of characters, insert those characters into the array of chars. This brings the amortized speed cost down to N/M/2 where M is the average number of characters inserted in one go.
A Tree of Characters
So much for arrays. How about a tree? Assuming the tree, is balanced, insertion will take log2 N. The downside is that traversal is a lot slower (still order N, but a decent constant factor slower). And we are also likely to lose cache locality.
Side note: I often surprised that the main ‘scripting’ languages don’t provide much more than the equivalent of perl scalars, arrays and hashes. Sure, Python has tuples and Ruby has symbols (okay, I do miss symbols in Perl), but where are the built-in trees? Even Lisp has them. I had a quick look for trees in the CPAN and found Tree::Binary and Tree::RedBlack. I couldn’t find at a quick glance whether Tree::Binary self-balances or not. Hmmm…
The final simple option is a rope (PDF). This is a binary tree where the leaves, instead of being characters, are arrays of characters. This improves cache locality (depending on the average size of the array), and traversal speed, although it isn’t as fast as a simple array of characters.
What would you use?