<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: What is the Datatype Behind an Emacs Buffer</title>
	<atom:link href="http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/feed/" rel="self" type="application/rss+xml" />
	<link>http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/</link>
	<description>Leveraging Perl and Emacs</description>
	<lastBuildDate>Tue, 07 May 2013 11:25:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: The Stig</title>
		<link>http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/#comment-8444</link>
		<dc:creator><![CDATA[The Stig]]></dc:creator>
		<pubDate>Sat, 30 Oct 2010 20:16:08 +0000</pubDate>
		<guid isPermaLink="false">http://curiousprogrammer.wordpress.com/?p=1278#comment-8444</guid>
		<description><![CDATA[Many versions of Emacs, including GNU, use a single contiguous character array virtually split in two sections separated by a gap. To insert the gap is first moved to the insertion point. Inserted characters fill into the gap, reducing its size. If there&#039;s insufficient space to hold the characters the entire buffer is reallocated to a new larger size and the gaps coalesced at the previous insertion point.

The naive look at this and say the performance must be poor because of all the copying involved. Wrong. The copy operation is incredibly quick and can be optimized in a variety of ways. Gap buffers also take advantage of usage patterns. You might jump all over the window before focusing and inserting text. The gap doesn&#039;t move for display – only for insert (or delete).

On the other hand, inserting a character block at the head of a 500MB file then inserting another at the end is the worst case for the gap approach, especially if the gap&#039;s size is exceeded. How often does that happen?

Contiguous memory blocks are prized in virtual memory environments because less paging is involved. Moreover, reads and writes are simplfied because the the file doesn&#039;t have to be parsed and broken up into some other data structure. Rather, the file&#039;s internal representation in the gap buffer is identical to disk and can be read into and written out optimally. Writes themselves can be done with a single system call (on *nix).

The gap buffer is the best algorithm for editing text in a general way. It uses the least memory and has the highest aggregate performance over a variety of use cases. Translating the gap buffer to a visual window is a bit trickier as line context must be constantly maintained.]]></description>
		<content:encoded><![CDATA[<p>Many versions of Emacs, including GNU, use a single contiguous character array virtually split in two sections separated by a gap. To insert the gap is first moved to the insertion point. Inserted characters fill into the gap, reducing its size. If there&#8217;s insufficient space to hold the characters the entire buffer is reallocated to a new larger size and the gaps coalesced at the previous insertion point.</p>
<p>The naive look at this and say the performance must be poor because of all the copying involved. Wrong. The copy operation is incredibly quick and can be optimized in a variety of ways. Gap buffers also take advantage of usage patterns. You might jump all over the window before focusing and inserting text. The gap doesn&#8217;t move for display – only for insert (or delete).</p>
<p>On the other hand, inserting a character block at the head of a 500MB file then inserting another at the end is the worst case for the gap approach, especially if the gap&#8217;s size is exceeded. How often does that happen?</p>
<p>Contiguous memory blocks are prized in virtual memory environments because less paging is involved. Moreover, reads and writes are simplfied because the the file doesn&#8217;t have to be parsed and broken up into some other data structure. Rather, the file&#8217;s internal representation in the gap buffer is identical to disk and can be read into and written out optimally. Writes themselves can be done with a single system call (on *nix).</p>
<p>The gap buffer is the best algorithm for editing text in a general way. It uses the least memory and has the highest aggregate performance over a variety of use cases. Translating the gap buffer to a visual window is a bit trickier as line context must be constantly maintained.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan</title>
		<link>http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/#comment-8387</link>
		<dc:creator><![CDATA[Nathan]]></dc:creator>
		<pubDate>Thu, 30 Sep 2010 15:28:44 +0000</pubDate>
		<guid isPermaLink="false">http://curiousprogrammer.wordpress.com/?p=1278#comment-8387</guid>
		<description><![CDATA[Well, you can ask the vi guys why they use linked lists.  (I think they do, though I haven&#039;t checked.)

Text editing algorithms (movement, display, etc.) are somewhat more straightforward if you already know where your lines break.

Ropes are significantly more complicated than a simple linked list of lines, particularly if you&#039;re working in a language without GC.  I can see memory fragmentation becoming a problem for ropes, too.

It&#039;d be interesting to look at text editors released over the last decade or so to see what their internal model is.  I doubt any of them use anything more sophisticated than linked lists, perhaps Emacs&#039;s gap buffer.  I&#039;d be surprised if ropes were used.]]></description>
		<content:encoded><![CDATA[<p>Well, you can ask the vi guys why they use linked lists.  (I think they do, though I haven&#8217;t checked.)</p>
<p>Text editing algorithms (movement, display, etc.) are somewhat more straightforward if you already know where your lines break.</p>
<p>Ropes are significantly more complicated than a simple linked list of lines, particularly if you&#8217;re working in a language without GC.  I can see memory fragmentation becoming a problem for ropes, too.</p>
<p>It&#8217;d be interesting to look at text editors released over the last decade or so to see what their internal model is.  I doubt any of them use anything more sophisticated than linked lists, perhaps Emacs&#8217;s gap buffer.  I&#8217;d be surprised if ropes were used.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jared</title>
		<link>http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/#comment-8381</link>
		<dc:creator><![CDATA[Jared]]></dc:creator>
		<pubDate>Wed, 29 Sep 2010 22:02:04 +0000</pubDate>
		<guid isPermaLink="false">http://curiousprogrammer.wordpress.com/?p=1278#comment-8381</guid>
		<description><![CDATA[Hi Nathan,

Thanks for the comment.

Sure I could implement almost all of the basic datatypes if I wanted to (in fact I&#039;ve partially implemented &lt;a href=&quot;http://curiousprogrammer.wordpress.com/2010/05/01/programming-with-types/&quot; rel=&quot;nofollow&quot;&gt;extensible vectors&lt;/a&gt; in elisp).  I could also implement my own compiler, or database, or socket library or whatever.  But why should I?  A huge reason I use an imperfect, yet good enough language like Perl is the excellent libraries it provides.  If I want a libraryless language, I know where to find Scheme.

I don&#039;t agree that trees are much like hash tables, but that may be worth a whole post.

And yes, a linked list of lines would, in some ways, be better than an array.  But is there any way it would be better than a rope?]]></description>
		<content:encoded><![CDATA[<p>Hi Nathan,</p>
<p>Thanks for the comment.</p>
<p>Sure I could implement almost all of the basic datatypes if I wanted to (in fact I&#8217;ve partially implemented <a href="http://curiousprogrammer.wordpress.com/2010/05/01/programming-with-types/" rel="nofollow">extensible vectors</a> in elisp).  I could also implement my own compiler, or database, or socket library or whatever.  But why should I?  A huge reason I use an imperfect, yet good enough language like Perl is the excellent libraries it provides.  If I want a libraryless language, I know where to find Scheme.</p>
<p>I don&#8217;t agree that trees are much like hash tables, but that may be worth a whole post.</p>
<p>And yes, a linked list of lines would, in some ways, be better than an array.  But is there any way it would be better than a rope?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan</title>
		<link>http://curiousprogrammer.wordpress.com/2010/09/28/emacs-buffer-datatype/#comment-8380</link>
		<dc:creator><![CDATA[Nathan]]></dc:creator>
		<pubDate>Wed, 29 Sep 2010 01:30:49 +0000</pubDate>
		<guid isPermaLink="false">http://curiousprogrammer.wordpress.com/?p=1278#comment-8380</guid>
		<description><![CDATA[I think most languages don&#039;t include trees because hash tables are close enough for most purposes.  You don&#039;t usually need to traverse key/value pairs in the map in sorted order, which is about the only thing a tree buys you over a hash.  And if you know that you care about sortedness, you probably know enough to go implement your own.

One other obvious datatype is to use a linked list of lines; that makes copying characters around for inserts significantly less painful, as your N is only the number of characters in a line, rather than the size of your buffer.]]></description>
		<content:encoded><![CDATA[<p>I think most languages don&#8217;t include trees because hash tables are close enough for most purposes.  You don&#8217;t usually need to traverse key/value pairs in the map in sorted order, which is about the only thing a tree buys you over a hash.  And if you know that you care about sortedness, you probably know enough to go implement your own.</p>
<p>One other obvious datatype is to use a linked list of lines; that makes copying characters around for inserts significantly less painful, as your N is only the number of characters in a line, rather than the size of your buffer.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
