The essay that follows was sent to r6rs-discuss@r6rs.org on Tue Jul 10 06:52:31 2007. On 25 July, I noticed that the version of this essay that was posted to the official archive of that list had been truncated; see http://lists.r6rs.org/pipermail/r6rs-discuss/2007-July/003080.html I am therefore reposting the complete essay, in exactly the form that I originally sent to r6rs-discuss@r6rs.org. In case the problem reoccurs, the complete essay is also available on my personal web site at http://www.ccs.neu.edu/home/will/R6RS/essay.txt I apologize for failing to notice this problem earlier. * * * an essay on language design: fixing the syntactic record layer Introduction ============ More than twenty years have passed since I wrote this [1]: Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today. I still believe that first sentence, and I still believe Scheme ought to demonstrate what is claimed in the second sentence, but the draft we are being asked to ratify does not always do that. This shortcoming of the candidate draft can be seen in the modularity and interoperability problems that beset the syntactic and procedural record layers. As I will show, these problems are caused by artificial restrictions that have been imposed upon the syntactic layer. Removing those weaknesses would remove the problems. A last-minute change in the 5.97 draft attempted to fix things by piling yet another feature, a parent-rtd clause, on top of the syntactic layer [2]. The presumed purpose of that parent-rtd clause was to address Andre van Tonder's observation that incompatibilities between the syntactic and procedural record layers create a modularity problem: You cannot define a new record type that inherits from an existing record type without knowing whether the base type was defined by the syntactic or by the procedural layer [3,4]. That also implies that record definitions are brittle: Unless a record type is sealed, its definition cannot be changed from using the syntactic layer to using the procedural layer, or vice versa, without breaking all record types that inherit from it. Although the editors acted with the best of intentions, their addition of the parent-rtd clause did not solve the problems it was intended to solve. Even with the parent-rtd feature, you *still* have to know whether the base record type was defined using the syntactic or record layer, and you *still* can't change a record definition from one layer to the other without running the risk of breaking client code. To make matters worse, the 5.97 draft added a couple of questionable statements that attempt to excuse the interoperability problems while asserting privileged status for the draft's syntactic layer. One of those statements is based upon a patently false claim. The editors have submitted this draft to the Steering Committee as a candidate for ratification, so there is no meaningful technical review of these last-minute changes apart from the ratification vote itself. Abstract ======== I will summarize the interoperability problems mandated by library chapter 8 of the 5.97 draft, trace them to their root cause, show how they could easily be fixed by removing artificial restrictions that are imposed by the syntactic layer, and conclude by showing that the two exculpatory statements of that chapter are partly false and thoroughly misleading. Symptoms ======== That the syntactic and procedural record layers do not interoperate well has been known for a while now, and had been acknowledged by the editors, who had declared their intention not to do anything about it [5]. I did not consider that to be an absolute barrier to ratification, because better syntactic layers would have been proposed as SRFIs, and one of those alternatives might eventually have replaced the R6RS syntactic layer. That would have been a better outcome than piling on still more features without fixing the fundamental problem. The last-minute addition of parent-rtd addressed the most obvious of the interoperability problems, which was first mentioned in public by my formal comment 90 [5], but left these others in place: * Record types defined by the syntactic layer are not interchangeable with record types defined by the procedural layer. * In consequence, the code you write for a record type definition that inherits from some base type depends upon whether that base type was defined using the syntactic or procedural layer. * Both layers are complex, which makes it hard for a casual reader to understand their relationships. * The procedural layer is the more expressive layer, so the draft's new warnings that try to frighten programmers into preferring the syntactic layer would have limited impact even if they were true. The procedural layer is more expressive because it can do everything the syntactic layer can do, and it can also be used to create multiple constructor-descriptors for a single record type descriptor [6]. That, of course, is a cue for someone to jump up and say "We can fix that by adding a new clause to the syntactic layer!" Adding yet another feature would be exactly the wrong thing to do. We ought to fix the problem, not try to cover it with still more sterile adhesive strips. The proper course of action is to understand why these problems matter, why they arose, and how to fix them. Then we should fix them. The Impending Records War ========================= By specifying two barely interoperable record systems, and advocating the more complex and less expressive of the two, the 5.97 draft would create an unnecessary dilemma for organizations that use Scheme. Most will deal with incompatibilities between the two record layers as they arise. After dealing with several instances of the problem, some organizations will standardize on one or the other of the record layers. Some will choose the procedural layer, because it is more expressive or because it is more in keeping with Scheme's roots as a higher order procedural language. Others will choose the syntactic layer because that is what the 5.97 draft suggests, or because Scheme's macro system is really cool. When these organizations import code that uses the "wrong" record layer, they will rewrite it to use their organization's standard layer. When they get tired of rewriting code, they will clamor for the "wrong" record layer to be expunged from the standard. That conflict is unnecessary. We do not have to fight over which record layer is wrong, because we could fix things so both are right. That is not hard. We should do it. The Root Cause ============== The root technical problem is easy to understand. I'll digress for a few paragraphs to give you a chance to figure it out before I do. A friend of mine remarked that it is impossible to design a record system for Scheme that won't lead to interoperability problems. This is Scheme, after all. Any Scheme programmer can define a new syntactic layer for records, and its notion of a record type might be different from the standard notion, so programmers shouldn't expect to be able to define a record type that inherits from any other programmer's record type. That's true, up to a point. The point, of course, is that we should be able to define records that inherit from any record system that uses the standard notion of a record type. The 5.97 draft doesn't have a standard notion of a record type. It has *two* standard notions of a record type, with context-restricted coercions between them. That is the root technical cause of the modularity and interoperability problems. The solution is to define a single standard notion of a record type, and to use that one notion as the basis for both the syntactic and the procedural layers. To do that, of course, the standard notion of a record type will have to be a first-class object. The syntactic layer can deal with first-class values by deferring them to run time, but the procedural layer can't reach back in time to deal with macro or expand-time values. This has been a source of controversy among the editors. The 5.96 and earlier drafts fudged by saying a record type is an "expand-time or run-time description". The 5.97 draft changed that phrase to "expand-time representation of the record-type", thereby institutionalizing the interoperability problems even as it pretended to do something about them. In the 5.97 draft, the procedural layer's notion of a record type is an rtd (record type descriptor). The syntactic layer's notion of a record type is an expand-time representation that bundles an rtd with a preferred constructor-descriptor. I will now describe a straightforward solution to this muddle, based upon the following standard notion of record type: A record type is an rtd. To maintain compatibility with the syntactic layer of the 5.97 draft, and for that reason only, every non-opaque rtd will be associated with a preferred constructor-descriptor. The preferred constructor-descriptor is the one associated with the rtd in a special global table or, if that table contains no preferred constructor-descriptor for rtd, then the preferred constructor descriptor is the one computed by (make-record-constructor-descriptor rtd #f) where is the parent's preferred constructor-descriptor, or #f if there is no parent. Note that the global table is a run-time object that holds run-time constructor-descriptors. Note also that any implementors who would like to maintain an expand-time or compile-time table of (conservative approximations to) the information contained within that run-time table are welcome to do so. How does an rtd become associated with its preferred constructor-descriptor? By having the two be passed as arguments to a special procedure that is known to the macro/library/compiler/whatever system, but is not exported by any of the standard libraries. In other words, only the syntactic layer can associate an rtd with a preferred constructor-descriptor other than the default. I understand that the preferred constructor-descriptors are an ugly hack. They would not be present in any record system I would design from scratch. Why then am I proposing these preferred constructor-descriptors? Because I am taking a lesson from C++, which caught on in part because it was bug-compatible with C. The system I am about to describe is, in one of Mike Sperber's favorite phrases, a conservative extension of the 5.97 record system. That means everything that would work in the 5.97 system would work in the system I am about to describe, and a number of things that wouldn't work in the 5.97 system, but should, will indeed work in the system I describe. How do we arrange that? By removing the artificial restrictions mandated by the 5.97 draft. (We'll keep the artificial restriction that limits the procedural layer's preferred constructor-descriptors to default constructor-descriptors. That restriction would be easy to remove also, but removing it might complicate the optional expand-time or compile-time bookkeeping that appears to have been the driving force behind the 5.97 design.) Proposal ======== To avoid still more discussion of the API for the R6RS record layers, I propose we keep the syntax and almost all of the semantics of the 5.97 syntactic layer, and keep all the procedures and all the semantics of the 5.97 procedural and inspection libraries. I further propose we extend the syntactic layer by eliminating certain weaknesses and restrictions. We will: * Require define-record-type to bind the to the rtd, in the same group of definitions that binds the constructor, predicate, accessors, and mutators. * Allow the and of a parent-rtd clause to be arbitrary expressions, as in the 5.97 draft. (Notice, however, that the bound by a define-record-type is now an ordinary variable and can serve as the without having to resort to a use of record-type-descriptor). * Extend the parent clause to allow any expression, which must of course evaluate to an rtd. * Extend record-type-descriptor to allow any expression as its , provided the expression evaluates to an rtd; in other words, record-type-descriptor would become a procedure. * Extend record-constructor-descriptor to allow any expression as its , provided the expression evaluates to an rtd; it would then evaluate to the rtd's preferred constructor-descriptor. In other words, record-constructor-descriptor would become a procedure. I might have missed something, but I believe that's all it takes. Note that record-type-descriptor has become unnecessary. It is nothing more than the identity function restricted to record type descriptors. If I weren't trying to describe a conservative extension of the 5.97 draft, I would urge removal of record-type-descriptor from the language [7]. Note that both the scope and semantics of a bound by the syntactic layer have become clearer. The is no longer a name for some mysterious "expand-time representation" that is neither a run-time object nor a macro. It is now an ordinary variable that obeys ordinary scope rules, can be exported or imported in the usual way, for run time, and has a first class object as its value. I'm not going to claim this is a good record system, but it offers all the features of the 5.97 draft, all of the performance (for all use cases that can even be expressed using that draft), and none of the modularity and interoperability problems associated with the record layers of that draft. Performance =========== The 5.97 draft contains a couple of new paragraphs that attempt to justify its limitations by appeal to matters of performance. Page 16 says: However, the record operations provided through the procedural layer may be significantly less efficient than the operations provided through the syntactic layer. Therefore, alternative implementations of syntactic record-type definition [sic] should, when possible, expand into the syntatic [sic] layer rather than the procedural layer. To put that in perspective, let me point out that the map procedure may be significantly less efficient than using a do loop. Indeed, there have been many implementations of Scheme in which do loops are more efficient than calls to map. Despite that fact, none of the Scheme reports have ever advocated using do loops instead of map. To advocate such things would be inappropriate for an implementation-neutral standard. In typical uses of records, the base record type will be defined at the top level of a library, where the variable that holds the rtd will be immutable, as will all of the other top-level variables that are defined in terms of the rtd. That makes it almost as easy to optimize code written using the procedural layer as code written using the syntactic layer. Sure, some compilers may optimize one without bothering to optimize the other, but most would optimize neither or both. In any case, it is obvious that any program that can be written under the restrictions of the 5.97 draft is also a program under my proposal. If some macro expander and/or compiler were written to record some expand-time information when the syntactic layer of the 5.97 draft is used, then they can record exactly the same information for the syntactic layer of my proposal. The only additional complication of my proposal is that the macro expander and/or compiler would have to recognize when the is an expression other than a variable that was bound by define-record-type. Recognizing that is trivial. My proposal would not require any new flow analysis. The advanced optimizations that require flow analysis would use essentially the same flow analysis under my proposal as they would under the 5.97 draft. Consider, for example, that the 5.97 draft allows the rtd associated with a to escape via the record-type-descriptor syntax. That means the rtd of a that is exported by a library, whether explicitly or implicitly, may escape within some importing library [8]. Hence any optimizations that require flow analysis of the rtd must either defer the optimization until a whole-program analysis can be performed, or else assume that the rtd of an exported will flow into arbitrary contexts. In other words, the rtd-flow analysis required by the 5.97 draft is already as bad as it could be, so my proposal can't possibly make it any worse. From page 18: Note: Use of the parent-rtd clause generally forces an implementation to delay the generation of constructor, accessor, and mutator code until the record-type definition is evaluated at run time, since the type of the parent is not generally known until then. That is a false statement. The editors might as well claim that the code for a lambda expression cannot be generated until run time, since the values of its free variables will not be known until then. Even in the current release of Larceny, all of the code generated for constructors, accessors, and mutators is generated at compile time. None of that code is ever generated at run time. In future releases, an unoptimized record access will consist of a procedure call, a double tag check, an indirect load, an eq? check, and a load. Twobit's existing optimizations, or easy extensions of them, will eliminate any or all of that code when it is safe to so. The code that isn't eliminated by optimization will be generated at compile time. No code will ever be generated at run time. And that's for the procedural layer. There is no earthly reason for a compiler to generate worse code for the syntactic layer than for the procedural layer, or to generate it any later. The parent clause should therefore be used instead whenever possible. This recommendation is based upon a false premise. So What? ======== The substantive changes that were made in the 5.97 draft are immune to meaningful technical review, so why did I write this? Partly to blow off steam, of course, but there were at least three other reasons as well. As Andre van Tonder wrote, the only way for us to register disagreement with changes made in the 5.97 draft is to vote against ratification [9]. Under the rules of that vote, any negative vote must be accompanied by an explanation, so I had to write something like this anyway. I am told that, if this draft is not ratified, the Steering Committee intends to pay a lot of attention to the reasons cited in those explanations. If you vote against ratification for reasons that include some of the issues I have discussed, then you may be able to save some writing by citing this essay. The second reason has to do with what happens after the vote. As I see it, there are three possible outcomes: 1. The vote is negative, which would give the editors an opportunity to get it right. 2. The draft is ratified, and everyone pretends to live happily ever after. 3. The draft is ratified, and the unhappy folk design alternative syntactic layers, probably written up as SRFIs, that build upon the R6RS procedural layer. This little essay of mine might be of some use, or at least have some influence, in the event of outcomes 1 or 3. I don't think outcome 2 is stable in the long run. I think it would evolve into outcome 3. Thirdly, writing this essay gave me a chance to consider whether I still believe what I wrote so long ago. Conclusion ========== Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. R6RS Scheme should demonstrate that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language. William D Clinger 5-9 July 2007 -------- [1] Jonathan Rees and William Clinger [editors]. Revised^3 report on the algorithmic language Scheme. ACM SIGPLAN Notices 21(12), December 1986, pages 37-79. [2] Michael Sperber et al. Revised^5.97 report on the algorithmic language Scheme -- standard libraries. http://www.r6rs.org/versions/r5.97rs-lib.pdf http://www.r6rs.org/document/lib-html-5.97/r6rs-lib.html [3] Andre van Tonder. Rationale issues. Posted to r6rs-discuss, 26 June 2007. http://lists.r6rs.org/pipermail/r6rs-discuss/2007-June/002825.html [4] William D Clinger. Response to [3], 27 June 2007. http://lists.r6rs.org/pipermail/r6rs-discuss/2007-June/002889.html [5] William D Clinger. Record layers are not orthogonal. Formal comment #90, 13 November 2006. http://www.r6rs.org/formal-comments/comment-90.txt [6] It doesn't matter whether the descriptor was created using the syntactic or the procedural layer. This is an example of the interoperability we should have throughout the record system. [7] It is analogous to endianness, buffer-mode, et cetera. [8] Whether the 5.97 draft allows a to be exported from a library may not be entirely clear, but disallowing such exports would be disastrous, so I assume the 5.97 draft is meant to allow such exports. [9] Andre van Tonder. parent-rtd clauses in records. Posted to r6rs-discuss, 3 July 2007. http://lists.r6rs.org/pipermail/r6rs-discuss/2007-July/003071.html _______________________________________________ r6rs-discuss mailing list r6rs-discuss@lists.r6rs.org http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss