Little Languages

Presented by Mike McHenry
Transcribed by Owen Landgren

Before the presentation of papers began, Mike put forward a pair of definitions ascribed for a Little Language and a Domain-Specific Language. A Little Language(hereafter LL) was described as a language wherein one can express anything, but the language is tailored towards a particular set of concepts. A domain-specific language was referred to as a language which can perform only the actions within a domain. Several members of the audience disagreed on this point.

Other conditions were put forth for a LL - a language which can be written by those people who don't need the full power of a complete programming language. A question was put forth asking how a little language differed from a library. It was noted that little languages have their details exposed, where libraries do not. Further, the programs written in a LL are tied to the semantics of the language, and these semantics can carry implicit information about the domain which the LL is designed to solve.

In 1986, Bentley publishes a paper describing the PIC language. Used to describe diagrams, PIC code looks something like:
ellipse "source" "code"
box "compiler"
ellipse "object" "code"
The point was strongly emphasized that the fact that PIC is a language and not a library allows the user to abstract away some of the detail, a so-called "Encapsulation of Knowledge." The same code in Pascal would require positioning, and be much more error-prone due to the need to synchronize the various points in space.

Again, the question was brought up as to the difference between a library and a little language. The key rebuttal here is that a library would have to perform global transformation on the code it is used inside to give the user the same flexibility. A library forces you to describe your problem as data that can be manipulated, but a little language describes the problem itself and thus escapes data representation. In the words of Guy Steele, "Behind every API lurks a programming language."

After explaining PIC, Bentley demonstrates the power of the LL approach by building two other languages on top of PIC, Scatter and CHEM. Scatter is a simple language for drawing scatter plots, and is apparently implemented compactly in 24 lines of awk. CHEM is a LL designed for chemists to describe atomic structures, and is implementable in 300 lines of awk and 70 lines of PIC macros. The advantage that Bentley claims is one of terseness - that concepts that would otherwise be overly complex to express are made simple by the layering of implicit knowledge. Further supporting this approach, Bentley notes that a shell is simply another little language, and his languages such as CHEM and PIC are glued together with this language.

Others, however, disagree as to the effectiveness of the little language approach. Van Deursen '98 does a software engineering study on the use of little languages in bank software. A simple scripting language, Risla, and a complex query language, RisQuest, for similar financial data are linked together with a third language, ToolBus. However, a problem quickly becomes apparent - whenever either Risla or RisQuest wishes to add something to its data representation, ToolBus must be modified to both accept that new form of data, and also convert that form of data into something which is meaningful for the other language.

Van Deursen concludes that too much data is required to correctly design a little language, as evolving them results in exponential blowup if you don't know the things you'll need in the domain. He instead argues that an object-oriented framework is a better solution for unspecified reasons. Another point crops up in the discussion here, that languages are "Special Purpose" if they glue together concepts purely within computer science, and "Domain Specific" if they mesh a concept from outside CS with computer science. The example of a little language that is special purpose is the regular expression syntax inside Scheme.

In '99, Batory tries to reduce the amount of work it takes to generate a little language, and comes to the conclusion that most of the tools built into a little language are general programming constructs provided by most other languages, and that the correct approach is to follow the principles of embedding espoused by the following papers.

The roots of the embedding approach are traced back to Felleisen '85, where he publishes a transliteration of Prolog to Scheme. The main benefit put forth in this paper was a clear channel of communication between the two languages, but it had the additional benefit of being the shortest published implementation of Prolog at the time. Being embedded in Scheme made it easier to extend and manipulate than a traditional implementation.

Eleven years later, Shivers '96 came forward with SCSH, the Scheme Shell. Here, Shivers attacked the traditional LL approach as ugly, inconsistent, and lacking the ability to translate the implicit knowledge encapsulated by each LL to other areas. Here, a brief digression into the state of education in computer science took place, with the gist of the discussion being that things that look alike(i.e. Java and C++) are not necessarily alike, and that training programmers to think in such terms as a black eye on the face of the community as a whole.

Shivers solves the problem of communicating between different little languages by embedding them all within a single language, Scheme. In addition, this embedding allows for increased readability and power, as programmers are no longer limited to solely the constructs created by the little language. At this juncture, the question of whether or not we've lost the benefits of a little language is brought up. The response here is that we haven't, as if the programmer wants to stay in a high-level world, he can program only in the little language. However, if he wants additional power he is free to drop into the connective tissue and add to these techniques.