Multilingual Programming:
Coordinating Programs, User Interfaces,
On-Line Help, and Documentation

Gary Perlman
School of Information Technology
Wang Institute of Graduate Studies
Tyngsboro, MA 01879 USA
(617) 649-9731
May 1986

Table of Contents

Abstract
Problems with Documentation
Examples
- Experimental Design Specification
  - Observation: Simple Input, Complex Output
- Data Bases of Bibliographic References
  - Observation: Multiple Syntheses of Analyses
- Data Analysis System Interface
  - Observation: Added Consistency and Flexibility
- Option Parser Generator
  - Observation: Enhanced Programmer Productivity While Postponing Standards
- Electronic Survey System
  - Observation: Multilingual Programming as a Method
Formalization
- Properties of Multilingual Programs
  - Generalization & Imagination
  - Flexibility and Resilience to Change
  - Accuracy and Consistency
  - Economy of Expression
Discussion
- Cost/Benefit Analysis
  - Development Cost/Benefit Analysis
  - Maintenance Cost/Benefit Analysis
- Summary and Conclusions
References

An early version of this paper was presented at the ACM SIGDOC Fourth International Conference on Systems Documentation in Ithaca, NY, June 1985.

Abstract

The philosophy behind multilingual programming is that software development must deal evenhandedly with all parts of software products if high quality software is going to be developed economically. The high cost of software is not due to the difficulty of coding, but in recoding and redocumenting software. Many expressions of the same ideas must be constructed and coordinated. Program code and comments, user interface and on-line help, and a variety of off-line documents, all must be consistent. A solution to the coordination problem is presented in this paper. Multilingual programming is a method of developing software that uses a database of information to generate multiple target languages like commented program code, user interface languages, and text formatting languages.

The method begins with an analysis of a domain to determine key attributes. These are used to describe particular problems in the domain and the description is stored in a database. Attributes in the database are inserted in idiomatic templates for a variety of target languages to generate solutions to the original problem. Because each of these solutions is based on the same source database of information, the solutions (documents, programs, etc.) are consistent. If the information changes, the change is made in the database and propagated to all solutions. Conversely, if the form of a solution must change, then only the templates change. The method saves much effort for updates of documents and programs that must be coordinated by designing for redesign.

Keywords: Automatic Program Generation, Automatic Documentation, User Interface, Language Design

Problems with Documentation

There are many types of text associated with software:

the program code for the software,
the comments in the code,
the user interface specification,
on-line error messages,
on-line documentation
off-line reference materials,
off-line quick-reference sheets,

and probably others. A major problem with software documentation is maintaining accuracy and consistency. Accuracy is the coordination of the program code with all other forms of documentation. Does the documentation accurately reflect the input/output behavior of the program? Consistency is the similarity of related document parts. Are examples, options, etc., shown in the consistent formats throughout? These are not easy problems to solve. Here is a typical scenario:

A program is designed and written. Comments are inserted into the program. Preliminary documentation for the program is written, and users give feedback to the developers. New features are put in the program, and some, but not all, of the comments in the code are updated. Some prompts and error messages in the user interface are not changed to reflect the workings of the new program. New documentation is written, after which some user interface prompts are modified. The product is shipped to market.

I contend that the problems of accuracy and consistency can be traced to the wasted dual efforts of programmers and documenters. Traditionally, documentation by programmers have been viewed as inefficient for several reasons:

Programmers do not think documentation is their problem.
Programmers are not interested in writing documentation.
Programmers do not have the time, or their time is considered too valuable, to be writing documentation.
Programmers do not know how to write good documentation.

These are some of the same reasons why programmers are viewed as inefficient workers on user interfaces (Perlman, 1983). Consequently, programmers do not write documentation, except for program comments, and technical writers are hired to write user documentation. Two groups of people work on specifying the same information in different formats for different audiences. Programmers write for compilers and programmers that might work on their code in the future, and documenters write for a variety of user populations. It is a waste of effort to have different people spend their time expressing the same idea in different languages.

This paper presents some practical solutions to the problem of accuracy and consistency of documentation. I will not talk about documentation separated from the issues of programming, user interfaces, or on-line help. These problems must be addressed with a coordinated effort.

Examples

With the following examples, I hope to convey the diverse applications of multilingual programming (MLP) by showing its use in a variety of domains. The technique can be summarized as follows: We begin with analysis of the problem domain, breaking it into small parts. Then we use this analysis to describe a particular problem. At that point, there is an abstract description of the problem. We then synthesize the description into a solution. Because we have a point at which a problem is described abstractly, we can synthesize several solutions.

While the above is abstract, it can be further summarized as analysis followed by multiple syntheses. In the following examples, this pattern is the one to watch for. The method will be formalized later.

Experimental Design Specification

UNIX|STAT (Perlman, 1980) is a compact data analysis system developed at the University of California, San Diego, and at the Wang Institute of Graduate studies. It runs on the UNIX (Ritchie & Thompson, 1974) and MSDOS operating systems. anova, a UNIX|STAT program, does a general analysis of variance. For non-statistically trained people, that means it is used primarily for analyzing data collected from experiments with controlled factors. Traditional ANOVA programs (Dixon, 1975; Nie et al, 1975) require that data be input as a matrix and the description of the experimental design information is in a special language separate from the data. In my experience, this method of experimental design specification leads to confusion and errors when used by inexperienced analysts. The anova program was designed to read self-documented input and from that, infer the structural relationships (the experimental design) in the data.

Each input line to anova contains one datum preceded by the names of the level of factors at which that datum was obtained. For example, suppose we have an experiment testing the effectiveness of two display formats, B&W and color, to two classes of readers, young and old. We present both formats to each reader, and measure comprehension on a percentage scale. Some of the data might look like this:

BamBam   B&W     young   52
BamBam   color   young   78
Fred     color   old     25
Fred     B&W     old     75
Pebbles  color   young   83
Pebbles  B&W     young   65
Wilma    B&W     old     93
Wilma    color   old     58

anova takes this analysis and infers the experimental design by synthesis. There are several points worth noting in the data.

The order of input lines to anova does not matter.
Each line is close to self explanatory; we know that Fred is old and what his scores are for the B&W and color format conditions.
From the data, we can see that every subject saw both format conditions (the factor varies within subjects), but no subject was both young and old (age varies between subjects).
There were four subjects.

The idea behind the anova program is to remove tedious and error prone tasks from data analysts by providing a synthesis of analysis. Given this design information, much of the data analysis process can be automated and verified (Perlman, 1982).

Observation: Simple Input, Complex Output

The ANOVA example shows how a simple input, a relational database containing records describing individual data points, can produce a complex output, automating many details.

Data Bases of Bibliographic References

The references to this paper are stored in a simple database. The format for a record looks something like this:

author    = Perlman G
article   = An Eye For an Eye For an Arm and a Leg
journal   = Journal of Irreproducible Results
date      = 1981
issue     = 4
pages     = 29-30

Records are extracted from a central database and sorted before being formatted for input to the troff text formatting system (Kernighan, Lesk, & Ossanna, 1974). There are several types of publication records in the database: books, journal articles, articles in edited books, technical reports, and so on. For each publication type, a different format is required. The references in this paper are printed in APA format (APA, 1983). Two properties of the formatting might change: the output format, or the text formatter. For example, the ACM uses a different format, and Scribe (Reid & Walker, 1980) and TEX (Knuth, 1979) are other text formatters. With my personal database system, it is a simple translation of one format to another, or of one formatter to another. Templates defining how the records (analysis) are formatted (synthesis) are simply redefined.

Observation: Multiple Syntheses of Analyses

Again stepping back for an overview, this is an example of analyzing a problem into simple parts that are placed in a relational database with sparse records, and synthesizing several different solutions. The solutions here are different reference formats using different text formatting systems. A more general view is that there are multiple views of a database by using a flexible report generation capability.

Data Analysis System Interface

S is a system and language for data analysis (Becker & Chambers, 1984). While at Bell Labs, I developed a high-level user interface to the S language using the IFS (Vo, 1985) user interface language. S is a large system, with over 300 functions, each with about 3-6 options. The system I built (Perlman, 1983) has a screen with a form and a menu for every S function; the menu controls the invocation of the function and the form allows users to supply options. There are over 100 menus arranged in a hierarchy to help users find the functions of interest. In all, there are close to 500 screens, each with menus or forms, and on-line help. In developing this system, I pushed the idea of MLP to new limits, and found it was more powerful than I had anticipated.

It was clear to me that programming 500 screens by hand, even with a high level language like IFS, was going to present problems. User interface design is an iterative process, and if each iteration involved changing hundreds of files containing screen descriptions, then it would be impossible to make many changes. Early in the development, I decided to design a special purpose artificial language (Perlman, 1984) especially suited to designing screens in the IFS language. An artificial language is a special purpose notation for precise and concise communication within a limited domain. My goal was to be able to specify the screen designs with as little syntax as possible. In the words of Tufte (1983), I wanted to minimize the ``ink to data'' ratio and specify only the information that changed from screen to screen. I did not want to repeatedly specify the formatting information because it would have wasted my time and made it more difficult to maintain consistency.

Becker and Chambers had already done much of my work by designing the S interface language using the m4 macro processor (Kernighan & Ritchie, 1980). The S interface language defines attributes of S functions and their options. The most notable are the attributes of options including:

name: the name of the option,
type: the data type of the option value,
size: the dimensions of the option's value,
default: the default value, and
requirement: whether or not the option is required.

Other information, such as the allowable range of options, is coded by hand. Becker and Chambers write this information in a dialect of m4 and use m4 macro definitions to generate RATFOR code for input to a compiler. The format of m4 macros is simple: a macro name is followed by a parenthesized list of comma-separated arguments. For example, the following is an option to the S plot command.

ARG (main, OPTIONAL, CHAR)

In English, the main title of the plot is an optional character vector with no default value.

Missing from the information in the S interface language is on-line help about the purpose of the functions and options. I had to add this information from the S documentation by hand to build the high level interface. Once this was done, all the information about the S functions was parameterized (analyzed) and centralized.

The generation of the screens is straightforward, but there are many details. For each function in an S interface language source file, there is a definition of its name, purpose, etc., and the attributes of its options. This is a relational database of information about each function and their options. From this information, m4 macros are defined to parse the information so that it is available to a code generator. The code generator takes this information and extracts what it needs for different parts of the screen design. The following parts are generated:

declarations: Options are represented as variables that have to be declared.
titles: Forms and menus have prompts based on the names and short descriptions of options.
help: On-line help is extracted from the S manual and coordinated with the screen designs.
validation: Inputs are validated based on the datatype and range of options, and required options must be supplied before a function is allowed to run.

In short, each piece of information about each function is used several times in several contexts.

The code generation can be summarized as follows. Several sources of information are integrated into one consistent database. Information from this database is parsed with m4 and used to fill in the blanks of idiomatic templates (ie. macro definitions) in the IFS language. The same information is used more than once, for help, validation, and for declarations, but each datum comes from one source, thus ensuring consistency.

The result of using m4 macros to design and build the IFS/S interface was beneficial many times over.

Generalization: As an abstract notation, it allowed me to see more relationships than otherwise would have been possible.
Abbreviation: As an abbreviated language, it saved much typing and reading time.
Consistency: As a single source for the design, it allowed extremely consistent development. If a change were made in the design, that change was centralized in the templates, and only a regeneration process was necessary.
Accuracy: All the data substituted in templates for program code and documentation came from one source, thus insuring the accuracy of the documentation.
Flexibility: By localizing all the IFS specific language in the macro definitions, flexibility was gained. During the IFS/S interface development, IFS itself was under development, and several times the IFS language changed so that the whole system was corrupted. Only the macros had to be changed to reconfigure the system, not 500 screen designs. This would have been non-trivial because the generated screen designs contained an average of 400 lines of IFS code, or about 200,000 lines. The screens were so detailed because additions to the design were centralized with a one time cost for each addition. Effort on a screen design was repaid by being multiplied by several hundred screens.

Because the IFS code was separated from the database pf function descriptions, macros could be written to generate text in other languages. Full and quick reference paper documents were created using the troff text formatter, each in a few hours. There was no problem of the accuracy of these documents because they were generated from the same source as the user interface, which was generated from the program code. There was no problem with the consistency of these documents, again because they were all generated with the same macros. Such standardization is especially impressive with such a large system and such detailed documents (one was about 100 pages).

Observation: Added Consistency and Flexibility

The generation of hundreds of thousands of lines of code and several document types is made practical by using the same methods as before: several target languages are generated from a relational database of many records. Given the size of the system, what is particularly impressive is that the user interface and documents are highly consistent and are still flexible; a localized change to the generators changes the whole system uniformly. There is a strong relation between MLP and fourth generation/application generators (Horowitz, et al, 1985; Raghavan & Chand, 1986). With little specification, multiple products (programs, tables, documents) can be generated, although, unlike MLP, these tend to be application specific and less flexible.

Option Parser Generator

SETOPT is a code generator that produces a parser to handle UNIX program command line options (Perlman, 1985). UNIX program options are wildly inconsistent (Norman, 1981), and the efforts of Hemenway & Armitage (1984) to define a syntax standard were accompanied by the development of SETOPT to help develop compliant programs. In addition to ensuring a consistent syntax for command line options, SETOPT deals with on-line help, type checking, input conversions, and range checking. In short, SETOPT aids all aspects of programming command line options on UNIX.

With SETOPT, each option is described with a list of attributes in a format convenient for input to m4 (Kernighan & Ritchie, 1980), a macro processor. For example, a simple text formatting program might take options to control the width of lines, whether lines are numbered, and page headers. With SETOPT, the following could specify these options.

OPT (w, width, Line Width, INT, 0, 72, value>0)
OPT (n, number, Line Numbering, LGL, 0, FALSE)
OPT (h, header, Page Header, STRING, 0, "",
    length(value) < optval(width))

This analysis of the options states that the width option is an integer of dimension 0 (a scalar) whose default value is 72 and whose minimum value is 0. It is set with the -w flag, and its purpose is to set the line width. Note in the previous English explanation how the parameters of the OPT macro can be plugged into a troff (Kernighan, Lesk & Ossanna, 1978) template to provide detail. The same information is used by SETOPT to generate a C language (Kernighan & Ritchie, 1979) parser for handling all aspects of the users interface:

parsing the options on the command line,
validating options and providing standardized error messages,
allowing access to on-line help,
allowing interactive setting of options, and several other capabilities.

Like the IFS/S interface, much effort can be expended because the SETOPT tool can be used with hundreds of UNIX programs.

Again, the process is the same as with the other examples. A domain of application is chosen and analyzed so that the problems in the domain are parameterized. This is the analysis stage. This information is in a database from which several solutions can be synthesized. The synthesis is done by plugging information from the database into templates in different languages: with SETOPT, troff macros to generate UNIX manual entries, and C program code to produce a user interface.

The manual entries generated by SETOPT are not complete, nor what I would call great prose. SETOPT provides a simple scheme to insert explanatory text in different parts of the generated document. It is difficult, but not impossible, to generate smoothly flowing text. Computer program documentation, especially that on program option attributes, does not need to read like great prose. This seems to be a domain where tables are superior to plain text, and where consistency, or in another view, monotony, is desired.

Observation: Enhanced Programmer Productivity While Postponing Standards

The method used in SETOPT is like the previous examples. Simple descriptions of options (problems) are parsed into an accessible format (database) and used repeatedly in each of several target languages: option parser, on-line help, and manual entry. Some software developers fear standards because they can not be sure that the standard will not change. With MLP, programmers can conform to a standard without knowing the rules of the standard. They can be protected against changes in a standard because their only interface is to the database, and the record formats (attributes of options) are stable almost to the point of never changing. It is the code/document generators that contain information about standards and changes to a standard can be encoded centrally. There is a strong analogy with user interface tool development. In the development of user interface management systems, there are two user interfaces: one between the programmer and the user interface tool, and one between the tool and the end-user (Perlman, 1983). With user interface tools, it is easier to standardize the programmer-tool interface than the tool-user interface where new input/output technology and individual differences between users require more flexibility (Perlman, 1985).

Electronic Survey System

Surveys for gathering information can be described with a simple grammar. In an electronic survey system (Perlman, 1985), survey questions are represented as having four basic attributes:

variable

a variable that is set by answering a question,

prompt

a prompt that is presented to a respondent,

help

more detailed information, available on request, about the requirements for the answer, and

type

the type of survey question (e.g., multiple choice, rating scale, etc). Based on the question type, other parameters might also be supplied. For example, a minimum and maximum value might be supplied for a Thurstone scale question of the form:

Rate on a scale from minimum to maximum...

Based on these parameters, a question database is constructed, and from it, C program code (Kernighan & Ritchie, 1979) is generated to administer the survey. By changing the templates from which the program code is generated, troff text formatting commands are used instead to generate a paper survey. Some work was done to generate a form based survey system using the Rapid/USE prototyping tool (Wasserman, 1979). Once again, several different synthetic solutions to problems are formed from the same analysis.

Observation: Multilingual Programming as a Method

By now the recurring themes of the examples should be clear, and we are ready to formalize the characteristics of the method of multilingual programming.

Formalization

Each of the previous five examples show the same process, depicted in Figure 1. First, an abstraction of a domain is used to analyze a problem. This analysis results in a source database of information representing the problem from which solutions can be constructed by synthesis. The information is plugged into idiomatic templates to generate instances in several classes of target languages: text formatters, report generators, programming languages, and user interface management systems; hence the name multilingual programming. For each class of target language, there are several possible specific languages. The results of the syntheses can include program code, program comments, user interface code, on-line documentation, and off-line documentation. In this section, I will attempt to describe MLP more formally.

Figure 2 is a graphical representation of the process of MLP. At the top of the Figure are two shapes representing instances in a specific subject domain. An analysis of the instances shows that each has five key concepts in the domain. This pattern is formalized and the information from those five key concepts is extracted and parameterized in a relational database (depicted in the center of Figure 2) to form one source of the information. From this database, several different views or solutions are possible, each being a synthesis of the information in the database, shown at the bottom of Figure 2.

It is not necessary that all information in the database be used in forming a synthetic view. In declaring variables in a programming language, a help string is not necessary, although it is customary to put that information in comments next to the code that is generated. The synthesis on the lower right of Figure 2 does not contain the information shaded with vertical lines.

It is possible to use the same information (always from the same source) more than once. In generating printed documentation, it is a good idea to provide several levels of detail:

quick reference,
a table of attributes, and
detailed information.

The same information might go into each of these, although more would go into the detailed documentation.

When real systems are being developed, these views evolve through an iterative elaboration and refinement process. Consider the development of a user interface system. The templates might begin by scavenging an existing piece of code, parameterizing some parts. A first generation user interface might not check ranges of input values. A second generation user interface might check ranges, but not provide diagnostic error messages. The flexibility of MLP allows developers to address unanticipated needs flexibly and gradually work toward a better system. Note that all the while, the consistency of the system is maintained by generating text based on a single database with the same templates. Change is localized in the templates, thus minimizing effort.

Abstractions

In describing Figure 2, I did not tell how one would notice that several instances share common concepts. I do not know how this can be done in general, except by experience. It was only after writing the troff text commands to format hundreds of references that I noticed I was wasting my time doing the same action repeatedly and that changes in format would be difficult. With experience with similar tasks, a person's performance improves, which is a hint that repeated actions can be automated. There are some psychological theories of how people judge similarity (Tversky, 1977) and how we use analogy (Rumelhart & Norman, 1981) to discover patterns, but no practical methods are known.

Idiomatic Templates

A template is an abstraction of an idiomatic pattern of text that frequently occurs in a specific target language like a text formatting or programming language. Templates have slots where variables are inserted to form instances in the target language. For example, in the C programming language, a programmer might begin defining the square root function like this: /* sqrt: square root */ double sqrt (x) double x; /* must be non-negative */ The documentation for sqrt might look like this:

                   TYPE   COMMENT
FUNCTION   sqrt  double   square root
ARGUMENTS 
           x     double   must be non-negative

and be based on some troff formatting macros (defined elsewhere) like:

 .FN  "sqrt"  "x"  "double"  "square root"
 .AG  "x"  "double"  "must be non-negative"

The idiomatic templates for each language abstract the parts that remain constant across uses. Note that they contain the same information plugged into different, but corresponding slots.

C:
	/* purpose */
	type function (arguments)
	type argument; /* comment */
troff:
	.FN "function" "arguments" "type" "purpose"
	.AG "argument" "type" "comment"

Without a convention, there is no way to determine the referents of the comments. Enforcement of the convention is difficult if the convention is not supported with tools. Tools supporting different types of comments and source code parsing are still impractical in large projects because of the need for coordination with other texts like printed documentation.

Database of Attributes

The information from the previous example can be parameterized by analysis using a set of attributes:

function   = sqrt
purpose    = square root
type       = double
argument   = x
type       = double
comment    = must be non-negative

and put into a database with two relations, one for functions and one for arguments. This information is target-language independent, somewhat object oriented, implying that a person does not need to know the syntax of any language to program or write documentation when programming multilingually. Information needed for code generation or documentation can be extracted and plugged into slots in templates. Language specific syntax information is held in the templates.

It can be difficult to write text, especially phrases, like the purpose and comment above, because the same information will have to fit into many templates. There is some virtue in the difficulty, because it forces using consistent formats (e.g., the tense and voice of all phrases must agree).

Multiple Target Languages

Once information is in a database, many views of the database are possible. It is only by changing the definitions of the views, by modifying or substituting the templates, that different target languages can be generated. Each target language is based on the same source of information, and so is consistent with the others.

Text Generators

It is not mandatory that macros be used when building templates. There are several reasons why macros are preferable to more common language extensions like functions, and more common language generators like a high-level language.

When macros are used, it does not matter if the target language has a function definition capability. A good macro processor can extend any language.
Macros do not need to adhere to the syntactic rules of the target language. Default values can be inserted to function calls, and variable names and values can be combined with string operations.
Macros are easier to write than more complex text generators like compilers. The parsing of macro parameters is supplied in most macro processors.
General macro processors, like the m4 macro processor (Kernighan & Ritchie, 1980) offer all or most of the capabilities needed for building templates. m4 supports macro definition, parsing of parameters, string manipulation, condition testing, iteration through recursive macros, and arithmetic.

Code generators, especially macro processors like m4, are not without their problems.

The Quoting Problem. Recursive evaluation of macros makes the quoting problem difficult to master. It can take a long time to learn how to get nested recursive macros substituted (to avoid quoting) and how to delay or stop the substitution (to use quoting). It takes macro programmers a long time to understand the problem and learn habits that circumvent the problem.

Pretty Printing. The output from text generators is often syntactically correct for the target language, but an ugly sight to the human eye. The output from macro substitutions contain everything in the definitions of the macros. This includes any white space to make the macro definitions more readable. Unlike most programming languages, structured macro writing style conflicts with functionality, especially for templates of text formatting languages for later human viewing. The solution seems to be to use a post-processor, a prettyprinter, to reformat the macro processor output for input to a target language processing system. Often, this involves stripping off leading space on lines and removing blank lines.

Properties of Multilingual Programs

Generalization & Imagination

The following quote from Chapter 5 of Whitehead (1911) leads into one advantage of MLP.

By the aid of symbolism, we can make transitions in reasoning almost mechanically by the eye, which otherwise would call upon higher faculties of the brain. By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems.

By parameterizing problems by analysis, a notation is established, and our ability to see new relationships and form new syntheses is enhanced.

Flexibility and Resilience to Change

A small change in a program, such as changing the type of a variable from an integer to a real should not require a huge effort. Most current practice requires many changes:

the declaration,

the program comment,

the user interface to read the variable,

on-line help and error messages, and

user manuals. It is not surprising that most of the cost of software is in maintenance and changes to working software. MLP is a method for making software more flexible, allowing people to design for redesign.

MLP is resilient to changes of standards and software tools. Personal experience taught me this well. While working on a system written in a user interface language, the definition of the user interface language changed, leaving me with hundreds of thousands of lines of unworking code. Because I had generated the user interface language from a database, I avoided many hours of work by some minor changes to some templates.

Accuracy and Consistency

Much of the documentation and many program comments I read are inaccurate. This could be attributed to carelessness, but I think that would avoid confronting the problem. Text (comments and manuals) written about other text (program code), by hand, is going to lag behind, and updates can be forgotten. Also, text written about other text, by hand, can be inaccurate because people make different inferences from the same information. MLP promotes accuracy by automating the updates and removing chances for misinterpretation.

Once a document (user interface) exists, it meets or sets a standard format for related documents (software). The format of related documents (user interfaces) should be consistent so that people can learn based on their experience, not in spite of it. Analogy is a powerful human learning mechanism (Rumelhart & Norman, 1981), and we should take advantage of it.

Economy of Expression

Finally, MLP supports abbreviation. Information in a database is about as abbreviated as possible, this information is crossed, in the Cartesian set theoretical sense, with templates for each language, thereby multiplying productivity.

Discussion

Choosing the Appropriate Focus

Hester, Parnas, & Utter (1981) suggest that documentation of systems should precede any development, and others have suggested that user interfaces should be designed first. The motivation for writing documentation first is to write correct code efficiently, and the motivation for writing user interface specifications first is to ensure that programs are easy to use. These are good motivations, but show how good ideas can compete for attention. The solution is to work on both problems at the same time by analyzing the problem so that documentation, user interfaces, code, and so on, are treated as equally important parts of software products that require coordination.

There are problems with choosing a target language, documentation, programming, user interface, or whatever, as the source of information for other target languages. For example, writing documentation from program code is error prone and expensive. When target languages are used as source databases, they are almost always strained to accommodate the other languages. For example, the writing style tools of the Writer's Workbench (Frase, 1983; Macdonald, 1983) use troff text formatting macros as a text structuring language and try to infer structure based on formatting instructions. This is the opposite of the desired process, that format should reflect content. Much of the time, inferring structure from formatting language works well, especially if writers use a particular high level set of macros developed at Bell Labs, but sometimes writers find themselves trying to fool the analysis tools.

It does not make sense to put one part of a programming system over another. Neither a good program with poor documentation nor a bad program with good documentation are acceptable. The implementation of programs, the development of user interfaces, the writing of documentation, all must be coordinated.

Multiple Views of Programs

Knuth (1982) developed the WEB system that combines program code with structured program comments so that both can be extracted for input to his TEX formatter (Knuth, 1979) or just the program code can be extracted for the compiler. It is a system for printing beautiful program listings with minimum programmer effort. While this process is similar to the one described here, the WEB system does not use analysis of problem domains to the same extent, nor does it allow for the use of parameterized information for domains outside programming, like documentation and user interfaces.

Natural Language

Natural language systems such as those of Schank (1979) are able to generate paraphrases of their inputs in several languages. Although an impressive feat, the hard part, according to Schank, is to understand the original input and represent it in a data structure. Once that is done, the generation of paraphrases works on the same principle as in this paper. In the examples described here, the problems of parsing the input are trivial compared to those faced by cognitive scientists studying natural language understanding.

Cost/Benefit Analysis

In this final section, I try to answer when multilingual programming pays off. MLP requires planning on a larger scale than is customary. To implement that plan, there is the overhead of learning about generating templates. To offset that cost, there have to be benefits. MLP is especially suited to large projects or ones where a coordinated solution is desired.

Suppose that in a domain we have D documents (bottom, Figure 2) like program text, manuals, etc., that contain a total of A attributes (middle, Figure 2) to describe P problems (top, Figure 2). If any of these is large, then MLP is economical, but for different reasons. P*D solutions are generated, each of which is proportional to A, making a complete solution proportional to P*D*A .

Development Cost/Benefit Analysis

Using traditional methods, the cost of developing P*D documents is proportional to P*D*A. Using MLP, the cost is D times the cost (Ct) of developing templates for each document type, D*Ct, plus P times the cost (Cp) of describing the attributes of each problem, P*Cp. Both these have sizes proportional to A, the number of attributes, so the total cost under MLP is (D*Ct + P*Cp)*A. In short, a multiplicative function has been replaced with an additive function with larger factors for each addend.

Another way of appreciating at the benefits of MLP is to note that:

Adding 1 new template is rewarded by the free addition of P new solutions.
Adding 1 new problem description is rewarded by the free addition of D new solutions.

By free I mean that the cost in human effort is small, although the cost in computer resources may be large. The larger P or D get, the larger the multiplicative factor of the benefit of MLP. If P or D is small, then MLP may not be worth the trouble of learning and using the method. If P, the number of problems, is large, then MLP provides flexibility for change and abbreviation. If A, the number of attributes, is large, then MLP aids possibilities for generalization, flexibility for change, and accuracy. If D, the number of documents, is large, then we aid the accuracy of the documents, and help reduce human effort by abbreviation.

Maintenance Cost/Benefit Analysis

Thus far, I have only discussed the initial cost of MLP, which for domains with few required solutions, is higher than traditional methods. The cost/benefit analysis for maintenance is different, and it should be addressed because, as discussed in software engineering texts like Zelkowitz, et al (1979), Boehm (1981), and Fairley (1985), the major cost in software is in maintenance.

Consider the cost of changing an attribute of a problem, a simple example of which might be to change the default value and type of some user interface variable (e.g., a program option). In the program source code, we have to change at least one constant definition, one type declaration, type conversions (string format to the data type and back), and perhaps also some comments. In the documentation, this information must be propagated throughout all documents, where the use of symbolic constants is less likely. In short, a small change that can be described with a couple of statements has turned into an hour of uninspiring and probably error-prone busy-work.

Now consider the cost of changing the format of a screen display, in a system that has, say, hundreds of screens. In the MLP case, the change is made in one place and the result is propagated throughout the system with the major cost being computer time, not human time. The change is made uniformly, and the need for retesting is minimal compared to the tedious screen-by-screen tweaking and viewing by human labor.

The benefits of MLP to the maintenance stage of the software lifecycle are often overwhelming, as shown in these examples, and as experienced in the IFS interface to the S statistical system (Perlman, 1983), which involved three target solutions (IFS user interface, long manual, and short reference) for over 300 problems.

Summary and Conclusions

Multilingual programming is a method in which

problems are analyzed, resulting in simple descriptions that are
placed in a database
from which several solution texts can be synthesized.

We always do the analysis in understanding a problem, but with MLP it is explicit enough to be in database records. The synthesis, if done by traditional human labor, is less regular; documents, on-line help, or error checking are often lacking and sometimes missing because of laziness or forgetfulness or incompetence.

The key concept in MLP is that there is one source of information from which all representations are generated. MLP aids programmer/writer productivity by reducing the amount of repetitive work that must be done by skilled practitioners and by multiplying that effort. MLP supports standards by using algorithmic generators yet provides flexibility for even large systems because changes are centralized in the database descriptions and in the text generators. The flexibility feature is especially useful in fuzzy areas like user interface development where terms like iterative design and rapid prototyping are euphemisms for We don't know what we're doing so we'll try something and work from there.

To develop high quality software, we must be willing to plan to coordinate all parts of software products: specifications, code, comments, user interface, online documentation, error messages, short and long user manuals, and so on. With a multilingual programming strategy, accuracy, consistency, flexibility, and economy are by-products of acknowledging the need for coordination.

References

APA, (1983) APA Publication Manual. (3rd Edition). Washington, DC: American Psychological Association.
Becker, R. A., & Chambers, J. M. (1984) Design of the S System for Data Analysis. Communications of the Association for Computing Machinery, 27:5, 486-495.
Boehm, B. W. (1981) Software Engineering Economics. Englewood Cliffs, NJ: Prentice-Hall.
Dixon, W. J. (1975) BMD-P Biomedical Computer Programs. Berkeley, CA: University of California Press.
Fairley, R. E. (1985) Software Engineering Concepts. New York: McGraw-Hill.
Frase, L. T. (1983) The UNIX Writer's Workbench Software: Philosophy. Bell System Technical Journal, 62, 1883-1890.
Hemenway, K., & Armitage, H. (1984) Proposed Syntax Standard for UNIX System Commands. In Summer USENIX Conference. El Cerito, CA: Usenix Association.
Hester, S. D., Parnas, D. L., & Utter, D. F. (1981) Using Documentation as a Software Design Medium. Bell System Technical Journal, 60:8, 1941-1977.
Horowitz, E., Kemper, A., & Narasimhan, B. (1985) A Survey of Application Generators. SOFTWARE, 2:1, 40-54.
Kernighan, B. W., Lesk, M. E., & Ossanna, J. F. Jr. (1978) Document Preparation. Bell System Technical Journal, 57:6.2, 2115-2136.
Kernighan, B. W., & Ritchie, D. M. (1979) The C Programming Language. Englewood Cliffs, NJ: Prentice Hall.
Kernighan, B. W., & Ritchie, D. M. (1980) The m4 Macro Processor. Murray Hill, NJ: Bell Laboratories.
Knuth, D. E. (1979) TEX and METAFONT: New Directions in Typesetting. Bedford, MA: Digital Press.
Knuth, D. E. (1983) Literate Programming. Stanford University Report STAN-CS-83-981.
Macdonald, A. H. (1983) The UNIX Writer's Workbench Software: Rationale and Design. Bell System Technical Journal, 62, 1991-2008.
Nie, H. H., Jenkins, J. G., Steinbrenner, K., & Bent, D. H. (1975) SPSS: Statistical Package for the Social Sciences. New York: McGraw-Hill.
Norman, D. A. (1981) The Trouble with UNIX. Datamation.
Perlman, G. (1980) Data Analysis Programs for the UNIX Operating System. Behavior Research Methods & Instrumentation, 12:5, 554-558.
Perlman, G. (1982) Data Analysis in the UNIX Environment: Techniques for Automated Experimental Design Specification. In K. W. Heiner, R. S. Sacher, & J. W. Wilkinson (Eds.), Computer Science & Statistics: Proceedings 14th Symposium on the Interface.
Perlman, G. (1983) The Interface Arsenal: Software Tools for User-Program Interface Development. In Summer USENIX Conference. El Cerito, CA: Usenix Association. (Dallas, TX.)
Perlman, G. (1983) The Design of an Interface to a Statistical System. Murray Hill, NJ: Bell Laboratories.
Perlman, G. (1984) Natural Artificial Languages: Low Level Processes. International Journal of Man-Machine Studies, 20, 373-419.
Perlman, G. (1985) An Overview of the SETOPT Command Line Option Parser Generator. In Winter USENIX Conference. El Cerito, CA: Usenix Association. pp. 160-164.
Perlman, G. (1985) Electronic Surveys. Behavior Research Methods, Instruments, & Computers, 17:2, 203-205.
Perlman, G. (1985) Presentation at the User Interface Standards Meeting at the ACM CHI '85 Conference on Human Factors in Computing Systems.
Raghavan, S. A., & Chand, D. R. (1986) Application Generators & Fourth Generation Languages. Tyngsboro, MA: Wang Institute of Graduate Studies.
Reid, B. K., & Walker, J. H. (1980) Scribe: Introductory User's Manual. Pittsburgh, PA: Unilogic.
Ritchie, D. M., & Thompson, K. (1974) The UNIX Time-Sharing System. Communications of the Association for Computing Machinery, 17:7, 365-375.
Rumelhart, D. E., & Norman, D. A. (1981) Analogical Processes in Learning. In J. R. Anderson (Ed.), Cognitive Skills and Their Acquisition. Hillsdale, NJ: Lawrence Erlbaum Associates.
Schank, R. (1979) Presentation at the Second Annual Cognitive Science Society Meeting.
Tufte, E. R. (1983) The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Tversky, A. (1977) Features of Similarity. Psychological Review, 84, 327-352.
Vo, K. P. (1985) IFS - A Tool to Build Integrated, Interactive Application Software. AT&T Technical Journal, 64:9, 2097-2117.
Wasserman, A. I. (1979) USE: A Methodology for the Design and Development of Interactive Information Systems. In H. J. Schneider (Ed.), Formal Models and Practical Tools for Information System Design. Amsterdam: North-Holland. pp. 31-50.
Whitehead, A. N. (1911) An Introduction to Mathematics. London: Oxford University Press.
Zelkowitz, M. V., Shaw, A. C., & Gannon, J. D. (1979) Principles of Software Engineering and Design. Englewood Cliffs, NJ: Prentice-Hall.