Universal Systems Language: Lessons Learned from Apollo

Margaret H. Hamilton and William R. Hackler, Hamilton Technologies, Inc.

An inordinate amount of money is spent in projects where system design and software development play a key role, huge portions of it wasted, and critical systems run the risk of failure, sometimes leading to a major catastrophe. This occurs in large part because of the “after the fact” paradigm on which the languages used to define systems are based.

The assumption made here is that system engineers and software developers can significantly reduce the well-known problems associated with doing business as usual by using a language based on a radically different approach, one that is preventive instead of curative. The Universal Systems Language is such a language. Based on systems theory — to a great extent derived from lessons learned from the Apollo onboard flight software effort — USL has evolved over several decades and taken on multiple dimensions. Its purpose has been to solve problems considered next to impossible to solve with traditional approaches, at least in the foreseeable future.

According to users, USL eliminates any preconceived notions because it is a world unto itself — a completely new way to think about systems. Instead of object-oriented and model-driven systems, the designer thinks in terms of system-oriented objects (SOOs) and system-driven models. Much of what seems counterintuitive with traditional approaches, which tend to be software centric, becomes intuitive with this systems-centric approach.

USL Design Objectives

USL was created for designing systems with significantly increased reliability, higher productivity, and lower risk. Hamilton and Hackler designed it with the following objectives in mind:

Reduce complexity and bring clarity into the thinking process
Ensure correctness by inherent, universal, built-in language properties
Ensure seamless integration from systems to software
Develop unambiguous requirements, specifications, and design
Ensure that there are no interface errors in a system design and its derivatives
Maximize inherent reuse
Ensure that every model captures real-time execution semantics (for example, asynchronous and distributed)
Establish automatic generation of much of design, reducing the need for designers’ involvement in implementation details
Establish automatic generation of 100 percent, fully production-ready code from system specifications for any kind or size of software application
Eliminate the need for a high percentage of testing without compromising reliability

Part I: Apollo Beginnings

USL had as its origin the study of Apollo flight software development. The primary questions were, “What could we do better for future systems?” and “What should we keep doing because we are doing it right?” The team analyzed almost every aspect of the flight software. Naturally, this study reaffirmed some earlier assumptions about systems and software, called into question others, and added new ones.

Apollo was the ideal environment for jump-starting a “never-in-the-box” technology. There was no school to attend or field to learn what today is known as “software engineering” or “systems engineering.” When there were no answers to be found, at times the team just had to make it up, and they had to design things to work the first time. Many on the team were fearless twenty-something-year-olds, and dedication and commitment were a given, but there was no time to be a beginner. Learning was by doing, and a dramatic event would often dictate change.

Because software was a mystery, a black box, upper management gave the team total freedom and trust. Mutual respect was across the board. What would later become foundations for USL enabled the Apollo team to create the software for the trip to the moon.

Because system engineers threw requirements over the wall to software developers, engineers and developers necessarily became interchangeable, as did their life-cycle phases — suggesting that a system is a system, whether in the form of higher-level algorithms, software that implements the algorithms, or systems that execute them. From this perspective, system design issues became one and the same as software, reinforced by the fact that entire missions were tested by software simulations integrating hardware, software, the universe, and humanware (for example, astronauts).

Expect the Unexpected

It quickly became clear that nothing and no one could be expected to be perfect. The team learned to plan accordingly.

Apollo 11 — Just before landing on the moon, onboard software discovered that the CPU was fast approaching overload and there would not be enough time to perform landing functions unless emergency steps were taken. With the software’s global error detection and recovery mechanisms, nominal displays were interrupted with priority alarm displays. Every time the CPU approached overload, the software cleared out its entire queue of processes and restarted its functions, allowing only the highest priority processes to perform until the landing was completed. The source of the error was later found to be the astronaut checklist document instructing the astronaut to place the rendezvous radar hardware switch in the wrong position, thus stealing valuable CPU time. The mechanisms the software used for this emergency were thought by many to have saved the Apollo 11 mission.

Apollo 12 — Just prior to liftoff, lightning struck the spacecraft twice, each time causing a computer power failure. Again, the software restarted the mission functions in time for liftoff.

The flexibility required of these missions could not have been accomplished in real time without an asynchronous, multiprogramming operating system where higher priority processes interrupt lower priority processes. Assigning a unique priority to every function in the software was critical for ensuring that all events would take place in the correct order and at the right time — for example, turning the engine on or off or ensuring that the priority displays would interrupt normal mission sequences in an emergency.

To the team’s surprise, changing from a synchronous OS used in unmanned missions to an asynchronous OS in manned missions supported asynchronous development of the flight software as well. In essence, the development process — a system in itself — inherited the same philosophy of “expect the unexpected” embodied in the system it developed. The team also established that a system-wide “kill and start over again” recompute approach to error detection and recovery was far superior to a point-repair and “pick up from where you left off” approach.

Context-Shifting as Problem-Solving

Often, a problem that at first seemed impossible was eventually solved by changing its context. It seemed unthinkable to define and provide error detection and recovery for every potential cause — for example, the two successive lightning strikes that shut down Apollo 12’s computer systems prior to launch. The solution was to determine general ways in which hardware or software could be affected (for example, by a power outage triggered by one of many causes), reducing the problem to a small, finite number of predictable things to check for.

Moreover, this approach provided new assurances that certain errors could be eliminated early in the life cycle — or even prevented — simply by adding rules used at definition time (for example, always assign a name directly to logic to be invoked, instead of referring to it relative to other logic; for example, refer to Sally instead of Fred + n). This eliminated the problem that would occur in the event that either logic’s location relative to the other would later be changed and as a result the logic in question would be referred to incorrectly, possibly resulting in dire consequences.

Apollo 14: The Lock Mechanism Dilemma

Better can sometimes become the enemy of good. Lock mechanisms preventing human operators from entering an input error might also eliminate the possibility of fixing an unanticipated problem during a mission by going through the back door.

On Apollo 14, erroneous hardware signals were misleading the software, and it became necessary to manually intervene in real time to “fool” the software so that it would ignore the signals. The change, made at the eleventh hour by the developers working closely with the astronauts through Mission Control, would go against the software specification but would remain consistent with the original intent of the system requirements at large. After two attempts, the new change finally worked in simulations on the ground and was uploaded to the spacecraft, saving the mission with minutes to spare.

Clearly, the team needed a way to “have our cake and eat it too” — built-in lock mechanisms that would not interfere in this kind of an emergency.

Fascination with Errors

Because of the never-ending focus on making everything as perfect as possible, there was an ongoing fascination with errors: finding them, detecting and recovering from them, handling them, preventing them, learning from them, learning about systems from them — even defining what an “error” is (or isn’t). The team determined that they could not measure a system’s reliability until they defined a formal, agreed-upon general concept of “error,” along with all of its implications.

Error was defined in terms of system viewpoints (for example, requirements versus specification versus implementation), programs (lunar excursion module versus command module versus commonware); categories (system “glue” versus powered flight); weight (catastrophic versus FLTs, or “funny little things”); how to determine the source or cause of an error (for example, software versus hardware); kind of error (timing); and when an “error” is really an error or a “new feature” (or, for example, if two errors cancel each other, is there an error?). A standard process was developed for recording and relating to every error, including its history — for example, in what part of the life cycle it was created and found and, accordingly, what could be done to prevent it in the future.

The Interface Error Finding

Earlier ideas for a systems technology began to surface as the team analyzed the kinds and causes of software problems found during verification and validation (V&V) testing of the Apollo onboard software. Because of Apollo’s software design and development processes, at the outset the team faced the likelihood of introducing almost any conceivable error — in hindsight, a blessing in disguise. This was due in part to size constraints in the hardware, which made it necessary for mission phases to share erasable memory. In addition, the flight software for each mission was developed concurrently with flight software for other missions, along with mission planning, hardware integration, simulators, and astronaut training — underscoring how much the software was part of a larger system.

Additional findings from the error analysis:

Although half of the billions of dollars spent on the life cycle was devoted to simulation, 44 percent of errors were found by manual means — referred to on the project as the “Augekugal method” (eyeballing) or “Nortonizing” (named after the person who perfected this technique).
60 percent of errors found during V&V had unwittingly existed in previous flights — showing how subtle they were — though, fortunately, no software errors surfaced during actual flights.
More automation was needed, especially static as opposed to dynamic analysis.

The interface errors were analyzed in greater detail first because they not only accounted for the majority of errors, they also were often the most subtle and most difficult to find. Each interface error was placed into a category identifying the means to prevent it by way of system definition. This process led to a set of axioms forming the basis for a new mathematical theory for designing systems that would, among other things, eliminate the entire class of interface errors just by the way a system is defined.

Lessons That Continue

Given the ongoing evaluation of the Apollo effort, it became clear that a new kind of language was needed and that the mathematical theory could provide its core. Results of the analysis took on many dimensions, not just for space missions but for applications in general, and not just for software but for systems in general.

Lessons learned from this effort continue today: Systems are asynchronous, distributed, and event-driven in nature, and this should be reflected inherently in the language used to define them and the tools used to build them. This implies that a system’s definition should characterize natural behavior in terms of real-time execution semantics, and designers should no longer need to explicitly define schedules of when events are to occur. Instead, events should occur when objects interact with other objects so that by defining such interactions the schedule of events is inherently defined.

Most important, it became clear that the root problem with traditional approaches is that they support users in “fixing up wrong things” rather than in “doing things the right way in the first place.”

Part II: Universal Systems Language

USL captures the lessons learned from Apollo. What sets USL apart is the systems paradigm on which it is based. Whereas the traditional software development approach is curative, testing for errors late into the life cycle, USL’s development-before-the-fact philosophy is preventive, not allowing errors in the first place. Correctness is accomplished by the very way a system is defined, by built-in language properties inherent in the grammar.

A USL definition models both its application (for example, an avionics or banking system) and properties of control into its own life cycle. Each SOO definition has built-in constraints that support the designer and developer, yet they do not take away flexibility in fulfilling requirements. A SOO inherently integrates all aspects of a system (for example, function-, object-, and timing-oriented). Every system is an object, every object a system.

Unlike formal languages that are not friendly or practical, and friendly or practical languages that are not formal, its users consider USL to be not only formal but also practical and friendly. Unlike other mathematically based formal methods, USL extends traditional mathematics with a unique concept of control: universal real-world properties internal to its grammar — such as those related to time and space — are inherent, enabling USL to support the definition and realization of any kind or size of system. The formalism along with its unfriendliness is “hidden” by language mechanisms derived in terms of that formalism.

General Systems Theory

A formalism for representing the mathematics of systems, USL is based on a set of axioms of a general systems theory and formal rules for their application. All representations of a system are defined in terms of a function map (FMap) and a type map (TMap). Every SOO is defined in terms of a set of FMaps and TMaps.

Three primitive structures, derived from the set of axioms, and nonprimitive structures derived ultimately in terms of the primitive structures specify each map. Primitive functions, corresponding to primitive operations on types defined in a TMap, reside at the bottom nodes of an FMap. Primitive types, each defined by its own set of axioms, reside at the bottom nodes of a TMap. Each primitive function (or type) can be realized as a top node of a map on a lower (more concrete) layer of the system.

Six Axioms of Control

We must visualize a system definition both by what it does (level by level — a parent node in a hierarchy is on a higher level than its children nodes) and how it does it (layer by layer — a specification is on a higher layer than its implementation). However, a hierarchical definition runs the risk of not being reliable unless there are explicit rules that ensure each decomposition is valid.

At the base of every USL system is a set of six axioms — universally recognized truths — and the assumption of a universal set of objects. The axioms provide the formal foundation for a USL “hierarchy” — referred to as a map, which is a tree of control that spans networks of relations between objects. Explicit rules for defining a map have been derived from the axioms, where — among other things — structure, behavior, and their integration are captured.

Resident at every node on a map is the same kind of object (for example, a function on every node of an FMap and a type on a TMap). The object at each node plays multiple roles; for example, the object can serve as a parent (in control of its children) or a child (being controlled by its parent).

Each axiom defines a relation of immediate domination of a parent over its children. The union of these relations is control. Among other things, the axioms establish the relationships of an object for:

Invocation in time and space
Input and output (domain and codomain)
Input access rights and output access rights
Error detection and recovery
Ordering during developmental and operational states

Every system can ultimately be defined in terms of three primitive control structures, each derived from the six axioms — resulting in a universal semantics for defining systems.

The Three Primitive Control Structures

A structure relates each parent and its children according to the set of rules derived from the axioms of control. A primitive structure provides a relationship of the most primitive form (finest grain) of control. All maps are defined ultimately in terms of the primitive structures and therefore abide by the rules associated with each structure:

Structure	Relationship	Description
Join	Dependent	Children are ordered sequentially
Include	Independent	Children can execute in parallel
Or	Decision-making	Selection among alternatives

The three primitive control structures and their rules form a universal foundation for constructing maps in the domains of time and space as FMaps and TMaps.

Because it is defined in terms of these structures, every SOO has control properties, inherently providing seamless integration, maximizing its own reliability and flexibility to change, capitalizing on its own parallelism, and maximizing the potential for its own reuse and automation. The structures ensure that all interface errors — approximately 75 to 90 percent of all errors normally found during testing in a traditional development — are eliminated at the definition phase.

Although SOOs have properties for systems in general, the properties have special significance for the real-time, distributed behavior of systems: Each system is event-interrupt-driven; each object state is traceable, reconfigurable, and has a unique priority; independencies and dependencies can readily be detected (manually or automatically) and used to determine where parallel and distributed processing are most beneficial.

FMaps and TMaps: Definition and Execution Space

All functions in a system and their relationships are defined with a set of FMaps. Similarly, all types in a system and their relationships are defined with a set of TMaps.

FMaps (Function Maps) represent the dynamic (doing) world of action by capturing functional and temporal (including priority) characteristics. FMaps define, integrate, and control the transition of objects from one state to another.
TMaps (Type Maps) represent the static (being) world of objects by capturing spatial characteristics — for example, containment of one object by another or relationships between locations of objects in space. TMaps define, integrate, and control the potential atemporal relations between states of objects.

FMaps and TMaps depend on and reuse each other. The primitive operations that belong to types on a TMap used by FMaps within the same layer are themselves defined with FMaps on the TMap’s implementation layer and therefore rely on another layer’s TMaps. Thus, because an FMap depends on TMaps, it depends on another layer’s FMaps; similarly, because a TMap depends on another layer’s FMaps, it too depends on another layer’s TMaps. Functions depend on types, types on functions. In other words, FMaps and TMaps recursively reuse each other, layer by layer.

The definition and execution space of a SOO shows an FMap and its TMaps and their instantiation in terms of an EMap and its OMaps over time and space.

A SOO is realized — that is, has all of its values instantiated for a particular performance pass — in terms of an Execution Map (EMap) of actions, an instantiation of an FMap, and its OMaps. The figure above depicts a SOO’s definition and execution space, showing a person’s house and a path that he can take from the house to get food and water. In this figure, an alternative syntax is used to define FMaps: function(domain)structure=codomain instead of function(domain;codomain)structure.

Layered Reuse

Each layer of TMaps and FMaps becomes itself a reusable system to the layer immediately above it, which itself is a system layer. Application domains are separated into layers of reuse in which the primitive types of one layer are implemented in terms of reusable FMaps and TMaps of one or more lower-level layers of detail.

An FMap is completed when all its leaf nodes (or leaf nodes of the FMaps it uses) are recursive leaf nodes or are primitive function leaf nodes that use primitive operations of types in the TMaps. A recursive leaf node definition has the name and functionality of one of its parent’s ancestor definition nodes. The recursive reuse pattern has an Or primitive structure decision node between each of its recursive leaf nodes (each with some input different than the ancestor’s input) and the ancestor node. For a recursion to always be able to terminate, at least one of the Or structure’s alternatives cannot be (or have a descendent that is) a recursive leaf of the ancestor.

Object Control and Access Rights

TMaps provide universal primitive operations for controlling objects and object states (for example, type Any) that are inherited by all types of objects. They offer a means to create, destroy, copy, reference, locate, access a value, detect and recover from errors, access the type of an object, and access instances of a type.

A critical rule: a reference to an object’s state cannot be modified if other references are being (or could be) made to that state; reject values exist in all types, signifying error conditions.

The 001 Tool Suite

Given a set of FMaps and TMaps, the 001 Tool Suite (USL’s automation environment) can generate much of the design and all of the RMaps, perform requirements analysis, and simulate and observe a system’s behavior as it is being executed in terms of EMaps and OMaps. For software, 001 can use the same FMaps and TMaps to automatically generate all of the code including its documentation.

The developer doesn’t ever need to change the code, since application changes are made to the specification — not to the code — and target architecture changes are made to the configuration for the generator environment and not the code. Only the changed part of the system is regenerated and integrated with the rest of the application — again, the system is automatically analyzed, generated, compiled, linked, and executed without manual intervention.

Just as with the systems it is used to develop, 001 is completely defined with itself, using USL, and is completely and automatically generated with itself. It therefore has the same before-the-fact properties that all USL systems have.

USL as Formal Foundation for Other Languages

Diverse mappings (several automated) exist that go from a given syntax and semantics to USL or from USL to one of a possible set of syntactical forms (and semantics). The USL team performed an analysis of how the USL formal semantics could provide SysML/UML2 with a universal system formalism that can reduce semantic ambiguity in the OMG SysML specification and significantly simplify the UML2 specification standard.

Conclusion

Most of today’s systems are defined with languages originally intended for software. These systems are built using a programming or specification language created specifically for a computer — a syntax-first, syntax-dependent approach. USL, based on a formal systems theory derived from real-world systems — a semantics-first, syntax-independent approach — was originally created for defining systems in general, where the goal was to combine mathematical perfection with engineering precision.

The team inadvertently discovered during the Apollo error study that there was a universal way to prevent errors by the way a system is defined, addressing the issue of reliability head on. While searching for mechanisms to define error-free systems, they unexpectedly found patterns with properties that addressed other issues as well. Among other things, these patterns always present in FMaps and TMaps inherently support asynchronous and distributed behavior within all objects.

Whereas on Apollo it was necessary to manually program the scheduling of the processes and the assignment of priorities to each function to capitalize on the asynchronous operating system, FMaps and TMaps inherently make this happen from the beginning, starting in the models themselves. What was previously a manual life cycle process can now be automated. Further, because USL maximizes inherent reuse, the larger and more complex a system, the higher the productivity.

Although the Apollo software was developed long ago, reflection on its lessons continues. It is the hope that its legacy will continue. The goal is that the systems of today inherit the best of yesterday, and systems of tomorrow inherit the best of today.

References

M. Hamilton, “Inside Development Before the Fact,” Electronic Design, Apr. 1994
M. Hamilton and W.R. Hackler, “Universal Systems Language for Preventative Systems Engineering,” Proc. 5th Ann. Conf. Systems Eng. Res. (CSER), Stevens Institute of Technology, Mar. 2007, paper #36
M. Hamilton, “Zero-Defect Software: The Elusive Goal,” IEEE Spectrum, Mar. 1986, pp. 48-53
M. Hamilton, “The Heart and Soul of Apollo: Doing It Right the First Time,” Proc. 7th Int’l MAPLD Conf., NASA Office of Logic Design, paper S216, 2004
M. Hamilton and W.R. Hackler, “Reducing Complexity: It Takes a Language,” Innovations in Systems and Software Eng. J., NASA, Springer Verlag, 2009
M. Hamilton, Shuttle Management Memo #14, Charles Stark Draper Laboratory, 1972
M. Hamilton, What Is an Error?, tech. note, HTI, 1991
M. Hamilton and S. Zeldin, “Higher Order Software — A Methodology for Defining Software,” IEEE Trans. Software Eng., vol. SE-2, no. 1, Mar. 1976, pp. 9-32
B. Krut Jr., Integrating 001 Tool Support in the Feature-Oriented Domain Analysis Methodology, CMU/SEI-93-TR-11, 1993
M. Ouyang and M.W. Golay, An Integrated Formal Approach for Prototyping High-Quality Software of Safety-Critical Systems, MIT-ANP-TR-035, 1995
Software Productivity Consortium, “Object-Oriented Methods and Tools Survey,” SPC-98022-MC, Dec. 1998
M. Hamilton and W.R. Hackler, “Deeply Integrated Guidance Navigation Unit (DI-GNU) Common Software Architecture Principles,” Picatinny Arsenal, NJ, 2003-2004
J. Keyes, Internet Management, chapt. 30-33 on 001-Developed Systems for the Internet, Auerbach, 2000
HTI, 001 Tool Suite (1986-2008)
M. Hamilton, “Development Before the Fact in Action,” Electronic Design, June 1994
D. Bolinger and D.A. Sears, Aspects of Language, Harcourt Brace Jovanovich, 1981, p. 109
M. Hamilton and W.R. Hackler, “Towards Cost Effective and Timely End-to-End Testing,” HTI, prepared for Army Research Lab, 2000
S. Cushing, “A Note on Arrows and Control Structures: Category Theory and HOS,” 1978
S. Friedenthal, A. Moore, and A. Steiner, “OMG Systems Modeling Language (OMG SysML) Tutorial,” INCOSE 2006
M. Hamilton and W.R. Hackler, “A Formal Universal Systems Semantics for SysML,” INCOSE 2007, paper #8.3.2
Object Management Group, “Systems Modeling Language,” v. 1.0, 2006
Department of Defense, “National Test Bed Software Engineering Tools Experiment — Final Report,” vol. 1, Oct. 1992
M. Schindler, Computer-Aided Software Design, John Wiley & Sons, 1990

The Apollo On-Board Flight Software (2019) — Hamilton’s later retrospective covers much of the same Apollo narrative but without the USL formalization. Reading both shows how she frames the same events for different audiences.
Heart & Soul of Apollo (2004) — MAPLD presentation referenced as citation [4] in this paper. Likely contains visual materials that complement the formal treatment here.
Colossus Erasable Memory (1972) — The actual COLOSSUS erasable memory programs represent the concrete software artifact that the shared-memory interface error problem drove USL’s development.
Skylark GSOP (1972) — Hamilton’s team’s Skylab specifications, produced during the same period as the error analysis described in this paper.

Universal Systems Language: Lessons Learned from Apollo