What the Errors Tell Us

Citation: Hamilton, Margaret H. “What the Errors Tell Us.” IEEE Software, 35(5): 32—37, September/October 2018.

DOI: 10.1109/MS.2018.290110447

Source: Paywalled (IEEE Software). Research copy. Published in the “Focus: Software Engineering’s 50th Anniversary” special issue.

This is Hamilton’s most recent known publication and the capstone of her entire body of work. Published in a special issue celebrating the 50th anniversary of the discipline she helped name and define, the paper distills a half-century of practice into its essential insight: that errors, properly understood, are the teachers that reveal how systems should be designed. It begins with “It was not quite the ’60s” and ends with a call to educate people in the preventative paradigm.

View source document (PDF)

What the Errors Tell Us

Margaret H. Hamilton, Hamilton Technologies, Inc., 2018

The paper opens with Hamilton’s personal journey into computing. It is a first-person narrative unlike her earlier technical publications, and this shift in register is deliberate: the IEEE Software anniversary issue was an occasion for reflection, and Hamilton uses it to trace the entire arc from her earliest encounters with errors to the mature formal theory those errors taught her to build.

Early Days: The LGP-30 and SAGE

Hamilton’s first assignment was creating weather prediction software in hexadecimal on the LGP-30 for Edward Lorenz at MIT. Priority was understanding the hardware-software relationship at the deepest level. Debugging was so laborious that the “solution” was modifying binary paper tape directly — poking holes with a pencil to change 0 to 1, or covering holes with Scotch Tape to change 1 to 0. Hamilton recognized even then that this approach, what she called “hacking,” was fundamentally error-prone.

Her next assignment was on the SAGE air defense system, developing software on the first AN/FSQ-7 computer (XD-1) to search for unfriendly aircraft. At SAGE, errors were impossible to hide: when the machine crashed, siren and foghorn sounds echoed throughout the building, and the guilty programmer was identified by standing at the console. Debug information consisted of a foot-long register with flashing lights whose contents you wrote down on paper.

Hamilton innovated in unexpected ways at SAGE. She took Polaroid pictures of each programmer posing with their bug — the pictures grew more creative as time went on. She discovered debugging by sound: an operator once called at 4 AM saying “your program no longer sounds like a seashore.” These experiences shaped a career-long fascination: “I began to find more ways to understand what made a particular error or a class of errors happen as well as how to prevent it from happening in the future.”

Apollo Onboard Flight Software

The challenge at MIT’s Instrumentation Laboratory was direct: build human-rated software, meaning astronauts’ lives were at stake. The software had to work, and it had to work the first time.

Hamilton’s account of the Apollo 11 incident is told here through the lens of error categorization. Before landing on the moon, the AGC became overtaxed because the rendezvous radar switch had been left in the wrong position — an error in the checklist document, not in the software. The Display Interface Routines’ Priority Displays (1201 and 1202 alarms) interrupted the astronauts’ normal mission displays, warning of the emergency, letting Mission Control understand the situation, and alerting the astronauts to correct the switch. The software was not merely detecting an error; it was compensating for one, shedding lower-priority tasks and re-establishing the critical ones needed for landing.

The Apollo software architecture is described with particular attention to its asynchronous nature. Every process had a unique priority, ensuring correct ordering in time relative to everything else. The flight software and the astronauts became “parallel processes within a distributed system-of-systems environment” — a formulation Hamilton considers one of the most important conceptual advances from Apollo. Error detection and recovery included a system-wide “kill and recompute from a safe place” restart approach, combined with priority displays and human-in-the-loop capabilities.

Hamilton emphasizes the management discipline: updates were continuously submitted from hundreds of people across many releases for concurrent missions. “Everything needed to play together like a finely tuned orchestra, making sure there were no interface errors.” The Assembly Control Supervisor role — a dedicated person whose job was to manually eyeball all code for interface errors and coding rule violations — kept the system coherent.

The Preventative Paradigm

With NASA and DoD funding, Hamilton’s team performed a systematic empirical study of the Apollo effort. The subject of errors, she writes, “took on a life of its own.” Her personal note carries a candor rare in technical literature: “I had the opportunity to have some responsibility in the making of many of these errors, without which we would not have been able to learn as much as we did.”

From the error analysis, a general systems theory was derived. From its axioms, “a set of allowable patterns” became the basis for the Universal Systems Language (USL) and its automation, Development Before the Fact.

USL differs fundamentally from traditional programming languages. Instead of telling the computer what to do, the designer defines “all the system’s relationships” — the what. Every system is defined in terms of Function Maps (FMaps) and Type Maps (TMaps). FMaps capture functional, temporal, and priority characteristics. TMaps capture type, spatial, and structural characteristics. Three primitive control structures — Join (dependent), Include (independent), and Or (decision-making) — provide the universal building blocks.

The paper’s single figure presents a robot exploration system defined with FMap RunRobot and TMap Robot. This example is carefully chosen: not an avionics system but an autonomous robot with a reactive sensorimotor memory map, using the distributed independent set (dIset) TMap structure. The choice signals USL’s applicability to autonomous systems and contemporary robotics.

The 75% Finding, Matured

Hamilton’s career-spanning claim reaches its most definitive form here: “The majority of errors, including all interface errors (at least 75% of and the most subtle of all errors), are not allowed into a system in the first place, by the way it is defined.”

The automation capabilities of USL’s 001 Tool Suite work in concert with this preventative approach. Correct use of USL eliminates the majority of errors by construction. The 001 Analyzer hunts down any errors from incorrect USL use. Code, documentation, and even robot commands are automatically generated. When a type changes, all impacted functional uses are identified and reanalyzed automatically. The developer never needs to change the code — application changes are made to the USL definition, architecture changes to the generator configuration. Only changed parts are regenerated and integrated, and the system is automatically analyzed, generated, compiled, linked, and executed without manual intervention.

Closing Reflection

Hamilton closes with a frank assessment: “Many of the pressing software issues that existed in the earlier days still exist today,” attributing this largely to the persistence of the traditional curative paradigm. She notes the counterintuitive property of preventative systems: “the more reliable a system, the higher the productivity in its lifecycle.”

The final sentence captures both the paper’s thesis and Hamilton’s life’s work: “The errors not only tell us how to build systems without them but also unexpectedly gave us a paradigm for the future. Educating people how to think and build systems in terms of the paradigm becomes the next challenge.”

References

M. Hamilton, “Inside Development Before the Fact,” Electronic Design, Apr. 1994
M. Hamilton, “The Language as a Software Engineer,” ICSE 2018 (keynote)
M. Hamilton, “Computer Got Loaded,” letter to the editor of Datamation, March 1971
M. Hamilton, “A Demonstration of USL and Its Automation, the 001 Tool Suite” (tax example), htius.com
M. Hamilton and W.R. Hackler, “Universal Systems Language: Lessons Learned from Apollo,” IEEE Computer, Dec. 2008
M. Hamilton and W.R. Hackler, “Universal Systems Language for Preventative Systems Engineering,” CSER 2007
M. Hamilton, “Zero-Defect Software: The Elusive Goal,” IEEE Spectrum, Mar. 1986
M. Hamilton, “The Heart and Soul of Apollo,” MAPLD 2004
M. Hamilton and W.R. Hackler, “Reducing Complexity: It Takes a Language,” ISSE J., 2009
M. Hamilton and W.R. Hackler, “A Formal Universal Systems Semantics for SysML,” INCOSE 2007
M. Hamilton, “USL and its Automation, the 001 Tool Suite,” IEEE/Lockheed Martin Webinar, 2012
M. Hamilton and W.R. Hackler, “Towards Cost Effective and Timely End-to-End Testing,” HTI, 2000

“Computer Got Loaded” (1971) — Hamilton’s earliest published account of the Apollo 11 incident, cited as reference 3 in this paper. The 2018 paper retells the same story through the mature error taxonomy lens.
Higher Order Software (1976) — The foundational axioms paper. The “set of allowable patterns” that Hamilton references as the basis for USL originates here.
USL: Lessons Learned from Apollo (2008) — Cited as reference 5. The most technically detailed presentation of USL’s formal structures, complementing this paper’s narrative approach.
The Apollo On-Board Flight Software (2019) — Hamilton’s retrospective covers overlapping Apollo narrative without the USL formalization and error theory framing.
Preventative Software Systems (1994) — Where “Development Before the Fact” was formally named. The four-level preventative hierarchy articulated there is the seed of this paper’s mature taxonomy.
USL for Preventative Systems Engineering (2007) — Cited as reference 6. The CSER paper provides the formal technical detail behind USL’s preventative approach that this paper summarizes narratively.
The 001 Tool Suite: Evolution of Automation — The tool chain whose capabilities this paper describes in its most compressed form.

What the Errors Tell Us

What the Errors Tell Us

Early Days: The LGP-30 and SAGE

Apollo Onboard Flight Software

The Preventative Paradigm

The 75% Finding, Matured

Closing Reflection

References

Related Documents