Computer Science: The coding error that blew up a rocket

Profile photo for Ben, a tutor with Owl Tutors
BenOwl Tutor

Computer Science

March 26th, 2024

In this article, experienced Computer Science teacher Ben, explores the consequences of coding errors.

On the 4th of June 1996 the European Space Agency’s (ESA), Ariane 5 rocket exploded on her maiden voyage at 3700 meters, 40 seconds after launch. It was one of the most expensive software bugs in history, costing over US$370 million.

So, what happened?

Programmers often reuse code on similar projects. The Ariane 5 software developers reused code from the Ariane 4 which had been developed 10 years before. The code that caused the problem was in the Inertial Reference System. This was responsible for aligning the rockets before take-off. As this software was designed to be used before take-off so it should have been inactive during flight. The developers had added a delay to account for the common inaccuracy of flight launch time meaning this code was running when it shouldn’t have been. In addition the Ariane 5 had a significantly higher velocity than the Ariane 4 where the code had run without problems, even during flight.

Computers store numbers in binary, the code that caused the problem converted 64-bit floating point into 16-bit signed integer.

A 64-bit floating point can represent numbers from 10^-383 that means -10 with 383 following zeros to 10^384 which is 10 with 384 following zeros. A 16-bit signed integer can store numbers from −32,768 to 32,767.

The code that caused the problem was supposed to convert the 64-bit floating point into a number which the 16-bit signed integer could store, however, with the additional velocity of the Ariane 5 over the Ariane 4 the code was unable to complete its task correctly and passed numbers large than could be stored in the 16-bit signed integer. This led to a stack overflow, which is trying to store something in a computer memory space not large enough to store it.

Once the software realised it had an error in the form of a stack overflow it stopped sending numbers and starting sending error codes. The error handling for this was to switch to a backup system, however this backup system was an exact clone of the original system, therefore the exact same error occurred just 72 milliseconds later.

There was no exception handling code built in to handle these types of errors. The system interpreted the errors as meaning the rocket is off-course and fires the boosters designed to deflect the nozzle from threats, however the threat is fictitious. The action of firing these deflection boosters incorrectly lead to the rocket beginning to tear apart. At this point the central system identifies a catastrophic failure and triggers the termination system leading to a full self-destruct.

This entire sequence of events took place between the 37th and 40th second after take-off.

The testing on the Ariane 5 relied on assumptions from the Ariane 4. As computer scientists the Ariane 5 highlights that the importance of diligent, comprehensive and extensive testing of all modules in every conceivable scenario and to remember “just because it worked before over there, doesn’t mean it’ll work again over here!”

If you liked this article, subscribe to our newsletter

By subscribing to our newsletter you agree to receive email from us and agree to our Terms and Conditions*

Start the discussion!

    Related Posts