On April 6, 2019, a ten-bit counter rolled over. The counter, a component of many older satellites, marks the weeks since Jan 1, 1980. It rolled over once before, in the fall of 1999. That event was inconsequential because few complex systems relied on GPS. Now, more systems rely on accurate time and position data: automated container loading and unloading systems at ports, for example. The issue was not with the satellites or with the cranes.
The problem highlights the pervasive disconnect between the worlds of IT and OT. Satellites are a form of industrial control system. Engineers follow the same set of principles designing satellites as they do designing any other complex programmable machine. Safety first, service availability next.
In the 1990s satellites suffered a series of failures, prompting the US General Accounting Office (GAO) to review satellite security. The report (at https://www.gao.gov/products/GAO-02-781) identifies two classes of problems that might befall satellites, shown in these two figures.
Figure 1: Unintentional Threats to Satellites
Figure 2: Intentional Threats to Satellites
This analysis is incomplete. It omits an entire class of problems: software design defects and code bugs. The decision to use a 10-bit counter to track the passing weeks is a design defect. The useful life of a satellite can be 40 years or more. A 10-bit counter runs from 0 to 1,023, then rolls over to zero. Since the are 52 weeks in a year, the counter does not quite make it to 20 years. This design specification was dramatically under-sized. More recent designs use a 13-bit counter, which will not roll over for almost 160 years. That provides an adequate margin.
As for code bugs, satellites suffer them just like any other programmable system. The Socrates network tracks satellites to project potential collisions. In 2009, Socrates predicted that two satellites, a defunct Soviet-era communications satellite and the Iridium constellation satellite #33, were projected to pass 564 meters apart. In reality, they collided, creating over 2,000 pieces of debris larger than 1 cm in size. Whether the defect arose from buggy code or inadequate precision in observations, the satellites collided. Either way, there is a software defect here. The question is, is the software inaccurate, or is it creating precision that does not exist? If the instruments doing the measurement have a margin of error, the report should include that data. By stating that the satellites will pass 564 meters apart, the value implies a precision of ½ meter either way – between 563.5 meters and 564.5 meters. If the precision is within half a kilometer, the software should state that specifically – “Possible collision – distance between objects under 1 KM.” If the input data is precise, then the code is calculating the trajectories incorrectly. Either is a code bug.
These two types of defects are neither unintentional (code and designs do not degrade over time) nor intentional (no saboteur planted the defect). The third class of defect results from inconsistent design specifications (the satellite can live for 40 years but the counter rolls over in 20) or poor coding practices (creating a level of precision unsupported by the measurements, or calculating the trajectories incorrectly). These are software defects.
As we all know, there was no failure in the GPS system. I made a passing comment during a talk on satellite security at the RSA 2019 conference. A reporter from Tom’s Guide was there, and he wrote an excellent article on the problem: https://www.tomsguide.com/us/gps-mini-y2k-rsa2019,news-29583.html.
The failure is not including software issues among the risks to a programmable device.
What do you think? Let me know below or @WilliamMalikTM.
The post What Did We Learn from the Global GPS Collapse? appeared first on .