Source

Software Engineering Within SpaceX

June 01, 2020

Hi everyone! 👋 I am sure quite a few of you must have seen the SpaceX launch this past Saturday. It was an amazing and historic event. Millions of people were watching it live on YouTube and elsewhere. With each passing day, we are getting closer to commercial space flights and I have to agree I am excited.

In addition to fueling my excitement about space travel, this launch also made me curious about the tech which goes in these rockets. I did some research from the Computer Science perspective and wanted to share what I found. It goes without saying that most of this information is gathered from different sources I came across online. Even though I tried to make sure I don’t include any wrong information, there is no guarantee that this information is 100% accurate.

Teams

There was an AMA by the SpaceX Software Engineering team 7 years ago where they shared some insights about how they work and what they work on. They have 4 separate Software teams:

Flight Software team

The Flight Software team is about 35 people. We write all the code for Falcon 9, Grasshopper, and Dragon applications; and do the core platform work, also on those vehicles; we also write simulation software; test the flight code; write the communications and analysis software, deployed in our ground stations. We also work in Mission Control to support active missions.

Enterprise Information Systems team

The Enterprise Information Systems team builds the internal software systems that makes SpaceX run. We wear many hats, but the flagship product we develop and release is an internal web application that nearly every person in the company uses. This includes the people that are creating purchase orders and filling our part inventory, engineers creating designs and work orders with those parts, technicians on the floor clocking in and seeing what today’s work will be per those designs…and literally everything in between. There are commercially available products that do this but ours kicks major ass! SpaceX is transforming from a research and engineering company into a manufacturing one - which is critical to our success - and our team is on the forefront of making that happen. We leverage C#/MVC4/EF/SQL; Javascript/Knockout/Handlebars/LESS/etc and a super sexy REST API.

Ground Software team

The Ground Software team is about 9 people. We primarily code in LabVIEW. We develop the GUIs used in Mission and Launch control, for engineers and operators to monitor vehicle telemetry and command the rocket, spacecraft, and pad support equipment. We are pushing high bandwidth data around a highly distributed system and implementing complex user interfaces with strict requirements to ensure operators can control and evaluate spacecraft in a timely manner.

Avionics Test team

The Avionics Test team works with the avionics hardware designers to write software for testing. We catch problems with the hardware early; when it’s time for integration and testing with flight software it better be a working unit. The main objective is to write very comprehensive and robust software to be able to automate finding issues with the hardware at high volume. The software usually runs during mechanical environmental tests.

Hardware + Software Redundancy

Someone also recounted their interaction with the SpaceX team at GDC 2015/2016 in an answer on StackExchange. They talk about the tripple redundancy system and how SpaceX uses the Actor-Judge system. In short there are 3 dual core ARM processors running on custom board (according to elteto). For each decision a “flight string” compares the result from each core on a single processor. If the output matches the command is sent to different controllers. There are 3 processors (with dual cores) so that means each controller/sensor will get three different commands. The controllers then act as the judge and compare the three commands. If all three are in agreement, they carry out the operation. If even a single command is in disagreement, the controller carries out the command from the processor which had previously been sending the correct commands.

This means that at any given point there are 6 running processes of the flight software.

Software Certifications

Most of the important software in mission-critical infra goes through various certifications. For instance, you can’t run any random software on an airplane. Even the entertainment system code has to satisfy various certifications. One such certification is DO-178B which stands for *Software Considerations in Airborne Systems and Equipment Certification.

The certification and correctness part is made easier by using software verification tools. One such tool is Astrée. It is a static code analyzer that checks for runtime errors and concurrency related bugs in C projects. This also leads us to the answer for why a lot of mission-critical code is written in C. Its because there are a lot of static analyzers and software verification tools for C.

A fun fact which I got via Hacker News:

Automatic docking software for the ATV that delivers supply to ISS is written using C code and verified with Astree.

SpaceX also made use of Chromium and JavaScript for Dragon 2 flight interface. I am not sure how that passed the certification. I assume it was allowed because for every mission-critical input on the display, there was a physical button underneath the display as well. So if in case the screen malfunctioned, the astronauts could potentially make use of the physical buttons. Regarding the use of Chromium and JS, a user on Hacker News had to say this as well:

Also, only the actual graphical display application uses Chromium/JS. The rest of the system is all C++. The display code has 100% test coverage, down to validation of graphical output (for example if you have a progress bar and you set it to X% the tests verify that it is actually drawn correctly).

You can see the buttons in the image below.

Falcon 9 touch input display

The astronauts explain how the system works and what they do in case of UI malfunction in this video.

Mission Critical infra also uses real-time operating system. These operating systems have special assurances that might not be provided by regular operating systems. For example, faster interrupt response and better memory protection. A RTOS provides real-time guarantees which are essential for such software. One such operating system is VxWorks. It was launched in 1987, targets Embedded systems, and is owned by Wind River Systems. It is used in the Mars Rover and SpaceX Dragon (among other systems). Having so many certifications doesn’t mean that bugs can’t show up. Apparently, the 2003 Mars rovers experienced a bug in their flash memory driver but it was sorted out by sending an update from earth (source).

Model Rockets

If you want to indulge your curiosities and explore programming for rockets, you should check out model rockets. Joe Barnard works on BPS.space and has made small hobby rockets do amazing stuff. He has developed his own little flight controller called Signal. He made a Falcon 9 replica which is stabilized using Thrust Vector Control. There are so many opportunities nowadays for learning about actual space rockets and working your way up from small model rockets.

Falcon 9 replica

If you are interested in model rockets, you should explore the different certifications and licenses available in your country for amateur rocketeers. US has 3 levels of certifications and each level gives you more possibilities for rocket launches.

If you are curious about how much cool stuff you can do using model rockets, check out this landing rocket developed by Joe below.

I will continue to update this article based on any new stuff I find during my research. If you feel like I misquoted something or if there is something new I should add to this article please let me know in a comment below!

Till next time, have a wonderful day, and stay safe! 👋 ❤️

PS: SpaceX also launched a simulator where you can try your hands at docking the Falcon 9 with the ISS 🚀

Edit: After this post was published the SpaceX software team did an AMA on Reddit (6th June 2020). It contains some good insights. Check it out here.

✍️ Comments

Johnny
Tuesday, Jun 2, 2020 at 14:46 UTC

“A ROS provides real-time guarantees which are essential for such software.” Do you mean a RTOS?

Yasoob
In reply to Johnny
Tuesday, Jun 2, 2020 at 16:47 UTC

Hi Johnny! You are absolutely correct. I just fixed the typo. Thanks! :)

Tom
Tuesday, Jun 2, 2020 at 18:03 UTC

Absolutely amazing breakdown. As a programmer, it’s VERY interesting to see the tools and languages used to put people in space.

Yasoob
In reply to Tom
Tuesday, Jun 2, 2020 at 19:57 UTC

Yeah I was also always curious about this kind of stuff too. I couldn’t find any compiled list so decided to do the grunt work myself. The research phase was actually quite a lot of fun! :)

Sathya
Wednesday, Jun 3, 2020 at 04:07 UTC

Hi, I wonder how and who designed the system interface - I mean the interface software - hardware - control systems - mechanical/hydraulic/pneumatic system. This is the hardest part even for a relatively simple product like an automobile.

Ern
Wednesday, Jun 3, 2020 at 10:43 UTC

Good summary. I’d assume that there would be classified aspects to some of this such as perhaps the guidance systems, hardware designs, software designs, etc. Mission critical with humans would need the highest levels of hardware redundancy with rad-hardening in particular due to the space environment. Especially the in-orbit hardware. It’s interesting that sub-orbital hardware may not require rad-hardening. Noticed a typo in “tripple”. Should be triple.

Sam
Wednesday, Jun 3, 2020 at 15:18 UTC

I was working with Qt in a previous job and in my ramp up of it discovered that SpaceX was using QML for the Crew Dragon UI. I believe that jives with the statement that “only the graphics are drawn with JS/Chromium” and that the rest of the code is C++. Really awesome how far UI development has come with balancing high performance vs speed of development vs visual quality!

Stefan
Wednesday, Jun 3, 2020 at 17:42 UTC

also intresting in this context is this talk https://www.youtube.com/watch?v=t_3bckhV_YI it is from SpaceX engineers how they migrated to Bazel for their build system

Nate Fisher
Wednesday, Jun 3, 2020 at 21:07 UTC

Does this article imply that RTCA/DO-178B is used as a means of demonstrating compliance in some way, or otherwise is used to define lifecycle processes for their development/verification/systems teams? Can you say where you saw this mentioned by SpaceX?

Yasoob
In reply to Nate Fisher
Wednesday, Jun 3, 2020 at 21:18 UTC

Hi Nate,

This particular piece of information wasn’t sourced from SpaceX. It was just given as an example of the type of certification which is required for different mission-critical systems.

Jana Sankova
Thursday, Jun 4, 2020 at 18:29 UTC

Hi Nate, Adding to the comment above, FYI - the RTCA/DO-178B is an old reference and was replaced with RTCA/DO-178C. BTW - I really enjoyed reading this summary article!

Doug
Friday, Jun 5, 2020 at 02:29 UTC

I remember a launch years ago that aborted just before ignition. Walter Cronkite asked someone what happened and they told him that there were three computers and they vote. If not in agreement, the action does not occur. A few hours later the launch went off. Cronkite went back and asked what had they done. The answer buried in a lot of techno bable was that they rebooted the computer that voted no.

David
Tuesday, Jun 9, 2020 at 09:56 UTC

New (6/6/20) AMA from the SpaceX software team: https://www.reddit.com/r/spacex/comments/gxb7j1/we_are_the_spacex_software_team_ask_us_anything/

Rohan
Wednesday, Jun 10, 2020 at 05:56 UTC

“There are 3 processors (with dual cores) so that means each controller/sensor will get three different commands. The controllers then act as the judge and compare the three commands. If all three are in agreement, they carry out the operation. If even a single command is in disagreement, the controller carries out the command from the processor which had previously been sending the correct commands. This means that at any given point there are 6 running processes of the flight software.”

Fascinating! Carrying out the command from the processor that has been correct thus far makes a lot of sense intuitively, but I’m wondering if there’s a more rigorous justification for this sort of redundancy management? Or if there are other strategies for redundancy also in place? Like I’m thinking of a scenario where you have 5 continuous scenarios where the processors aren’t unanimous. Would you rely on the results from the processor that was correct upto the (n-5)th instance and use its results each time? That seems concerning. If not, is there a process for recalibration after a lack of unanimity? I get that such a scenario is probably extremely unlikely in the zero-margin-for-error industry SpaceX operates in, but it seems to me that they would have to account for something like this precisely because they’re in a zero-margin-for-error industry.

Alaya
Thursday, Jun 11, 2020 at 21:46 UTC

Thank you for the informations !

Aaron
Wednesday, Aug 5, 2020 at 00:41 UTC

Great article! Thank you!

Thank you!

Your comment has been submitted and will be published once it has been approved. 😊

Yasoob Khalid

My Books

Software Engineering Within SpaceX

Teams

Hardware + Software Redundancy

Software Certifications

Model Rockets

✍️ Comments

Say something

Thank you!

Software Engineering Within SpaceX

Teams

Hardware + Software Redundancy

Software Certifications

Model Rockets

You might also like

Newsletter

✍️ Comments

Say something

Thank you!