This week, Conor discusses that wonderful repository of US-generated planetary science data: the Planetary Data System. This data, provided for free on the web at https://pds.nasa.gov/ allows any researcher - no matter whether they are professional or amateur - to benefit from the space missions that have been funded by US taxpayer money. Sometimes, this means that discoveries made by a mission can arrive decades after that mission has ended in studies led by researchers who may not have even been alive when that mission was dispatched!
by Conor Hayes
One of my favourite occurrences in astronomy (and in science in general) is when someone manages to pull new information out of old data. For example, data collected by the Galileo spacecraft in 1997 were used in a 2018 paper (https://www.nature.com/articles/s41550-018-0450-z) to argue that Europa might have plumes of water similar to those seen on Enceladus. Of course, in order for discoveries like these to be made, old data has to be archived in a way that is easily accessible to someone who may not have intimate knowledge of how the data were originally gathered.
In an attempt to solve this problem, NASA’s Planetary Science Division founded the Planetary Data System (PDS) in 1989. The PDS was not NASA’s first attempt at an archive for its planetary missions. During the 1960s and 1970s, mission data were primarily archived at the National Space Science Data Center and the Regional Planetary Image Facilities. However, these archives were not always the most robust, focusing primarily on data storage rather than organization and documentation.
The PDS, by contrast, was designed not just to archive data, but also to present it to future researchers in a standardized format that wouldn’t require highly specialized knowledge to use. To this end, the PDS archiving standards were developed. The standards are painfully specific and in-depth (the “basic concepts” document is nearly 50 pages long, and the core reference manuals total to over 650 pages), so I won’t even attempt to explain them in full here. Instead, let’s look at an archived data project from my research to see how the standards are actually implemented.
The basic premise of the PDS archiving standards is that the data have to be accessible to any plausible future researcher. This means that the data absolutely cannot be archived in a proprietary format. Any time that you write a NumPy array to disk as a NPY file, save an image as a PNG, or export a document as a PDF, you are assuming that the technology to read those files will continue to exist. If those formats are depreciated at some point down the line and the general knowledge about how to use them is lost, then the data contained within are, for all intents and purposes, gone forever.
Of course, you have to make some assumptions somewhere, otherwise developing a standard will be nearly impossible. In this case, the PDS decided to assume that future researchers would be accessing their data using computers that could understand ASCII characters. Given that the ASCII standard itself has been a fundamental part of every computer since its creation in the 1960s, this seems like a pretty safe assumption to make.
Figure 1 : Some of the information you would find in a PDS label file.
Now, let’s take a look at an actual PDS data product. This product is one frame of an MSL suprahorizon movie (described elsewhere in this blog), and is archived on the PDS Cartography and Imaging Sciences Node. (The other science nodes, if you were curious, are Atmospheres, Geosciences, Planetary Plasma Interactions, Ring-Moon Systems, and Small Bodies). Each product comes in two parts: the label and the actual data. The label (seen in Figure 1) contains information about the format of the data, such as the number of bytes it contains, which byte the image data begins on, the image shape, the bit depth, and the number of bands in the image. It also lists information about the instrument used to collect the data, like the azimuth and elevation that the camera was pointed at, where on the planet the rover was located when the image was taken, and other useful information like the time of day the image was taken and the units associated with the data.
Unlike the label, which is presented in a plaintext format, the image data cannot be understood just by looking at it. If you open it in a text editor, you’ll probably get something that just looks like an incomprehensible mess of random characters (see Figure 2). That’s probably not surprising though. You wouldn’t try to open a PNG in a text editor, so why would this be any different? Well, if you try to open it in your favourite image viewing application, you likely won’t have much luck there either.
Figure 2 : Opening a PDS image file in a text editor – a bunch of nonsense!
As it happens, both the label and the image data are presented as binary files containing no information that would help an application interpret them. A text editor assumes that you’re trying to open a text file, so the label, which is a text file, opens just fine. (This is also the reason why opening the image file in a text editor displays a bunch of random letters and symbols - the editor is interpreting the image data as ASCII characters.) But displaying an image is much more complex than plaintext, so without the guidance that your typical PNG or JPG includes, it’s unlikely that any mainstream application would be able to open a PDS image file.
This is the downside of the PDS archiving standard. Because it has to make as few assumptions as possible about the application being used to open it, the data are presented in such a general format that most common applications, used to being presented with highly structured files, have no idea what to do with them. The upside is that because the standards are so well-documented, it’s not exceptionally difficult to write your own code to read PDS files. In the interest of time, I ultimately decided to use code someone else had already written (the planetaryimage package distributed by the PlanetaryPy Project - it can be downloaded from their GitHub at https://github.com/planetarypy/planetaryimage, if you’re interested), but it could be a fun challenge to create an image viewer yourself in your language of choice.
Figure 3 : The results of opening a PDS image file with a tool designed specifically for the task – a beautiful image from the surface of Mars!
The PDS data archiving standards might not be as intuitive or out-of-the-box easy to use as other file formats that we might be used to, but it’s for a good cause. By standardizing our data archives, we are ensuring that future researchers will continue to have access to the vast volumes of information we have collected about our Solar System, information that may be hiding discoveries awaiting reanalysis by some scientist who might not even be born yet.
No comments:
Post a Comment