Magnetic tape: the format of choice for world-leading scientists!

(Photo by CERN)

A bit of a side-ramble here, triggered by an article in The New Scientist magazine that caught my attention. The headline was ‘Large Hadron Collider sticks with reels of tape for vast storage needs’ and I thought, woah, hang on a minute, are they seriously suggesting that the immense volumes of data generated by one of the world’s most advanced scientific instruments are stored on tape? Turns out they are!

The European Council for Nuclear Research (CERN), based on the Franco-Swiss border near Geneva, hosts a giant complex of particle accelerators. Located in a tunnel 100 metres underground is the most famous of them: the Large Hadron Collider (LHC), the most powerful particle accelerator ever built and the one in which the famous Higgs boson particle was discovered. It consists of a 27-kilometre ring of superconducting magnets with a number of accelerating structures to boost the energy of the particles along the way. The LHC is the most expensive scientific instrument ever built, and yet the information it collects is still archived on magnetic tape! Which is even more amazing when you get a handle on the sheer volume of information we’re talking about here…

(Photo by Anna Pantelia/CERN)

The LHC operates in multi-year runs followed by periods of downtime and upgrades. So far there have been two such runs, from 2010 to 2013 and from 2015 to 2018, during which times the LHC generated unprecedented quantities of data. A particle accelerator propels charged particles, such as protons or electrons, at high speeds, close to the speed of light. They are then smashed either onto a target or against other particles. These collisions, which are recorded as data, produce massive particles (such as the Higgs boson), which scientists study to increase our understanding of matter and of the origins of the Universe. You can take a virtual tour of the LHC here or watch a 60-second introduction below.

According to the CERN website, “the CERN Data Centre processes on average one petabyte (one million gigabytes) of data per day. The LHC experiments produce about 90 petabytes of data per year, and an additional 25 petabytes of data are produced per year for data from other (non-LHC) experiments at CERN. Archiving the vast quantities of data is an essential function at CERN. Magnetic tapes are used as the main long-term storage medium and data from the archive is continuously migrated to newer technology, higher density tapes”.

Elsewhere on the CERN website I read that, “Most of the data collected at CERN will be stored forever; the physics data is so valuable that it will never be deleted and needs to be preserved for future generations of physicists.” “While tapes may sound like an outdated mode of storage, they are actually the most reliable and cost-effective technology for large-scale archiving of data, and have always been used in this field. One copy of data on a tape is considered much more reliable than the same copy on a disk.”

A bit more digging and I discover an old article from The Economist (‘Magnetic tape to the rescue’, 2013) suggesting that CERN’s need for mass storage “is reviving a technology which, only a few years ago, seemed destined for the scrapheap: magnetic tape.” “Tape is the oldest computer storage medium still in use,” it continues. “It was first put to work on a UNIVAC computer in 1951. But although tape sales have been falling since 2008 and dropped by 14% in 2012, according to the Santa Clara Consulting Group, tape’s decline has now gone into reverse: sales grew by 1% in the last quarter of 2012 and a 3% rise is expected this year.” Seems the tape revival may not be limited to the field of music!

“CERN’s Alberto Pace says magnetic tape has several advantages over hard disks for long-term data preservation, including speed, reliability, zero power consumption for storage, and security. For example, a broken tape can be spliced back together and only lose a few hundred megabytes of information, while deleting 50 petabytes of CERN data on magnetic tape would take years, rather than minutes for disk-based data. IBM Zurich research lab’s Evangelos Eleftheriou cites tape’s lower cost compared to disks, and much greater longevity of the stored data.”

“However, the looming flood of data is too much even for modern tape cartridges to handle, and higher densities are required. Eleftheriou currently is developing a tape with a density of 100GB per square inch, as well as creating the equipment needed to read it. The technology could potentially yield a cartridge capable of storing more than 100 terabytes, and a key challenge to be met is to position the read/write head to within 10 nanometers.”

Back to the CERN website to find out more, as the Large Hadron Collider is about to begin its third run in the next month or so. I find an article, ‘LHC: pushing computing to the limits’, that reports:

“New IT research-and-development activities have already begun in preparation for the LHC’s Run 3 (foreseen for 2021 to 2023). “Our new software, named CERN Tape Archive (CTA), is the new tape storage system for the custodial copy of the physics data and a replacement for its predecessor, CASTOR. The main goal of CTA is to make more efficient use of the tape drives, to handle the higher data rate anticipated during Run 3 and Run 4 of the LHC,” explains German Cancio, who leads the Tape, Archive & Backups storage section in CERN’s IT department. Compared to the last year of Run 2, data archival is expected to be two-times higher during Run 3 and five-times higher or more during Run 4 (foreseen for 2026 to 2029).”

Finally, I found this: a short video from a joint research presentation by Fujifilm and IBM Research in December 2020, in which CERN’s Alberto Pace explains why and how CERN uses tape technology. The scale of it is pretty mind-blowing! I have to hand it to the man, if my job carried anything like the levels of responsibility that his does, I wouldn’t get a night’s sleep ever again!