bwDataArchive

Service for long term storage of scientific data for members of univerisities and public scientific institutions in the state of Baden-Württemberg, the Helmholtz-Association and european data infrastructures.

The Service

The service bwDataArchive provides long-term data storage for research and other public institutions in the state of Baden-Württemberg, the Helmholtz association and European data infrastructures. Data is stored in technical infrastructure of the Scientific Computing Center (SCC) at the Karlsruhe Institute of Technology (KIT) and comprises trustworthy large storage systems for a secure data storage for a period of ten years and more. The service enables a qualified implementation of the recommendation of the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) regarding the safeguarding and storage of research data.

Tape technology as mass storage

Long-term storage is technically implemented using mass storage, which largely consists of magnetic tapes - a proven technology with a much lower error rate compared to hard disks. Large data centers rely heavily on this reliable and durable storage technology, which is also constantly being further developed. In addition, the acquisition and operating costs of tape technologies in the petabyte range are up to ten times cheaper than hard disk or SSD storage solutions. The disadvantage of longer access times (up to two minutes) due to linear storage on magnetic tapes is offset by the advantages of longevity and low costs for power and cooling.

The SCC currently uses the TS1160 technology from IBM®. A magnetic tape cartridge of this technology generation can store approx. 20 TB of data.

HPSS as a storage management solution

An all-encompassing system must be used to manage the data storage in combination with the provision of functions to ensure the integrity of the stored data, for example. The SCC decided to install and operate the High Performance Storage System (HPSS). HPSS enables the storage of millions of files up to the exabyte range. Disks and tape storage are combined in a virtual file system to form a high-performance storage management system that automatically migrates data between hard disks and tapes.

The Project

The project bwDataArchiv was launched in 2014 as a collaboration between the SCC at KIT and the High Performance Computing Center Stuttgart (HLRS) at the University of Stuttgart and was extended for a further two years in 2016. As part of this state project funded by the Baden-Württemberg Ministry of Science, Research and Arts (MWK), the bwDataArchive service was developed as a central long-term archive for data from research institutions and libraries in the state of Baden-Württemberg and the HLRS.

Motivation

Data from scientific experiments, from measurements, analysis and simulations, have to be stored long-term to guarantee accessibility after scientific projects have ended. The storage of this data isn't only important because of legal reasons, but also because of its historical and possible future scientific value. In order to accomplish this task, it became necessary to set up structures and develop and evaluate technologies for structured, reliable and secure long-term storage of data volumes of up to several exabytes.

Goals

The central component of the bwDataArchive project was the development of a long-term archive service based on the existing infrastructure of the SCC at KIT. Among other things, questions regarding the use of new technologies, the development of in-house software and the integration of software from partner projects had to be answered. The following questions were also addressed:

  • How can the process of data storage and archiving be simplified for a non-IT-savvy scientific community?
  • Which security-related aspects are important for long-term storage?
  • Which kind of service model is needed for a service that specializes in long-term data storage?
  • How can the data integrity of hundreds of petabytes of data be efficiently ensured?

Sponsorship and Cooperation

The 'bwDataArchiv' project was supported by the Ministry of Science, Research and Arts Baden-Württemberg (MWK) and it cooperates with the DFG project RADAR, the now-completed state projects bwDataInMotion (bwDIM) and bwDataDiss, with many different scientific communities, and also with the international projects EUDAT, the now-completed Human Brain Project (HBP) and the Worldwide LHC Computing Grid (WLCG).

Tape library at KIT with grapic exterior decoration with LHC graphics.
Tape library at KIT campus south
Interior view of the tape library with rows of storage boxes for tape cartridges and a transport robot in the back.
Interior view of the tape library with the tape robot at the center.
Robot for automatic transport of magnetic tape cartridges.
Robot for automatic transport of magnetic tape cartridges.
Older model of a tape drive with inserted tape cartridge.
Older generation tape drive with inserted tape cartridge.