## Search

Institute for Advanced Simulation (IAS)

# What file system to use for different data?

In multiple GPFS file systems for different types of user data. Each file system has its own data policies.

• $HOME Acts as repository for the user’s personal data like the SSH key. There is a separate HOME folder for each HPC system and a shared folder which pointed on all systems to the same directory. Data within$HOME are backed up by TSM, see also

• $SCRATCH Is bound to a compute project and acts as a temporary storage location with high I/O bandwidth. If the application is able to handle large files and I/O demands,$SCRATCH is the right file system to place them. Data within $SCRATCH is not backed up and daily cleanup is done. • Normal files older than 90 days will be purged automatically. In reality modification and access date will be taken into account, but for performance reasons access date is not set automatically by the system but can be set by the user explicitly with touch -a <filename>. Time stamps that are recorded with files can be easily listed by stat <filename>. • Empty directories, as they will arise amongst others due to deletion of old files, will be deleted after 3 days. This applies also to trees of empty directories which will be deleted recursively from bottom to top in one step. •$PROJECT
Data repository for a compute project. It's lifetime is bound to the project lifetime. Data are backed up by TSM.

• $FASTDATA Belongs to a data project. This file system is bandwidth optimized (similar to$SCRATCH), but data are persistent and internally backed up via snapshots.

• $DATA Belongs to a data project. This file system is designed to store a huge amount of data on disk based storage. The bandwidth is moderate. The file-system internal backup is realized with the GPFS snapshot feature. For more information, look at •$ARCHIVE
Is bound to a data project and acts as storage for all files not in use for a longer time. Data are migrated to tape storage by TSM-HSM. It is recommended to use tar-files with a minimum size of of multiple Gigabytes and maximum of 8 TB. The background is that recalling/restoring files from tape is much more efficient using only a few large datastreams than thousends of small data streams. See also

All GPFS file systems are managed by quotas for disk space and/or number of files. See also