Cluster File System and Storage

Tip

If you have data stored in an Amazon S3 bucket, then you can use datastores in MATLAB to directly access the data without needing any storage on the cluster. For details, see Transfer Data to Amazon S3 Buckets and Access Data Using MATLAB. You can also select the following storage options when creating your cluster.

Cluster Shared Storage
- Persisted Storage. To request shared disk space that remains after you shut down the cluster, select a disk size. The shared storage is mounted at /shared/persisted. For details, see the table below.
- Temporary Storage. The cluster shared file system based on ephemeral storage on the cluster headnode. When the cluster starts, the temporary storage is mounted at /shared/tmp, which is a distributed file system of all the ephemeral block devices on the headnode instance. When the cluster shuts down the content of /shared/tmp is removed. If the headnode has no ephemeral storage, temporary storage is not available.
- MATLAB Drive. To enable access to MATLAB Drive, you must use a personal cluster. You can access the files in your MATLAB drive at the mounted location /MATLAB Drive.
- Amazon S3 Data. To transfer individual files from an Amazon S3 bucket to the cluster machines, click Add Files. You can specify S3 files only when creating your cluster and starting it for the first time. When the cluster starts up, files are copied to /shared/imported. See Copy Data from Amazon S3 Account to Your Cluster.
Local Machine Storage
- Volume Size: To request an Amazon EBS Volume, enter a number of GB in the box, e.g. 100. This results in a local data volume, created on each worker machine of your cluster. The local data volume is mounted at /mnt/localdata. Use this option when read/write performance is critical.
- EBS Snapshot ID: If you previously saved an EBS snapshot of your data on Amazon, then enter the ID. The data is copied to the SSD volume attached to each worker machine. If you provide a formatted snapshot, then the file system type must be ext3, ext4, or xfs. For ext3 and ext4, the full volume size of the file system might not be immediately available when the instance comes online. Growing the file system to full capacity can take up to 30 minutes after the instance is online, depending on the size of the extN volume and the instance type. You can access all data in the original snapshot as soon as the cluster is online.
- Ephemeral Storage: This type of storage is available only on instance types that contain "d" in their name, for example, m5ad.24xlarge. Each ephemeral storage device (NVMe SSD) is mounted at /mnt/localdataN, where N goes from zero to the number of ephemeral storage devices minus one, for example, /mnt/localdata1 corresponds to the second ephemeral storage device.

After selecting your storage options, click Create Cluster. For details on other cluster settings, see Create a Cloud Cluster.

All worker machines have access to local and cluster shared storage. You can use these folders for storing data generated by your jobs, and for data you want to transfer between the cluster and your client location. See Transfer Data to or from a Cloud Center Cluster. The paths are the same for all worker machines of the cluster. Changes to files and folders under /mnt/localdata are not visible to other machines. Files and folders under the /shared mount point are shared by all worker machines of your cluster. Changes made by any machine are visible to all other machines. Each folder has different longevity, as shown in the table.

Location	Size	Usage
`/mnt/localdata`	Specified in cluster configuration	The location of the local machine storage volume. Each worker machine gets its own copy of the data. Temporary and intermediate data can also be written to this location. Deleted when cluster is stopped. The data is not retained between cluster runs. If you have specified an EBS snapshot, then the data is copied again when the cluster is started.
`/mnt/localdataN`	Specified in cluster configuration	Only available and automatically enabled on instance types that contain "d" in their name, for example, m5ad.24xlarge. This storage volume is backed by ephemeral storage. Each ephemeral storage device (NVMe SSD) is mounted at `/mnt/localdataN`, where `N` goes from zero to the number of ephemeral storage devices minus one, for example, `/mnt/localdata1` corresponds to the second ephemeral storage device.
`/MATLAB Drive`	Depends on your MATLAB license. For more information, see MATLAB^® Drive™ Storage Quota (MATLAB).	Enabled when personal cluster is selected at cluster creation. A read-only file system on the worker machines.
`/shared/persisted`	Specified at cluster creation	The location of the cluster shared persisted storage and MATLAB Job Scheduler (MJS) data. This folder is shared among worker machines and is retained between cluster runs. Save data you want to retrieve on the next start of the cluster in folders and files under `/shared/persisted`. Since the content in `/shared/persisted` is retained when you stop or restart the cluster, the MJS data and history are preserved in this location between cluster runs. If `/shared/persisted` is not enabled for the cluster, MJS data is not preserved between cluster runs because it is stored in the headnode's local filesystem, which is deleted when cluster is stopped. Deleted when cluster is deleted.
`/shared/tmp`	Varies with instance type, only available for instances with ephemeral storage (NVMe instance store)	This folder is shared among worker machines and is not retained between cluster runs. Use it to store temporary and intermediate data that must be visible or accessible from multiple worker machines. The available storage space depends on the ephemeral storage available on the selected machine instance type. Deleted when cluster is stopped.
`/shared/imported`	Part of allocation for `/shared/tmp` or `/shared/persisted`. If both are available, then `/shared/tmp` is used.	The location of the cluster shared Amazon S3 data. Selected Amazon S3 objects are copied to this location when cluster is first created/started. If `/shared/imported` is backed by `/shared/persisted`, then the content is not altered when the cluster shuts down or restarts. If `/shared/imported` is backed by `/shared/tmp`, then the S3 data is deleted when the cluster is stopped.

Note:

There is no file sharing between different clusters that use Cluster Shared Storage. Only machines within the same cluster have file sharing.
You create, start, stop, and delete your cloud clusters independent of your local MATLAB session. Deleting an associated cluster object in MATLAB does not affect the cloud cluster or its persistent storage.
When a cluster times out, it shuts down and clears the contents of /shared/tmp, /shared/imported, /mnt/localdata, and /mnt/localdataN, but preserves the content of /shared/persisted. If you use an automatic shutdown setting for your cluster, ensure that you have all data you need from /shared/tmp, /mnt/localdata and /mnt/localdataN before this timeout occurs.
The contents of /shared/tmp are built using ephemeral storage.
To check if /shared/imported is backed up by /shared/persisted, or by /shared/tmp, run the command ls -l /shared/imported, which displays its actual location.

Headnode Limitation on S3 Uploads

The S3 files upload works only if one or both of the following conditions is true:

The headnode is of an instance type that has ephemeral storage, for example, m5ad.24xlarge.
Persisted storage is enabled at cluster creation.

If neither of these conditions is satisfied, then the S3 files are not visible in the worker nodes. For example, if you have a cluster with a dedicated headnode of type m5.xlarge, which has no ephemeral storage, S3 uploads work only if persisted storage is enabled.

Cluster File System and Storage

Headnode Limitation on S3 Uploads

Related Topics