Modern NVIDIA GPUs come with three different types of processing cores:
Compute Unified Device Architecture (CUDA) is a parallel computing platform built on specialized hardware and the application programming interface (API) for the NVIDIA instruction set. CUDA cores are discreet processors, usually enumerated in the thousands on a single GPU chip that allows data to be processed in parallel across those cores.
CUDA cores are the workhorse component of general purpose GPU computing. They improve both the performance and cost effectiveness of parallel processing for myriad scientific workloads. When they are complimented by other specialty GPU core types, workload performance is further accelerated.
Tensor is a data type that can represent nearly any type of ordered or unordered data. It can be thought of as a container in which multi-dimensional data sets can be stored. In the simplest terms it can be considered as an extension of a matrix. For example, matrices are two-dimensional structures containing numbers, but a tensor is a multi-dimensional set of numbers.
Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy. The latest generation expands these speedups to a full range of workloads. As examples, up to 10X speedups in artificial intelligence (AI), machine learning (ML), and deep learning (DL) workloads, and 2.5X boosts for general HPC workloads is common.
Tensor cores can compute faster than CUDA cores. Primarily because CUDA cores perform one operation per clock cycle, whereas tensor cores can perform multiple operations per clock cycle. For ML and DL models, CUDA cores are not as effective as Tensor cores in terms of both cost and computation speed but they still augment their productivity.
Ray tracing cores are exclusive to NVIDIA RTX graphics cards. RTX technology enables detail and accuracy for 3D designs and rendering and photorealistic physical world simulations including visual effects. Simulation and visualization capabilities produced are not limited to how something looks, but also how it behaves. The marriage of CUDA cores and APIs with RTX cores enables accurate modeling of the behavior of real-world objects and granular data visualization capabilities.
The HOME file system (/home) is used for storage of job submission scripts, small applications, databases, and other user files. All users will be presented with a home directory when logging into any cluster system. Contents of user home directories will be the same across cluster systems.
All user home directories have a 20GB quota.
The SCRATCH file system (/scratch) is a capacious and performant parallel file system intended to be used for compute jobs. The SCRATCH file system is considered ephemeral storage.
The LOCAL SCRATCH file system (/lscratch or /tmp) is a reasonably performant but less capacious shared file system local to each compute node, also intended to be used for compute jobs. LOCAL SCRATCH file systems are also considered ephemeral storage.
The WORK file system (/work) is used to store shared data common across users or compute jobs. The WORK file system can be read by compute nodes in the cluster but they cannot write to it. As a result it is referred to as “near-line storage”.
All group work directories have a 500GB quota.
The PROJECT file system (/project) is for longer term user or shared group storage for active job data. Compute project data is considered active if it is needed for current or ongoing compute jobs. The PROJECT file system is not an archive for legacy job data and is referred to as "offline storage" as it is not accessible by compute nodes.
All group project directories have a 1TB quota.
Job submission nodes allow users to authenticate to the cluster and are sometimes referred to as “login nodes”. They also provide applications required for scripting, submitting and managing batch compute jobs. Batch compute jobs are submitted to the cluster work queue. The user then waits for the job to be scheduled and run when the requested compute resources are available.
Users require an AU NetID that has been provisioned for AUHPCS, and a device configured for Duo two-factor authentication to access job submission nodes.
Data transfer nodes provide access to user file systems in the cluster. Their role is to facilitate high speed transfer of data across those file systems within the cluster. These nodes can also be used for the transfer of data in and out of the cluster.
Users require an AU NetID that has been provisioned for AUHPCS, and a device configured for Duo two-factor authentication to access data transfer nodes.
General Intel compute nodes:
These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, and physics workloads with the most modest resource needs.
Middle memory Intel compute nodes:
These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, and some modeling workloads with the additional resource needs. It is also likely these nodes would be suitable for pharmaceutical, molecular biology, and simulation workloads.
High memory Intel compute nodes:
These nodes are candidates for bioinformatics, genomics, population science, mathematics, chemistry, physics, modeling, pharmaceutical, molecular biology, and simulation workloads with the largest resource needs
NVIDIA Quadro RTX - Intel compute nodes:
These nodes are candidates for data sciences, physics and life science modeling, artificial intelligence, inference, and simulation workloads with modest resource needs. These systems also include hardware features that can be used to accelerate complex simulations of the physical world such as particle or fluid dynamics for scientific and data visualization. They could also be used for film, video, and graphic rendering, or even special effects workloads.
NVIDIA Tesla T4 - Intel compute nodes:
These nodes are candidates for mathematics, data sciences, artificial intelligence, inference, machine learning, deep learning, and simulation workloads with modest resource needs.
NVIDIA A100 – AMD compute node:
This node provides the greatest end-to-end HPC platform performance in the cluster. It offers many enhancements that deliver significant speedups for largescale artificial intelligence, inference, deep learning, data analytics, and digital forensic workloads.
Research Technology systems administration staff can assist with most Red Hat Linux operating system, application, and system utility support. Enterprise support for Red Hat Linux can be extended to professional services if needed. However, because there is often commonality across many Linux distributions, Research Technology may be able to help support application and utilities on other distributions as well.
Research Technology HPC systems engineering and application staff can assist with many job submission script composition and troubleshooting tasks. However, it is important to understand that job submission script issues and debugging can be complex and support will often be a cooperative effort between users and staff.
Software will be made available to the cluster using the following methodologies.
To access AUHPCS researchers, faculty, or staff must meet the following requirements:
Once access is granted for approved compute projects adherence to AUHPCS governance policies is a continuing requirement.