GPU Processing Cores

Modern NVIDIA GPUs come with three different types of processing cores:

NVIDIA CUDA Cores

Compute Unified Device Architecture (CUDA) is a parallel computing platform built on specialized hardware and the application programming interface (API) for the NVIDIA instruction set. CUDA cores are discreet processors, usually enumerated in the thousands on a single GPU chip that allows data to be processed in parallel across those cores.

CUDA cores are the workhorse component of general purpose GPU computing. They improve both the performance and cost effectiveness of parallel processing for myriad scientific workloads. When they are complimented by other specialty GPU core types, workload performance is further accelerated. 

NVIDIA Tensor Cores

Tensor is a data type that can represent nearly any type of ordered or unordered data. It can be thought of as a container in which multi-dimensional data sets can be stored. In the simplest terms it can be considered as an extension of a matrix. For example, matrices are two-dimensional structures containing numbers, but a tensor is a multi-dimensional set of numbers.

Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy. The latest generation expands these speedups to a full range of workloads. As examples, up to 10X speedups in artificial intelligence (AI), machine learning (ML), and deep learning (DL) workloads, and 2.5X boosts for general HPC workloads is common.

Tensor cores can compute faster than CUDA cores. Primarily because CUDA cores perform one operation per clock cycle, whereas tensor cores can perform multiple operations per clock cycle. For ML and DL models, CUDA cores are not as effective as Tensor cores in terms of both cost and computation speed but they still augment their productivity.

NVIDIA Ray Tracing Cores

Ray tracing cores are exclusive to NVIDIA RTX graphics cards. RTX technology enables detail and accuracy for 3D designs and rendering and photorealistic physical world simulations including visual effects. Simulation and visualization capabilities produced are not limited to how something looks, but also how it behaves. The marriage of CUDA cores and APIs with RTX cores enables accurate modeling of the behavior of real-world objects and granular data visualization capabilities.

Cluster storage

The HOME file system (/home) is used for storage of job submission scripts, small applications, databases, and other user files. All users will be presented with a home directory when logging into any cluster system. Contents of user home directories will be the same across cluster systems.

All user home directories have a 20GB quota.

The SCRATCH file system (/scratch) is a capacious and performant parallel file system intended to be used for compute jobs. The SCRATCH file system is considered ephemeral storage.

The LOCAL SCRATCH file system (/lscratch or /tmp) is a reasonably performant but less capacious shared file system local to each compute node, also intended to be used for compute jobs. LOCAL SCRATCH file systems are also considered ephemeral storage.

The WORK file system (/work) is used to store shared data common across users or compute jobs. The WORK file system can be read by compute nodes in the cluster but they cannot write to it. As a result it is referred to as “near-line storage”.

All group work directories have a 500GB quota.

The PROJECT file system (/project) is for longer term user or shared group storage for active job data. Compute project data is considered active if it is needed for current or ongoing compute jobs. The PROJECT file system is not an archive for legacy job data and is referred to as "offline storage" as it is not accessible by compute nodes.

All group project directories have a 1TB quota.

Software

Red Hat system utility and application support

Research Technology systems administration staff can assist with most Red Hat Linux operating system, application, and system utility support. Enterprise support for Red Hat Linux can be extended to professional services if needed. However, because there is often commonality across many Linux distributions, Research Technology may be able to help support application and utilities on other distributions as well.

Job submission script support

Research Technology HPC systems engineering and application staff can assist with many job submission script composition and troubleshooting tasks. However, it is important to understand that job submission script issues and debugging can be complex and support will often be a cooperative effort between users and staff. 

Software delivery

Software will be made available to the cluster using the following methodologies.

  • Installation by users in their home directories
  • Installation into the cluster as a LMOD environment module
  • Installation in an end user managed Singularity container
  • Installation on a HPCS specific network application server
  • Some externally hosted applications requiring GPU resources

Access

To access AUHPCS researchers, faculty, or staff must meet the following requirements:

  • Have an active AU NetID and a device configured for Duo two factor authentication.
  • Register as a Principal Investigator (PI) using the iLAB “High Performance Computing and Parallel Computing Services Core” page, or be added to an existing project by a registered PI.
  • Complete the required training course(s) for basic Linux competency (if required) and AUHPCS cluster concepts, use, and workflow.

Once access is granted for approved compute projects adherence to AUHPCS governance policies is a continuing requirement.

 

Get Help

Support

Consultation and assistance with HPC Services from the ITSS Research Technology group can be requested using the standard AU enterprise support services. Requests can be directed to us using the “Research HPC Services” assignment group.

Software Support

HPCS will make every effort to provide some level of support the scientific software installed in the cluster. However due to the open-source origins of many of these software packages, standard support is generally not available in the majority of cases. In these cases support will be a collaborative effort leveraging HPCS staff experience, HPCS staff research, and end user self-support. For purchased software with support HPCS staff will liaise with end users and vendors to support such software. Scientific software that requires funding for licenses and support that the university is not already licensed for will have to be purchased by the requesting department.