SecureKloud's HPC Revolution: Elastic Computing for Pharma on AWS

Business Challenge

Innovating within the pharmaceutical sector, the client, grappled with vast data volumes from lab experiments and trials using traditional IT methods. They aimed to construct and operate predictive, real-time, or retrospective data applications to accelerate insights. The challenge was to smoothly migrate and handle large, complex simulations and deep learning workloads in a cloud setting using AWS's robust suite of high-performance computing (HPC) products and services. The client was in search of a partner to manage an essentially limitless compute capacity, a high-performance file system, and high-throughput networking, while still ensuring peak performance, security, and cost-effectiveness.

Our Solution - Optimizing HPC Architecture for AI, ML, and Deep Learning Workloads

SecureKloud provided a multi-faceted solution to support scalable and compliant elastic high-performance computing (HPC) for performing clinical pharmacological workloads, deep learning capabilities and MLOps. The solution was tailored to enable scalable and compliant HPC that can efficiently handle clinical pharmacological workloads.

HPC on a compliant cloud platform enables users to easily scale up and scale down compute resources required for running compute intensive workloads related to AI/ML and deep learning initiatives. Since cloud environments are elastic in nature there is no overheard cost involved and no infrastructure upkeep cost while the compute infrastructure is not in use. HPC enables to burst compute on a compliant and elastic infrastructure.

Deployment Process

HPC on AWS is achieved by a well-defined IaaC framework which works with existing CI/CD pipelines. SLURM modules and configuration items are defined to optimal configuration as part of the framework. Automation in loading specific modules for specific use cases in AI/ML and Deep Learning models helps abstract the underlying complexities in running analytics on large datasets. Deployment process involves using CLI tools to trigger HPC cluster creation following IaaC best practices. Configured is stored in repo as YAML files and these YAML files define the number of HPC queues, partitions, images, and startup scripts for compute instances. RStudio Workbench is configured to run on login nodes integrated with HPC as well as on the HPC cluster providing the opportunity to the user to launch workloads aligning to their use cases.

Modular Approach

HPC queues are customized to user requirement by using custom AMIs to ensure specific dependency versions, libraries are deployed as part of the queue. The user will have the ability to customize HPC Compute instance as per their use case and requirement fine grained to specific version levels. Further customization can be achieved by using the HPC startup script, which is also used to enforce compliance. Users can run workloads from the comfort of RStudio Workbench IDE or run in a modularized fashion by using srun, sbatch CLI tools.

Security

HPC is connected with the corporate Active Directory (AD) store via SSSD, which ensures role-based access control and allows users to authenticate to the RStudio workbench. To uphold IAM best practices, we've integrated RStudio with the corporate AD, enabling role-based access control through groupwise segregation. Each user's AD account is assigned a specific scope in the FSx Lustre shared file storage. This ensures that user-specific data remains private to that user, unless they choose to share it via a shared folder or location.

Scalability

Several HPC clusters were established to meet the needs of different business units. A structured billing and chargeback system was integrated into the CloudEdge platform, which was used to deploy, manage, and maintain the entire HPC solution. This allowed the IT team to easily scale HPC within the organization. Moreover, integrating with the CloudEdge platform provided managed services capabilities for handling customer issues and user-requested customizations.

Customizations

In the context of HPC stack customizations, a variety of alterations were made to enhance performance, expand capabilities, and maintain compliance. Here's a brief overview:

Integrated execution support was established for RStudio, a popular IDE for R programming language.
Adhoc srun and sbatch were made available for triggering workloads on GPU instances for specific use cases.
A shared file system was created using Amazon FSX Lustre, allowing for compliant data and output sharing within the organization, GPMx R-Container, and Windows instances.
Support for the Julia programming language was enabled.
The running of JupyterLab IDE and VSCode from within RStudio Workbench was facilitated, integratedwith HPC capabilities.
Pytorch, Conda, Spack, and Lmod were integrated for handling deep learning workloads.
Select LaTex packages could be installed for document generation based on results from running compute-intensive analytics for AI/ML models.
Nonmem, PsN, Pirana were integrated or supported within the HPC cluster.
High memory queues (32 GB) were supported.

Implementation Method (AI/ML Use Cases)

Use case 1: Build automated HPC (high performance computing) cluster on AWS public cloud to handle compute intensive data analytics workloads including RStudio Workbench. Enabling users in customer's clinical pharmacology team to execute data analytics, document generation using large scale computing model in pharmacology. The cluster is tightly coupled with enterprise identity access management policies to enable/disable user access based on AD groups.

Use case 2: Dynamically enable/disable internal team specific customization for specific packages (e.g.) torsten, stan, nonmem, etc. within every cluster before the use can execute. This enabled users to use the cluster for varied purposes instead of a pre-defined set of use cases.

Use case 3: Enable users in deep learning (MLOps) teams to access independent queues within existing cluster configuration to use GPU specific instances for executing deep learning models. Capability to run these GPU intensive workflows using specific version of libraries not restricted to PyTorch, Conda, Spack, Lmod, Julia, etc.

Use case 4: Customize RStudio package manager to enable freezing specific versions of R libraries to ensure compatibility across various teams.

Use case 5: Deployment and integration of RStudio with RSConnect to publish Shiny apps.

Business Outcomes

Transitioned SecureKloud's AWS-based HPC cluster from an upfront investment to a cost-saving Pay-as-you-go model.
Prepared for future growth with a record-breaking pipeline of 14 programs in Phase 3 or Phase 3 ready, along with a noteworthy 5% increase in R&D.
Achieved a 7% full-year revenue growth and significant double-digit growth in non-GAAP diluted earnings per ADS, demonstrating robust financial performance.
Empowered business teams with secure and compliant platforms for running compute-intensive data analytics workloads.
Ensured data segregation and protection through role-based access control and session management for HPC platform users.
Enabled users to access elastic compute and GPU clusters that scale up and down as needed, resulting in substantial cost savings.
Integrated HPC deployment and customization with a robust CI/CD pipeline, driven by IaaC and extensive automation.

How SecureKloud's Scalable HPC Solution Revolutionizes Clinical Workloads for Global Pharma

Executive Summary

About the client

3

350+

14+

400+

Business Challenge

Our Solution - Optimizing HPC Architecture for AI, ML, and Deep Learning Workloads

Deployment Process

Modular Approach

Security

Scalability

Customizations

Implementation Method (AI/ML Use Cases)

Business Outcomes

Our experts are here to help you out.

Recent Case Studies

Cloud Transformation

Cybersecurity

How SecureKloud's Scalable HPC Solution Revolutionizes Clinical Workloads for Global Pharma

Executive Summary

About the client

3

350+

14+

400+

Business Challenge

Our Solution - Optimizing HPC Architecture for AI, ML, and Deep Learning Workloads

Deployment Process

Modular Approach

Security

Scalability

Customizations

Implementation Method (AI/ML Use Cases)

Business Outcomes

Our experts are here to help you out.

Recent Case Studies