AWS Parallel Cluster is an AWS-supported open source cluster management tool that helps you to deploy and manage High Performance Computing (HPC) clusters. It automatically sets up the required computing resources and shared file system. AWS Parallel Cluster is distributed as python package and is installed using pip.
You as a user will be able to interact with parallel cluster from your local machine by installing its open source python package powering the CLI
Using CLI, ‘pcluster configure’ command takes the step by step input for configuring the cluster resources like region, scheduler, OS, instance type, etc. with a master node and other compute nodes
Parallel Cluster takes the text configuration file that you develop on your local system and translates them into an AWS CloudFormation template
This CloudFormation template is created in your AWS account and describes the HPC cluster
Allowed values for scheduler jobs
SGE
Torque
Slurm
AwsBatch
ASG is used only for traditional schedulers like SGE, Torque, Slurm. For AWS Batch, AWS relies on its elastic scaling capabilities provided by AWS managed job scheduler
AWS Parallel Cluster daemons run on the cluster nodes, mainly to manage the HPS cluster elasticity, below are the daemons
Jobwatcher: Runs on master node and monitors the configured schedulers and each minute it evaluates the queue to decide when to scale up
Sqswatcher: runs on master node, monitors the SQS messages sent by ASG and notifies these messages to queue manager when new instances are available or terminated
Nodewatcher: runs on each node, after scaledown_idletime period instance is terminated
A few commands of pcluster will not work for clusters created with cfncluster; cfncluster uses different configuration files & IAM custom policies
No comments:
Post a Comment