Scale-Out Deep Learning
A Perfect Storm
I've been posting recently on some of my favorite topics relating to analytics and AI: deep learning with GPUs, making deep learning easier, and parallelism. Today we're announcing a new feature right at the epicenter of this perfect storm, with the launch of an easy way to run deep learning across an elastic, GPU-accelerated cluster.
Scale-Out Deep Learning on EC2
We're releasing a new CloudFormation template which will spin up everything you need to run scale-out, GPU-accelerated deep learning workloads on EC2, which will automatically scale up and down depending on your needs. You can now launch the AWS Deep Learning AMI on clusters of multiple P2 instances, each pre-installed with the latest, most popular deep learning libraries (MXNet, TensorFlow, Theano, Caffe, Torch), pre-compiled and configured with CUDA drivers.
Deep learning models can often be large (which is to say, deep), resulting in training times of days or even weeks. Longer training times can add friction to three really important areas: iterating on your existing model; keeping your model up to date with fresh data; or expanding the sophistication of your model with additional data. No one wants to wait a week to see if their latest code change was effective.
This template allows you to materially increase the amount of computational firepower at your disposal for deep learning, decreasing training times and leading to more accurate, up to date and sophisticated models that are likely going to drive a new wave of features and products.
Deep Learning on GPU Clusters In Minutes
The template creates a new VPC network, a master node, a cluster of GPU worker nodes inside an autoscaling group, and the extra configuration necessary to secure and access the instances in your cluster.