To The Stars

BY MATT WOOD 

Serverless AI

AI & Model Selection

Parallelism is at the heart of driving performance with most systems; it is one of the things that makes GPUs so effective for deep learning training, for example. This is true for model selection, a common task in building artificial intelligence systems, which attempts to maximize the performance of the AI system by tuning a set of model parameters, called hyperparameters - these are parameters of the model, not of the underlying data itself. The goal of model selection (also known at hyperparameter optimization), is to efficiently search the hyperparameter space to find values which fine tune the performance of the model itself, improving predictive power, classification accuracy, and so on.

With complex models with a large number of hyper parameters, this can be a very computationally intensive process. A number of different methods exist to enable the efficient search of the hyper parameter space as part of the optimization, from a systematic grid search, to more sophisticated Baysian approaches (read more on evaluating these methods in this excellent post from SigOpt).

From left to right: Grid search, random search, Baysian optimization (awesome hyperparameter optimization visualization from the fine folks at SigOpt)

Parallelization & AWS Lambda 

As you may be able to tell from the graphs above, there is an opportunity to accelerate model selection and hyper parameter optimization through parallelization. AWS Lambda has proven to be a very versatile service for broad data analytics, from real time processing to map/reduce style workloads, since it provides a way to execute and parallelize decoupled, stateless functions at very high scale, without ever having to spin up or configure EC2 instances. We could, for example, fire an individual Lambda function to evaluate a specific, unique, precise point in hyperparameter space (shown by the yellow dots in the graphs above), and then launch them all once, in parallel, to dramatically decrease the time to sample the space and select our optimal model.

Example images used in handwriting recognition training.

My colleague, Sunil Mallya, took exactly this approach for optimizing a neural network trained to  recognize handwriting (using the UCI Optdigits dataset of 4,495 training examples, and a test set of another 1,125 examples), using a Lambda-based, massively-parallel hyperparameter grid search. The result is more than a 5X increase in speed.

On a C3.large instance, the process took 4 minutes, 2 seconds; using Lambda: 47 seconds (both with the same model error of 0.04).

This is modest on a small training set, and a relatively simple problem like handwriting recognition, but on a large scale, production system, a 5X increase could shave hours or days off processing time. 

The Need For Speed

Artificial Intelligence research has an insatiable appetite for speed, driven by three major factors. Firstly, the faster a process can run, the more data you can use to train that model in the same unit of time, which is important where there is an advantage in training sophisticated models on very large datasets of billions of examples.

Secondly, the faster a system operates, the more complexity can be modeled per unit time. This results in algorithms which can be materially more sophisticated with improved accuracy or predictive power, but which still remain computationally tractable. 

Thirdly, the faster AI engineers can review the image of code or algorithm changes in the accuracy and power of their models, the more changes and optimizations they can make (and the happier, in turn, the engineers are).

Building sophisticated, powerful AI systems is a multi-disciplinary pursuit, requiring a rich set of tools and approaches. Sometimes, engineers will want to go right down to the silicon (optimizing for the first two speed factors); other times, they will find a sweet spot in optimizing (pun intended), for the third factor. The cloud (where all these options exist, plus more along the same spectrum), thankfully, doesn't force you to make a decision based on anything other than finding the right balance at the right time.

One-Click Deep Learning

One-Click Deep Learning

Scaling AI