The gap between using a pre-built machine learning API like the various Azure Cognitive Services and building your own machine learning system from scratch, even using cloud services like Azure Machine Learning that give you pre-implemented versions of common algorithms, is significant. Preparing the data, picking the machine learning algorithm and choosing the weights and parameters to fine tune that algorithm so it produces accurate results with your data set, take both expertise and experimentation.
The new Automated Machine Learning that’s in preview in Azure ML (and coming soon to Power BI) aims to speed up the process of putting together that whole machine learning pipeline and give developers the benefit of the internal expertise at Microsoft, by recommending combinations of data transformation, machine learning algorithms and parameters that produce more accurate results – much like a shopping site recommends products or a streaming music service suggests new songs you’ll like.
Automated ML itself is actually doing probabilistic machine learning using both collaborative filtering and Bayesian optimisation. It’s a recommendation system that’s been trained on the evaluations of hundreds of millions of machine learning pipelines, so rather than running your data set through every possible combination (which would be too slow to be useful), it can explore the most likely one.
That doesn’t mean that Microsoft sees your data set; the data stays in your Azure tenant or even on your own local computer where each machine learning pipeline suggested by the recommender runs. Only the accuracy scores go back to the recommender, so it can pick the next pipeline to try.
The idea is to make machine learning available to a wider group of developers than just expert data scientists, who are in short supply, Eric Boyd, the corporate vice present of Microsoft’s AI platform told us.
“An experienced data scientist can take a data set and very quickly learn with Automated ML, is this trainable, is this going to converge, which models and algorithms will work best on that? And they can use that for refinements and additional work.” That lets them experiment much faster and tackle more machine learning projects, using Automated ML in the Azure Machine Learning Python SDK and in Jupyter notebooks. Support for Azure Databricks is coming soon.
But it’s also useful for more novice data scientists, Boyd suggested. “We see a lot of data scientists who I think of as analysts. They’re great at Power BI, they know their data set and they can train a model based off the data set. Here’s a history of car prices; I want to predict the price of a car but I don’t know what features are important, I don’t know what machine learning model can give me something useful.”
The recommender works with both numeric and textual data, for classification and for regression; “where I have a data set and I want to predict a particular column,” Boyd explained. Support for ensemble models that use more than one machine learning algorithm is coming soon.
It can automatically generate features (including imputing missing values, encoding and normalising data and handling features that rely on heuristics and ‘rules of thumb’), and it handles feature transformation and data. You can also customise the machine learning pipeline that Automated ML suggests before you use it.
The recommender will also soon be able to explain how the machine learning model it suggests works with your data – something that’s increasingly important for machine learning to be accepted (and that may be a legal necessity if you need to comply with GDPR).
“Automated machine learning, particularly the way we’ve implemented it, leads to relatively explainable algorithms,” Boyd noted – although he added that there’s still a lot of research to do in this area. “We can show these are the features and these are the weights those features have [in the model]. If it’s a mortgage decision, it might say your income was low and that was an important feature, or where you live was a feature and you can start to see what those things are.”
Automated ML doesn’t currently cover deep learning, convolutional neural networks or recurrent neural networks, but there are internal Microsoft systems that suggest what’s going to be possible.
“We have a product internally called Hyperdrive which manages hyperparameter sweeps and Hyperdrive is powered by an AI algorithm built on the history of hyperparameter sweeps it’s done,” Boyd told us. “It’s a huge acceleration for internal teams; it started in Bing and most of the developers in the Bing team are using it to manage hyperparameter tuning. It’s already effective in a wide variety of use cases and it gets the better more it’s used.”
“As you start looking at deep learning, there are more and more parameters involved; there are more hyperparameters in even the structure of the model – how many convolutions there are, how many layers. Those are things we can experiment on and get models that perform well and learn how to do this better.” And as those internal tools mature and cover more cases, we can expect more of those features to show up in tools like Azure ML and Power BI and other tools to bring machine learning to an ever-wider audience of developers.
Boyd compares this simplification of working with machine learning to developers using data structures like hash tables. “Someone had to come up with the maths behind it but developers use a hash table without thinking. Good developers understand the limitations and where the collisions are going to come. As AI moves from theory to implementation, much of it is going to be, if I know how to call on these libraries, and I need to understand enough what features are important, how does hyperparameter tuning change – but I don’t necessarily need to understand how to implement gradient descent. You’ll take it in college and you’ll understand it, but in practice you won’t use it directly.”
Contact Grey Matter directly to discuss ML and Cognitive Services for your application build, Azure for scalability, or to receive technical consultation: +44 (0)1364 654100.