I recently presented a webinar discussing how to use the Azure Machine Learning Studio to build a predictive model. In case you missed it, a recording of the webinar, entitled "Getting Started with Azure Machine Learning," is available here. The presentation includes step-by-step demos showing how to create an Azure Machine Learning experiment.
Over the course of the session, there were several questions asked that I was unable to address. This post provides answers to these questions to help provide a clearer understanding of Azure Machine Learning.
Is there any coding involved when building the models, or only drag-drop from toolbars?
The Azure Machine Learning platform provides a wide variety of modules that do not require any coding. It is feasible to create a fully functional machine learning model without writing any code. However, the “Execute R Script” and “Execute Python Script” modules can be used to execute scripts written in R and Python, respectively.
What are the limitations of the free workspace?
The free workspace has limits on storage (10 GB) and number of modules per experiment (100) whereas the standard (paid) workspace does not have limits. The paid workspace also provides faster performance than the free tier and allows pulling data from on-premise data sources (such as SQL Server), where the free tier does not.
The following link provides details on the differences between the free and standard tier workspace (link).
How do we consume a trained model from an application?
Azure Machine Learning Studio provides the ability to create a web service from a trained model. This is fairly simple to do, as demonstrated near the end of the webinar. Once the web service is created, the Machine Learning Studio provides sample C#, R and Python code that can be used in an application to consume the web service for the trained model.
Can a model be consumed with Reporting Services?
Yes, the web service described above can be consumed from Reporting Services.
Where is the workspace information saved?
Workspace information, including uploaded data sets and output of each module, is saved in blob storage. Creating an Azure Machine Learning workspace requires selection of an Azure storage account. The workspace information is stored in this account.
Is the machine learning element in this case running multiple regressions to find one which best explains the variance in the dependent variable?
My example did not do this, but Azure Machine Learning provides tools that will do this. Specifically, the Tune Model Hyperparameters module iterates over various combinations of parameter settings in order to find the combinations that yield the best results.
Follow this link for more information on the Tune Model Hyperparameters module.
Where do YOU see the AUC or confusion metrics?
AUC (Area Under Curve) and confusion metrics (as well as the ROC curve) are available when using the Evaluate Model module for a Classification model. See the screenshot below for a sample of these metrics from a classification experiment.
The output of the Evaluate Model module varies depending on the type of model being evaluated. During the webinar, I demonstrated a regression model. The metrics I showed with the Evaluate Model module were relevant to regression models.
Follow this link for more information on the metrics provided by the Evaluate Model for each type of model.
Can batch or stochastic gradient descent be set up to support online learning applications?
Yes, see the screenshot below. The “Solution method” selection in the Linear Regression module provides the “Online Gradient Descent” option, followed by a series of options to configure the model.
Follow this link for more information on configuring online gradient descent with the Linear Regression module.
Do you need to be a statistician to really understand the best algorithm to apply?
I saved this one for last! My short answer is no. But…
To begin, it is important to understand that results of machine learning algorithms are not right or wrong in the way many people in technology are familiar.
Having said that, understanding or interpreting results of an ML model requires some level of understanding of both the techniques used to build the model and the nature of the data used to build and train the model. A background in statistics or data science certainly helps here.
If you are not sure how to choose an appropriate model, Microsoft has provided a handy flow chart/cheat sheet to help guide this process. This flowchart is available here.
Your required level of understanding depends on the stakes associated with the question you wish to answer with machine learning. For example, if you intend to build a model to predict likelihood of a certain medical diagnoses, this will most certainly require advanced knowledge of the algorithms used. But you can get away with having much less insight if you want to predict which player in your fantasy football league will perform best in the coming week.
With all the tools and technology available for machine learning, do not underestimate the value of domain knowledge. This type of insight can only be developed over time by learning about your business or environment. This information is invaluable to the machine learning process, but certainly does not require you to be a statistician.
Thanks to all of you who attended last week's free training session! Please reach out to me at [email protected] if you have more questions on this webinar, or about machine learning in general. Feel free to read this post on my personal blog as well.