What is ML: A Taxonomy of Machine Learning
Overview
Teaching: 15 min
Exercises: 15 minQuestions
What is the difference between classification and regression?
What is the difference between supervised and unsupervised learning?
How can I quantify machine learning algorithm performance?
Objectives
Understand the landscape of machine learning algorithms
Use this understanding to identify the appropriate type of algorithm to use for a given problem.
Understand the importance of performance metrics.
scikit-learn algorithm cheat sheet
The following taxonomy draws heavily from Chapter 5, Machine Learning Basics in (Goodfellow, Bengio, & Courville, 2016)
The Experience,
Typically, the experience a machine learning algorithm encounters during learning is in the form of a dataset, or exposure to a dataset (or subset thereof). A dataset is a collection of examples, each example comprising a set of features that have been quantitatively measured from some object or event. We typically represent an example as a vector , where each entry of the vector is another feature. Broadly speaking, experiences are often categorised as either unsupervised or supervised.
Unsupervised learning algorithms experience a datset containing many features, then learn useful properties of the structure of this dataset.
Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.
Roughly speaking, unsupervised learning involves observing several examples of a random vector and attempting to implicitly or explicitly learn the probability distribution , or some interesting properties of that distribution; while supervised learning involves observing several examples of random vector and an associated value or vector , then learning the predict from , usually by estimating .
(Goodfellow, Bengio, & Courville, 2016)
The types of experiences are not necessarilly mutually exclusive. Often times a single problem may involve the use of either one of the above techniques, most likely both and potentially a hybrid of the two.
For completeneness, when characterising the types of experiences available to a machine learning algorithm we will also include reinforcement learning. Reinforcement learning algorithms work with a dataset that is not necessarilly fixed, these algorithms interact with their environment such that there is a feedback loop between the learning system and its experiences.
Challenge
Try to identify the type of experience for each of the examples below.
- A set of holiday images taken from Flickr with there associated locations.
- A set of satellite images continuosly collected across the globe.
- A time series of temperatures recorded across a range of sites.
- A game of Go.
- A stream of news artciles.
Solution
- Supervised.
- Unsupervised.
- Supervised.
- Reinforcement.
- Unsupervised.
Discussion
What type of datasets (experiences) have you worked with in the past? Are there any unique experiences you can identify in your domain that might be applicable to a learning algorithm?
The Task,
Many kinds of tasks can be solved with machine learning. Some of the most common machine learning tasks include the following:
-
Classification: In this type of task, the computer program is asked to specify which of categories some input belongs to. To solve this task, the learning algorithm is usually asked to produce a function . When , the model assigns an input described by vector to a category identified by numeric code . There are other variants of the classification task, for example, where outputs a probability distribution over classes. An example of a classification task is object recognition, where the input is an image (usually described as a set of pixel brightness values), and the output is a numeric code identifying the object in the image.
-
Classification with missing inputs: Classification becomes more challenging if the computer program is not guaranteed that every measurement in its input vector will always be provided. To solve the classification task, the learning algorithm only has to define a single function mapping from a vector input to a categorical output. When some of the inputs may be missing, rather than providing a single classification function, the learning algorithm must learn a set of functions. Each function corresponds to classifying with a different subset of its inputs missing. This kind of situation arises frequently in medical diagnosis, because many kinds of medical tests are expensive or invasive. One way to efficiently define such a large set of functions is to learn a probability distribution over all the relevant variables, then solve the classification task by marginalizing out the missing variables. With input variables, we can now obtain all different classification functions needed for each possible set of missing inputs, but the computer program needs to learn only a single function describing the joint probability distribution.
-
Regression: In this type of task, the computer program is asked to predict a numerical value given some input. To solve this task, the learning algorithm is asked to output a function . This type of task is similar to classification, except that the format of output is different. An example of a regression task is the prediction of the expected claim amount that an insured person will make (used to set insurance premiums), or the prediction of future prices of securities. These kinds of predictions are also used for algorithmic trading.
-
Ranking: Sometimes, instead of estimating an absolute numeric value, we want to be able to learn relative positions. For example, in a recommendation system for movies, we want to generate a list ordered by how much we believe the user is likely to enjoy each.
-
Transcription: In this type of task, the machine learning system is asked to observe a relatively unstructured representation of some kind of data and transcribe the information into discrete textual form. For example, in optical character recognition, the computer program is shown a photograph containing an image of text and is asked to return this text in the form of a sequence of characters (e.g., in ASCII or Unicode format). Another example is speech recognition, where the computer program is provided an audio waveform and emits a sequence of characters or word ID codes describing the words that were spoken in the audio recording.
-
Machine translation: In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language. This is commonly applied to natural languages, such as translating from English to French.
-
Structured output: Structured output tasks involve any task where the output is a vector (or other data structure containing multiple values) with important relationships between the different elements. This is a broad category and subsumes the transcription and translation tasks described above, as well as many other tasks. One example is parsing—mapping a natural language sentence into a tree that describes its grammatical structure by tagging nodes of the trees as being verbs, nouns, adverbs, and so on. Another example is pixel-wise segmentation of images, where the computer program assigns every pixel in an image to a specific category. The output form need not mirror the structure of the input as closely as in these annotation-style tasks. For example, in image captioning, the computer program observes an image and outputs a natural language sentence describing the image. These tasks are called structured output tasks because the program must output several values that are all tightly interrelated. For example, the words produced by an image captioning program must form a valid sentence.
-
Anomaly detection: In this type of task, the computer program sifts through a set of events or objects and flags some of them as being unusual or atypical. An example of an anomaly detection task is credit card fraud detection. By modeling your purchasing habits, a credit card company candetect misuse of your cards. If a thief steals your credit card or credit card information, the thief’s purchases will often come from a different probability distribution over purchase types than your own. The credit card company can prevent fraud by placing a hold on an account as soon as that card has been used for an uncharacteristic purchase.
-
Synthesis and sampling: In this type of task, the machine learning algorithm is asked to generate new examples that are similar to those in the training data. Synthesis and sampling via machine learning can be useful for media applications when generating large volumes of content by hand would be expensive, boring, or require too much time. For example, videogames can automatically generate textures for large objects or landscapes, rather than requiring an artist to manually label each pixel. In some cases, we want the sampling or synthesis procedure to generate a specific kind of output given the input. For example, in a speech synthesis task, we provide a written sentence and ask the program to emit an audiowaveform containing a spoken version of that sentence. This is a kind of structured output task, but with the added qualification that there is no single correct output for each input, and we explicitly desire a large amount of variation in the output, in order for the output to seem more natural and realistic.
-
Imputation of missing values: In this type of task, the machine learning algorithm is given a new example , but with some entries of missing. The algorithm must provide a prediction of the values of the missing entries.
-
Denoising: In this type of task, the machine learning algorithm is given as input a corrupted example obtained by an unknown corruption process from a clean example . The learner must predict the clean example from its corrupted version , or more generally predict the conditional probability distribution .
-
Density estimation or probability mass function estimation: In the density estimation problem, the machine learning algorithm is asked to learn a function , where can be interpreted as a probability density function (if is continuous) or a probability mass function (if is discrete) on the space that the examples were drawn from. To do such a task well (we will specify exactly what that means when we discuss performance measures ), the algorithm needs to learn the structure of the data it has seen. It must know where examples cluster tightly and where they are unlikely too ccur. Most of the tasks described above require the learning algorithm to at least implicitly capture the structure of the probability distribution. Density estimation enables us to explicitly capture that distribution. In principle, we can then perform computations on that distribution to solve the other tasks as well. For example, if we have performed density estimation to obtain a probability distribution , we can use that distribution to solve the missing value imputation task. If a value is missing, and all the other values, denoted , are given, then we know the distribution over it is given by . In practice, density estimation does not always enable us to solve all these related tasks, because in many cases the required operations on are computationally intractable.
Of course, many other tasks and types of tasks are possible. The types of tasks we list here are intended only to provide examples of what machine learning can do, not to define a rigid taxonomy of tasks.
Challenge
Try to identify the type of task for each of the examples below.
- Estimate required steering wheel angle given an image from a dash-cam.
- Predict the rating a user might assign a particular movie, given a handful of ratings from other movies and users.
- Identify potentially malicious traffic in a computer network.
- Convert page layout sketches into functioning html.
- Identify the sub-surface structure based on sensor readings.
Solution
- Regression.
- Imputation.
- Anomoly detection
- Translation.
- Classification.
Discussion
Are any of these tasks applicable to datasets that you have? Would any of these tasks solve some interesting science questions you have?
The Performance Measure,
To evaluate the abilities of a machine learning algorithm, we must design a quantitative measure of its performance. Usually this performance measure is specific to the task being carried out by the system. For example, tasks such as classification, classification with missing inputs, and transcription, we often measure the accuracy of the model.
Usually we are interested in how well the machine learning algorithm performson data that it has not seen before, since this determines how well it will work when deployed in the real world. We therefore evaluate these performance measures using a test set of data that is separate from the data used for training the machine learning system.
Discussion
What are the useful metrics of performance for some of the tasks you identified above? Are they easy to capture or express mathematically?
Challenge
Assume we are given the task of building a system to distinguish healthy crops from unhealthy crops. What is in an unhealthy crop that lets us know that it is unhealthy? How can the computer detect an unhealthy crop through image analysis? What would we like the computer to do if it detects an unhealthy crop?
Write the phrase “data school” ten times on a piece of paper. Also ask a friend to do the same. Analysing these twenty images try to find features, types of strokes, curvatures, loops how you make dots, and so on, that discriminate your handwriting from that of your friends.
In estimating the price of a used car, it makes more sense to estimate the percent depreciation over the original price than to estimate the absolute price. Why?
References
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. Retrieved from https://www.deeplearningbook.org
Key Points
There are a number of machine learning algorithms available, which one you use depends on the type of data you have, the problem you are trying to solve and your definition of ‘what is good’.
There is typically more than one way to solve a problem, usually it depends on how you frame what you are doing.