This lesson is still being designed and assembled (Pre-Alpha version)

Machine Learning: Additional Resources

Key Points

Introduction
  • Machine learning is the study of algorithms to learn patterns from data.

  • Machine learning is ubiquitous in the modern world.

Problem Set-up: Classifying Candy
  • Having the candies mixed up together is bad, and we want to use machine learning to fix it.

Incorporating Ethics into your Machine Learning Project
  • All machine learning projects should take ethical considerations into account during the planning stages and throughout the completion of the project

  • Not all machine learning projects are ethical: sometimes, the right choice is to abandon a project.

  • Projects might be used for data sets or applications that the developers never intended it for. It is important to envision potential uses that might turn out to be harmful.

Feature Engineering
  • In order to use a computer for classification, we need to summarize the information our eyes see into a few meaningful numbers that the computer can parse.

  • For the current problem of classifying candy, there are a number of features related to the appearance that may be useful.

Decision Boundaries
  • Not all features have the same relevance to classification. Some might separate all classes well, others only a subset, and some might not be helpful for separating out classes at all.

  • A decision boundary separates two or more classes from one another. The simplest decision boundaries are straight, but it is possible to draw very complicated decision boundaries.

K-Nearest Neighbours
Model Evaluation
  • Splitting the data into a training set and a test set we hold back until the end of the training phase helps us evaluate the performance of our machine learning model.

  • The choice of evaluation metrics depends on the problem to be solved using machine learning.

  • Different evaluation metrics optimize for different outcomes, and should be used in different circumstances.

  • Evaluation metrics can be used jointly or be combined to give a fuller picture of performance.

Logistic Regression
  • Logistic regression is an extension to linear regression models that allows for modelling problems where the outcomes are 0 and 1.

  • Logistic regression allows for separation of two different classes via a decision boundary.

Cross-Validation
  • Training is not enough: we need to make sure the model generalizes to new data points it hasn’t seen before.

  • Underfitting and overfitting are common problems when applying machine learning models that can be diagnosed with cross-validation.

  • Crossvalidation is the process of randomly dividing the data into subsets, and using different combinations of subsets as training and validation sets.

Final Thoughts

Additional Resources

FIXME

Glossary

Terms to include