Module 1: Introduction

Why Machine Learning?

  • Problems Machine Learning Can Solve
  • Knowing Your Task and Knowing Your Data

Why Python?

  • scikit-learn
  • Installing scikit-learn

Essential Libraries and Tools

  • Jupyter Notebook
  • NumPy
  • SciPy
  • matplotlib
  • pandas
  • mglearn

Python 2

Versus Python 3

Versions Used in this Book

A First Application: Classifying Iris Species

  • Meet the Data
  • Measuring Success: Training and Testing Data
  • First Things First: Look at Your Data
  • Building Your First Model: k-Nearest Neighbors
  • Making Predictions
  • Evaluating the Model

 

 

 Module 2: Supervised Learning

Classification and Regression

Generalization, Overfitting, and Underfitting

  • Relation of Model Complexity to Dataset Size

Supervised Machine Learning Algorithms

  • Some Sample Datasets
  • k-Nearest Neighbors
  • Linear Models
  • Naive Bayes Classifiers
  • Decision Trees
  • Ensembles of Decision Trees
  • Kernelized Support Vector Machines
  • Neural Networks (Deep Learning)

Uncertainty Estimates from Classifiers

  • The Decision Function
  • Predicting Probabilities
  • Uncertainty in Multiclass Classification

 

 

Module 3: Unsupervised Learning and Preprocessing

Types of Unsupervised Learning

Challenges in Unsupervised Learning

Preprocessing and Scaling

  • Different Kinds of Preprocessing
  • Applying Data Transformations
  • Scaling Training and Test Data the Same Way
  • The Effect of Preprocessing on Supervised Learning

Dimensionality Reduction, Feature Extraction, and Manifold Learning

  • Principal Component Analysis (PCA)
  • Non-Negative Matrix Factorization (NMF)
  • Manifold Learning with t-SNE

Clustering

  • k-Means Clustering
  • Agglomerative Clustering
  • DBSCAN
  • Comparing and Evaluating Clustering Algorithms
  • Summary of Clustering Methods

 

 

Module 4: Representing Data and Engineering Features

Categorical Variables

One-Hot-Encoding (Dummy Variables)

Numbers Can Encode Categoricals

Binning, Discretization, Linear Models, and Trees

Interactions and Polynomials

  • Univariate Nonlinear Transformations
  • Automatic Feature Selection
  • Univariate Statistics
  • Model-Based Feature Selection
  • Iterative Feature Selection

Utilizing Expert Knowledge

 

 

Module 5: Model Evaluation and Improvement Cross-Validation

  • Cross-Validation in scikit-learn
  • Benefits of Cross-Validation
  • Stratified k-Fold Cross-Validation and Other Strategies

Grid Search

  • Simple Grid Search
  • The Danger of Overfitting the Parameters and the Validation Set
  • Grid Search with Cross-Validation

Evaluation Metrics and Scoring

  • Keep the End Goal in Mind
  • Metrics for Binary Classification
  • Metrics for Multiclass Classification
  • Regression Metrics
  • Using Evaluation Metrics in Model Selection

 

 

Module 6: Algorithm Chains and Pipelines

Parameter Selection with Preprocessing

Building Pipelines

Using Pipelines in Grid Searches

The General Pipeline Interface

  • Convenient Pipeline Creation with make_pipeline
  • Accessing Step Attributes
  • Accessing Attributes in a Grid-Searched Pipeline

Grid-Searching Preprocessing Steps and Model Parameters

Grid-Searching Which Model To Use

 

 

Module 7: Working with Text Data

Types of Data Represented as Strings

Example Application: Sentiment Analysis of Movie Reviews

Representing Text Data as a Bag of Words

  • Applying Bag-of-Words to a Toy Dataset
  • Bag-of-Words for Movie Reviews

Stopwords

Rescaling the Data with tf–idf

Investigating Model Coefficients

Bag-of-Words with More Than One Word (n-Grams)

Advanced Tokenization, Stemming, and Lemmatization

Topic Modeling and Document Clustering

  • Latent Dirichlet Allocation

 

 

Module 8: Wrapping Up

Approaching a Machine Learning Problem

  • Humans in the Loop

From Prototype to Production

Testing Production Systems

Building Your Own Estimator

Where to Go from Here

  • Theory
  • Other Machine Learning Frameworks and Packages
  • Ranking, Recommender Systems, and Other Kinds of Learning
  • Probabilistic Modeling, Inference, and Probabilistic Programming
  • Neural Networks
  • Scaling to Larger Datasets
  • Honing Your Skills
Hi, How Can We Help You?