ML Resources
This is a notparticularlysystematic attempt to curate a handful of my favorite resources for learning statistics and machine learning. This isn’t meant to be comprehensive, and in fact is still missing the vast majority of my favorite explainers. Rather, it’s just a smattering of resources I’ve found myself turning to multiple times and thus would like to have in one place. The organizatiion is as follows:

Open Courses and Textbooks: Cover a fairly broad topic reasonably comprehensively, and would take weeks to months to work through starttofinish.

Tutorials, Overviews, and (Individual) Lecture Notes: Explain a specific topic extremely clearly, and take minutes to hours (or a few days tops) to work through from starttofinish.

Cheatsheets: Provide structured access to useful bits of information on the order of seconds.
Finally, I’ve added a section with links to a few miscellanous websites that often produce great content.
Of the above, the second section is both the most incomplete and the one that I am most excited about. I hope to use it to capture the best explanations of tricky topics that I have read online, to make it easier to relearn them later when I inevitably forget. (In a perfect world, Chris Olah and/or distill.pub would just write an article on everything, but in the meantime I have to gather scraps from everywhere else.)
If you stumble upon this list and have suggestions for me to add (especially for the middle section!), please feel free to reach out! But I’m only trying to post things on here that I’ve read, so it may be caught in my toread list for a while before it makes it on here. Of course, the source for this webpage is on github, so you can also just take it.
Open Courses and Textbooks
I’m trying to limit to this list to things that are legally accessible online, for free.
Foundation
File  Description 

Math for ML Book  Math for machine learning book by Faisal and Ong, available on github. 
Boyd Applied Linear Algebra  Freely available book from Boyd and Vandenberghe on Applied LA (website). 
Fast.ai Computational Linear Algebra  Rachel Thomas has put together this great online textbook for computational linear algebra with accompanying youtube videos. 
MIT 6.041 Intro Probability  John Tsitsiklis et al have put together some great resources. Their classic MIT intro to probability has been archived on OCW and also offered on Edx (Part 1, Part 2). The textbook is also excellent. 
Joe Blitzstein’s Stat110  Joe Blitzstein’s undergrad probability course has a high overlap in content with 6.041. Like 6.041, it also has a great textbook, youtube videos, and an edx offering. It’s a bit more playful, as well. 
MathematicalMonk  This guy is amazing. Some 250 youtube tutorials on ML, Probability, and Information Theory. What’s great about these playlists is any individual video could go into section 2! 
Tim Roughgarden’s Lectures on Algorithms and Algorithms Illuminated  Tim Roughgarden is one of most natural teachers I’ve ever seen, and fortunately for the world, he’s decided to make a lot of his algorithms resources public. The first link is to lecture notes in PDF form from many classes – for the dataoriented, his CS 168 course is accessible and amazing. Videos for his Algorithms 2 class (CS 261) are here (pdf notes are in that first link). The second is a link to his page for his new textbook, but that page also has links out to all the youtube videos from his coursera version of CS 161 (Algorithms 1). 
Foundations of Data Science textbook and videos  This minicourse appears to have developed out of CMU’s “CS Theory for the Information Age” 2012 site, which I think is a better name for this. It’s a strong upperundergrad or intrograd student math class covering foundations for highdimensional data algorithms. Another class using the textbook is here. Highdimensional probability section is cool. 
Statistics
File  Description 

Seeing Theory  This is an online visual textbook that has a bunch of cool interactive displays for intro probability/stats ideas. My favorite is the inference visualizations. 
Russell Poldracks’ Statistical Thinking for the 21st Century  This appears to be a pretty fantastic (albeit rather elementary) textbook for a onequarter intro to statistics class (stat 60 at stanford). Despite assuming little, it touches upon a lot of great topics. 
Doug Sparks’ Stats 200  Nice course notes on Statistical Inference from Doug Sparks 2014 offering of stats 200 
Modern Statistics for Modern Biology  This online textbook is from Susan Holmes and Wolfgang Huber, and provides a nice and accessible intro to the parts of modern data science revelant to computational biologists. It also happens to be a piece of typographic art, created with bookdown. 
Statistical Rethinking  Lecture Videos on youtube accompany this fantastic introductory textbook. 
Hernan and Robbins Causal Inference Book  Longupcoming textbook on causal inference (from the epidemiology perspective), with drafts fairly frequently updated on the web page. 
Classic Machine Learning
File  Description 

Bishop’s Pattern Recognition and Machine Learning  This is a classic ML text, and has now been finally released (legally) for free online. 
CS 229 Lecture Notes  Classic note set from Andrew Ng’s amazing gradlevel intro to ML: CS229. 
ESL and ISL from Hastie et al  Beginner (ISL) and Advanced (ESL) presentation to classic machine learning from worldclass stats professors. Slides and video for a MOOC on ISL is available here. 
CS 228 PGM Notes  Really great course notes on Probabilistic Graphical Models from at Stanford. PDF export wasn’t ideal so linking only to website. 
Blei Foundations of Graphical Models Course  2016 course notes on Foundations of Graphical Models from David Blei 2016 website 
Deep Learning
File  Description 

Roger Grosse’s CSC231 Notes  Notes from Roger Grosse’s CSC 231 full website here. Probably the single best intro to DL course I’ve found from any university. Notes and slides are gorgeous. 
Fast.Ai  Wonderful set of intro lectures + notebooks from Jeremy Howard and Rachel Thomas. In addition, Hiromi Suenaga has released excellent and selfcontained notes of the whole series with timestamp links back to videos: FastAI DL Part 1, FastAI DL Part 2, and FastAI ML. 
CS231N DL for Vision  Amazing notes from Andrej Karapthy, with lectures on Youtube as well. 
CS224 Deep Learning for NLP 2017  Fantastic course notes on Deep Learning for NLP from Stanford’s CS224. Github repo here 
CMU CS 11747  Fantastic course on Deep Learning for NLP from CMU’s Graham Neubig. Really great lecture videos on Youtube here 
CS294158 Deep Unsupervised Learning  Open course on deep unsupervised learning from Berkeley. Looks fantastic. 
Deep Learning Book  This textbook by Ian Goodfellow, Yoshua Bengio, and Aaron Courville is probably the closest we have to a de facto standard textbook for DL. 
Reinforcement Learning
File  Description 

Sutton and Barto Open RL Book  Defacto standard intro to RL, even though the textbook is only now about to be published! 
Berkeley Deep Reinforcement Learning  RL class from Berkely taught by top dogs in the field, lectures posted to Youtube. 
Optimization
File  Description 

Boyd Convex Optimization Book  Famous and freely available textbook from Boyd and Vandenberghe, accompanied by slides and Youtube videos. More advanced followup class here 
NYU Optimizationbased Data Analysis 2016 and 2017  Fantastic course notes on Optimizationbased data analysis from NYU 2016 website and 2017 website. 
Tutorials, Overviews, and (Individual) Lecture Notes
This section is fledgling at best, but was my real motivation in making this page. Archetypes include basically anything on distill.pub, good blog or medium posts, etc. Depthfirst learning looks like a great access point here, but I haven’t gotten to do more than skim any of those, yet.
Fundamentals
File  Description 

CS 229 Linear Algebra Notes  Linear algebra reference from Stanford’s Machine Learning Course. 
Matrix Calc for DL (pdf here)  Really nice overview of matrix calculus for deep learning from Parr/Howard. Citable on on arxiv. 
Probability and Statistics
File  Description 

Common statistical tests are linear models (or: how to teach stats)  This little blog post does an incredible job explaining how a whole bunch of common statistical tests can be intuitively unified under a single framework (linear models) 
Seeing Theory Frequentist Inference  This is a really beautiful visual presentation of the basic ideas of frequentist inference, from the Seeing Theory textbook. I love it. 
Hernan Selection Bias  Nice summary of selection bias via DAGs by Hernan et al. 
Classic Machine Learning/Data Science NOS
File  Description 

Roughgarden SVD Notes  Really great presentation of SVD from Tim Rougharden’s CS168 at Stanford. 
Roughgarden PCA Notes  Really great presentaiton of PCA from Tim Rougharden’s CS168 at Stanford. 
Bayesian Machine Learning
File  Description 

Blei Exponential Familes/Variational Inference  A couple of the course notes I particularly like from Blei’s 2011 Probabilistic Modeling Course ) 
Blei Variational Inference Review  Overview on Variational Inference from David Blei available on arxiv 
Deep Learning
File  Description 

Karpathy’s Recipe for Training NNs  A great blog post that contains a bunch of little tricks for training deep neural networks 
Adversarial Examples/Robust ML Part 1, Part 2, and Part 3  The Madry lab is one of the top research groups in robust deep learning research. They put together a fantastic intro to these topics on their blog. I hope they keep making posts… 
Distill Attention  Amazingly clear presentation of the attention mechanism and its (early) variants 
Distill Building Interpretability  Coolest visualizations of NN internals I’ve ever seen 
Distill Feature Visualization  Running theme: If it’s only distill.pub, read it. 
Chris Olah Understanding LSTMs  Chris Olah is a master of his craft, and here offers a fantastic overview of LSTMs and GRUs. 
Intro to Federated Learning  Intro to federated learning and PySyft from Andrew Trask and others using PyTorch. 
Natural Language Processing
File  Description 

Chris Olah on Word Embeddings  Chris Olah explaining world embeddings and the like. 
The Illustrated Transformer  Nice visualization of how transformer networks work by Jay Alammar. 
The Annotated Transformer  Harvard’s Sasha Rush created a linebyline annotation of “Attention is All You Need” that also serves as a working notebook. Pedagogical brilliance, and it would be awesome to do this for a couple papers per year. 
Goldberg’s Primer on NNs for NLP  Overview of Deep Learning for NLP from Yoav Goldberg downloaded from here. 
Neubig’s Tutorial on NNs for NLP  Overview of Deep Learning for NLP from Graham Neubig. Downloaded from arxiv and pairs nicely with his course and videos. 
Reinforcement Learning
File  Description 

Karpathy’s Pong From Pixels  Andrej Karpathy has a real gift for didactics. This is a selfcontained explanation of deep reinforcement learning sufficient to understand a basic atari agent. 
Weng’s A (Long) Peek into RL  A nice blog post covering the foundations of reinforcement learning 
OpenAI’s Intro to RL  The introductory tutorial for OpenAIs new “Spinning Up in Deep RL” website 
Information Theory
File  Description 

Chris Olah Visual Information Theory  As always, Chris Olah creates an amazing presentation both in words and images. Goal is to visualize key information theory concepts. 
Cover and Thomas Ch2  Entropy and Information  The extremely wellwritten introductory chapter from the classic information theory textbook. 
Cover and Thomas Ch11  Info Theory and Statistics  The information theory and statistics chapter from the classic information theory textbook. 
Deriving Probability Distributions from Maximum Entropy Principle  It feels slimey and selfserving to include this, but I wrote this post to better understand how information theory can be used to understand/derive common probability distributions from first principles. 
Deriving the information entropy of the multivariate gaussian  Another blog post I wrote to try to understand information theory + statistics. 
Optimization
File  Description 

Ruder Gradient Descent Overview (PDF here)  Great overview of gradient descent algorithms. 
Bottou LargeScale Optimization  Notes on Optimization from Bottou, Curtis, and Nocedal. Downloaded from arxiv. 
Cheatsheets
Math
File  Description 

Probability Cheatsheet  Probability cheat sheet, from William Chen’s github 
Probability Cheatsheet  Cheat sheet for unifying statistical tests under the linear models framework. From Jonas Kristoffer Lindeløv’s github 
CS 229 TA Cheatsheet 2018  TA cheatsheet from the 2018 offering of Stanford’s Machine Learning Course, Github repo here. 
CS Theory Cheatsheet  CS theory cheat sheet, originally accessed here 
Programming
File  Description 

R dplyr cheatsheet  Cheatsheet for Hadley’s amazing data wrangling package, dplyr. One of many from RStudio 
R ggplot2 cheatsheet  Cheatsheet for Hadley’s amazing plotting package, ggplot2. One of many from RStudio 
SQL Joins cheatsheet  Graphical description of classic SQL joins w/ toy code 
Python pandas cheatsheet  Cheatsheet for python’s data wrangling package, pandas. Downloaded from here 
Python numpy cheatsheet  Cheatsheet for python’s numerical package, numpy. Downloaded from Datacamp 
Python keras cheatsheet  Cheatsheet for python’s NN package, keras. Downloaded from Datacamp. 
Python scikitlearn cheatsheet  Cheatsheet for python’s ML package, scikitlearn. Downloaded from Datacamp. 
Python seaborn tutorial  Tutorial for python’s plotting system, seaborn. Haven’t found a great one yet for matplotlib. 
Graphic Design cheatsheet  Cute little graphic design cheatsheet downloaded from here 
Miscellaneous websites
File  Description 

Chris Olah’s Blog  Essentially everything on here is gold. I am so grateful for the hours he must put into these posts. 
distill.pub  Distill navigates a really interesting gap between superblog and research journal. I wish that we had more publications like this. 
Pytorch Tutorials  The tutorials put out by the pytorch developers are really fantastic. Easy to see why the community is growing so fast. 
Sebastian Ruder’s blog  Sebastian has produced a lot of really great explanations, like the one on gradient descent methods I linked to above. He also maintains a website tracking progress on NLP benchmarks 
Lillian Weng’s Blog  Great blog on RL, metalearning, and other topics 
ShortScience  This website contains public summaries/discussions of machine learning, CS, and biology papers. 
Berkeley AI Research (BAIR) Blog  BAIR produces a lot of great research, and uses this blog to release more accessible presentations of their papers. 
Off the Convex Path  Nice blog on machine learning and optimization. 
Ferenc Huszár’s blog  Pretty popular blog that has a lot of explorations/musings on ML from an author with a rigorous mathematical perspective 
Thibaut Lienart’s Blog  This website has some notes on math and optimization that seem interesting. 
Causal Inference Papers  Nice Github repo that compiles a bunch of Arxiv papers on Causal Machine Learning 