LOCEN Resources Machine Learning

The key textbook on AI

Russell, Stuart J. and Norvig, Peter (2016).  Artificial Intelligence: A Modern Approach, Third edition ("Global edition" for Europe). Harlow, UK: Pearson.

Our comments:

  • It is consider the bible of AI 
  • It gives some founding concepts of cognitive science (e.g, see its very good chapter 2), AI (logical/symbolic planning), and some hints to machine learning
  • It gives a good coverage of all AI topics, so it is good to have a broad overview of its different fields (but it does not cover at all the new deep-nets literature, for which you need other books, e.g. see below)
  • It is relatively easy to understand

BookS on machine learning


A very good review related to the following book gives a good overview of the most used books in the field:

- Kevin Murphy's Machine learning: a Probabilistic Perspective (MLAPP)

Here are the indications of the review.

Similar textbooks on statistical/probabilistic machine learning (links to book websites, not Amazon pages):

- Barber's Bayesian Reasoning and Machine Learning ("BRML", Cambridge University Press 2012) (pdf available)

- Koller and Friedman's Probabilistic Graphical Models ("PGM", MIT Press 2009)

- Bishop's Pattern Recognition and Machine Learning ("PRML", Springer 2006) (pdf available)

- MacKay's Information Theory, Inference and Learning Algorithms ("ITILA", CUP 2003) (pdf available)

- Hastie, Tibshirani and Friedman's Elements of Statistical Learning ("ESL", Springer 2009) (pdf available)

* Perspective: My perspective is that of a machine learning researcher and student, who has used these books for reference and study, but not as classroom textbooks.

* Audience/prerequisites: they are comparable among all the textbooks mentioned. BRML has lower expected commitment and specialization, PGM requires more scrupulous reading. The books differ in their topics and disciplinary approach, some more statistical (ESL), some more Bayesian (PRML, ITILA), some focused on graphical models (PGM, BRML). K Murphy compares MLAPP to others here. For detailed coverage comparison, read the table of contents on the book websites.

* Main strength: MLAPP stands out for covering more advanced and current research topics: there is a full chapter on Latent Dirichlet Allocation, learning to rank, L1 regularization, deep networks; in the basics, the decision theory part is quite thorough (e.g. will mention Jeffrey's/uninformative priors). The book is "open" and vivid, doesn't shy away from current research and advanced concepts. This seems to be purposeful, as it shows in many aspects:

- quotes liberally from web sources, something usually not done in academic publications

- borrows "the best" from other authors (always with permission and acknowledgment, of course): most importantly the best pictures and diagrams, but also tables, recaps, insightful diagrams. Whereas other books will produce their own pictures and diagrams themselves (eg, PRML has a distinctive clarity and style in its illustrations), MLAPP takes many of its colour illustrations from other people's publications; therefore it can select the most pithy and relevant pictures to make a point. You could think that reproductions may be illegible and require extra effort to interpret because they come from a variety of sources; I have found that the bonus coming from having precisely the right image prevails.

- frequent references to the literature, mentions of extensions and open questions, as well as computational complexity considerations: for instance, the section on HMMs will mention duration modeling and variable-duration Markov models, and a comparison of the expressive power of hierarchical HMMs versus stochastic context-free grammars, complete with relevant citations, and a brief mention of the computational complexity results from the publications. All this connects the material with research and new ideas in a fine way -- which other textbooks don't achieve, I find. For instance, PGM defers references to a literature section at the end of each chapter, resulting in a more self-contained, but more poorly "linked" text.

* Didactic aids: Another distinctive feature is that the author clearly has tried to include didactic aids gathered over the years, such as recaps, comparative tables, diagrams, much in the spirit of the "generative model of generative models" (Roweis and Ghahramani): e.g. table comparing all models discussed, pros and cons of generative vs. discriminative models, recap of operations on HMMs (smoothing, filtering etc), list of parameter estimation methods for CRFs.

* Editorial features: Other editorial features worth mentioning are

- compared to others, helpful mentions of terminology, e.g. jargon, nomenclature, concept names, in bold throughout the text ("you could also devise a variant thus; this is called so-and-so")

- mathematical notation relatively clear and consistent, occasional obscurities. PGM stands out as excruciatingly precise on this aspect.

- boxes/layout: no "skill boxes" or "case study boxes" (PGM), not many roadmap/difficulty indications like ITILA or PGM, examples are present but woven into the text (not separated like PGM or BRML). Layout rather plain and homogeneous, much like PRML.

- sadly lacks list of figures and tables, but has index of code

* Complete accompanying material:

- interesting exercises (yet fewer than PRML, BRML, PGM); solutions, however, are only accessible to instructors (same with BRML, PGM), which in my experience makes them only half as useful for the self-learner. PRML and ITILA have some solutions online resp. in the book.

- accompanying Matlab/Octave source code, which I found more readily usable than BRML's. PGM and PRML have no accompanying source code, even though the toolkit distributed with Koller's online PGM class might qualify as one. I find accompanying code a truly useful tool for learning; there's nothing like trying to implement an algorithm, checking your implementation against a reference, having boilerplate/utility code for the parts of the algorithm you're not interested in re-implementing. Also, code may clarify an algorithm, even when presented in pseudo-code. By the way, MLAPP has rather few pseudo-code boxes (like BRML or PRML, while PGM is very good here).

- MLAPP is not freely available as a PDF (unlike BRML, closest topic-wise, ESL, or ITILA). This will no doubt reduce its diffusion. My own take on the underlying controversy is in favor of distributing the PDF: makes successful books widely popular and cited (think ITILA or Rasmussen and Williams' Gaussian Processes), increases the book's overall value, equips readers with a weightless copy to annotate with e-ink, or consult on the go. I believe PDF versions positively impact sales, too: impact neutral-to-positive to course textbook/university library sales, indifferent to sales in countries with widely different purchase power, positive to all other segments due to enormous diffusion/popularity.

* Conclusion:

The closest contender to this book I believe is BRML. Both are excellent textbooks and have accompanying source code.

BRML is more accessible, has a free PDF version, and a stronger focus on graphical models.

MLAPP has all the qualities of an excellent graduate textbook (unified presentation, valuable learning aids), and yet is unafraid of discussing detail points (e.g. omnipresent results on complexity), as well as advanced and research topics (LDA, L1 regularization).


Online courses on machine learning and artificial intelligence

The classic course of Andrew Ng on Coursera 

This Coursera course "Machine learning", by Andrew Ng from the University of Stanford, is a great start for entering machine learning. A classic and one of the best available ones, leveraging the great teaching abilities of Ng being based on video lectures. It includes Matlab/Octave exercises, marked in automatic by the computer-system of the course, but this imposes a periodically-available enrolment and 6/8 hours of study per week for about 8 weeks. Enrolling allows one to have a certificat at the end of the course (official with a payment).

Andrew Ng, Machine learning.


Machine Learning at Caltech

Topic-by-topic video library on machine learning (free) based on the popular online course. The library contains some interesting views on the biological plausibility of neural networks, Bayesian learning, error measures, and deterministic noise. Link:


Artificial Intelligence at Stanford

A bold experiment in distributed education, "Introduction to Artificial Intelligence" will be offered free and online to students worldwide from October 10th to December 18th 2011. The course will include feedback on progress and a statement of accomplishment. Taught by Sebastian Thrun and Peter Norvig, the curriculum draws from that used in Stanford's introductory Artificial Intelligence course. The instructors will offer similar materials, assignments, and exams.
Artificial Intelligence is the science of making computer software that reasons about the world around it. Humanoid robots, Google Goggles, self-driving cars, even software that suggests music you might like to hear are all examples of AI. In this class, you will learn how to create this software from two of the leaders in the field. Class begins October 10. Link:


Stanford course

Stanford’s Data Mining and Applications Certificate: 


Computational Cognition Cheat Sheets from Rochester University

Over the years, our lab has written several notes (referred to as "Computational Cognition Cheat Sheets") providing brief introductions to computational methods that are often useful in the study of human cognition. Many students and faculty have told us that these notes are extremely useful, both for self-study and classroom teaching. These notes are available from the following web page:


There are currently 22 notes available on the following topics:
Backpropagation, Algorithm, Bayesian Estimation, Bayesian Inference: Gibbs Sampling, Bayesian Inference: Metropolis-Hastings Sampling, Bayesian Inference: Particle Filtering, Bayesian Statistics: Beta-Binomial Model, Bayesian Statistics: Dirichlet Processes, Bayesian Statistics: Indian Buffet Process, Bayesian Statistics: Normal-Normal Model, Conditional Independence Dependency-Separation and Bayesian Networks, Factor Analysis, Hidden Markov Models, K-Means Algorithm for Clustering, Maximum Likelihood Estimation, Mixture Models, Mixtures-of-Experts, Optimal Linear Cue Combination, Principal Components Analysis, Principal Components Analysis and Unsupervised Hebbian Learning, Reinforcement Learning: Model-based, Reinforcement Learning: Model-free, Sensory Integration and Kalman Filtering.

Master in computer vision and artificial intelligence, from Universitat Autònoma de Barcelona

Good slides on computer vision:


AI simplified on-line tool, to build neural-network systems using a graphical interface


Tools for speek recognition interfaces


Specific machine learning algorithms of interest

Random forest

One of the most powerful, fast and simple classifier, also used most of times in the industry, suitable for problems with spaces with thousands of features. See a brief introduction to it here (from the very good documentation of SciPy):



An algorithm to search manifolds within high-dimentional continuous spaces (from the very good documentation of SciPy):