*Useful data science resources and recommended study routes. Updated occasionally.*

Title | Author | Thoughts | Level |
---|---|---|---|

Coursera: Machine Learning |
Andrew Ng |
_{General overview, not much detail. Labs are in MATLAB, which is not desirable.} |
Introductory |

Stanford Statistical Learning | Trevor Hastie, Robert Tibshirani |
_{ Online lectures following the text Introduction to Statistical LearningExample R modules, minimal self-evaluations} |
Introductory |

Stanford CS229 Machine Learning | Andrew Ng, John Duchi |
_{A broad, technical overview of Machine LearningWritten problem sets on ML theory} |
Average |

Coursera: Neural Networks for Machine Learning |
Geoffrey Hinton |
_{Wide overview of several neural network models, including non-standard ones, such as Hopfield nets and Restricted Boltzmann MachinesLabs are in MATLAB, which is not desirable.} |
Average |

Stanford CS231n Convolutional Neural Networks for Visual Recognition | Fei-Fei Li, Andrej Karpathy, Justin Johnson |
_{ Well-written online modules, video lectures on youtube Completed the assignments, in which you write neural network architecture in pythonStrongly recommend modules + assignments for understanding NN's, CNN's, RNN's} |
Average - Advanced |

All course content should be available for free. The paid Coursera certification is not really important.

Title | Author | Thoughts | Level |
---|---|---|---|

An Introduction to Statistical Learning | Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani |
_{Good introductory book for machine learning for those with statistical backgroundIncludes R modules} |
Introductory |

Mining of Massive Datasets | Jure Leskovec, Anand Rajarman, Jeffrey D. Ullman |
_{Practical knowledge about data mining, machine learning with real-life applications} |
Average |

The Elements of Statistical Learning | Trevor Hastie, Robert Tibshirani, Jerome Friedman |
_{Advanced version of Introduction of Statistical LearningIncludes R modules} |
Advanced |

Pattern Recognition and Machine Learning | Christopher M. Bishop |
_{Have not read in detail} |
- |

Title | Author | Thoughts | Level |
---|---|---|---|

HackOn(Data) Workshop Material | Armando Benitez |
_{Great notebooks to learn Apache Spark on Databricks, Machine Learning with SparkAdapted from edX Spark labs} |
Introductory - Average |

TensorFlow Tutorial and Examples for Beginners | Aymeric Damien |
_{Well-constructed jupyter notebooks for learning TensorFlow} |
Introductory |

Make sure you have the sufficient theoretical background in statistics, linear algebra and multivariable calculus. Most university students should be adequately prepared after second-year classes in these subjects.

Acquire a basic background in Python, including the following libraries: *NumPy, Matplotlib, Scipy, Pandas*. There are many resources available online. I particularly like this one for *NumPy, Matplotlib, Scipy*.

It is also useful to know R and Scala (for Apache Spark).

Start off with the canonical *Coursera Machine Learning* course by Andrew Ng. It will give you a high-level overview of machine learning that is not too technical. You can stop this course after you feel like you have developed a sufficient intuition for machine learning.

If you have a statistical background, opt for the *Stanford Statistical Learning* course and study *An Introduction to Statistical Learning*. Otherwise, read the lectures notes to *Stanford CS229 Machine Learning* for a more technical introduction to Machine Learning.

For a general theoretical overview of neural networks, complete the *Coursera Neural Networks for Machine Learning* course by Geoffrey Hinton.

For a deeper and more technical understanding of neural networks, read the modules to *Stanford CS231n Convolutional Neural Networks for Visual Recognition* and complete the assignments. It is important that you complete the assignments, in which you will actually write neural network layers.

Afterwards, begin learning computational frameworks for deep learning, such as *tensorflow* or *theano* (I recommend tensorflow), as well as deep learning libraries, such as *keras* and *caffe*. Then start building your own neural networks, and figure out how to train them with GPUs.

Familiarize yourself with cloud computing services. I recommend beginning with AWS, which offers a free tier. I donâ€™t think there is a need to take an entire course on cloud computing, as you will learn a lot by doing. Try to launch your own virtual machines and use them to run your models. Try integration with their storage services.

Learn the basics to Apache Spark, a distributed computing engine designed for big data. I did this through the *HackOn(Data) Workshops*, but there are plenty of other resources available. Then, try launching a Spark cluster on the cloud, either through a service like AWS EMR or Azure HDInsight, or by bootstrapping your own cluster (My Guide).

As for the rest, learn as you need.

*Note: Keep in mind that you can only learn so much through reading. Data Science is about doing! Try kaggle competitions, or fool around with fun datasets.*