Useful data science resources and recommended study routes. Updated occasionally.
|Stanford Statistical Learning||Trevor Hastie, Robert Tibshirani||
|Stanford CS229 Machine Learning||Andrew Ng, John Duchi||
Neural Networks for Machine Learning
|Stanford CS231n Convolutional Neural Networks for Visual Recognition||Fei-Fei Li, Andrej Karpathy, Justin Johnson||
||Average - Advanced|
All course content should be available for free. The paid Coursera certification is not really important.
|An Introduction to Statistical Learning||Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani||
|Mining of Massive Datasets||Jure Leskovec, Anand Rajarman, Jeffrey D. Ullman||
|The Elements of Statistical Learning||Trevor Hastie, Robert Tibshirani, Jerome Friedman||
|Pattern Recognition and Machine Learning||Christopher M. Bishop||
|HackOn(Data) Workshop Material||Armando Benitez||
||Introductory - Average|
|TensorFlow Tutorial and Examples for Beginners||Aymeric Damien||
Make sure you have the sufficient theoretical background in statistics, linear algebra and multivariable calculus. Most university students should be adequately prepared after second-year classes in these subjects.
Acquire a basic background in Python, including the following libraries: NumPy, Matplotlib, Scipy, Pandas. There are many resources available online. I particularly like this one for NumPy, Matplotlib, Scipy.
It is also useful to know R and Scala (for Apache Spark).
Start off with the canonical Coursera Machine Learning course by Andrew Ng. It will give you a high-level overview of machine learning that is not too technical. You can stop this course after you feel like you have developed a sufficient intuition for machine learning.
If you have a statistical background, opt for the Stanford Statistical Learning course and study An Introduction to Statistical Learning. Otherwise, read the lectures notes to Stanford CS229 Machine Learning for a more technical introduction to Machine Learning.
For a general theoretical overview of neural networks, complete the Coursera Neural Networks for Machine Learning course by Geoffrey Hinton.
For a deeper and more technical understanding of neural networks, read the modules to Stanford CS231n Convolutional Neural Networks for Visual Recognition and complete the assignments. It is important that you complete the assignments, in which you will actually write neural network layers.
Afterwards, begin learning computational frameworks for deep learning, such as tensorflow or theano (I recommend tensorflow), as well as deep learning libraries, such as keras and caffe. Then start building your own neural networks, and figure out how to train them with GPUs.
Familiarize yourself with cloud computing services. I recommend beginning with AWS, which offers a free tier. I don’t think there is a need to take an entire course on cloud computing, as you will learn a lot by doing. Try to launch your own virtual machines and use them to run your models. Try integration with their storage services.
Learn the basics to Apache Spark, a distributed computing engine designed for big data. I did this through the HackOn(Data) Workshops, but there are plenty of other resources available. Then, try launching a Spark cluster on the cloud, either through a service like AWS EMR or Azure HDInsight, or by bootstrapping your own cluster (My Guide).
As for the rest, learn as you need.
Note: Keep in mind that you can only learn so much through reading. Data Science is about doing! Try kaggle competitions, or fool around with fun datasets.