A guide on how to set up Spark with Jupyter on AWS EC2 instances with S3 I/O support. Presented at Toronto Apache Spark #19.
A solution for determining the most optimal placement of location-based information maps throughout Toronto. With Chris Goldsworthy.
The following is work I have done with the University of Toronto Data Science Team (UDST).
We do kaggle competitions.
I use python multiprocessing to preprocess Lung CT Images efficiently on all available CPU cores on AWS compute instances.
An exploration of satellite images using AWS S3 and boto3 for the kaggle DSTL Satellite Imagery Feature Detection challenge.
A list of useful data science resources.
A list of what I’ve read.