Extracted, cleaned and pre-processed over 13 million records from remote SQL database. Trained XGboost valuation model on AWS EC2.
A guide on how to set up Spark with Jupyter on AWS EC2 instances with S3 I/O support. Presented at Toronto Apache Spark #19.
A solution for determining the most optimal placement of location-based information maps throughout Toronto.
The following is work I have done with the University of Toronto Data Science Team (UDST).
I use python multiprocessing to preprocess Lung CT Images efficiently on all available CPU cores on AWS compute instances.
An exploration of satellite images using AWS S3 and boto3 for the kaggle DSTL Satellite Imagery Feature Detection challenge.
A list of useful data science resources.
A list of what I’ve read.