top of page
Toxic Comment Classification

A multi-label classification Machine Learning problem to detect toxic comments from Wikipedia’s talk page edits.


(Naive Bayes, Logistic Regression, Support Vector Machine, Gradient Boosting)

Taxi on the Street
NYC Taxi Demands Prediction

A data pipeline which automates NYC taxi and weather data extraction from AWS S3 to MongoDB and connected to Spark for taxi demands prediction. 


(AWS S3, EC2, EMR, ETL, Data Pipeline, MongoDB, Spark SQL, Spark ML, Pyspark, Linear Regression, Random Forests, Gradient Boosting)

Checking Text on a Document
Twitter Sentiment Analysis

Sentiment analysis for the automated-fetched tweets shown as a digested list with different colors.


(VADER, Tweepy, Jinja2, Flask, Selenium, REST API)

Display of Stock Market Quotes
Canadian National Bankruptcy Rates Forecasts

A time series prediction forecasting Canadian monthly bankruptcy rates based on data from 2015 to 2017 considering macroeconomics indicators.


(R, SARIMA, Holt-Winters, VAR, VARX)

open books
BBC Article Recommendation Engine

A website built to recommend most relevant BBC articles of your choice sitting on AWS EC2.


(AWS EC2, word2vec, Flask, HTML, Jinja2, Stanford GloVe, Python)

At the Station
BART (Bay Area Rapid Transit) Rides Data Manipulation

A data pipeline loading BART rides data from 2001 to 2016 to SQL database with data cleansing and manipulation. 


(Python, PostgreSQL)

bottom of page