PROJECTS
Toxic Comment Classification
A multi-label classification Machine Learning problem to detect toxic comments from Wikipedia’s talk page edits.
​
(Naive Bayes, Logistic Regression, Support Vector Machine, Gradient Boosting)
NYC Taxi Demands Prediction
A data pipeline which automates NYC taxi and weather data extraction from AWS S3 to MongoDB and connected to Spark for taxi demands prediction.
​
(AWS S3, EC2, EMR, ETL, Data Pipeline, MongoDB, Spark SQL, Spark ML, Pyspark, Linear Regression, Random Forests, Gradient Boosting)
Twitter Sentiment Analysis
Sentiment analysis for the automated-fetched tweets shown as a digested list with different colors.
(VADER, Tweepy, Jinja2, Flask, Selenium, REST API)
Canadian National Bankruptcy Rates Forecasts
A time series prediction forecasting Canadian monthly bankruptcy rates based on data from 2015 to 2017 considering macroeconomics indicators.
(R, SARIMA, Holt-Winters, VAR, VARX)
BBC Article Recommendation Engine
A website built to recommend most relevant BBC articles of your choice sitting on AWS EC2.
​
(AWS EC2, word2vec, Flask, HTML, Jinja2, Stanford GloVe, Python)
BART (Bay Area Rapid Transit) Rides Data Manipulation
A data pipeline loading BART rides data from 2001 to 2016 to SQL database with data cleansing and manipulation.
(Python, PostgreSQL)