

PROJECTS

Toxic Comment Classification
A multi-label classification Machine Learning problem to detect toxic comments from Wikipedia’s talk page edits.
(Naive Bayes, Logistic Regression, Support Vector Machine, Gradient Boosting)

NYC Taxi Demands Prediction
A data pipeline which automates NYC taxi and weather data extraction from AWS S3 to MongoDB and connected to Spark for taxi demands prediction.
(AWS S3, EC2, EMR, ETL, Data Pipeline, MongoDB, Spark SQL, Spark ML, Pyspark, Linear Regression, Random Forests, Gradient Boosting)

Twitter Sentiment Analysis
Sentiment analysis for the automated-fetched tweets shown as a digested list with different colors.
(VADER, Tweepy, Jinja2, Flask, Selenium, REST API)

Canadian National Bankruptcy Rates Forecasts
A time series prediction forecasting Canadian monthly bankruptcy rates based on data from 2015 to 2017 considering macroeconomics indicators.
(R, SARIMA, Holt-Winters, VAR, VARX)

BBC Article Recommendation Engine
A website built to recommend most relevant BBC articles of your choice sitting on AWS EC2.
(AWS EC2, word2vec, Flask, HTML, Jinja2, Stanford GloVe, Python)

BART (Bay Area Rapid Transit) Rides Data Manipulation
A data pipeline loading BART rides data from 2001 to 2016 to SQL database with data cleansing and manipulation.
(Python, PostgreSQL)