Projects

A multi-label classification Machine Learning problem to detect toxic comments from Wikipedia’s talk page edits.

(Naive Bayes, Logistic Regression, Support Vector Machine, Gradient Boosting)

A data pipeline which automates NYC taxi and weather data extraction from AWS S3 to MongoDB and connected to Spark for taxi demands prediction.

(AWS S3, EC2, EMR, ETL, Data Pipeline, MongoDB, Spark SQL, Spark ML, Pyspark, Linear Regression, Random Forests, Gradient Boosting)

Sentiment analysis for the automated-fetched tweets shown as a digested list with different colors.

(VADER, Tweepy, Jinja2, Flask, Selenium, REST API)

A time series prediction forecasting Canadian monthly bankruptcy rates based on data from 2015 to 2017 considering macroeconomics indicators.

(R, SARIMA, Holt-Winters, VAR, VARX)

A website built to recommend most relevant BBC articles of your choice sitting on AWS EC2.

(AWS EC2, word2vec, Flask, HTML, Jinja2, Stanford GloVe, Python)

A data pipeline loading BART rides data from 2001 to 2016 to SQL database with data cleansing and manipulation.

(Python, PostgreSQL)