top of page
PROJECTS
Prints
Toxic Comment Classification

A multi-label classification Machine Learning problem to detect toxic comments from Wikipedia’s talk page edits.

​

(Naive Bayes, Logistic Regression, Support Vector Machine, Gradient Boosting)

Taxi on the Street
NYC Taxi Demands Prediction

A data pipeline which automates NYC taxi and weather data extraction from AWS S3 to MongoDB and connected to Spark for taxi demands prediction. 

​

(AWS S3, EC2, EMR, ETL, Data Pipeline, MongoDB, Spark SQL, Spark ML, Pyspark, Linear Regression, Random Forests, Gradient Boosting)

Checking Text on a Document
Twitter Sentiment Analysis

Sentiment analysis for the automated-fetched tweets shown as a digested list with different colors.

 

(VADER, Tweepy, Jinja2, Flask, Selenium, REST API)

Display of Stock Market Quotes
Canadian National Bankruptcy Rates Forecasts

A time series prediction forecasting Canadian monthly bankruptcy rates based on data from 2015 to 2017 considering macroeconomics indicators.

 

(R, SARIMA, Holt-Winters, VAR, VARX)

open books
BBC Article Recommendation Engine

A website built to recommend most relevant BBC articles of your choice sitting on AWS EC2.

​

(AWS EC2, word2vec, Flask, HTML, Jinja2, Stanford GloVe, Python)

At the Station
BART (Bay Area Rapid Transit) Rides Data Manipulation

A data pipeline loading BART rides data from 2001 to 2016 to SQL database with data cleansing and manipulation. 

 

(Python, PostgreSQL)

bottom of page