Machine Learning Projects
The Ocean’s Memory: Regression prediction on Ocean Temperature Using over 60 Years of CalCOFI Environmental Data
Built a leakage-safe regression pipeline on nearly one million CalCOFI oceanographic records, using selected environmental and spatial features to predict Pacific Ocean temperature across over 60 years of observations.
Animal Classification with Regularized Models and Compact Ensembles
Created a mini-classifier to separate zoo animals into biological groups using features such as hair, feathers, milk, eggs, fins, legs, and aquatic behavior. Regularized models and compact ensemble methods were compared to build a simple, controlled classification pipeline.
Parametric and Non-Parametric statistical tests
Built a comprehensive statistical testing workflow covering both parametric and non-parametric methods. The project demonstrates how to choose and apply appropriate tests such as t-test, ANOVA, Mann-Whitney U, Kruskal-Wallis, Wilcoxon, Chi-square, Pearson, and Spearman using Python and SciPy.
Ensemble Learning for Penguin Species Multiclass Classification Using Morphological and Ecological Features
Built a multiclass ML pipeline to classify penguin species using body measurements and ecological features. Compared baseline and ensemble models with cross-validation and tuning to select the best model.
End-to-End Census Income Classification with EDA and Hyperparameter Tuning
Built an end-to-end machine learning pipeline on the Adult Census Income dataset to predict whether an individual earns above or below $50K per year. The project covers data cleaning, EDA, preprocessing, model comparison, hyperparameter tuning, and final evaluation with Gradient Boosting as the best
Cloud-Native ML Deployment Architecture (Docker, Registry, GKE)
Building and deploying a machine learning model to predict diabetes risk using patient health data, packaging the model in a Docker container, exposing it through a FastAPI service, and deploying it to Kubernetes with an automated CI/CD pipeline.
Engineering a Production-Grade LLM Tutor for Structured Mathematics Reasoning
Engineering a production-grade LLM-powered IGCSE mathematics tutoring platform by integrating a React frontend, a Django backend, and PostgreSQL persistence, orchestrating controlled model inference with structured prompt governance, containerizing the system with Docker, and implementing monitoring
Price predictive regression analysis, operationalizing Tabular ML: CI/CD, Docker, Kubernetes, Observability
Building a regression pipeline to predict housing prices using the California housing dataset, applying preprocessing and feature engineering with scikit-learn, tracking experiments with MLflow, serving predictions through a FastAPI API, containerizing the service with Docker, monitoring it using Pr
Building a RAG Question Answering System Using LLM models and Vector Databases
Developing a retrieval-augmented question answering system over 7,500+ pages of Cambridge IGCSE Mathematics past papers by ingesting and chunking exam PDFs, generating embeddings and indexing them in a vector database, retrieving relevant context through semantic search, and generating step-by-step
Web Scraping and Analysis of Job Market Data in Germany
Collecting and structuring 22K+ job listings from a German job portal using a Python web scraping pipeline, enabling large-scale analysis of sector demand, job distribution, and geographic employment patterns across Germany.
Applied Data Science: Analytics, Visualization, and Machine Learning
Projects exploring data analysis, machine learning, and interactive visualization using R. The work focuses on analyzing film industry trends, streaming platform datasets, and healthcare data to uncover patterns and meaningful insights.
Why MLOps Feels Like a Basketball Court at First (Until the Patterns Appear)
Exploring MLOps through simple analogies drawn from everyday learning experiences. Using the ideas of pattern recognition, a crowded basketball court, and Stephen Covey’s “sharpen the saw” principle, it shows how complex systems become understandable once their structure appears.