
National Internet Observatory, Northeastern University
Data Engineer / DevOps Engineer • MA, USA
Oct 2024 — Present
- Built distributed data pipelines in Python using Marimo notebooks, Prefect, and Dask to migrate 5–10 million records per minute from MongoDB to PostgreSQL, with Pydantic for schema validation and SQLAlchemy 2.0 for ORM-based ingestion.
- Containerized applications with Docker and deployed them to Kubernetes using Helm, Git, and GitLab CI, standardizing CI/CD for distributed pipelines on a daily cadence across cluster environments.
- Implemented Kubernetes-based monitoring and observability using Prometheus, Grafana, and PromQL—metrics-driven dashboards for pipeline health, throughput, failures, resource utilization, and traffic across workloads processing 5B+ records.
- Built a research-facing visualization platform on a DMZ-hosted VM: Django for authentication and invite-based provisioning, FastAPI/Pydantic APIs, Polars-powered live queries over PostgreSQL—secure self-service analytics for 100+ research users.



