Self-hosted AI orchestration infrastructure for a sustained coding workflow, built around one premise: treat LLM inference like a scarce resource, not a convenience.
8-tier cost hierarchy across 12+ providers with ordered fallback chains; context-window exhaustion routes separately from general provider failure - different failure modes, different recovery paths.
14 role-typed agents (Sisyphus, Momus, Oracle, etc.), each with tuned temperature, system prompt constraints, and a model sequence - Momus intentionally runs on the free GLM model since adversarial brainstorming benefits from volume over quality.
Persistent vector memory via Mem0 + Qdrant (1536-dim embeddings) across sessions.
GitHub Actions CI/CD to AWS Lightsail VPS with preflight
RAG pipeline over Cornell course syllabi - natural language queries against PDFs, served from an Azure-hosted REST API at ~50ms average latency.
The non-trivial part was retrieval quality - vector similarity and actual relevance don't always align; chunking strategy and reranking required significant iteration to make responses useful.
Published OSS agent skill for code-review agents deciding whether backward-compatibility code is safe to remove - built because agents routinely approve risky deletions based only on local static search.
5-verdict framework (RETAIN โ DEPLOY OBSERVABILITY โ DEPRECATE โ QUARANTINE โ REMOVE) with evidence scored across code, data, clients, configs, runtime telemetry, and owner history.
Core insight: absence of local references โ absence of consumers - old mobile clients, persisted rows, delayed jobs, and cached payloads all need separate evidence traces.
Tooling: no-dependency Python scanner, Semgrep rules for compatibility patterns, git-archaeology shell script, and an eval suite for output quality iteration.
AI-powered infinite canvas - freeform thoughts go in, the backend classifies and connects them automatically in real time.
GPT-4 for auto-categorization and roadmap generation; text-embedding-ada-002 for similarity scoring across 4 typed edge types (semantic, entity, temporal, causal) rendered in ReactFlow.
Socket.IO pipeline between React frontend and Node/Express backend - processing is async but fast enough that results land before the next thought.
IndexedDB for client-side persistence; no server-side storage by default.
3-service microservices app (client / Express API / ML model) behind Nginx on Kubernetes, with GitHub Actions CI/CD - lint, build, containerize, deploy on merge.
Cross-platform focus timer and task tracker with offline-first sync via MongoDB Realm - built on the premise that the user's main problem is starting, not organizing.
Crowdsourced wifi quality map for Cornell's ~20,000 students - aggregates user-reported location and bandwidth data, clusters it, and surfaces a campus-wide signal map.
Flask backend running OpenCV Haar Cascade face/eye detection on webcam frames, React frontend - built to understand server-side image processing pipelines end to end.