I design the data infrastructure that makes AI systems intelligent — RAG pipelines, vector search, and LLM integrations grounded in reliable, governed data. 5 years in finance-domain data engineering. Databricks Certified GenAI Engineer.
LLMs are only as good as the context they receive. I build the data layer that grounds AI agents in accurate, governed, domain-specific knowledge.
Retrieval-Augmented Generation architectures on Databricks — chunking, embedding, retrieval and re-ranking for finance-domain documents.
Databricks Vector Search & Unity Catalog for governed, low-latency semantic retrieval over large document corpora.
LLM chains and agent orchestration using MLflow, Model Serving, and structured tool-use patterns for data-grounded AI.
Beyond client work, I build things for fun — personal sandboxes for exploring new tech, architectures, and ideas.
A local zero-ops orchestrator built on Prefect 3.x to run CLI-based AI agents (Codex, Gemini). Focuses on determinism, "No-API" zero-cost execution, Pydantic validation, and scalable Map-Reduce patterns.
An on-demand AI research framework based on the Medallion architecture (Bronze -> Silver -> Gold). Employs Map-Reduce workflows and Pydantic validation to extract and synthesize structured insights without hallucination or "lost in the middle" errors.
A modular, code-driven pipeline that transforms raw screencasts and JSON scripts into polished MP4 demo videos. Orchestrates OpenAI TTS, Whisper, Remotion (React), and FFmpeg for synced audio, complex compositing, and agentic QA.
An AI-maintained Personal Knowledge Management vault. Claude acts as a disciplined librarian via custom CLI commands to ingest articles, enforce schemas, maintain atomic wiki pages with backlinks, and autonomously lint the knowledge graph.
A specialized AI system prompt and methodology designed to systematically refactor legacy workflows into modern AI-orchestrated pipelines. It enforces Right-Sized Ops, Zero-Ops deterministic evaluations, and rigorous selection of agentic design patterns prior to any code generation.
You are looking at it! Built with semantic HTML, CSS, and vanilla JS. Documented and deployed via GitHub Pages.
Automated processing and visualization of e-cycling race results. Python scripts to parse and generate graphics.
A fun project exploring cycling events data, built with Streamlit.
E-Sports Championship transparency project.
Ambitious Full Stack AI-platform (SvelteKit + Python). "Failed due to its size" but a massive learning ground for architecture and complex orchestrations.
Automated workflow to sync documentation from multiple projects into a central repository.
An autonomous Map-Reduce pipeline orchestrated by Prefect. It fetches RSS/YouTube sources, processes them into structured JSON via Gemini CLI in parallel (Silver), synthesizes them (Gold), and automatically ingests the final digest into a local PKM via Claude CLI.
Monday routine orchestrated by Prefect that checks event official status and triggers a full data visualization pipeline locally when results are ready.
Weekly automation using React fiber clicks and setTimeout-queued JS sequences to update event routes in Zwift's UI.
Prefect agent that runs weekly scraping scripts and pushes updates via GitHub Git Data API to trigger Streamlit Cloud redeployments.
Automated trackers for Verkkokauppa cycling equipment outlet availability and Microsoft Cloud Skills Challenge vouchers, alerting directly via Cowork app.
Formal certifications and ongoing hands-on achievements.
Design and implement an MLOps and GenAIOps infrastructure, manage machine learning model lifecycles, implement generative AI quality assurance and observability, and optimize generative AI systems and model performance.
Design and implement LLM-enabled solutions, RAG applications, and LLM chains using Databricks Vector Search, Model Serving, MLflow, and Unity Catalog.