AI Data Engineer · Open to roles

Building the
context layer
for AI agents.

I design the data infrastructure that makes AI systems intelligent — RAG pipelines, vector search, and LLM integrations grounded in reliable, governed data. 5 years in finance-domain data engineering. Databricks Certified GenAI Engineer.

RAG Pipelines Vector Search LLM Integration Agentic Workflows
5+
Years experience
100+
Notebooks migrated
100%
Data consistency
2026
GenAI certified
profile photo
Specialization

Context engineering
for AI systems

LLMs are only as good as the context they receive. I build the data layer that grounds AI agents in accurate, governed, domain-specific knowledge.

RAG Pipelines

Retrieval-Augmented Generation architectures on Databricks — chunking, embedding, retrieval and re-ranking for finance-domain documents.

Vector Search

Databricks Vector Search & Unity Catalog for governed, low-latency semantic retrieval over large document corpora.

Agentic Workflows

LLM chains and agent orchestration using MLflow, Model Serving, and structured tool-use patterns for data-grounded AI.

Career

Work Experience

Mar 2021 – Present

Data Engineer

Solita · Oulu, Finland
  • Modernizing mission-critical data platforms for a major Finnish pension insurance company.
  • Designing domain-oriented architecture (Data Mesh) for investment, risk management, and actuarial reporting.
  • Leading legacy migrations with automated parallel run testing — proven 100% data consistency.
  • Building scalable pipelines for liability calculations, financial data aggregation, and regulatory compliance (EUDR).
  • Stack: Azure Databricks · PySpark · Python · Azure Data Factory · dbt · SQL · Terraform
Aug 2020 – Dec 2020

Doctoral Student / Researcher

University of Oulu
  • Research in Academic Analytics — BI and Data Mining solutions for Higher Education.
  • Teaching assistant for Python basics course.
Jun 2019 – Aug 2020

Research Assistant

University of Oulu
  • Learning Analytics: data cleaning, feature engineering, EDA.
  • Ad-hoc analysis beyond standard BI tools.
  • Master's Thesis: "Simulation analysis of higher education teaching linked with financing model."
Portfolio

My Projects

Data Engineering · 2023–2025

Finance & Investing Data Platform

Challenge: Modernize investment performance calculations, liberating dataflows from legacy systems.
Solution: Built an isolated platform segment with daily end-to-end pipelines for Return, Risk, and market data. Revamped cash flow reporting, orchestrated SQL warehouse migrations.
Azure Data Factory Databricks dbt SQL Servers Bicep
Platform Engineering · 2023–2025

General Data Platform

Challenge: Create a modern, agile data platform empowering the organization to build dataflows swiftly.
Solution: Comprehensive Azure-based platform with robust user management, security, and optimized data streaming.
Azure Data Factory Databricks PySpark Alation
GenAI + Data Eng · 2023–2025

Data Migration & GenAI-Accelerated Refactoring

Challenge: Migrate 100+ legacy Databricks notebooks to new data sources ensuring 100% data consistency.
Solution: Custom test bench for automated regression testing + GenAI-augmented workflows for systematic refactoring at speed.
Databricks Python PySpark Automated Testing
GenAI · POC

GenAI-Augmented Migration Pipeline POC

Challenge: Automate migration of complex legacy Informatica workflows to modern dbt models.
Solution: Semi-automated migration pipeline using GenAI to interpret transformation logic and generate compliant dbt code.
GenAI dbt Python Legacy ETL
Compliance · 2025

Regulatory Data Integration (EUDR)

Challenge: Attach traceability numbers to transport orders for EU Deforestation Regulation compliance — replacing manual updates.
Solution: Production-ready GET → MODIFY → POST integration with daily scheduling, error handling, full audit logging, and secure secret management.
Azure Web Apps Snowflake Python (Flask) Docker Terraform
Personal

Architecture Sandboxes & Personal Projects

Beyond client work, I build things for fun — personal sandboxes for exploring new tech, architectures, and ideas.

AI Orchestration / Agentic workflows

Local AI Agent Orchestrator

A local zero-ops orchestrator built on Prefect 3.x to run CLI-based AI agents (Codex, Gemini). Focuses on determinism, "No-API" zero-cost execution, Pydantic validation, and scalable Map-Reduce patterns.

AI Research / Medallion Architecture

Agentic Data Research Framework

An on-demand AI research framework based on the Medallion architecture (Bronze -> Silver -> Gold). Employs Map-Reduce workflows and Pydantic validation to extract and synthesize structured insights without hallucination or "lost in the middle" errors.

Media Automation / Python & Node

Agentic Video Automation Pipeline

A modular, code-driven pipeline that transforms raw screencasts and JSON scripts into polished MP4 demo videos. Orchestrates OpenAI TTS, Whisper, Remotion (React), and FFmpeg for synced audio, complex compositing, and agentic QA.

AI Knowledge Graph / Obsidian

Autonomous Second Brain (PKM)

An AI-maintained Personal Knowledge Management vault. Claude acts as a disciplined librarian via custom CLI commands to ingest articles, enforce schemas, maintain atomic wiki pages with backlinks, and autonomously lint the knowledge graph.

AI System Architecture / Methodology

AI Migration Architect Agent

A specialized AI system prompt and methodology designed to systematically refactor legacy workflows into modern AI-orchestrated pipelines. It enforces Right-Sized Ops, Zero-Ops deterministic evaluations, and rigorous selection of agentic design patterns prior to any code generation.

Frontend / Modern Stack

Portfolio Website

You are looking at it! Built with semantic HTML, CSS, and vanilla JS. Documented and deployed via GitHub Pages.

Data Engineering / Visualization

MyWhoosh Race Results

Automated processing and visualization of e-cycling race results. Python scripts to parse and generate graphics.

Streamlit App & Data

Cycling Events

183

A fun project exploring cycling events data, built with Streamlit.

Esports Transparency & Analytics

Streamlit E-SM

519

E-Sports Championship transparency project.

Full Stack AI Platform

InsightHub

Ambitious Full Stack AI-platform (SvelteKit + Python). "Failed due to its size" but a massive learning ground for architecture and complex orchestrations.

DevOps / Automation

Centralized Docs Sync

Automated workflow to sync documentation from multiple projects into a central repository.

AI Data Pipeline / Medallion

Information Digest Pipeline

An autonomous Map-Reduce pipeline orchestrated by Prefect. It fetches RSS/YouTube sources, processes them into structured JSON via Gemini CLI in parallel (Silver), synthesizes them (Gold), and automatically ingests the final digest into a local PKM via Claude CLI.

Data Automation Agent

MyWhoosh Results Pipeline

Monday routine orchestrated by Prefect that checks event official status and triggers a full data visualization pipeline locally when results are ready.

UI Automation Agent

Zwift Route Updater

Weekly automation using React fiber clicks and setTimeout-queued JS sequences to update event routes in Zwift's UI.

DataOps Agent

Cycling Events Updater

Prefect agent that runs weekly scraping scripts and pushes updates via GitHub Git Data API to trigger Streamlit Cloud redeployments.

Web Scraping Agent

Equipment & Voucher Watchers

Automated trackers for Verkkokauppa cycling equipment outlet availability and Microsoft Cloud Skills Challenge vouchers, alerting directly via Cowork app.

More on GitHub

Explore other repositories, experiments, and learnings.

Writing

Latest Articles

Thoughts on Data Engineering, AI, and tech.

Loading articles…

Expertise

Skills & Technologies

Core Competencies

Generative AI Engineering
RAG Applications
AI Data Engineering
Platform Engineering
MLOps & DataOps

Technologies & Tools

Vector Search & RAG MLflow & Model Serving Databricks & Spark Azure (ADF, Synapse) Python & PySpark dbt & Unity Catalog Docker & Terraform

Domain Knowledge — Finance

Corporate Finance Financial Risk Mgmt Financial Engineering Econometrics Banking & Insurance
Credentials

Certifications & Learning

Formal certifications and ongoing hands-on achievements.

Official Certifications

Databricks Certified Data Engineer Associate

Issued October 2022

Microsoft Learn Modules

Get in touch

Contact

Tweaks