AI Data Engineer · Open to roles

Building the
context layer
for AI agents.

I design the data infrastructure that makes AI systems intelligent — RAG pipelines, vector search, and LLM integrations grounded in reliable, governed data. 5 years in finance-domain data engineering. Databricks Certified GenAI Engineer.

RAG Pipelines Vector Search LLM Integration Agentic Workflows
5+
Years experience
100+
Notebooks migrated
100%
Data consistency
2026
GenAI certified
profile photo
Specialization

Context engineering
for AI systems

LLMs are only as good as the context they receive. I build the data layer that grounds AI agents in accurate, governed, domain-specific knowledge.

RAG Pipelines

Retrieval-Augmented Generation architectures on Databricks — chunking, embedding, retrieval and re-ranking for finance-domain documents.

Vector Search

Databricks Vector Search & Unity Catalog for governed, low-latency semantic retrieval over large document corpora.

Agentic Workflows

LLM chains and agent orchestration using MLflow, Model Serving, and structured tool-use patterns for data-grounded AI.

Career

Work Experience

Mar 2021 – Present

Data Engineer

Solita · Oulu, Finland
  • Modernizing mission-critical data platforms for a major Finnish pension insurance company.
  • Designing domain-oriented architecture (Data Mesh) for investment, risk management, and actuarial reporting.
  • Leading legacy migrations with automated parallel run testing — proven 100% data consistency.
  • Building scalable pipelines for liability calculations, financial data aggregation, and regulatory compliance (EUDR).
  • Stack: Azure Databricks · PySpark · Python · Azure Data Factory · dbt · SQL · Terraform
Aug 2020 – Dec 2020

Doctoral Student / Researcher

University of Oulu
  • Research in Academic Analytics — BI and Data Mining solutions for Higher Education.
  • Teaching assistant for Python basics course.
Jun 2019 – Aug 2020

Research Assistant

University of Oulu
  • Learning Analytics: data cleaning, feature engineering, EDA.
  • Ad-hoc analysis beyond standard BI tools.
  • Master's Thesis: "Simulation analysis of higher education teaching linked with financing model."
Portfolio

My Projects

Data Engineering · 2023–2025

Finance & Investing Data Platform

Challenge: Modernize investment performance calculations, liberating dataflows from legacy systems.
Solution: Built an isolated platform segment with daily end-to-end pipelines for Return, Risk, and market data. Revamped cash flow reporting, orchestrated SQL warehouse migrations.
Azure Data Factory Databricks dbt SQL Servers Bicep
Platform Engineering · 2023–2025

General Data Platform

Challenge: Create a modern, agile data platform empowering the organization to build dataflows swiftly.
Solution: Comprehensive Azure-based platform with robust user management, security, and optimized data streaming.
Azure Data Factory Databricks PySpark Alation
GenAI + Data Eng · 2023–2025

Data Migration & GenAI-Accelerated Refactoring

Challenge: Migrate 100+ legacy Databricks notebooks to new data sources ensuring 100% data consistency.
Solution: Custom test bench for automated regression testing + GenAI-augmented workflows for systematic refactoring at speed.
Databricks Python PySpark Automated Testing
GenAI · POC

GenAI-Augmented Migration Pipeline POC

Challenge: Automate migration of complex legacy Informatica workflows to modern dbt models.
Solution: Semi-automated migration pipeline using GenAI to interpret transformation logic and generate compliant dbt code.
GenAI dbt Python Legacy ETL
Compliance · 2025

Regulatory Data Integration (EUDR)

Challenge: Attach traceability numbers to transport orders for EU Deforestation Regulation compliance — replacing manual updates.
Solution: Production-ready GET → MODIFY → POST integration with daily scheduling, error handling, full audit logging, and secure secret management.
Azure Web Apps Snowflake Python (Flask) Docker Terraform
Personal

Architecture Sandboxes & Personal Projects

Beyond client work, I build things for fun — personal sandboxes for exploring new tech, architectures, and ideas.

Frontend / Modern Stack

Portfolio Website

You are looking at it! Built with semantic HTML, CSS, and vanilla JS. Documented and deployed via GitHub Pages.

Data Engineering / Visualization

MyWhoosh Race Results

Automated processing and visualization of e-cycling race results. Python scripts to parse and generate graphics.

Streamlit App & Data

Cycling Events

183

A fun project exploring cycling events data, built with Streamlit.

Esports Transparency & Analytics

Streamlit E-SM

519

E-Sports Championship transparency project.

Full Stack AI Platform

InsightHub

Ambitious Full Stack AI-platform (SvelteKit + Python). "Failed due to its size" but a massive learning ground for architecture and complex orchestrations.

DevOps / Automation

Centralized Docs Sync

Automated workflow to sync documentation from multiple projects into a central repository.

More on GitHub

Explore other repositories, experiments, and learnings.

Writing

Latest Articles

Thoughts on Data Engineering, AI, and tech.

Loading articles…

Expertise

Skills & Technologies

Core Competencies

Generative AI Engineering
RAG Applications
AI Data Engineering
Platform Engineering
MLOps & DataOps

Technologies & Tools

Vector Search & RAG MLflow & Model Serving Databricks & Spark Azure (ADF, Synapse) Python & PySpark dbt & Unity Catalog Docker & Terraform

Domain Knowledge — Finance

Corporate Finance Financial Risk Mgmt Financial Engineering Econometrics Banking & Insurance
Credentials

Certifications & Learning

Formal certifications and ongoing hands-on achievements.

Official Certifications

Databricks Certified Data Engineer Associate

Issued October 2022

Microsoft Learn Modules

Get in touch

Contact

Tweaks