AI Engineer · Building in public

From data pipelines to reliable AI systems.

I spent five years building production data pipelines and platforms. Now, as an AI Engineer, I'm applying that foundation to reliable AI systems and production GenAI—and sharing what I learn along the way.

Explore my work Follow on LinkedIn ↗

Professional experience

Five years in production before AI.

My AI work is built on experience from regulated data environments where reliability, traceability and operational control are requirements—not optional improvements.

Anonymized professional work · Finance

Finance & Investment Data Platform

Legacy calculations and fragmented dataflows made investment reporting difficult to maintain.

I designed and implemented daily pipelines for return, risk and market data, modernizing performance calculations and reporting workflows.

Azure Data Factory · Databricks · dbt · Bicep

Anonymized professional work · Migration

Large-scale Data Platform Migration

More than 100 Databricks notebooks had to move to new data sources without silently changing their outputs.

I used automated comparison testing and a structured AI-assisted workflow to make the refactoring repeatable and reviewable.

Databricks · Python · PySpark · Regression testing

Anonymized professional work · Compliance

Regulatory Data Integration

A manual compliance workflow needed a dependable connection between operational data and an external regulatory service.

I delivered a scheduled Snowflake-to-REST integration with validation, error handling, auditability and secure configuration.

Snowflake · Python · Azure Web Apps · Terraform

Customer names, data and implementation details are intentionally omitted.

Selected public work

The same engineering principles, built in public.

A small selection of systems that demonstrate reliability, traceability and operational control. Work in progress and known limitations are labelled explicitly.

Featured public system · ContextVault v0.1.0

Governed memory for long-running agentic software work.

Coding agents need several kinds of context: session state, repo context, cross-repo context, retrieval context and governed memory.

ContextVault focuses on the last layer: keeping architectural decisions, preferences and warnings current when work spans many sessions and multiple agents.

session state repo context cross-repo context retrieval governed memory

Read case study GitHub ↗ Release ↗

ContextVault architecture: governed writes and Markdown records flow through DuckDB memory service into budgeted memory packs for CLI, HTTP and MCP adapters.

47/47 automated tests

3/3 golden evaluation cases

Hosted CI and clean-clone verification

CLI, HTTP, Docker and read-only MCP

Featured public system · Information Digest Agentic

Auditable AI signal pipeline for technical research.

Reading more sources does not solve information overload if the synthesis cannot show where its conclusions came from.

Information Digest Agentic separates source ingestion, typed Silver analysis, Gold synthesis, specialist lenses and downstream delivery so each AI step has a deterministic boundary and an audit trail.

raw ingest typed analysis source audit agent runtime run health

Read case study GitHub branch ↗

133 offline tests passing

83/83 Silver analyses in latest recorded run

6/6 specialist lens reports completed

Fetch, source-health, pipeline-health and agent audit logs

Published hackathon build · Agent workflow & MCP

Information Digest — Azure Hackathon

Large source volumes need structure and provenance, not just another summary.

The system separates ingestion, typed intermediate data, agent analysis and synthesis, with MCP services for blog, GitHub and YouTube sources.

Public repository · 3 FastMCP services · Latest audit: 65/69 tests passing; hardening in progress

Repository ↗

Public build · Data engineering

Cycling Events

Nine event sources create duplicates, conflicting fields and inconsistent locations.

A shared schema, source precedence and entity resolution turn the feeds into one maintained public map.

9 source adapters · Deduplication · Geocoding · Public application · Data quality metrics and licensing documentation in progress

Live app ↗ Repository ↗

My path

From production data to production AI.

I came to AI through governed data platforms, financial data pipelines, integrations and migrations.

That background shapes how I build AI systems. I care about what happens after the prototype works: evaluation, observability, deterministic boundaries, security and cost.

I'm building toward AI Architecture by designing real systems, documenting my decisions and sharing the useful lessons along the way.

years in production data engineering before moving into AI systems.

AI Engineer · Solita

2026–Present

Applying a production data engineering foundation to AI systems and GenAI delivery.

Data Engineer · Solita

2021–2026

Production data platforms, migrations and integrations across regulated and finance-domain environments.

Research · University of Oulu

2019–2020

Learning analytics, data mining and applied statistical analysis.

Building in public

Notes from the work—not generic AI commentary.

I share engineering decisions, working evidence, failed assumptions and practical lessons from building data and AI systems.

Read on LinkedIn ↗

AI × Data Engineering Digest

What changed, why it matters and what it means in production.

I regularly collect and structure developments across AI and data engineering, then highlight one practical theme at a time. A few recent issues:

Weekly digest, week 30 2026: allow, ask or deny before an agent tool call has impact

Vko 30 · 24.07.2026

ALLOW / ASK / DENY before impact

Weekly digest, week 29 2026: model routing should be selected by measured task data and cost

Vko 29 · 17.07.2026

Route models by task data, not brand

Weekly digest, week 28 2026: build evaluation data and contracts before changing the model

Vko 28 · 10.07.2026

Build evals and contracts before model swaps

Credentials

A supporting signal, not the portfolio.

Formal certifications complement the systems and evidence above.

Microsoft · 2026

Machine Learning Operations Engineer Associate

MLOps, GenAIOps, quality assurance and observability.

Databricks · 2026

Generative AI Engineer Associate

LLM applications, retrieval systems and production operations.

Databricks · 2022

Data Engineer Associate

The production data engineering foundation behind my AI work.

Connect

Interested in reliable AI systems, production GenAI or the engineering behind them?

Follow my work, challenge an idea or start a conversation.

LinkedIn ↗ GitHub ↗