Architecture

Local inference.
Cloud-scale training.

A hybrid architecture that keeps data processing local and private, while leveraging cloud GPU/TPU infrastructure for model training and evaluation.

System Overview

The system is divided into three layers, each with a clear responsibility boundary.

Local AI Inference

On-device models handle data parsing, classification, and real-time reasoning. Sensitive data stays on the user's hardware. Optimized for consumer-grade GPUs and Apple Silicon.

Agent & Tool Execution

Autonomous agents coordinate multi-step workflows through a structured tool interface. Each tool call is logged and auditable. Agents run locally alongside the inference layer.

Cloud Training & Evaluation

Model fine-tuning, evaluation, and iteration run on cloud GPU/TPU infrastructure. Training workloads require high-throughput compute that benefits from elastic cloud scaling.

Why This Split

Privacy: User data is processed locally. No raw data is sent to the cloud during normal operation.
Performance: Local inference eliminates network latency for interactive workflows. Results are immediate.
Training at scale: Iterative model training requires GPU/TPU clusters that are impractical to run on consumer hardware. Cloud infrastructure provides the compute density needed for rapid experimentation cycles.
Cost efficiency: Cloud resources are used only for training and evaluation — bursty, time-bounded workloads that benefit from elastic provisioning rather than always-on infrastructure.

Cloud Infrastructure

Training and evaluation workloads run on Google Cloud Platform:

GPU/TPU compute for model fine-tuning and iterative training runs
Managed storage for training datasets and model artifacts
Evaluation pipelines for automated model quality benchmarking
Elastic scaling to match training workload demands without fixed infrastructure costs

Design Principles

Separation of concerns: Inference, execution, and training are independent layers with well-defined interfaces.
Data minimization: Only anonymized training signals — not raw user data — are used in cloud training pipelines.
Auditability: Every agent action and tool invocation is logged with full provenance.

Local inference. Cloud-scale training.