Architecture

Local inference.
Cloud-scale training.

A hybrid architecture that keeps data processing local and private, while leveraging cloud GPU/TPU infrastructure for model training and evaluation.

System Overview

The system is divided into three layers, each with a clear responsibility boundary.

Local AI Inference

On-device models handle data parsing, classification, and real-time reasoning. Sensitive data stays on the user's hardware. Optimized for consumer-grade GPUs and Apple Silicon.

Agent & Tool Execution

Autonomous agents coordinate multi-step workflows through a structured tool interface. Each tool call is logged and auditable. Agents run locally alongside the inference layer.

Cloud Training & Evaluation

Model fine-tuning, evaluation, and iteration run on cloud GPU/TPU infrastructure. Training workloads require high-throughput compute that benefits from elastic cloud scaling.


Why This Split

  • Privacy: User data is processed locally. No raw data is sent to the cloud during normal operation.
  • Performance: Local inference eliminates network latency for interactive workflows. Results are immediate.
  • Training at scale: Iterative model training requires GPU/TPU clusters that are impractical to run on consumer hardware. Cloud infrastructure provides the compute density needed for rapid experimentation cycles.
  • Cost efficiency: Cloud resources are used only for training and evaluation — bursty, time-bounded workloads that benefit from elastic provisioning rather than always-on infrastructure.

Cloud Infrastructure

Training and evaluation workloads run on Google Cloud Platform:

  • GPU/TPU compute for model fine-tuning and iterative training runs
  • Managed storage for training datasets and model artifacts
  • Evaluation pipelines for automated model quality benchmarking
  • Elastic scaling to match training workload demands without fixed infrastructure costs

Design Principles

  • Separation of concerns: Inference, execution, and training are independent layers with well-defined interfaces.
  • Data minimization: Only anonymized training signals — not raw user data — are used in cloud training pipelines.
  • Auditability: Every agent action and tool invocation is logged with full provenance.