Organisation: CloudCIX · Version: v1.0 · March 2026
Scope of this documentation: The ML Services pipeline handles URL ingestion, content processing, vector embedding, and corpus search. It is the indexing and retrieval layer for MLWorkbench. Chat, Q&A, and conversational AI are handled by a separate service and are not covered here.
| Stage | Service | Responsibility |
|---|---|---|
| Gateway | nginx | Rate limiting · request routing |
| 1 — Control plane | Orchestrator | Job state · stage callbacks · Postgres |
| 2 — Fetch | Scraper | HTTP fetch · robots.txt · domain rate limits |
| 3 — Process | Chunking | Token-aware splitting · ordered chunks |
| 4 — Index & retrieve | Embedding DB API | H100 vectors · vector DB · corpus search |
| Role | Read first | Then |
|---|---|---|
| 👩💻 Developer | System Design §1 → Design Concepts A & C | Implementation Guide §3 & §5 |
| 🔧 SRE / DevOps | Infrastructure §1 & §4 → Implementation §6 & §7 | Observability §9 (alerts) |
| 🧪 QA / Tester | System Design §2 & §5 | QA & Test Strategy (all) |
| 📋 Project Manager | Project Charter | System Design §1 table & §9 |
| 📊 Stakeholder | Project Charter — executive summary & metrics | Infrastructure §5 (capacity tiers) |
→ See the Document Reading Guide for a full breakdown per role including what to skip.
| Document | What it covers | Audience |
|---|---|---|
| System Design | Architecture, data flows, job state machine, auth model, MLWorkbench API contracts | Developers · SRE · QA |
| Design Concepts | The why behind every pattern — idempotency, circuit breakers, rate limiting, saga, dual transport | Developers · QA |
| Technology Choices | Redis vs Kafka, Postgres vs MongoDB, FastAPI vs Flask, self-hosted H100 vs OpenAI cloud — full alternatives considered | Developers · Tech Lead |
| Implementation Guide | Code patterns, Docker Compose for all five VMs, nginx config, environment variables, PR checklist | Developers · SRE |
| Observability Guide | All metrics, 15 alerting rules (incl. OOM), Alloy config, Grafana dashboards, debugging & OOM triage | SRE · Developers |
| Infrastructure Specification | VM specs, load model, throughput at 3 capacity tiers, bottleneck map, scaling & OOM runbooks | SRE · PM |
| Document | What it covers | Audience |
|---|---|---|
| QA & Test Strategy | Test pyramid, full API contract matrix, state machine tests, 7 chaos scenarios, performance baselines, risk register | QA · Developers · PM |
| Project Charter | Executive summary, goals/non-goals, SLOs, milestones, team responsibilities, risk register | PM · Stakeholders |
| Document Reading Guide | Exactly which documents each role should read and skip, with a quick-reference matrix | Everyone |
| Vibe Coding & Setup Guide | Two-developer AI-assisted build guide — work split, build order, prompt templates, go-live checklist | Developers |
| Concern | Decision | Revisit when |
|---|---|---|
| Pipeline control | Orchestrator drives all stages via callbacks — services never call each other | — |
| Job state | Postgres — queryable, durable, survives restarts | — |
| Queues | 3 Redis instances — one per consumer VM (Option C) | — |
| Embedding inference | Self-hosted H100 — OpenAI-compatible API, no external cost | H100 utilisation > 80% sustained |
| Auth | API key → Membership → CallerContext abstraction | Membership returns identity claims |
| Rate limiting | nginx (structural) + per-service fair share (business) | API gateway with plugin ecosystem needed |
| Observability | LGTM+P via Alloy sidecar — no standalone Prometheus in production | — |
| Chat / Q&A | Out of scope — separate service not documented here | — |
| Item | Status |
|---|---|
| H100 inference server | ✅ Operational |
| LGTM+P monitoring cluster | ✅ Operational |
| DLQ implementation | ⚠️ Pending — Phase 3 (monitored but not yet built) |
| SSE job progress streaming | 🔵 Planned — Phase 4 (polling sufficient for now) |
| Multi-tenant auth | 🔵 Planned — blocked on Membership identity claims |
| Phase | Target | Status |
|---|---|---|
| Phase 1 — Foundation | Complete | ✅ Done |
| Phase 2 — Pipeline | Q1 2026 | 🔄 In progress |
| Phase 3 — Production Readiness | Q2 2026 | 🔵 Upcoming |
| Phase 4 — Growth | Post-launch | 🔵 Planned |
CloudCIX ML Services · v1.0 · March 2026 · 10 documents