Skip to content

TotlProvision — Product Roadmap & Build Plan

Status: proposal for sign-off. No engine/backend code written until Phase 0 is approved. Owner: Robert. Drafted 2026-06-18.

Vision

A versatile, white-labelable Windows provisioning + ongoing-management product an MSP can sell to other MSPs. Hands-off setup of 10,000s of machines, secure-by-default, with a Cloudflare-hosted backend for fleet reporting, secret escrow (admin passwords + BitLocker keys) behind M365 SSO, and multi-tenant isolation from day one.

Locked architecture decisions

  • Backend store: Cloudflare D1 (SQLite). Relational: tenants, machines, configs, run results, escrowed secrets, audit log.
  • Auth to portal: Cloudflare Access (Zero Trust) + Entra/M365 SSO. No hand-rolled OAuth.
  • Multi-tenancy: tenant-aware from day 1 — every table carries tenant_id; ships usable single-tenant, never needs re-architecture.
  • Escrow scope: admin passwords + BitLocker recovery keys, same encryption pattern.

What we keep (don't rewrite)

The existing engine is well-built and stays: phase/module/orchestrator split, resume-via-scheduled-task + state.json, structured JSON logging, config-merge-over-defaults, SHA256 install verification, NetBIOS-safe rename, idempotent winget exit codes. We extend it; we do not rebuild it. Only the paid-Profwiz migration path is replaced (see Phase 5).


System shape

ENGINE (on each PC, PowerShell)         BACKEND (Cloudflare)          PORTAL (Cloudflare Pages)
 ├ preflight → phases → validation        ├ Worker API                  ├ behind Access + Entra SSO
 ├ generates random admin pwd             │   /v1/report (ingest)        ├ reveal password / BitLocker key
 ├ enables BitLocker                      │   /v1/secret (escrow)        ├ fleet dashboard (pass/fail)
 ├ encrypts secrets to public key  ─────► │   /v1/config (pull)          ├ config editor
 └ POSTs result + ciphertext              ├ D1 (relational)              └ audit log viewer
                                          └ R2 (bulk logs)

Secrets flow (zero-knowledge): machine encrypts to a per-tenant public key; ciphertext is all that ever reaches D1. The private key never touches Cloudflare — it's delivered to the engineer's browser only after Access SSO and decryption happens client-side (WebCrypto). Cloudflare/D1 physically cannot read escrowed secrets. Every reveal is audit-logged.

Hard consequence — key recovery / break-glass. Because Cloudflare can't decrypt, a lost tenant private key means all that tenant's escrowed secrets are unrecoverable. Phase 0 must include a recovery design: backed-up private key, split via Shamir (e.g. M-of-N) or sealed offline copies. No exceptions — this is decided before any escrow code ships.


Phased roadmap

Phase 0 — Foundations (gate for everything)

  • D1 schema (tenant-aware): tenants, machines, configs, runs, secrets, audit, api_tokens, users/roles.
  • Worker skeleton + routing, per-tenant API-token auth for machine ingest.
  • Crypto model: zero-knowledge. Per-tenant keypair; machine-side encrypt (PowerShell .NET RSA-OAEP + AES-GCM envelope); browser-side decrypt (WebCrypto). Private key never stored on Cloudflare. Key generation + passphrase-wrapping + recovery/break-glass (Shamir M-of-N) designed and documented here.
  • Local repo restructure: engine/ (current src), backend/ (Worker+D1), portal/ (Pages), build/.
  • Tooling (cross-cutting): CI (.github/workflows/ci.yml) — Node + Pester + mkdocs build --strict; MkDocs Material docs site auto-deploying to Cloudflare Pages on push (docs/deploying-docs.md).
  • No real machine actions yet — just the spine + tests.

Data classification (what gets encrypted in D1)

  • App-layer encrypted, retrievable (zero-knowledge): BitLocker recovery keys, local-admin passwords, BIOS/setup passwords, domain/Entra-join credentials, RMM installer keys/tokens, Wi-Fi PSKs.
  • Hashed, never retrievable: machine API tokens (salted hash; verify-only, so a DB leak can't replay).
  • Field-level encrypted (PII): usernames / emails.
  • Tenant-isolated + access-controlled, not app-encrypted: run pass/fail, timings, phase names, asset inventory (serial/model/specs).
  • Append-only / tamper-evident (integrity over secrecy): audit log.
  • Platform at-rest encryption (Cloudflare D1) is assumed but treated as insufficient on its own for the retrievable-secret classes above.

Phase 1 — Build system & two artifacts

  • build/Build-Release.ps1dist/ runtime-only + Inno compile.
  • Authoring installer (TotlProvision-Setup.exe) → Program Files; ProgramData for writable data.
  • Per-machine provision exe (TotlProvision-Provision.exe) → self-extract to C:\TotlProvision, run bootstrap; OOBE-triggerable via autounattend.xml FirstLogonCommands.
  • .gitignore dist/. Pester: include/exclude manifest (no secrets leak).

Phase 2 — Security hardening (must-have for 10k)

  • Per-machine randomized local-admin password (LAPS-style), unique per box.
  • Failsafe credential cleanup — guaranteed clear of autologon/DefaultPassword even on crash (a separate cleanup task), not just the happy path.
  • LSA-encrypted autologon password instead of plaintext registry; minimize the autologon window.
  • Drop .env secrets from USB media; prompt at runtime or pull from escrow.
  • Pin/verify all bootstrap-time remote downloads (extend SHA256 discipline to Choco/etc.).
  • Code-sign all .ps1 + the exes; move toward AllSigned.

Phase 3 — Reporting & escrow client (engine ↔ backend)

  • Engine module Totl.Report: POST run result (phases, pass/fail, timings, asset inventory) to Worker.
  • Engine module Totl.Escrow: generate + encrypt admin password and BitLocker key, upload ciphertext.
  • Retry/backoff + offline queue (store-and-forward when the site has no network).
  • Portal v1: fleet dashboard (pass/fail per client), secret reveal behind SSO, audit-log viewer.

Phase 4 — Deployment feature parity (what the pros ship)

  • BitLocker enable + key escrow.
  • Dell Command Update driver/firmware phase.
  • Domain / Entra join automation per client.
  • Windows edition upgrade (Home→Pro key injection).
  • Locale / time zone / keyboard / Wi-Fi profile phase.
  • Autopilot hardware-hash harvest (export to portal) for Intune clients.
  • Asset inventory export to dashboard.
  • Offline mode — cached installer source.

Phase 5 — Native migration (replace paid Profwiz)

  • Totl.Migrate native SID-reassign for local→local + on-prem domain (ProfileList rewrite, icacls /substitute, NTUSER.DAT hive ACL rewrite). VM-gated, fully unit-tested before any client box.
  • Free defaults: robocopy data copy + OneDrive Known Folder Move. Profwiz/USMT remain the documented Entra in-place fallback until native is battle-tested.

Phase 6 — UX/UI & operability

  • GUI: presets (Standard/Minimal/Clear, à la WinUtil), save/load named client configs, searchable app list, validation feedback.
  • Live provisioning progress window + end-of-run PASS/FAIL summary (not "read the log").
  • Preflight gate phase: AC power, free disk, network, TPM/SecureBoot/OS build sanity — fail fast.
  • Post-provision validation phase: confirm apps installed, telemetry off, rename took.
  • WinUtil "Standard" tweak parity, reimplemented natively from a pinned WinUtil commit (no runtime dependency on christitus.com). Screenshot-captured defaults below; all exposed as GUI toggles.

Phase 7 — Product layer (sellable to MSPs)

  • RBAC — engineer vs admin; who edits configs vs. who reveals secrets.
  • White-label branding — installer, GUI, portal.
  • Licensing/activation — entitlement enforcement.
  • Versioned, integrity-signed package/app library.
  • Desired-state / drift remediation — re-apply config on a schedule (the Immy.bot differentiator).
  • Rollback/undo of tweaks.
  • VM test harness to validate a config before mass deployment.

Best-practice review of ALL existing scripts (runs through Phases 2 & 6)

Per file: Set-StrictMode, $ErrorActionPreference, param validation, [CmdletBinding()], -WhatIf/-Confirm on destructive ops, no plaintext secrets, TLS 1.2, idempotency/reboot-resume, consistent Write-TotlLog. Each change flagged with rationale; no silent rewrites; cross-file signature impacts traced first.

Known smells to fix: $args shadows the automatic variable (bootstrap, Apps); no error handling on the Winlogon Set-ItemProperty; bootstrap robocopy excludes out — align with dist; no -WhatIf on destructive ops. Files: bootstrap.ps1, Invoke-Provision.ps1, all src/modules/*, gui/New-TotlConfig.ps1, oobe/Build-OobeMedia.ps1, scripts/Migrate-UserProfile.ps1, scripts/Initialize-Git.ps1, data/Update-AppxCatalog.ps1.

WinUtil "Standard" parity (captured from screenshot)

Essential — ON: Create Restore Point, Delete Temp Files, Disable ConsumerFeatures, Disable Telemetry, Disable Activity History, Disable GameDVR, Disable Hibernation, Disable Homegroup, Prefer IPv4 over IPv6, Disable Location Tracking, Disable Storage Sense, Disable Wifi-Sense, Set Services to Manual. Essential — OFF: Enable End Task w/ Right Click, Run Disk Cleanup, Terminal default PS5→PS7, Disable PS7 Telemetry, Set Hibernation default (laptops), Debloat Edge. Preferences — ON: Dark Theme, Snap Window, Snap Assist Flyout, Snap Assist Suggestion, Mouse Acceleration, Sticky Keys, Show File Extensions, Search Button in Taskbar, Center Taskbar Items, NumLock. Preferences — OFF: Bing Search in Start Menu, Verbose Logon, Show Hidden Files, Task View Button, Widgets Button, Detailed BSoD. * = Robert's red-boxed Totl defaults (Dark on, NumLock on, End Task off, Disk Cleanup off). All tweaks exposed as per-build GUI toggles; values above are the loaded defaults. Note: disableTelemetry is currently false in default config — Standard sets it on; default will change.


Resolved / open items

Resolved: installer = Inno EXE; install path = Program Files + ProgramData; launcher modes (Configure / Create USB / Provision); End Task + Disk Cleanup off-by-default toggles; native tweaks from pinned commit; D1 / Cloudflare Access / tenant-aware / passwords+BitLocker escrow; Inno Setup verified on build machine.

Open before Phase 0 code: 1. Per-machine provision exe packaging: Inno (recommended) vs IExpress. 2. Cloudflare account/zone, D1, Pages, and Access (Entra app registration) provisioned by you — I'll list exactly what to create when Phase 0 starts. 3. Repo restructure (engine/, backend/, portal/) acceptable? It touches paths in current scripts.

Suggested execution order

P0 (foundations) → P2 (security) → P1 (build/installer) → P3 (reporting/escrow) → P6 (UX + preflight + tweaks) → P4 (deployment parity) → P5 (migration) → P7 (product layer). Security before convenience.

Docs to update per phase

README.md, docs/RUNBOOK.md, docs/SECURITY.md (escrow/crypto/SSO), docs/ARCHITECTURE.md (new), CHANGELOG.md, VERSION.