Skip to content
2 min read

On AI as Force Multiplier

Seed note

After building a production PII detection engine with LLMs at CommandK and now working on AI-powered workspace planning at Saltmine, I have some evolving thoughts on where AI fits in engineering work.

The demo-to-production gap

The gap between an LLM demo and a production LLM system is enormous. Demos handle the happy path. Production handles:

  • What happens when the model hallucinates?
  • What happens when latency exceeds your budget?
  • What happens when the API is down?
  • What happens when the model changes behavior after a provider update?

At CommandK, we ran three detection strategies in parallel: regex patterns, ML classifiers, and LLM-based classification. No single approach was reliable enough alone. The ensemble was.

Force multiplier, not replacement

The best AI integrations I have built amplify human capability rather than replace it:

  • The PII detection engine suggests findings, but a human reviews and confirms
  • The workspace recommendation engine generates options, but a planner evaluates and decides
  • Claude Code proposes architecture, but I review and adapt

The pattern: AI handles the exhaustive search, humans handle the judgment.

What I am still figuring out

  • How to evaluate LLM outputs systematically when there is no single correct answer
  • The right balance between cost and quality when you are making thousands of API calls per day
  • How to build user trust in AI-assisted workflows where the stakes are real
  • Whether RAG or fine-tuning is the right approach for domain-specific knowledge (leaning RAG for now)

This note is a seed. These thoughts will mature as I build more.