Optimizing AI Inferencing for Agentic Operations in Manufacturing: Calvin Cooper - Co-Founder & Coo, Neurometric AI - Industry40.tv | Lyssna här

# AI in Manufacturing Podcast: Episode Show Notes

## Episode: Optimizing AI Inference for Agentic Operations in Manufacturing

**Podcast Name:** AI in Manufacturing Podcast (Industry40.tv)

**Episode Title:** Optimizing AI Inference for Agentic Operations in Manufacturing

**Guest:** Kelvin Cooper, Co-Founder & CEO, Neurometric.ai

**Host:** Kudzai Manditereza

---

## 1. Episode Summary

This episode explores why manufacturing companies struggle to scale AI from pilot to production—and how inference orchestration and small language models (SLMs) offer a practical path forward. Kelvin Cooper, Co-Founder and CEO of Neurometric.ai, joins host Kudzai Manditereza to break down why routing all AI tasks through a single frontier model becomes a cost and reliability liability at scale. Cooper draws on his background in venture capital, private equity AI rollups at Pilot Wave Holdings, and AI policy research at the Milken Institute to argue that the future of industrial AI is not one model that knows everything, but a coordinated system of specialized models that each know their job. The conversation covers Neurometric's AI maturity framework, real customer results showing 10x cost and latency improvements, the concept of catastrophic forgetting, and why manufacturing leaders need to adopt a startup execution mindset rather than over-analyzing use cases. Leaders seeking to cut AI inference costs and accelerate deployment will find actionable strategies throughout.

---

## 2. Key Questions Answered in This Episode

- Why do 95% of AI proof-of-concepts in manufacturing never make it to production?

- How should manufacturers select their first AI use case instead of getting stuck in analysis paralysis?

- What is inference orchestration and why does it matter for scaling AI in manufacturing?

- Why is relying on a single large language model a liability for industrial AI at scale?

- What are small language models (SLMs) and how do they deliver faster, cheaper, and more accurate AI?

- What is catastrophic forgetting and how does it affect AI deployments in manufacturing?

- How can manufacturers avoid vendor lock-in when building AI systems?

---

## 3. Episode Highlights with Timestamps

- **[00:00]** — **Introduction** — Host Kudzai Manditereza introduces the topic of optimizing AI inference for agentic manufacturing operations and welcomes Kelvin Cooper.

- **[00:36]** — **Kelvin Cooper's Background** — Cooper describes Neurometric.ai's mission to "make intelligence essentially free," his role at Pilot Wave Holdings, and his AI policy work at the Milken Institute.

- **[03:32]** — **The Pilot-to-Production Gap** — Discussion on why the vast majority of AI proof-of-concepts fail to reach production and what the startup world can teach manufacturers.

- **[06:52]** — **The Flywheel, Not the Pilot** — Cooper argues that companies mistakenly think the pilot is the product, when what they should be building is a rapid feedback loop of shipping, learning, and iterating.

- **[08:01]** — **Selecting Your First AI Use Case** — Advice on why "just pick and execute" often beats months of use case analysis, with examples of low-hanging fruit across white-collar and shop-floor workflows.

- **[11:31]** — **Why One Frontier Model Doesn't Scale** — Cooper explains how relying on a single LLM becomes a cost and latency bottleneck, citing AT&T's public shift to orchestration and multi-agent stacks.

- **[14:44]** — **Intelligence vs. Reliability** — Why reliability—not raw intelligence—determines whether AI is allowed to scale in production environments.

- **[16:27]** — **Task-Specific SLMs and Fine-Tuning** — How specialized small language models deliver faster, cheaper, and more accurate results through fine-tuning and production data feedback loops.

- **[18:13]** — **Neurometric's AI Maturity Framework** — Walk-through of how organizations progress from "get something to work" through cost optimization to full AI system orchestration.

- **[20:32]** — **Catastrophic Forgetting Explained** — Cooper defines catastrophic forgetting and contextualizes it for manufacturing leaders.

- **[24:16]** — **The Future: Coordinated Model Teams** — A vision of AI systems that automatically select the right model for each task, abstracting away vendor choice entirely.

- **[28:12]** — **Neurometric Platform Overview** — Details on the SLM Marketplace, model analysis dashboards, and the self-improving system roadmap.

- **[33:33]** — **Prediction: The Factory of the Future** — Cooper's forecast on Jevons paradox, nearshoring, and why competing on technology and automation—not labor—defines the next era of manufacturing.

---

## 4. Key Takeaways

- **Build the flywheel, not the pilot:** The real KPI for early AI efforts isn't proving a specific use case—it's building a team that can ship, learn, and iterate quickly. The feedback loop is the product.

- **Just pick and execute:** Spending three months analyzing use cases costs more in lost learning than picking an imperfect starting point and iterating. Low-hanging fruit exists across both shop-floor and back-office workflows.

- **One frontier model is a scaling liability:** Routing all tasks through a single large language model creates unsustainable cost and latency at scale. AT&T cut costs by 90% by shifting to orchestration with task-specific models.

- **Small language models deliver outsized results:** Fine-tuned SLMs can be faster, cheaper, and more accurate than general-purpose LLMs for repetitive, well-defined tasks—because they don't need to know world history to handle a purchase order.

- **Avoid vendor lock-in from day one:** Build AI systems with the assumption that you'll need to swap models. Abstraction layers let you shift from GPT-4o to Llama Maverick and see 10x cost and 4x latency improvements.

- **Reliability beats intelligence for production AI:** Models that are impressively capable in demos may be non-deterministic and unreliable at scale. In manufacturing, consistent accuracy is the prerequisite for deployment.

- **The time to act is now:** Billions in capital are flowing into AI rollups targeting industrial businesses. Companies that wait risk being acquired or outcompeted by those that moved first.

---

## 5. Notable Quotes

> "Most doors are two-way doors. We tend to overestimate risk associated with getting something wrong, and underestimate the opportunity of getting something right." — Kelvin Cooper, CEO at Neurometric.ai

> "The problem is that you think the pilot is what you're building. What you're actually building is a feedback loop." — Kelvin Cooper, CEO at Neurometric.ai

> "Intelligence gets headlines, but reliability determines whether AI is allowed to scale." — Kelvin Cooper, CEO at Neurometric.ai

> "You don't need to know world history to handle some repetitive tasks." — Kelvin Cooper, CEO at Neurometric.ai

> "The future is now, just not evenly distributed." — Kelvin Cooper, CEO at Neurometric.ai

---

## 6. Key Concepts Explained

**Inference Orchestration**

Definition: Inference orchestration is the automated routing of AI tasks to the optimal model based on cost, latency, and accuracy requirements, rather than sending all queries to a single large language model.

Why it matters: It enables manufacturers to scale AI deployments without prohibitive costs or performance bottlenecks.

Episode context: Cooper describes how AT&T used orchestration to cut AI costs by 90% when scaling to 27 billion tokens per day, and positions Neurometric as an off-the-shelf solution for this capability.

**Small Language Models (SLMs)**

Definition: Small language models are compact, task-specific AI models with fewer parameters that are fine-tuned for narrow use cases, delivering faster and cheaper inference than general-purpose large language models.

Why it matters: SLMs allow manufacturers to run AI at production scale without the cost and latency penalties of frontier models.

Episode context: Cooper explains that Neurometric's SLM Marketplace lets users browse, download, and deploy task-specific models, with customers seeing 10x improvements in cost and latency.

**Catastrophic Forgetting**

Definition: Catastrophic forgetting occurs when an AI neural network learns new tasks and abruptly loses its ability to perform previously learned tasks.

Why it matters: It's a fundamental challenge when trying to update or expand AI systems in production without degrading existing performance.

Episode context: Cooper notes that while this is a known research challenge, billions of dollars in AI research are actively solving it, and manufacturing leaders should not let it become a reason for inaction.

Rss Apple Podcaster