Agentic AI · Performance Analysis · MS Information Capstone
Agentic AI isn't about replacing engineers.
It's about fixing the right problems.
A capstone project for North River Diagnostics — a manufacturing company losing time, consistency, and quality coverage to manual process capability assessments. The answer was an agentic workflow that doesn't just automate the math; it enforces the analytical discipline engineers should follow but rarely have time for.
Azure AI Foundry
Prompt Flow
MCP Tools
RAG · Azure AI Search
GPT-4o
Azure Blob Storage
Python · Multimodal
The problem
Three engineers. Three interpretations. One process.
North River Diagnostics manufactures diagnostic components with tight performance specs. Their engineers spent most of their analysis time on low-value work — extracting data, cleaning spreadsheets, running calculations — before they could think about what the numbers actually meant. And when they did interpret, they often disagreed.
⏱️
Slow response time
Capability and behavior assessments took 1.5–2.5 hours per process. Corrective action was delayed by days while engineers worked through the backlog manually.
🔀
Inconsistent interpretation
The same Cpk value was described as "capable enough," "marginal," and "not capable at all" by different engineers. Leadership received conflicting signals on the same process.
🕳️
Limited coverage
Only the most visible problem processes got regular reviews. Entire production lines ran with undetected risk because there weren't enough engineers to cover them.
⚠️
The core analytical failure
Capability indices like Cpk are only valid when the process is statistically stable. An unstable process makes Cpk meaningless — yet engineers routinely calculated and acted on Cpk without checking stability first. This single failure directed improvement resources at the wrong problem: firefighting variation instead of redesigning capability.
The workflow
Eight nodes. One decision pipeline.
Built in Azure AI Foundry Prompt Flow — a sequential workflow where each node has a single, clear responsibility. Stability is assessed before capability is interpreted. Every time, without exception.
Ingestion
1
data_access · Python
Cloud data retrieval
Connects to Azure Blob Storage and retrieves the process CSV. Decouples data access from analysis logic — the workflow pulls programmatically, mirroring real production systems.
2
process_data_formatter · LLM (GPT-4o)
Intelligent preprocessing
Uses an LLM with a Jinja2 prompt template to normalize raw CSV data — identifying measurement columns, aligning with spec limits (LSL, USL), and outputting a consistent data object for all downstream nodes.
Parallel analysis
3a
cap_metrics · Python + MCP Agent
Capability metrics calculation
On-the-fly agent with a Process Capability MCP tool. Computes Cp, Cpk, Pp, Ppk deterministically — not via LLM estimation. Results are auditable and identical across runs.
MCP · deterministic
3b
chart_creator · Python + MCP Agent
SPC chart generation
On-the-fly agent with a chart-generation MCP tool. Produces an Individuals chart and Moving Range chart showing process behavior over time. Outputs a URL for downstream visual interpretation.
MCP · visual artifact
↑ Nodes 3a and 3b run in parallel from the formatter output
Interpretation
4
process_behavior · Python + Multimodal Agent
Visual stability assessment
A vision-capable GPT-4o agent "looks at" the SPC chart image and applies process behavior rules — detecting points beyond control limits, runs, trends, and cycles. Produces a clear stability verdict before any capability interpretation begins.
Multimodal · vision reasoning
5
aggregator · Python → Persistent RAG Agent (Foundry)
Standards-grounded synthesis
Calls a persistent Aggregator Agent hosted in Azure AI Foundry with a RAG knowledge base built from NRD's internal capability guidelines. Synthesizes capability results and stability findings into a leadership-ready narrative — always referencing internal standards, never hallucinating thresholds.
RAG · governed interpretation
Output
6
report_generator + report_writer · Python
Structured report packaging
Transforms the aggregated narrative into a structured HTML report, publishable to Azure Blob Storage. Separates analysis logic from presentation — the same report format regardless of which process or dataset is analyzed.
Agent patterns
Three patterns. Each chosen deliberately.
The system demonstrates all three major agentic AI patterns — and the design explicitly matched each pattern to the task it handles best.
Pattern 1 · On-the-fly MCP
Deterministic calculation agents
Used for capability metrics (Cp, Cpk, Pp, Ppk) and chart generation. Created per-run, attached to a specific MCP tool, discarded after use. Reliable, auditable, identical output every time.
Why here: Numerical calculations must be deterministic. LLMs should never estimate statistics that have a right answer.
Pattern 2 · Multimodal on-the-fly
Visual stability interpreter
Uses GPT-4o's vision capability to analyze the SPC chart image directly. Detects out-of-control signals, runs, and trends the way a trained engineer would — from the visual pattern, not just numeric thresholds.
Why here: Stability assessment is inherently visual. Encoding all chart rules as numeric logic would be brittle and miss pattern-level signals.
Pattern 3 · Persistent RAG agent
Governed corporate expert
A persistent agent hosted in Azure AI Foundry with a knowledge base built from NRD's internal capability PDFs. Produces consistent, standards-aligned narrative across every run. Updating the RAG documents updates the "corporate standard" without touching the workflow.
Why here: Interpretation must reflect internal standards — not LLM priors. RAG is the mechanism that makes outputs governable and auditable.
Sample outputs
Two scenarios. Real process data. Opposite conclusions.
Both datasets are Hot Metal Delivery Times — 117 observations each, spec limits LSL 35 / USL 65. The workflow ran both and produced structurally identical reports with genuinely different findings.
Scenario A · Stable but incapable
Hot Metal Delivery Times — In Control
The process is predictable and stable — no points beyond control limits. But the variation is far too wide for the specification window. The correct action is process redesign, not firefighting. The agent made this distinction clearly and explicitly.
n observations117
process mean48.46 (target: 50)
within σ11.33
Cp0.441 — incapable
Cpk0.396 — incapable
Pp / Ppk0.411 / 0.369
OOC points (I-chart)0 — stable
MR violations1 (samples ~95–120)
Agent verdict: Statistically stable but incapable. Cpk valid to interpret. Root cause is variation width and centering, not instability. Action: redesign, not corrective action.
Scenario B · Unstable and incapable
Hot Metal Delivery Times — Out of Control
Five points beyond control limits on the Individuals chart. Five violations on the MR chart. The mean has shifted to 64.25 — near the upper spec limit. The agent correctly refused to anchor recommendations on Cpk and redirected focus to finding and eliminating assignable causes first.
n observations117
process mean64.25 (near USL 65)
within σ13.97
Cp0.358 — incapable
Cpk0.018 — near zero
Pp / Ppk0.270 / 0.014
OOC points (I-chart)5 — unstable
MR violations5 — high instability
Agent verdict: Statistically unstable. Cpk of 0.018 is not reliable — do not use for decisions. Action: eliminate assignable causes before any capability interpretation.
Design principles
Four rules that kept the system trustworthy.
🔒
Stability gating
The workflow enforces a hard rule: stability is always assessed before capability is interpreted. The process_behavior node runs before the aggregator receives any results. This cannot be skipped, regardless of how clean the data looks.
📖
RAG-grounded interpretation
The Aggregator Agent references NRD's internal capability guidelines — not LLM priors — for every threshold it cites. Updating the guidance PDFs updates the "corporate standard" without touching a single line of workflow code.
⚙️
Deterministic vs. AI reasoning split
Numbers that have a right answer (Cp, Cpk, control limits) are computed in Python via MCP tools. Language models handle only what they're good at: normalizing ambiguous data, interpreting visual patterns, and synthesizing narrative. Never estimating statistics.
👤
Human-in-the-loop by design
Every output is explicitly framed as a decision-support tool. The Aggregator always qualifies its conclusions with uncertainty language and requires engineer sign-off. The system never issues a recommendation without making the human's role in reviewing it explicit.
Personal reflection
Behind the build
The most important design decision was where to put the intelligence. I kept numerical calculations and chart generation in Python and MCP tools — they need to be deterministic and auditable. I used LLMs where flexibility genuinely adds value: data normalization, visual pattern interpretation, and narrative synthesis. That separation reduced hallucination risk while keeping the benefits of AI-driven reasoning where they actually belong.
The three agent patterns behaved very differently in practice. The on-the-fly MCP agents were the most predictable — reusable utilities for standardized analytics. The multimodal agent was effective at interpreting chart patterns but sensitive to prompt wording. The persistent RAG-backed Aggregator was where agentic AI clearly outperformed traditional scripting: reconciling stability, capability, and recommended actions into a consistent, standards-aligned narrative that any engineer would recognize as correct.
The broader lesson: the first pilot I would propose in any new domain would focus on explanation rather than automation. This project showed that trust is built through clarity before control. An AI system that explains its reasoning well — and is honest about its limits — gets adopted. One that doesn't, doesn't.