McKinsey's State of AI 2025: For Retail Leaders

The State of AI in 2025. A practical playbook for retail and e-commerce leaders

Ricardo Gomez-Cendon
November 12, 2025

Executives keep asking the same thing “If almost everyone is using AI, Why is enterprise impact still elusive?”

McKinsey’s 2025 survey answers that plainly. AI use is broad and rising. Scaling is not. About 88% of companies report regular AI use in at least one function. Yet only about one in three have begun to scale across the enterprise. That is the gap to close this upcoming year.

This article distills the report for operators. What is truly new. What matters for retailers and brands. How to turn the findings into process, platforms, and P&L lift.

What is new and why it matters

1) The agent era has moved from buzz to real pilots. 23% of respondents say their companies are already scaling at least one agentic system. Another 39% are experimenting. Usage is still thin inside individual functions. Which leaves room for focused leaders to win with depth over breadth.

2) Value shows up locally before it shows up in EBIT. Most leaders report innovation gains and use-case level cost or revenue benefits. Only 39% see enterprise-level EBIT impact so far. The implication is simple. Productize specific workflows first. Then wire them together at platform level.

3) High performers behave differently. They set growth and innovation objectives along with efficiency, they redesign workflows, they invest, and their senior leaders personally own the agenda. 1/3rd of high performers spend more than 20% of digital budgets on AI. They are nearly 3x as likely to redesign work at its core.

The “Agentic Commerce” blueprint

Think in systems that ship outcomes, not tools that demo features. Use this layered model to design once and repeat across brands and channels.

Layer 1. Workflows worth winning

Merchandising content loop. Intake briefs. Generate PDP drafts. A/B assets. Validate claims. Publish with channel rules. Human checkpoint required before go-live, which is a best practice highlighted for high performers.
Promo and pricing orchestration. Reconcile demand signals, inventory, competitor moves, and guardrails. Product managers define bands. Models propose. Humans approve.
Availability and supply assurance. Agents monitor inbound supply, forecast risk, trigger safety-stock changes, and open exceptions tickets. Early agent scaling tends to start in IT and knowledge management. Treat those as your on-ramp.
Service and retention. Resolve known issues with agent runbooks. Escalate complex cases with human-in-the-loop. Track CSAT impact, not just handle time.

Layer 2. Guardrails that make scale safe

Human-in-the-loop criteria. Define when a human must validate outputs for accuracy. Leaders who do this explicitly see more value at scale.
Approval thresholds and rollback plans. Needed for price changes, claims, safety, and compliance.
Observability and drift monitoring. Log prompts, outputs, decisions, and who approved.

Layer 3. Shared platform pieces

Data products. Catalog quality rules for PDPs. Inventory health scores. Promo elasticity curves. These recur across brands.
Agent runtime and interfaces. Queue, retry, and audit across OMS, PIM, DAM, WMS, CRM.
People and process. Agile delivery and clear ownership correlate with higher returns.

What your board should actually expect

McKinsey’s function-level data is the best clue for where benefits land first. Cost decreases are most often reported in software engineering, manufacturing, and IT. Revenue increases most often show in marketing and sales, strategy and corporate finance, and product or service development. Translate that into near-term proof points in content velocity, promo responsiveness, and stockout prevention before you promise EBIT expansion.

Executive KPIs that connect to the report

Time to publish PDPs and content error rate after human review.
Promo change lead time and gross margin impact under guardrails.
Out-of-stock prevention rate and expedite cost reduction by DC.
Self-serve resolution rate and post-escalation CSAT.

The 90-day plan. Built to unblock scale

Weeks 1–2. Pick two flows and write them like products

Choose one revenue flow and one cost flow. Example. PDP content loop and shortage prevention.
Document target steps, inputs, approvals, SLAs, and failure modes.
Identify required data products and human checkpoints. This mirrors practices that separate high performers.

Weeks 3–4. Put guardrails in writing

Create a single “Human Validation Matrix”. When to review. Who reviews. What evidence to store. This is a top differentiator in the survey.
Define rollback, incident reporting, and audit requirements.

Weeks 5–8. Ship a narrow v1 in production

One brand or one category. Change the SOPs so teams use it daily. High performers are nearly three times more likely to redesign the work itself. Treat this as the work, not a side project.

Weeks 9–12. Expand scope and add a second agent step

Add channels, nodes, or languages. Connect to OMS or WMS. Fund the shared services that let you repeat the pattern. One third of high performers are already investing at that level.

Org and talent moves that unlock value

Visible senior ownership. High performers have leaders who personally model usage and remove blockers. Make AI a standing item in weekly business reviews.
Strategic workforce planning. Expect rising demand for MLOps, data engineering, and data product roles as adoption grows. Larger companies are already hiring these at higher rates.
Ambition beyond cost. Firms that set growth and innovation goals alongside efficiency report broader enterprise benefits. Set at least one growth metric per program.

Risk management. Run it like a product

Half the battle is predictability. 51% of organizations using AI report at least one negative consequence, with inaccuracy the most common. Mitigation efforts are rising, yet explainability still lags. Bake this both into your acceptance criteria and dashboards.

Your minimum bar

Accuracy thresholds and sampling plans by use case.
- Store human approval evidence for sensitive outputs.
Explainability notes in plain language when decisions affect customers or prices.
IP and compliance checks on training data and outputs, with audit logs.
- Ambitious programs will surface more risks simply because they are doing more. They also mitigate more. Plan for that reality.

Budget guidance that matches the data

If you only fund pilots, you will only get pilot results. High performers often spend more than 20 percent of their digital budgets on AI, and that spend is tied to faster scaling and shared platforms. Use that benchmark to defend the plumbing. PIM rules, data products, agent runtime, and observability should sit in a funded roadmap, not in ad-hoc tickets.

FAQs from retail executives

Do we need agents now or can we wait for the next model?

Many firms are already piloting agents. Few are scaling them in more than one or two functions today. That is an opportunity to be first in your category for a specific flow, then repeat the pattern across functions.

Will this reduce headcount?

Expectations vary. A plurality expects little change. About a third expect reductions and a smaller share expects increases. Plan for role shifts, especially in data and MLOps, which become more critical as adoption grows.

Where should we expect the first financial signals?

Look for cost reductions in engineering and IT, and revenue gains in marketing, strategy, and product development before enterprise-level EBIT moves. Report those early wins clearly so you can fund scale.

The report is not saying try more tools. It is saying redesign more work. Set bolder goals than cost. Fund the platform pieces. Put humans in the right loop. And lead from the top. The organizations that do this are already separating from the pack, with stronger leadership ownership, faster workflow redesign, and higher investment intensity.

Source: https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#/