SteveITpro - Learning AI & Cloud in Public

Notes from BRK344: Windows AI and the Future of Work.

Session: BRK344 Date: Tuesday, Nov 18, 2025 Time: 2:30 PM - 3:15 PM PST Location: Moscone West, Level 3, Room 3002

The thesis: AI belongs on the device, not just in the cloud

Most of the Ignite 2025 AI narrative centred on cloud services -- Azure AI Foundry, Copilot Studio, SRE Agent, multi-agent orchestration. BRK344 argued for a fundamentally different proposition: that the most impactful AI experiences will run locally on Windows devices, not round-trip to the cloud. This is not a small claim. It challenges the architectural assumption that has underpinned most enterprise AI strategy: that AI inference is a cloud workload.

Microsoft's case rests on four pillars: Copilot+ PCs with dedicated NPU hardware, Windows AI APIs that abstract the compute layer, security architecture that keeps sensitive data on-device, and hybrid models that intelligently route between local and cloud inference. Whether this thesis holds up depends on hardware adoption, developer ecosystem maturity, and whether on-device AI can deliver experiences that are genuinely better than cloud alternatives -- not just cheaper or more private.

Copilot+ PCs: The hardware bet

What a Copilot+ PC actually is

Copilot+ PC is not a marketing label for any laptop with a sticker. It is a hardware specification with defined minimums:

NPU (Neural Processing Unit) delivering 40+ TOPS (trillion operations per second)
16 GB RAM minimum (most shipping with 32 GB)
256 GB SSD minimum
Qualcomm Snapdragon X, Intel Core Ultra, or AMD Ryzen AI processors

The NPU is the critical component. Unlike GPUs that excel at parallel floating-point operations, NPUs are architecturally optimised for the matrix multiplication and tensor operations that define neural network inference. They deliver AI performance per watt that GPUs cannot match on mobile form factors.

Why TOPS matter (and where the number misleads)

The 40 TOPS threshold sounds impressive, but context matters. Large language models like GPT-4 class models require infrastructure that dwarfs any laptop NPU. On-device AI is not running GPT-4 locally. It is running smaller, specialised models -- Phi-3, Phi-3.5, and custom fine-tuned models in the 1-7 billion parameter range -- that are designed for specific tasks.

What runs well on a 40 TOPS NPU:

Real-time text summarisation and rewriting
Image recognition and classification
Document analysis and extraction
Background task automation (e.g., email categorisation)
Local speech recognition and translation
Code completion for smaller codebases

What still needs the cloud:

Large context window reasoning (>32K tokens)
Complex multi-step agent workflows
Training and fine-tuning
Multi-modal reasoning across large document sets
Anything requiring GPT-4 class capability

The honest assessment: The NPU enables a genuinely useful tier of AI capability on-device. It is not a replacement for cloud AI -- it is a complement. The question for enterprise IT is whether this complementary tier is valuable enough to justify the hardware refresh cycle.

Windows AI APIs: The developer abstraction

The platform layer

BRK344 dedicated significant time to Windows AI APIs -- the software layer that lets developers build AI features without worrying about whether inference runs on the NPU, GPU, CPU, or cloud.

Key APIs demonstrated:

Windows Copilot Runtime -- High-level APIs for common AI tasks (summarisation, entity extraction, image description)
DirectML -- Low-level access to NPU/GPU for custom model inference
ONNX Runtime -- Cross-platform model execution with Windows-optimised paths
Windows Studio Effects -- Real-time AI-powered camera and audio processing

The abstraction that matters

The most significant architectural decision is the compute routing layer. When a developer calls a Windows AI API, the system automatically determines whether to run inference on:

NPU -- Efficient, always available, lower power consumption
GPU -- Higher peak performance for complex models
CPU -- Fallback when dedicated AI hardware is unavailable
Cloud -- For models too large for local execution

Why this matters for enterprise development:

Application developers should not need to know whether the user's device has an NPU, what chip architecture it uses, or how to optimise model execution for different hardware. The Windows AI APIs abstract this entirely. A developer writes one integration, and the platform handles hardware-specific optimisation.

The cautionary note: Abstraction layers are only as good as their implementation. Early versions of any hardware abstraction tend to have performance inconsistencies across devices. The experience on a Snapdragon X Elite laptop may differ meaningfully from an Intel Core Ultra device. Microsoft's track record with hardware abstraction (DirectX) is strong, but AI hardware abstraction is a newer challenge with different performance characteristics.

Security: The on-device advantage

Why local AI processing changes the security equation

The security argument for on-device AI is straightforward and compelling: if sensitive data never leaves the device, entire categories of security risk disappear.

Cloud AI security concerns:

Data in transit to cloud inference endpoints
Data at rest in cloud processing pipelines
Multi-tenancy risks in shared inference infrastructure
Regulatory compliance for data residency (GDPR, industry-specific requirements)
Vendor access to inference data for model improvement

On-device AI eliminates these:

Data processed locally, never transmitted
No cloud storage of sensitive content
No multi-tenancy risk
Data residency is inherent (data stays on the device in the user's jurisdiction)
No vendor access to inference data

The practical implications

For regulated industries: Healthcare, financial services, legal, and government organisations face significant compliance barriers to cloud AI adoption. On-device inference removes the data sovereignty objection entirely. A clinician can use AI to summarise patient notes without that data leaving the hospital's network perimeter. A lawyer can analyse contracts without sending privileged information to a cloud endpoint.

For security-conscious enterprises: Organisations that have invested heavily in Data Loss Prevention (DLP) face a tension with cloud AI: every AI interaction is a potential data exfiltration vector. On-device AI eliminates this tension for supported workloads.

Windows security features demonstrated:

VBS Enclaves -- Virtualisation-based security isolating AI workloads from other processes
Personal Data Encryption -- AI-processed data encrypted with user credentials, inaccessible to IT admins without user presence
Smart App Control -- AI-powered application trust decisions running locally
Microsoft Pluton -- Hardware root of trust for AI workload integrity

The counterargument

On-device security is not a panacea. The device itself becomes the attack surface. A compromised endpoint with powerful local AI creates risks that did not exist when AI required cloud connectivity. An attacker with device access could use the NPU for malicious inference -- analysing stolen documents, generating deepfakes, or automating social engineering -- all without network telemetry that cloud-based AI would generate.

Microsoft acknowledged this implicitly through the emphasis on hardware-level security (Pluton, VBS), but the threat model for AI-enabled endpoints is still evolving.

Local vs cloud: The hybrid model

Intelligent routing is the real innovation

The session's most important technical contribution was the framework for deciding when AI runs locally versus in the cloud. This is not a binary choice -- it is a continuous spectrum based on:

Latency sensitivity:

Real-time features (autocomplete, live translation, camera effects) -- always local
Batch processing (document summarisation, email triage) -- local when possible, cloud for large volumes
Complex reasoning (multi-step analysis, large context) -- cloud

Privacy sensitivity:

Personal data, health records, financial information -- local preferred
Non-sensitive corporate data -- cloud acceptable with appropriate controls
Public data -- cloud for best model quality

Model capability required:

Simple classification and extraction -- local (Phi-3 class models)
Moderate reasoning and generation -- local or cloud depending on context
Advanced reasoning, large context, multi-modal -- cloud (GPT-4 class models)

Connectivity:

Online with reliable connection -- hybrid routing based on above factors
Offline or unreliable connection -- local only, graceful degradation
Air-gapped environments -- local only, a genuine differentiator

The enterprise architecture implications

For IT teams planning AI strategy, the hybrid model means:

Device fleet matters: The AI capabilities available to your users depend on their hardware. A Copilot+ PC with a 40 TOPS NPU has a fundamentally different AI capability profile than a three-year-old laptop running an 8th-gen Intel processor. This creates a two-tier workforce experience that needs careful management.

Network architecture changes: If significant AI inference moves to the device, network bandwidth requirements for AI workloads decrease. But management and update traffic for local models increases. IT teams need to plan for model distribution, updates, and versioning across the device fleet.

Cost modelling shifts: Cloud AI costs scale with usage -- more queries, more cost. On-device AI costs are front-loaded in hardware purchase and amortised over device lifecycle. For high-usage scenarios, on-device AI may be significantly cheaper over three years than equivalent cloud inference.

Application architecture decisions: Developers building enterprise applications need to decide which AI features run locally and which use cloud services. The Windows AI APIs help abstract this, but the architectural decision still matters for capability, cost, and security reasons.

What this means for enterprise IT strategy

The hardware refresh catalyst

Copilot+ PCs create a compelling reason for hardware refresh that goes beyond faster processors and more RAM. The NPU delivers capabilities that older hardware simply cannot replicate -- not slower, but completely unavailable. This is different from most PC refresh justifications, which are about incremental performance improvement.

The IT planning question: When do you start requiring Copilot+ PC specifications for new purchases? The answer depends on how quickly Microsoft and third-party developers deliver AI features that create measurable productivity gains. The hardware is ready. The software ecosystem is early.

The management challenge

On-device AI introduces new management considerations:

Model management -- Which models are deployed to which devices? How are they updated?
Performance monitoring -- How do you track NPU utilisation and AI workload performance across the fleet?
Policy enforcement -- Which AI capabilities are enabled for which user groups?
Support complexity -- AI feature troubleshooting adds a new dimension to desktop support

Microsoft's answer is Intune-based management for AI policies and model deployment, but the tooling is new and enterprise-scale management practices are still forming.

The cultural shift

Perhaps the most underappreciated aspect of BRK344's message: Windows AI is not a feature that IT deploys. It is a capability that users discover and adopt organically. When AI is built into the operating system -- summarising notifications, enhancing video calls, suggesting actions based on context -- adoption happens naturally without training programmes or change management.

This is fundamentally different from deploying a new enterprise application. There is no rollout plan for "Windows now understands your documents." The capability is simply there, and users either find it useful or they do not. For IT organisations accustomed to controlling the application landscape, this organic adoption pattern requires a different approach to governance and support.

The competitive landscape

Apple Intelligence

Apple is pursuing a similar on-device AI strategy with Apple Silicon's Neural Engine and Apple Intelligence. The key difference: Apple controls hardware, software, and ecosystem. Microsoft must coordinate across Qualcomm, Intel, AMD, and thousands of OEM configurations. Apple's vertical integration enables tighter optimisation but limits device choice. Microsoft's approach is more flexible but risks inconsistent experiences across devices.

Chrome OS and Google

Google's AI strategy centres on cloud-first with Gemini. Chrome OS devices are thin clients by design, without the local compute for meaningful on-device AI. For organisations already committed to Chrome OS, Microsoft's on-device AI story could be a differentiator for Windows. For organisations evaluating both, the on-device AI capability is a genuine feature gap in Chrome OS's favour of Windows.

Linux and open-source AI

The open-source AI ecosystem (Ollama, llama.cpp, vLLM) offers powerful local inference capabilities on Linux. For developer-oriented organisations, Linux may offer more flexibility for on-device AI than Windows. But for the mainstream enterprise user who will never open a terminal, Windows AI APIs provide a managed, supported experience that open-source tools do not.

Bottom line

BRK344 made a persuasive case that on-device AI is not a niche capability -- it is a foundational shift in how Windows delivers value. The Copilot+ PC hardware specification creates a baseline for AI-capable devices. The Windows AI APIs abstract hardware complexity for developers. The security model turns data sovereignty from a compliance burden into an architectural advantage. And the hybrid local/cloud model acknowledges that neither approach alone serves all use cases.

The strategic implication for enterprise IT is clear: the next PC refresh cycle is also an AI deployment decision. Every Copilot+ PC you deploy is an AI inference node that does not consume cloud credits, does not transmit sensitive data, and does not require network connectivity to function.

Whether that matters depends on your organisation's AI ambitions, security requirements, and willingness to invest in hardware that enables capabilities the software ecosystem has not yet fully delivered. The hardware is ahead of the software. The software is ahead of the enterprise management tooling. And the enterprise management tooling is ahead of the organisational readiness.

That ordering -- hardware, then software, then tooling, then readiness -- is exactly right for a platform transition. Microsoft is building from the bottom up. The question for IT leaders is when to start climbing.

Related coverage:

Session: BRK344 | Nov 18, 2025 | Moscone West, Level 3, Room 3002