Microsoft Mu: The new language model that brings local AI to Windows 11

Last update: 25/06/2025

  • Mu is Microsoft's new small language model, optimized to run locally on Windows 11 devices with NPUs.
  • Its initial integration is done in the Windows 11 Configuration agent, allowing adjustments using natural language.
  • Mu stands out for its efficiency and speed, reaching more than 100 tokens per second thanks to its 330 million parameters.
  • It includes innovations such as Dual LayerNorm, RoPE, and GQA, and has been trained using advanced processes and high-quality educational data.

Microsoft Windows 11 MU language model

The arrival of Mu, the latest small language model presented by Microsoft, marks a significant step in the current trend of placing artificial intelligence directly on users' devices. With the intention of reduce cloud dependence and harness the potential of Neural Processing Units (NPU), Mu is integrated into the Copilot+ PCs running Windows 11, initially focusing on the Settings app to facilitate access and modification of system parameters using simply natural language.

This advance means that, instead of sending queries to external servers, processing and responses are generated on the device itself, ensuring greater privacy, agility and efficiency. For the moment, the The rollout is targeted at Windows Insider Program participants with Copilot+ computers., although the expectation is that this technology will be extended to more users and functions in future updates.

Foundry Local
Related article:
Foundry Local and Windows AI Foundry: Microsoft is betting on local AI with a new developer ecosystem.

What is Mu really and what makes it stand out?

Mu language

Mu is a small language model (SLM, for its acronym in English), trained with 330 million parametersIts compact size does not mean a sacrifice in performance, as according to Microsoft it achieves figures very close to much larger models such as Phip-3.5-miniThis balance has been achieved thanks to a rigorous training process that has included techniques such as Dual LayerNorm, Rotary Positional Embeddings (RoPE) y Grouped-Query Attention (GQA) that provide efficiency and precision, especially in devices with limited resources.

Exclusive content - Click Here  How to flip the computer screen

The model takes advantage of a encoder-decoder architecture of the transformer type, capable of processing user input and transforming it into actions within the system. Thanks to this structure, Mu separates input and output processingWhich reduces latency and memory consumption, key points to ensure a smooth, wait-free user experience.

In official tests and data, Mu has proven capable of respond to more than 100 tokens per second and provide responses in less than 500 millisecondsThese numbers allow for virtually instantaneous interactions, even when it comes to modifying settings or interpreting long and varied queries in everyday language. If you'd like to dig deeper into how these models work, you can check out Comparisons between language models on PC.

Integration into the Configuration Agent and practical functions

Mu's initial landing is centered on the Windows 11 Configuration Agent, a feature that allows users adjust system parameters by simply typing or saying what they need. For example, just ask "How do I activate dark mode?" o "I want to increase the brightness" so that Mu can translate that instruction into the corresponding technical action within the system.

Exclusive content - Click Here  My PC does not have Windows sound.

Microsoft has stressed that AI adapts to tens of thousands of different contexts and queriesIn fact, more than 100,000 have been used. 3,6 million training samples to cover everything from the most common requests – such as changing the language or managing Wi-Fi networks – to more complex tasks. For questions that are too short or ambiguous, the system uses the traditional search functions, but when the instruction is clear and detailed, Mu acts automatically or guides the user step by step.

Technology and optimization adapted to new generations of hardware

Microsoft Mu NPU Windows Copilot+

La Mu optimization has been one of the most carefully considered points during its development. Microsoft has worked in collaboration with silicon partners such as AMD, Intel and Qualcomm to adapt it to the specificities of the new NPUs present in the Copilot+ PCsThis joint work has made it possible to introduce post-training quantification techniques, which convert the model weights and activations to 8- and 16-bit integers, thus reducing memory consumption and avoiding the need to retrain the entire model.

Mu's training process was carried out in high-performance environments, using NVIDIA A100 GPUs within Azure Machine LearningThe data set included hundreds of billions of educational tokens and techniques such as distillation from Phi models and Low-Range Adaptation (LoRA) to transfer knowledge and fine-tune the model for specific tasks. The end result is a small, agile model that's uniquely suited to the resources and limitations of modern wearable hardware. You can also explore how turn your PC into a local AI hub to expand the capabilities of your system.

Phi-4 mini AI on Edge-2
Related article:
Phi-4 mini AI on Edge: The future of local AI in your browser

Current challenges, availability and future prospects

One of the biggest challenges facing Mu is the interpretation of ambiguous or very brief queries, a common problem in natural language-based systems. To do this, Microsoft has implemented a hybrid logicWhile short queries trigger traditional search results, more detailed instructions trigger AI intervention, either to guide the user or perform automated actions.

Exclusive content - Click Here  How to open an OBT file

For now, Mu is only available in English and on Copilot+ devices through the Insider channel., although it is expected to be expanded to more languages ​​and other devices in the coming months, including those with AMD and Intel processors. privacy and security They also play a fundamental role, given the local nature of the processing.

The deployment of Mu is just the beginning of a broader strategy by Microsoft to incorporate Local AI and efficient language models in even more applications and aspects of the operating system, improving the experience and accessibility without sacrificing performance or privacy.