Jun 16, 2026 14:10:00

Microsoft claims that 'the power consumption of a single question to the AI is about the same as running a microwave oven for a few seconds,' up to 1/20th of previous estimates.

On June 15, 2026, Microsoft announced analysis results indicating that the estimated power consumption when sending a single question to a large-scale language model is approximately 0.16 to 0.60 Wh, equivalent to the power consumption of a 40W PC running for about 15 to 60 seconds. This is about 1/20 to 1/4 of previous estimates.

Scaling AI with 8 to 20x energy efficiency | The Microsoft Cloud Blog

https://www.microsoft.com/en-us/microsoft-cloud/blog/2026/06/15/scaling-ai-with-8-to-20x-energy-efficiency/

Every time you ask Microsoft Copilot to summarize an email, summarize meeting materials, or generate program modification suggestions, a large-scale language model calculation is performed in the data center. As the number of AI service users increases, accurately understanding the power and cooling water required for each question is crucial for data center construction planning and considering the impact on the power grid.

The process of reading input to the AI and gradually generating an answer is called 'inference.' Text is processed in small units called 'tokens,' and the longer the question or answer, the greater the computational load.

Therefore, the number of tokens processed is an important indicator when considering the power consumption of AI. However, even when handling the same amount of text, power consumption is not uniform; the amount of power required varies depending on the performance of the semiconductors used, the number of questions processed simultaneously, and the cooling efficiency of the data center.

Even with the same AI model, the power consumption per question will differ between an experimental environment where a GPU is used for a small number of questions and a production environment where many questions are processed together. According to Microsoft, past estimates sometimes did not adequately reflect 'batch processing,' which calculates multiple questions at once, or mechanisms for efficiently operating GPUs in environments with a large number of users. Differences in whether the measurement target includes only the GPU or also the CPU and cooling equipment also contributed to the variability in the estimated values.

Microsoft's research team estimated power consumption under production-scale operation conditions by running a large-scale model with over 200 billion parameters on eight servers equipped with NVIDIA H100 chips. This was done by combining the number of tokens that the semiconductor can process per second, the power consumption of a single server, and the PUE (Power Usage Effectiveness) which indicates the overall power efficiency of the facility.

For typical questions where the median response is approximately 300 tokens, the median power consumption per response was 0.31 Wh. The median 50% of the estimated range was 0.16–0.60 Wh, which is equivalent to the power used to run a 1000W microwave oven for approximately 0.6–2 seconds. Water consumption for cooling and other purposes was estimated to be 0–0.067 mL per question, with a median of less than 1/100th of a teaspoon.

Traditional estimates often suggested that a single question to the AI required several wheys of power, and it was widely

cited that a single ChatGPT question consumed about 10 times more power than a Google search (see PDF file) . However, Microsoft argues that, considering GPU utilization and batch processing in production environments, these estimates may have been 4 to 20 times higher than the actual figures.

However, the load of a single question is not always small. When the answer reaches a median of 5000 tokens due to multi-stage inference or the generation of long code, the median power consumption increased to 3.91 Wh, about 13 times that of a typical question. In processes where the AI has to think for a long time, the length of the answer has a significant impact on power consumption.

It is estimated that processing 1 billion general questions per day requires approximately 0.7 GWh of power under basic conditions, and this is expected to decrease to approximately 0.3 GWh with further efficiency improvements. Replacing 10% of the total with questions requiring longer reasoning increases the power consumption to approximately 1.7 GWh, but this can be reduced to approximately 0.8 GWh after efficiency improvements.

Microsoft further estimates that by combining the allocation to smaller models, optimization of processing infrastructure, and the use of next-generation GPUs and proprietary AI chips, they can improve energy efficiency per question by 8 to 20 times in the near future. Microsoft says it will continue to optimize at every stage, from models and processing infrastructure to semiconductors, to ensure that power and water consumption does not increase at the same rate as the use of AI expands.

Related Posts:

Jun 16, 2026 14:10:00 in AI, Posted by log1d_ts