Alibaba Group Holding has introduced a computing pooling solution that it said led to an 82 per cent cut in the number of Nvidia graphics processing units (GPUs) needed to serve its artificial intelligence models.AdvertisementThe system, called Aegaeon, was beta tested in Alibaba Cloud’s model marketplace for more than three months, where it reduced the number of Nvidia H20 GPUs required to serve dozens of models of up to 72 billion parameters from 1,192 to 213, according to a research paper presented this week at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea.“Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market,” the researchers from Peking University and Alibaba Cloud wrote.Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba, which owns the Post. Its chief technology officer, Zhou Jingren, is one of the paper’s authors.Cloud services providers, such as Alibaba Cloud and ByteDance’s Volcano Engine, serve thousands of AI models to users concurrently, meaning that many application programming interface calls are handled at the same time.AdvertisementHowever, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.Researchers globally have sought to improve efficiency by pooling GPU power, allowing one GPU to serve multiple models, for instance.

Alibaba Group Holding has introduced a computing pooling solution that it said led to an 82 per cent cut in the number of Nvidia graphics processing units (GPUs) needed to serve its artificial intelligence models.

“Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market,” the researchers from Peking University and Alibaba Cloud wrote.

Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba, which owns the Post. Its chief technology officer, Zhou Jingren, is one of the paper’s authors.

Cloud services providers, such as Alibaba Cloud and ByteDance’s Volcano Engine, serve thousands of AI models to users concurrently, meaning that many application programming interface calls are handled at the same time.

However, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.

Researchers globally have sought to improve efficiency by pooling GPU power, allowing one GPU to serve multiple models, for instance.

Alibaba Cloud claims to slash Nvidia GPU use by 82% with new pooling system

The new Aegaeon system can serve dozens of large language models using a fraction of the GPUs previously required, potentially reshaping AI workloads

The new Aegaeon system can serve dozens of large language models using a fraction of the GPUs previously required, potentially reshaping AI workloads

Select VoiceChoose your listening speedGet through articles 2x faster1.25×250 WPMSlowAverageFast00:0000:001.25x

Select VoiceChoose your listening speedGet through articles 2x faster1.25×250 WPMSlowAverageFast00:0000:001.25x

Vincent ChowPublished: 11:00am, 18 Oct 2025Updated: 12:30pm, 18 Oct 2025Alibaba Group Holding has introduced a computing pooling solution that it said led to an 82 per cent cut in the number of Nvidia graphics processing units (GPUs) needed to serve its artificial intelligence models.AdvertisementThe system, called Aegaeon, was beta tested in Alibaba Cloud’s model marketplace for more than three months, where it reduced the number of Nvidia H20 GPUs required to serve dozens of models of up to 72 billion parameters from 1,192 to 213, according to a research paper presented this week at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea.“Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market,” the researchers from Peking University and Alibaba Cloud wrote.Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba, which owns the Post. Its chief technology officer, Zhou Jingren, is one of the paper’s authors.Cloud services providers, such as Alibaba Cloud and ByteDance’s Volcano Engine, serve thousands of AI models to users concurrently, meaning that many application programming interface calls are handled at the same time.AdvertisementHowever, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.Researchers globally have sought to improve efficiency by pooling GPU power, allowing one GPU to serve multiple models, for instance.Advertisement

Alibaba Group Holding has introduced a computing pooling solution that it said led to an 82 per cent cut in the number of Nvidia graphics processing units (GPUs) needed to serve its artificial intelligence models.AdvertisementThe system, called Aegaeon, was beta tested in Alibaba Cloud’s model marketplace for more than three months, where it reduced the number of Nvidia H20 GPUs required to serve dozens of models of up to 72 billion parameters from 1,192 to 213, according to a research paper presented this week at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea.“Aegaeon is the first work to reveal the excessive costs associated with serving concurrent LLM workloads on the market,” the researchers from Peking University and Alibaba Cloud wrote.Alibaba Cloud is the AI and cloud services unit of Hangzhou-based Alibaba, which owns the Post. Its chief technology officer, Zhou Jingren, is one of the paper’s authors.Cloud services providers, such as Alibaba Cloud and ByteDance’s Volcano Engine, serve thousands of AI models to users concurrently, meaning that many application programming interface calls are handled at the same time.AdvertisementHowever, a small handful of models such as Alibaba’s Qwen and DeepSeek are most popular for inference, with most other models only sporadically called upon. This leads to resource inefficiency, with 17.7 per cent of GPUs allocated to serve only 1.35 per cent of requests in Alibaba Cloud’s marketplace, the researchers found.Researchers globally have sought to improve efficiency by pooling GPU power, allowing one GPU to serve multiple models, for instance.

The system, called Aegaeon, was beta tested in Alibaba Cloud’s model marketplace for more than three months, where it reduced the number of Nvidia H20 GPUs required to serve dozens of models of up to 72 billion parameters from 1,192 to 213, according to a research paper presented this week at the 31st Symposium on Operating Systems Principles (SOSP) in Seoul, South Korea.

AlibabaTechBig TechAlibaba Cloud claims to slash Nvidia GPU use by 82% with new pooling systemThe new Aegaeon system can serve dozens of large language models using a fraction of the GPUs previously required, potentially reshaping AI workloadsReading Time:2 minutesWhy you can trust SCMP6

Alibaba Cloud claims to slash Nvidia GPU use by 82% with new pooling system

Published: 11:00am, 18 Oct 2025Updated: 12:30pm, 18 Oct 2025

Published: 11:00am, 18 Oct 2025Updated: 12:30pm, 18 Oct 2025

Choose your listening speedGet through articles 2x faster1.25×250 WPMSlowAverageFast

Choose your listening speedGet through articles 2x faster1.25×250 WPMSlowAverageFast

Choose your listening speedGet through articles 2x faster

Choose your listening speedGet through articles 2x faster

This is the xdefiance Online Web Shop.

A True Shop for You and Your Higher, Enlightnened Self…

Welcome to the xdefiance website, which is my cozy corner of the internet that is dedicated to all things homemade and found delightful to share with many others online and offline.

You can book with Jeffrey, who is the Founder of the xdefiance store, by following this link found here.

Visit the paid digital downloads products page to see what is all available for immediate purchase & download to your computer or cellphone by clicking this link here.

Find out more by reading the FAQ Page for any questions that you may have surrounding the website and online sop and get answers to common questions. Read the Returns & Exchanges Policy if you need to make a return on a recent order. You can check out the updated Privacy Policy for xdefiance.com here,

If you have any unanswered questions, please do not hesitate to contact a staff member during office business hours:

Monday-Friday 9am-5pm, Saturday 10am-5pm, Sun. Closed

You can reach someone from xdefiance.online directly at 1(419)-318-9089 via phone or text.

If you have a question, send an email to contact@xdefiance.com for a reply & response that will be given usually within 72 hours of receiving your message.

Browse the shop selection of products now!

Reaching Outwards