Extensive revision: 8/17/25
This is the first of a series of occasional publications that will track Apple’s development of genAI inference data centers using Apple Silicon, centers at the edges of the Internet that will support billions of devices. Given Apple's policy of minimal publicity during development, there probably won't be more than four more reports before the end of 2026.Note: During “training”, a model learns from a large dataset; during “inference”, a trained model responds to queries from users.
To read more of this blog note click ➡ HERE
- "Mac Studio With M3 Ultra Runs Massive DeepSeek R1 AI Model Locally", Tim Hardwick, MacRumors, 3/17/25
- “Apple Mac Studio M3 Ultra workstation can run Deepseek R1 671B AI model entirely in memory using less than 200W, reviewer finds”, Efosa Udinmwen, MSN, 3/21/25
B. The future
In the coming years, large foundation models will still be trained in the cloud; but most users will direct most of their queries to local centers on the edges of the Internet that run large models and/or specialized small models derived from large models.
Although Apple will probably say that its massive collection of data centers are in “The Apple cloud”, the last section of this note will explain how this assertion means something quite different from saying that OpenAI’s models are run on servers in Microsoft’s Azure cloud. All of Apple’s inference servers will definitely be on the physical edges of the Internet.
C. Why should our computer savvy readers care about a downward genAI shift from the cloud to the edges?
- High impact on the environment/climate change
Cloud based data centers deploy massive arrays of chips from Nvidia and other manufacturers that have high energy requirements. As consequence they have highly negative impacts on local power grids, carbon emissions, fresh water, air quality, and other local resources. - High costs
The massive arrays of expensive chips in cloud based data centers run at high temperatures that require expensive cooling systems.
- Nvidia started out as a company that made GPU chips for video games. Gamers were constantly clamoring for more complex games, games that required faster and faster GPUs
- As it happens, George Hinton, one of the so-called “godfathers of AI”, realized that he could run his neural network models much faster on GPUs because their underlying matrix mathematics was similar.
- Recognizing that AI models were likely to quickly evolve into a much larger and far more profitable market than video games, Jensen Huang, CEO of Nvidia, redirected most of his company’s efforts to producing faster and faster GPUs for AI. Nvidia recently became the world's first $4 trillion company.
- They used the cameras on their iPhones to capture photos and videos of themselves, and their families, friends, and associates that they edited and stored on their desktops, then shared with family, friends, and associates on WhatsApp, TikTok, and YouTube -- frivolous social activities. And they had an insatiable craving for higher and higher photo and video resolution.
- Photo and video processing require far more computing power than text processing; so Apple’s relentless quest for higher resolution required it to design evermore powerful CPU and GPU chips for its phones and desktops, which meant their powerful chips would have to function on low energy inputs and produce low heat emission outputs.
- Once Apple added neural processing units (NPUs) to its silicon, as in its M1 chip in 2021, its chips were also positioned to provide engines for genAI data centers at the edges.
Edge data centers offer two advantages over cloud-based data centers that were not mentioned in our previous discussion:
- Faster communication with user’s client machines because they are much closer to these devices
- Stronger guarantees of security for each user because edge centers are accessible to a much smaller number of other users.
- All edge data centers are much closer to their users and their users' data than the cloud-based data centers; but there may be some distance/travel time between upper edge centers in office buildings and factories and their users.
- By contrast, lower edge data centers are always nearby, e.g., desktops and smartphones. Apple's M-chips for desktops and A-chips for iPhones provide these devices with high power without bulky batteries, and will not become so hot as to cause discomfort for their users.
- "Apple is developing its first server chip specially designed for artificial intelligence, according to three people with direct knowledge of the project, as the iPhone maker prepares to deal with the intense computing demands of its new AI features.
- Apple is working with Broadcom on the chip’s networking technology, which is crucial for AI processing, according to one of the people. If Apple succeeds with the AI chip—internally code-named Baltra and expected to be ready for mass production by 2026—it would mark a significant milestone for the company’s silicon team.
- The team honed its expertise designing cutting-edge chips for the iPhone before later advancing to designing Mac processors that set new standards for performance and energy efficiency.
- The move could cut costs and help Apple scale up AI services to support billions of devices"
- "Apple Is Working on AI Chip With Broadcom", Wayne Ma and Qianer Liu, The Information, 12/11/24
Although Apple has a massive array of servers in the Apple cloud, it has not sold servers to its customers since around 2011. That’s about to change.
- The energy efficient Baltra chips discussed in the previous section will be the engines for a new line of servers that Apple will sell to customers. These servers will be powerful enough to run full versions of the leading inference models produced by OpenAI, Anthropic, Google, and others.
- These servers, these data centers, will be as close as possible to their owners because they will be in their owner’s facilities so they will provide the fastest communications with their owner's devices. And they will provide the greatest security because their only users will be determined by their owners.
- Apple will have at least one line of servers: the smallest would only contain one Baltra chip; the largest would contain N chips, where N is still small enough for the servers to generate less heat than would require expensive external cooling systems.
- Apple will lease inference models from the world's leading developers for mutually beneficial licensing fees. Then Apple will work with developers to ensure that their inference models are configured to fit into Apple's ecosystem, ensuring that these edge servers work seamlessly with iPhones, Macs, and other Apple devices.
- Given the large size of inference models, they will probably be installed on the servers before the servers are delivered to their buyers.
- As components in Apple’s ecosystem, the servers will receive updates to the models via Apple’s cloud when updates were available.
- Apple will advise perspective purchasers of its servers as to how many chips they will need based on their estimated usage of the large language models they would like to install. The servers will probably be designed so that customers can add more chips if their actual usage exceeds their initial estimates.
- Given that Apple's servers will have substantially lower operating costs, the costs of responding to users’ queries will be substantially lower than the owners’ costs of responding from their inference models in the cloud. Accordingly, the negotiated licenses might specify that Apple will receive a share of the subscription fees AND that the fees charged to the server’s owners (and associates) be substantially lower than the fees charged to subscribers to cloud based servers, even for the most complex "thinking”models and agents.
Conclusion
If the editor’s hypothesis is correct, after 2026 members of the Apple community will gain access to reduced or distilled versions of inference models on their iPhones and ordinary desktop Macs. Some will gain access to full versions running on powerful Mac Studios. But most will gain most of their access to full versions of world class inference models via local Mac servers.
In other words, Apple will not invest hundreds of billions of dollars to develop its own world class models, nor will it invest hundreds of billions of dollars to construct massive data centers. Nevertheless, Apple will earn hundreds of billions of dollars by selling powerful, energy efficient data servers to its customers and by collecting a share of the fees which its customers and their employees and other associates will pay as subscribers to OpenAI, Anthropic, Google, Microsoft, Meta, xAI, and other developers of world class models.
Editor's P.S.
If Apple does, indeed, develop and deploy its forthcoming Baltra chips in inference servers located in the facilities of their servers' buyers, as per the editor's hypothesis, Apple's success in deploying these local servers will have a devastating financial impact on Microsoft, Google, and Nvidia. Their quarterly reports have repeatedly shown that neither Microsoft nor Google has generated substantial income from their own use of large language models; the lion's share of their genAI income has come from their cloud operations. With regards to Nvidia's mega-profitable chips residing in the cloud, a more descriptive code name for Apple's forthcoming, powerful, energy efficient chips might be "Cloud Busters" ... 😎
No comments:
Post a Comment
Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.