Friday, August 15, 2025

Apple moves back to its roots in hardware with “cloud buster” chips

Extensive revision: 8/17/25

This is the first of a series of occasional publications that will track Apple’s development of genAI inference data centers using Apple Silicon, centers at the edges of the Internet that will support billions of devices. Given Apple's policy of minimal publicity during development, there probably won't be more than four more reports before the end of 2026.


Note: 
During “training”, a model learns from a large dataset; during “inference”, a trained model responds to queries from users.

To read more of this blog note click ➡  HERE

A. Context | B. Future | C. Why care? | D. Nvidia | E. Levels | F. Hardware

A. Context
Here are links to two articles that describe one of the most impressive achievements of Apple Silicon so far:

  • "Mac Studio With M3 Ultra Runs Massive DeepSeek R1 AI Model Locally", Tim Hardwick, MacRumors, 3/17/25

  • Apple Mac Studio M3 Ultra workstation can run Deepseek R1 671B AI model entirely in memory using less than 200W, reviewer finds”, Efosa Udinmwen, MSN, 3/21/25
Readers should note that the Mac Studio is a desktop Mac that contains a single M3 Ultra chip set. China's R1 model is usually run on arrays of Nvidia chips in a cloud. Desktop Macs run on house current and do not require external cooling systems. 

BackToTop

B. The future 
In the coming years, large foundation models will still be trained in the cloud; but most users will direct most of their queries to local centers on the edges of the Internet that run large models  and/or specialized small models derived from large models. 

Although Apple will probably say that its massive collection of data centers are in “The Apple cloud”, the last section of this note will explain how this assertion means something quite different from saying that OpenAI’s models are run on servers in Microsoft’s Azure cloud. All of Apple’s inference servers will definitely be on the physical edges of the Internet.

BackToTop


C. Why should our computer savvy readers care about a downward genAI shift from the cloud to the edges?
 

  • High impact on the environment/climate change
    Cloud based data centers deploy massive arrays of chips from Nvidia and other manufacturers that have high energy requirements. As consequence they have highly negative impacts on local power grids, carbon emissions, fresh water, air quality, and other local resources.

  • High costs
    The massive arrays of expensive chips in cloud based data centers run at high temperatures that require expensive cooling systems.
Readers who acknowledge that man-made climate change is an undeniable reality are probably concerned by the deluge of reports from reliable mainstream media that the race to AGI is making highly undesirable contributions to climate change and exacerbating other undesirable changes in our environment. They are probably even more concerned to learn that many cloud data centers are satisfying their insatiable demands for more energy from fossil fuels. Given a choice between directing a query to chatGPT in the cloud vs ChatGPT on a lower energy "local" server, they would go local again and again and again.

And then there's the direct costs of cloud services, costs that are expected to drive the cost of electricity higher and higher for everyone who lives anywhere near the sprawling cloud data centers. Now add the projected higher costs of AI subscriptions to the more powerful models -- $200 per month???  --  for the models that will "really" answer their questions, costs that would be considerably lower if the responses to their queries came from less costly data centers on the edge.

D. Apple and Nvidia as Gemini twins
The similarities between Apple's current opportunity to dominate genAI edge computing are weirdly similar to Nvidia's rise to dominance of genAI cloud computing. First consider Nvidia's milestones:
  • Nvidia started out as a company that made GPU chips for video games. Gamers were constantly clamoring for more complex games, games that required faster and faster GPUs

  • As it happens, George Hinton, one of the so-called “godfathers of AI”, realized that he could run his neural network models much faster on GPUs because their underlying matrix mathematics was similar. 

  • Recognizing that AI models were likely to quickly evolve into a much larger and far more profitable market than video games, Jensen Huang, CEO of Nvidia, redirected most of his company’s efforts to producing faster and faster GPUs for AI. Nvidia recently became the world's first $4 trillion company.
In other words Nvidia's efforts to produce faster chips to in order to create faster games, a frivolous activity, yielded the chips that AI developers needed to improve the performance of their large models, one of the most important tech innovations in human history. 

Now consider Apple's parallel milestones. Under Tim Cook’s leadership, Apple became a $3.5 trillion corporation by focusing its efforts on the 80 percent of its user base who used their iPhones and desktops primarily for recreation (and communication). 
  • They used the cameras on their iPhones to capture photos and videos of themselves, and their families, friends, and associates that they edited and stored on their desktops, then shared with family, friends, and associates on WhatsApp, TikTok, and YouTube -- frivolous social activities. And they had an insatiable craving for higher and higher photo and video resolution.

  • Photo and video processing require far more computing power than text processing; so Apple’s relentless quest for higher resolution required it to design evermore powerful CPU and GPU chips for its phones and desktops, which meant their powerful chips would have to function on low energy inputs and produce low heat emission outputs. 

  • Once Apple added neural processing units (NPUs) to its silicon, as in its M1 chip in 2021, its chips were also positioned to provide  engines for genAI data centers at the edges.
As inference moves from the cloud to the edges, leaving model training in the cloud, Apple is poised to dominate the edge segment of genAI services, a market that will surely be worth trillions of dollars ... assuming that Apple moves swiftly to capitalize on its current advantages, as swiftly as did Nvidia when faced with similar opportunities. 


E. Levels of edges ... 2026

Edge data centers offer two advantages over cloud-based data centers that were not mentioned in our previous discussion: 

  • Faster communication with user’s client machines because they are much closer to these devices 

  • Stronger guarantees of security for each user because edge centers are accessible to a much smaller number of other users.

It will be useful to distinguish between two levels of edge centers: upper and lower. 
  • All edge data centers are much closer to their users and their users' data than the cloud-based data centers; but there may be some distance/travel time between upper edge centers in office buildings and factories and their users.

  • By contrast, lower edge data centers are always nearby, e.g., desktops and smartphones. Apple's M-chips for desktops and A-chips for iPhones provide these devices with high power without bulky batteries, and will not become so hot as to cause discomfort for their users.
In other words, Apple's current chips provide an immediate competitive advantage on the lower edge. Although it faces substantial competition on the upper edge, the following quote from an exclusive report in The Information indicates that Apple is developing a more powerful chip code-named "Baltra" that might provide it with a comparable competitive advantage on the upper edge.
  • "Apple is developing its first server chip specially designed for artificial intelligence, according to three people with direct knowledge of the project, as the iPhone maker prepares to deal with the intense computing demands of its new AI features.

  • Apple is working with Broadcom on the chip’s networking technology, which is crucial for AI processing, according to one of the people. If Apple succeeds with the AI chip—internally code-named Baltra and expected to be ready for mass production by 2026—it would mark a significant milestone for the company’s silicon team.

  • The team honed its expertise designing cutting-edge chips for the iPhone before later advancing to designing Mac processors that set new standards for performance and energy efficiency.

  • The move could cut costs and help Apple scale up AI services to support billions of devices"

  • "Apple Is Working on AI Chip With Broadcom", Wayne Ma and Qianer Liu, The Information, 12/11/24

F. Blog editor's hypothesis ➡ Apple will return to its hardware roots ... again

Although Apple has a massive array of servers in the Apple cloud, it has not sold servers to its customers since around 2011. That’s about to change. 

  • The energy efficient Baltra chips discussed in the previous section will be the engines for a new line of servers that Apple will sell to customers. These servers will be powerful enough to run full versions of the leading inference models produced by OpenAI, Anthropic, Google, and others. 

  • These servers, these data centers, will be as close as possible to their owners because they will be in their owner’s facilities so they will provide the fastest communications with their owner's devices. And they will provide the greatest security because their only users will be determined by their owners.

  • Apple will have at least one line of servers: the smallest would only contain one Baltra chip; the largest would contain N chips, where N is still small enough for the servers to generate less heat than would require expensive external cooling systems.
  • Apple will lease inference models from the world's leading developers for mutually beneficial licensing fees. Then Apple will work with developers to ensure that their inference models are configured to fit into Apple's ecosystem, ensuring that these edge servers work seamlessly with iPhones, Macs, and other Apple devices.

  • Given the large size of inference models, they will probably be installed on the servers before the servers are delivered to their buyers.

  • As components in Apple’s ecosystem, the servers will receive updates to the models via Apple’s cloud when updates were available.

  • Apple will advise perspective purchasers of its servers as to how many chips they will need based on their estimated usage of the large language models they would like to install. The servers will probably be designed so that customers can add more chips if their actual usage exceeds their initial estimates.

  • Given that Apple's servers will have substantially lower operating costs, the costs of responding to users’ queries will be substantially lower than the owners’ costs of responding from their inference models in the cloud. Accordingly, the negotiated licenses might specify that Apple will receive a share of the subscription fees AND that the fees charged to the server’s owners (and associates) be substantially lower than the fees charged to subscribers to cloud based serverseven for the most complex "thinking”models and agents.

Conclusion

If the editor’s hypothesis is correct, after 2026 members of the Apple community will gain access to reduced or distilled versions of inference models on their iPhones and ordinary desktop Macs. Some will gain access to full versions running on powerful Mac Studios. But most will gain most of their access to full versions of world class inference models via local Mac servers. 


In other words, Apple will not invest hundreds of billions of dollars to develop its own world class models, nor will it invest hundreds of billions of dollars to construct massive data centers. Nevertheless, Apple will earn hundreds of billions of dollars by selling powerful, energy efficient data servers to its customers and by collecting a share of the fees which its customers and their employees and other associates will pay as subscribers to OpenAI, Anthropic, Google, Microsoft, Meta, xAI, and other developers of world class models.

Editor's P.S.
If Apple does, indeed, develop and deploy its forthcoming Baltra chips in inference servers located in the facilities of their servers' buyers, as per the editor's hypothesis, Apple's success in deploying these local servers will have a devastating financial impact on Microsoft, Google, and Nvidia. Their quarterly reports have repeatedly shown that neither Microsoft nor Google has generated substantial income from their own use of large language models; the lion's share of their genAI income has come from their cloud operations. With regards to Nvidia's mega-profitable chips residing in the cloud, a more descriptive code name for Apple's forthcoming, powerful, energy efficient chips might be "Cloud Busters" ... 😎



____________________________________
Links to related notes on this blog:  

    No comments:

    Post a Comment

    Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.