Preface …
This is a long speculative piece that was written as part of a series of notes the editor is writing in response to questions that had been nagging him for the first six months of 2025:
- After Apple’s market capitalization peaked at $3.5 trillion last year, why hasn’t the value of its stock been in a state of inexorable decline throughout 2025? Why isn’t its market cap down to $2.9 trillion or even $2.7 trillion?
Facing declining sales in China, declines that were triggered by the Chinese government’s policies … plus a humiliating failure to “upgrade Siri” as a direct consequence of its late and grossly underfunded entry into the Big Tech generative AI race … plus mounting EU regulatory challenges … plus President Trump’s loudly proclaimed displeasure over Apple’s moving its manufacturing loop base from China to India instead of to the U.S. … What has Tim Cook been quietly telling Apple’s largest shareholders, like Warren Buffett, that has kept them from racing to the exit doors?
The editor’s response to his own questions is an optimistic speculation that Apple’s relentless pursuit of ever higher profitability has caused it to repurpose its super energy-efficient Apple Silicon in a way that will challenge to the hegemony of the current "Lords of the Cloud-based Data Centers". Their predictably fierce competitive responses will yield a satisfactory resolution of the increasingly disastrous problems posed by their hyper-scale cloud operations.
The problems
Building and deploying larger and larger models in cloud-based data centers requires larger and larger supplies of energy. Readers who are unaware of the growing magnitude of this calamity are referred to the following recent reports:
- “AI’s promise is still taking shape. The costs of its data centers are already here.", Adam Rogers, Hannah Beckler, and Dakin Campbell, Business Insider, 7/17/25
- "Big Tech’s A.I. Data Centers Are Driving Up Electricity Bills for Everyone", Ivan Penn and Karen Weise, NY Times, 8/14/25
- Readers who want to keep abreast of the full scope of the negative externalities imposed on surrounding regions and on global climate change by the relentless growth of AI cloud-based data centers are referred to the Evironment/Climate section of the editor's Useful AI News page.
- Perhaps the most perplexing aspect of this increasingly expensive Big Tech generative AI race to build more and more data centers that are larger and larger in order to develop "smarter" and "smarter" models at the cost of greater and greater negative externalities is its lack of a firm foundation within the bounds of any kind of underlying science. Readers, hopefully a small minority, who are unaware of this fact are referred to its extensive discussion in Appendix #2.
A solution
A solution to this problem that will be explored in these occasional notes is inference servers running on energy-efficient Apple Silicon — chips designed by Apple and fabricated by TSMC.
During its “training” phase, a model learns from a large dataset; during its subsequent “inference” phase, a trained model responds to queries from its users. Large models will continue to be trained in the cloud, but if Apple sells enough inference servers, servers that house its energy efficient chips, a large share of inference will move from the cloud down to these local servers and thereby greatly reducing the energy demands and disastrous impact of the remaining energy inefficient cloud-based servers.
1. R1 on Macs | 2. Customer Servers | 3. Home Centers | 4. Enterprise Centers |
5. Commercial Centers | 6. Anonymity & Payments | 7. Upgrading Siri |
8. Rollout | 9. Competition | Appendix 1 | Appendix 2
1. R1 on Mac Studios
- "Mac Studio With M3 Ultra Runs Massive DeepSeek R1 AI Model Locally", Tim Hardwick, MacRumors, 3/17/25
- “Apple Mac Studio M3 Ultra workstation can Deepseek R1 671B AI model entirely in memory using less than 200W, reviewer finds”, Efosa Udinmwen, MSN, 3/21/25
Here are a few quotes from an exclusive report in The Information … “Apple Is Working on AI Chip With Broadcom", Wayne Ma and Qianer Liu, The Information, 12/11/24
- "Apple is developing its first server chip specially designed for artificial intelligence, according to three people with direct knowledge of the project, as the iPhone maker prepares to deal with the intense computing demands of its new AI features."
- "Apple is working with Broadcom on the chip’s networking technology, which is crucial for AI processing, according to one of the people. If Apple succeeds with the AI chip—internally code-named Baltra and expected to be ready for mass production by 2026—it would mark a significant milestone for the company’s silicon team."
- "The team honed its expertise designing cutting-edge chips for the iPhone before later advancing to designing Mac processors that set new standards for performance and energy efficiency."
- "The move could cut costs and help Apple scale up AI services to support billions of devices"
- Create a hyper-scale cloud in competition with Amazon, Google, Microsoft, and Oracle.
- Greatly expand iCloud to include inference servers for the Apple ecosystem.
- Sell the new servers to customers who will use them to host inference servers within the Apple ecosystem.
- "Apple’s Secret Plan to Challenge AWS With M-Series Chips", Vikhyat, iGeeksBlog, 7/4/25
- Compact, powerful Mac Studios will be preferred by both subgroups, but the number purchased will be determined by their anticipated usage levels.
- Only designated employees and associates of professional enterprises will have access to the inference servers. Indeed, some employees and associates will be allocated home centers so they can work remotely and/or after hours. Professional centers will be accessible to all Apple devices, including iPhones.
- Educational centers will be accessible to designated employees and associates via all Apple devices. Some students will be given access via Apple devices in highly controlled labs, but probably not from their iPhones.
- Most enterprises employ tech support who, with proper training, will be able to work with iCloud’s techs to manage the servers, thereby reducing iCloud’s workloads. So the rule will only be “Don’t Touch” for non techs, but “Handle With Care” for techs.
- A larger enterprise should therefore purchase the same kinds of servers and employ the same kinds of highly skilled tech support staff as an allied data center.
- Alternatively, Apple could negotiate contracts with larger enterprises that would provide dedicated inference servers in nearby allied data centers. These servers would be maintained by the data center's staff but would only be accessible to employees of the large enterprise.
-- The members of a particular employee subgroup are likely to become heaviest users of these servers: the software engineers charged with writing the programs that enhance the productivity of their enterprises. Indeed, coding has proven to be the most reliably cost-effective use of generative AI.
- Apple will collect the monthly fees that users pay to the owners of inference models. Apple will then pass these payments to the owners of the inference models.
- Apple will also collect the rental payments that owners of inference models will pay to the owners of the data centers that cover the costs of hosting their models. Apple will pass these payments in a manner that will be described in section 8 (“Rollout”) of this note.
For the last two years, Apple’s top managers have characterized its late and grossly underfunded entry into the generative AI race as “upgrading Siri”. Those of us who have been tracking this race found this characterization to be absurd, or at best, misguided.
Apple is a hardware company that has always acquired its most important software -- including Siri and Safari -- from outside, then adapted the software to fit into its closely interconnected ecosystem. (See the note on this blog: Apple's dilemma: Pleasing its iPhone users AND its long term investors, Appendix #1, 7/8/25)
That Apple should fail to develop competitive foundation models was disappointing, but not surprising because foundation models are, perhaps, the most difficult software applications ever developed by anyone, anywhere.
However, to see Apple now perched on the verge of becoming one of the dominant players in generative AI based on its superb innovations in the design of energy efficient Apple Silicon is a welcome surprise of the best kind.
Siri is not a chatbot. Siri is a personal assistant. Unlike a chatbot, Siri is not supposed to know everything about everything. It only needs to know how to help its users get through various situations in their personal lives. To do so, Siri needs to know how to manage some of the same tools that its users might use.
- Prior to the advent of generative AI, Siri knew how to manage email, calendars, text messaging, phone calls, and other mundane applications.
- In the context of generative AI, “upgrading Siri” should mean enabling Siri to manage agents based on large foundation models. Agents will be downloaded from the App Store.
- Agents can be quite powerful. So enabling Siri to manage agents will make Siri a far more powerful personal assistant than it was before.
- In the past, Siri’s behavior was based on a fixed set of rules; but the new Siri’s behavior will be guided by a language model.
- The good news is that the underlying language model need not be a large language model; it can be a small language model, small enough to fit comfortably on a user’s iPhone.
- If Apple follows its own considerable precedents, it will not develop a small language model for Siri. It will acquire a model developed by someone else, then retrofit it to operate seamlessly within Apple’s closely interconnected ecosystem.
The generative AI community has developed a vast and ever-growing collection of agents. Here is a very partial list of the types of agents that Apple might enable Siri to manage:
- Travel logistics – not just hotels/restaurants, but flights, car rentals, ride-shares.
- Shopping and delivery – ordering groceries, consumer goods, or meal delivery.
- Financial helpers – paying bills, tracking expenses, moving money between accounts.
- Healthcare – booking doctor appointments, refilling prescriptions, checking insurance.
- Media & entertainment – finding movies, booking tickets, curating playlists or podcasts.
- Customer service – handling returns, tracking shipments, troubleshooting orders.
- Anonymity of users
Only Apple will know who is using the agents. Neither the developers nor the allied data centers will know. - Billing and payments
Apple will collect and distribute all payments for user subscriptions and center charges to model owners so that no one is overcharged or underpaid. - Siri compliance
Admission of agents to the app store will require stricter specifications and stricter enforcement of these specs than for other kinds of apps. Strict adherence to the specs will enable Siri to manage agents autonomously in the user's absence. Users can still sideload agents from outside the app store; but these users will have to manage the apps themselves, thereby forfeiting the biggest benefit of Siri's upgrade.
Accordingly, Apple should collect no fees from developers when their agents are admitted to the app store, a position that should greatly reduce demands from some developers that users should be able to download their apps from sources outside of the app store. - Apple's zero knowledge
Apple will have to add another layer of complexity to its management processes so that Apple itself cannot determine the content of a user's interactions with agents. It will employ end-to-end encryption and/or whatever other techniques were needed to achieve similar objectives previously, as in its "Private Cloud Compute (PCC)" - Heavily discounted subscription fees
Agents that are based on foundation models assume that users of agent apps will haves paid access to the models for best performance. A large majority of Apple's user community currently have no access or free subscriptions to various chatbots. Appendix #1 will argue that the monthly fees for ChatGPT for home servers should be $10 per month, instead of the $20 per month that OpenAI charges for use of ChatGPT in cloud-based data centers. This lower subscription fee reflects the lower costs that OpenAI will pay to Apple's local allied data centers for hosting OpenAI's models on energy-efficient Apple Silicon.
Appendix #1 will also argue that Apple should allocate $7 of the $10 to OpenAI instead of the $5 it usually receives from the $20 fee, thereby increasing its income for each subscription and making Apple users "preferred" users. - Siri is a "personal" assistant; so Apple will not provide this discount benefit to users who use their GPT Plus accounts to deploy agents as part of their business operations. Apple's allied data centers will block accounts that try to use Siri to make high volume GPT Plus requests for access to inference models, e.g., when they attempt to access a model 75 times in one day.
- Here's a recent relevant report, "Apple working on MCP support to enable agentic AI on Mac, iPhone, and iPad", Marcus Mendes, 9to5Mac, 9/22/25
- The Mac Studio containing an M3 Ultra chipset that ran the R1 model that was referenced in Section 1 of these notes cost about $15,000. It had 512G memory by unifying the memories of two 128G chips.
We will assume that the Mac Studios in the Home Data Centers will contain M6 Ultra chipsets that will be five times as powerful as the M3 Ultras. They will have 2.5TB (2.5 terabytes) of memory by unifying the memories of four 640GB chips and might cost $30,000. - These households can afford to buy compact, high powered $30,000 Mac Studios. They will need little or no persuasion to purchase home data centers that will protect their privacy and greatly reduce the negative impact of their generative AI activities on local environments and on global warming ... and the following discussion will also show that they will pay lower monthly fees with fewer limits on their generative AI activities on their home server than on cloud-based servers.
- If only 10 percent of the most affluent households buy home centers each year, their purchases would generate 1.5 million X $30,000 = $45 billion in annual sales for Apple.
- Apple might deliver the Mac Studio plus an external drive containing an encrypted copy of an inference server, e.g. OpenAI’s GPT-5, to the buyer's home address.
- The server would have been pre-configured to send out a notification to system administrators that requests that the admin responsible for obtaining the encrypted drive provide it with a QR code that tells it how to decrypt the drive.
- The buyer would come to the server, be verified via facial recognition by the system’s camera, then display the QR code to the system’s camera that was in the message that iCloud had sent to the buyer’s iPhone when the server and the external drive had been shipped.
- The QR code would contain instructions and a token. The instructions would direct the server to contact iCloud and present the token as proof that it was, indeed, the server that Apple had shipped to the buyer.
- iCloud would provide the key for decrypting the external drive and would configure the server to become the default server for GPT-5 for all devices on the owner’s home network, i.e., all device having the same Apple ID as the owner. For example, all Macs, iPads, and iPhones would be directed to ChatGPT running GPT-5 on the home server rather than to the chatbot and its underlying model in the Azure cloud.
In the first note in this series, the editor referred to the fees paid by the users of a home inference server as "subscription fees". That was incorrect L. The part of the fees paid to OpenAI for using a copy of its GPT-5 inference model on a home server are the royalties to which OpenAI is entitled for the use of its intellectual property.
- For example, older members of the family might agree to let younger members have priority during early evening hours when younger members were trying to do their homework.
- Many Mac Studios.
How many? That would depend on the anticipated number of users and the intensity of their use. - The tech skills of their tech support staff.
More competent tech staff could mean less need for iCloud support and therefore lower fees the owners of the inference models would pay Apple for iCloud support. - Longer time for sales.
Enterprise buyers would need more evidence from Apple of the cost-effectiveness of its local inference models compared to cloud-based models. - More complex Self-Imposed Usage Limits
More complex limits would require more extensive iCloud manage management, so higher Apple charges to the owners of the models.
- When Siri is upgraded, probably in early 2026, Siri will manage agents (downloaded from the App Store) that utilize foundation models in the cloud for all users because home and enterprise data centers will not yet exist.
- But when home data centers appear, Siri will perform tasks for linked users using agents that use the foundation models in their home data centers.
- Similarly, when enterprise data centers appear, Siri will perform tasks for linked users via agents that use the foundation models the in their enterprise centers.
- Finally, as nearby allied commercial data centers with sufficient capacity appear, iCloud will direct Siri to perform tasks for unlinked users who want to use agents to the foundation models in nearby commercial centers. There will be at least 100 million unlinked iPhone users in the U.S.
Question: Why will commercial centers buy so much expensive Apple Silicon?Answer: Because the entrepreneurs who establish these centers will anticipate substantial profit margins from their status as the newest components of Apple's ecosystem.
- Allied data centers will have no marketing expenses
-- Apple will provide each center with high volume/low value users, i.e., hundreds of thousands of unlinked users of Apple devices.
-- Apple will also negotiate contracts that will provide dedicated servers in nearby allied data centers for low volume/high value users, i.e., scores of larger enterprises. - Nevertheless Apple's allied data centers will earn payments from their users that will provide substantial returns on the data centers' investments, i.e. their purchase of Apple's inference servers.
- Small group of partners in U.S. (and elsewhere)
Let’s say we begin with the notion of one data center in each state, 50 centers. Too many partners. But if each partner eventually managed five data centers, 10 partners would be a manageable group.
And one other modification. No ally can have more than two data centers in one state, but there is no upper limit on the size of its centers. Its other three centers must be in three different nearby states.vThe core idea here is a balance of concentration and geographical dispersion. Other numerical distributions could also provide this balance. - Sustainable energy
All of Apple’s iCloud facilities run on sustainable energy, so Apple would require the same commitment from its partners. Given the high energy-efficiency of its chips, no high powered cooling systems are required
Empty warehouses near fiber optic infrastructure for high speed Internet connectivity would be ideal locations once they are equipped with solar panels on the roofs and battery storage to provide sustainable, round-the-clock power.
-- "Apple now globally powered by 100 percent renewable energy", Apple, 4/9/18
As argued in Appendix #1, the data centers will receive $3 per month payment from ChatGPT for running its inference model for the 100 million unlinked users = $300 million income per year.
- If the agent requests additional information, Siri will provide it based on its knowledge of its users preferences.
- If Siri can't confirm the reservation until 30 minutes later -- because so many agents were trying to make reservations using the same inference server in the data center -- the 30 minute delay from request to response is not only acceptable, most of time it won't even be noticed by the user.
- Meanwhile if the user generates another request for a reservation or some other task, Siri, the manager not the actual performer, will just pass the request to the appropriate agentic app, then wait for its response before returning the response to the user.
- When an unlinked user's device is first linked to a data center, iCloud will assign the unlinked user to the center that is the closest to the user's device. This will be the device's default data center.
- From time to time iCloud will monitor the current workloads of all of the servers in all of the allied data centers.
- When Siri assigns a task to an agentic app, iCloud will point the app to the server in the user's default data center that had the smallest workload in iCloud's last survey.
-- But if the designated server reports that its workload has increased too much, iCloud will try to point the app to the another server in the default center.
-- But if all of the servers in the default center inform iCloud that their current workloads are too high, iCloud will try to point the app to a server in the closest center owned by the same ally and restart the selection process in the second data center.
-- The main point of this process is that iCloud will try to find a server that will provide the fastest possible response to a user's request ... but the other owners of other data centers should not have to back up the owner of a default data center who did not buy enough inference servers to provide timely service during peak load times - However, if Apple mandates that each data center initially buy at least 10 M6 Ultras, iCloud's first choice of server in the user's default data center will usually be available... in the short run.
- The cost of the initial 10 Ultras in the 50 U.s. data centers = 10 x 50 x $35,000 = $17,500,000. The income during the first year = 100,000,000 unlinked users x $3 per month from OpenAI = $300,000,000. This would yield a $300,000,000 / $17,500,000 = 17 to 1 return on their investments ... That's very high, too high ... way too high.
In other words, the short run begins to end with initial user curiosity. Its end is hastened as more and more users' curiosity solidifies into habit as they learn firsthand that generative AI can do so much more than make reservations ... that generative AI chatbots can provide plausible answers to their questions about anything ... and the short run comes to a dead screeching halt if/when enough users’ habit of using chatbots frequently morphs into addiction wherein they interact with chatbots every day, many times a day for longer and longer periods.
- The revised cost of the initial 10 Ultras in the 50 U.s. data centers = 10 x 50 x $35,000 = $17,500,00, as before. But the revised income during the first year = 100,000,000 unlinked users x $1 per month from OpenAI = $100,000,000. This would yield a $100,000,000 / $17,500,000 = almost 6 to 1 return on their investments ... That's still a very high return so it should still provide a strong inducements for entrepreneurs to build data centers that will become the newest components of Apple's ecosystem.
- Apple might yield $2 of the $3 monthly payments from OpenAI to its allied centers after they install their second 10 M6 Ultra server boards; and pay the full $3 paid by OpenAI after the centers purchase their third set of 10 M6 Ultra motherboards (and thereafter if iCloud's algorithms have achieved the required level of accuracy in predicting the number of motherboards that are really required in each data center to make things "just work" no matter what happens.
- At the end of the second year, the data center would recoup the $30,000 it paid to Apple for the server plus a $10,000 profit. As an added benefit, the data center would be able to reassign the “old” Ultra to the pool of inference servers it deploys in support of Siri’s autonomous management of agentic services for linked users.
- The larger enterprise customer can also do this math, but would probably find it attractive nevertheless. A $20,000 annual dedication fee is better than a $30,000 purchase if it also comes with a stipulation that should Apple produce a more powerful chip before the end of the second year, the customer’s “old” Ultra will be replaced by a new one. So the marginally higher price provides a guarantee that the customer will always have access to Apple’s state-of-the-art technology.
- Given the substantially lower costs of running inference models on Apple Silicon, Apple should face no significant obstacles to securing hundreds of larger enterprise customers for each of its 50 allied data centers.
- The Verge recently reported that Nvidia is investing $5 billion in Intel to jointly develop PC and data center chips. Nvidia knows what Apple is up to. So this is probably Nvidia's initial response, investing billions in Intel to develop more energy-efficient chips.
c) U.S. vs China
Here's another closely related recent story, "China bans tech companies from buying Nvidia’s AI chips", Zijing Wu, Cheng Leng,Tim Bradshaw, Financial Times, 9/17/25 -- This story also covered by Ars Technica
China is about the same size as the U.S., but it has a population that is four times as large. Therefore the relentless growth of today's cloud-based data centers, centers that deploy Nvidia's energy-inefficient chips, will have a disastrous impact on their environment and on climate change that will be four times as large as whatever will be experienced by the U.S.
China is Apple's second biggest market, so the Chinese understand Apple’s strategies. They know, far better than the editor of this blog, that Apple is just a few years away from announcing a game changing breakthrough. So the Chinese have decided to develop their own energy-efficient chips for training and inference; but they might follow a different strategy than Apple’s.
Appendix #2 presents the editor's rationale for anticipating that current competitive efforts will not be resolved by national governments picking winners and losers, choosing favorites based on flattery and backstabbing.
It’s far more likely that the brutal but constructive clashes of profit-driven competitors in free markets will anoint winners in multiple market segments, but not because any of them has achieved AGI (artificial general intelligence).
Indeed, the continued absence of an underlying science to guide the efforts of the world’s most brilliant and best funded software engineers makes it highly unlikely that AGI will be achieved. The winners will win because markets will determine that their software has somehow become substantially more cost-effective than the software of competitors in their segments. And Apple will stand in all of the winners’ circles because its ultra energy-efficient hardware will facilitate all of the winning software.
Components of Apple Silicon
- Combining the two most powerful M1 Max chipsets, each with 64GB of memory, yielded the M1 Ultra with 128GB, 2 times as much as each M1 Max
- Similarly, combining the two most powerful M2 Max chipsets, each with 96GB of memory, yielded the M2 Ultra, with 192GB, 2 times as much as each M2 Max.
- But when Apple produced the M3 Ultra, it made an important breakthrough.The most powerful M3 Max had 128GB memory, but the memory in the M3 Ultra was 512GB, four times the memory in each M3 Max.
- However, the M4 Max only had 128GB memory, the same as the M3 Max; and Apple has yet to release an M4 Ultra.
- Apple has indicated that its M5 chipsets will be released in late 2025 or early 2026 ... but it has given no indication that it will increase the memory in the M5 Max beyond 128GB.
- "Apple has a massive contract with TSMC to use their most advanced manufacturing processes. The technology you're thinking of is TSMC's family of 3-nanometer (3nm) chips.
A "nanometer" refers to the size of the transistors—the tiny on-off switches that are the building blocks of a chip. Moving from a 5nm process to a 3nm process means you can pack billions more transistors into the same space.Here’s what that enables:
1. Denser Connections: By making everything smaller, the metal lines (interconnects) that connect the processors, GPU, and unified memory are packed much closer together. This creates a shorter, wider "highway" for data to travel.
2. Higher Bitrate: Because that highway is shorter and has more lanes, data can travel between the different parts of the chip and the unified memory much faster and with less energy. This increased speed is what we call higher bandwidth or bitrate. It's crucial for tasks that move huge amounts of data, like AI and high-resolution video editing.
3. Improved Efficiency: Shorter travel distances for electrical signals also mean less power is wasted as heat. This is why M3 and M4 chips can deliver incredible performance while maintaining excellent battery life."
- "Volume advantage for learning:
-- iPhone market: 100+ million users in the US alone (massive volume)
-- Mac market: 10-15 million units (much smaller volume)Apple’s smart staged approach:
-- M5 [cores] get the 3nm process first - work out manufacturing kinks on the high-volume iPhone production ... iPhone production = learning laboratory - TSMC and Apple perfect the 3nm manufacturing process on hundreds of millions of chips-- M6 benefits from mature process - by the time M6 launches, the 3nm process is proven and optimized
Why this makes business sense:
-- Risk mitigation: Test new process on high-volume, lower-cost chips first
-- Cost optimization: Spread 3nm development costs across massive iPhone volumes
-- Quality assurance: Iron out any 3nm manufacturing issues before moving to expensive Mac chips
-- Confident scaling: When M6 launches, TSMC can confidently handle the more complex memory configurations”
This section provides a method for devising a rough estimate of the relative cost of running inference servers in an allied Apple data center compared to the cost of winning them in an AI cloud based data center.
It makes some assumptions based on the on the editors reading of a variety of tech media. However, it alsovassumes that Apple will use far more accurate numbers if they apply a method such as suggested in this section.
a) R1 on M3 Ultra
In the second section of the note that proceeds this appendix, called the reader’s attention to a breakthrough in the use of Apple Silicon in December 2024. A Deep Seek R1 model was run on a Mac Studio containing a M3 Ultra with 512 GB of unified memory. Two caveats are needed now.
- The demonstration did not run a full version of the R1 model. A smaller, four bit quantized version of the model was deployed. Four 4 bit quantization produces a model that is about 95% as accurate as the original. Apple will not be competitive if it only runs quantized versions of inference models. Apple will have to run full models.
- The R1 model had an unusual structure. It used the so-called mixture of experts architecture. This meant that the entire 671 million byte model was never in unified memory. Only parts of it were in memory, which is why the model fit into the 512 GB of unified memory. Most models do not employ this architecture. So once again the M6 Ultra will have to have enough unified memory to run the largest prominent models.
Today’s models seem to be larger than one terabyte (1TB). But by the end of 2027 we should expect that many models will be closer to 2 TB. We also need to allow for the use of RAG techniques that would add more information into unified memory. So it would be better if the M6 ultra contained at least 2.5TB of unified memory.
The rest of this discussion will use OpenAI models supporting ChatGPT as an example. Similar estimates could be made for other models.
b) Subscription fees in the cloud
The editor has read a number of tech articles that suggest that the GPT Plus $20 a month subscription fee has two components. The larger component, Microsoft’s charge for running the model on its Azure cloud, is about $15. The remaining $5 dollars is what openAI collects for the use of its models, its intellectual property.
- Accordingly, Apple will suggest that the much smaller electrical demands of its allied centers deploying Apple Silicon will enable them to charge a much smaller cost to OpenAI for running its models.
- But OpenAI subscribers should at least pay the same $5 royalty to OpenAI for the use of its intellectual property.
c) Apple Silicon vs. chips in the cloud
The electric bill is the main component of the operating cost of a cloud data center. So how does the electricity required by ordinary GPU chips in the cloud compare to the electricity required by Apple Silicon? Here is an article that reported an extensive examination of the performance of the M3 Ultra chip
- "Mac Studio M3 Ultra Tested: Ultimate Power, But for Who?", Otabek Djuraev, Hostbar, 3/28/2x
Scrolling far into the article, the reader will find the author’s assessment of the extraordinary energy efficiency of the ultra chip.
“One particularly impressive aspect of the M3 Ultra's performance is its energy efficiency. Electricity monitor measuring power usage between 160 to 180 W. Power consumption at the wall when running the Deep Seek R1 model was well under 200 watts. Comparable performance on a PC would require multiple high-end GPUs consuming perhaps 10 times more electricity.”
In other words, the M3 Ultra Mac Studio running the Deep Seek R1 inference model consumed well under 200 watts, while comparable performance on a PC would require multiple high-end GPUs consuming perhaps 10 times more electricity - representing approximately 90% power savings.
d) Linear relationships rarely scale, especially the good ones
It’s delightful to discover that the M3 Ultra with 512GB unified memory only required 10% of the power that a cloud chip would require for a small inference model. But that does not necessarily mean that the M6 Ultra would only require 10% of the power to run a 5 X 512GB = 2.5TB inference model. In the worst case, it might require 5 times as much power.
But that seems way too pessimistic, especially given the fact the M3 Ultra contained transistors that were 5 nanometers. Since then TSMC, with Apple funding, has decreased the size of its transistors down to 3nm, and is now further reducing the size to 2nm, again with Apple funding.
Given that 2nm is 40 percent of 5nm, this suggests how we should “correct” the pessimistic estimate. The pessimistic estimate = 5 X 10% of $15 = $7.50. Therefore we should use 40% of the pessimistic estimate = 40% of $7.50 = $3 per user per month. In other words, OpenAI should pay $3 per month to an allied data center for the same reason that it pays $15 per month to Microsoft’s Azure cloud.
Now take a step back. In the Azure cloud, OpenAI earned $5 per month from each subscription. At this point in our analysis, OpenAI receives $5 from subscribers but must pay $3 to an allied Apple data center. So it only earns $2 per month. Accordingly, Apple subscribers should pay at least $8 per month so that OpenAI nets $5 per month. Better still, if Apple’s users paid $10 per month, half the $20 cloud subscription rate, OpenAI would earn $7 per month, more than its cloud income. This would give Apple’s subscribers a strategic preferred customer status.
Let's return to the monthly $3 charge collected by an alllied data center. It's reasonable to apply this same logic to home and enterprise data centers. The tech support required to keep these centers running is supplied by iCloud, so Apple should collect this $3 monthly fee from OpenAI, and the homeowner and proprietors should pay $10 monthly subscription fees for each user to OpenAI for use of GPT 5 under GPT Plus subscription.
Question: GPT Pro subscribers who pay $200 per month in the cloud. What should they pay on Apple Silicon?
Answer: Just the regular $10 Silicon subscription fee in most cases. Let's break this down. The reader should keep in mind that the Pro subscription affords more intensive use of some OpenAI models and features than a Plus subscription. But this advantage fades away in each type of access to inference models identified in this discussion, except one.
- Home … No special access is necessary because members of the household can work out voluntary restrictions among themselves on the use of the server so that everyone gains as much access as possible
- Enterprise … One or more servers will be dedicated to the relatively small group of users who need to make intense use of OpenAI's models
- Large enterprise … Again, one or more servers in the allied center will be dedicated to the relatively small group of users who need to make intense use of OpenAI's models
- Unlinked users assigned to allied data centers ... Their subscription fees should be calculated by the same process that was used to calculate the equivalent GPT Plus subscription fees.
-- Azure charges $200 - $5 = $195 per month
-- Recall that M3 Ultra chips that were 6-nm only required 10 percent power of cloud chips, 10% x $195
-- Worst case: 5 M6 Ultra chips might be required
= 5 X (10% x $195) = 5 X $19.50 = $97.50
-- But M6 Ultra's 2-nm chips will only require 40% power of M3 Ultra chips
= 40% x $97.50 = $39 per month
-- As in the Azure cloud, these users do not have unlimited access; from time to time, their access might be reduced by the managers of their assigned data center.
Modern engineering has achieved its greatest success within frameworks provided by one or more underlying sciences.
- Ancient Roman engineers turned cement into self-healing concrete which allowed them to build structures, e.g., aqueducts, that lasted for hundreds, nay thousands of years. They built arches with their concrete, then rotated their arches into magnificent domes that still enchant us with their beauty ... all without the benefits of anything remotely resembling modern materials science. But without an underlying guiding science, their "best practices" and "rules of thumb" only produced a handful of such history making innovations.
- By contrast, ever since Renaissance giants like Galileo triggered the development of modern science as a systematic data-driven discovery of the underlying order in real world phenomena, engineers have produced generation after generation of astounding innovations.
All engineers, ancient and modern, know how to make things work. Indeed, engineers are still assessed by their "know how". But the glory of modern engineering derives from its guidance by underlying sciences that alert engineers as to why some innovations are more likely to succeed and why others are more likely to fail. This knowledge greatly reduces the time modern engineers waste on vague hunches, gut feelings, trial and error, etc, etc, etc.
Without modern physics there would be no cars, trucks, airplanes, radio, TV, transistors, computers, etc, etc, etc. Without modern chemistry/biochemistry we could never grow enough food to feed our large urban populations, cool them in summer, warm them in winter, clothe them, cure their diseases, etc, etc, etc.
Unfortunately, modern science does not provide all of the answers to the questions modern engineers must address when designing their innovations. So modern engineers must still fall back on "best practices" and "rules of thumb" from time to time, but only from time to time.
Now comes the ongoing, unbelievably expensive development of large language models and we are suddenly back in Ancient Rome. Without guidance from underlying science, once again it's all about "rules of thumb", and "best practices", or "maybe your best practices are better than my best practices so maybe I should copy some of your best practices" or "better still, maybe I should just steal some your best engineers."
Empirical Studies
As most readers of this blog probably know, the biggest breakthroughs in generative AI, indeed, the foundations of generative AI, have been a few empirical observations that have not been explained by cognitive science nor computer science nor any other science.
- “Attention Is All You Need.”
In this widely read paper, published in 2017, Google employees reported their invention of "Transformers" (the "T" in OpenAI's "GPT" large language models), an architecture that would power every major AI breakthrough since 2017. It showed how to build AI models that could understand and generate human language with unprecedented sophistication.
- "Language Models are Few-Shot Learners", Tom B. Brown ... et al. ... Ilya Sutskever, Dario Amodei, arXiv, 7/22/20
-- Written by OpenAI staff, this report noted that GPT-3 showed "strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic ... Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general." - "Scaling Laws for Neural Language Models“, Jared Kaplan... et al. ... Dario Amodei, arXiv, 1/23/20
— This publication was the next big empirical breakthrough. The scaling laws estimated how much model size, computation, and data would be needed to enable subsequent emergent abilities. It was widely interpreted as predicting the pathway to artificial general intelligence (AGI) … but did it … really?
- "Sparks of Artificial General Intelligence: Early experiments with GPT-4", Sébastien Bubeck et al., arXiv, 4/13/23
-- Written by Microsoft’s staff, this paper boldly claimed that "beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance” … a striking example of the "irrational exuberance" inspired by the “scaling laws”.
These empirical breakthroughs were reminiscent of "Moore's Law", first published in 1965.
- "Cramming More Components onto Integrated Circuits", Gordon E. Moore, Electronics, 4/19/65
-- Moore's seminal paper was reprinted by the IEEE in 1998.
- "The complexity for minimum component costs has increased at a rate of roughly a factor of two per year (see graph). Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least ten years"
- Google declared a "Code Red", immediately invested ever increasing billions in generative AI during 2023, and promoted Demis Hassabis from his position at the top of Google's Deep Mind subsidiary to the top of all of Google's AI operations. Each year Google increased its capital investments by additional billions, mostly in cloud-based data centers
- Amazon invested $4 billion in Anthropic, the startup founded by some of the authors of the Scaling Laws. Amazon subsequently increased its investment to $8 billion. Google also invested $3 billion in Anthropic
- Meta also invested ever increasing billions, mostly in its own data centers
- Currently Google, Amazon and Microsoft have announced plans to invest at least $50 billion annually in data centers, with Meta close behind at about $40 billion annually.
- Sam Altman recently admitted that GPT-5 was not AGI because it can't learn from its own experience.
-- "Here's why Sam Altman says OpenAI's GPT-5 falls short of AGI", Brent D. Griffiths and Lakshmi Varanasi, Business Insider, 8/7/25
Indeed, all models based on Transformer architecture cannot learn, i.e., they cannot add to their permanent repository of knowledge. Everything they learn occurs in their pre-training. Only guardrails governing their behavior are added after pre-training.
Some models can search the Internet to acquire up-to-date additional information, but that new information is not retained permanently. Only the information acquired during a model's training is permanent. Indeed, they can't even transfer the new information they acquired when interacting with user A to a subsequent interaction with user B; they have to conduct the same search all over again. By definition, learning requires that new information be retained permanently.
In short, learning can never become an "emergent ability", no matter how large a model that is based solely on Transformer architecture becomes. - And then there is the Transformer architecture's peculiar emphasis on the size of a model. Current models were inspired by the networks of neurons in the brains in living creatures. But we have to ask ourselves, are smaller brains always associated with less intelligent behavior?
Consider the crow. Its brain is much smaller than a walnut. Nevertheless, crows use tools, conduct experiments, and learn from their experiments. Indeed, crows have been observed tossing small hard shell nuts onto roadways from a safe distance; waiting until cars run over the nuts; waiting until traffic subsided; then swooping down to gobble up the tasty morsels that were inside the cracked hard shells.
--"Bird Brains", Gareth Huw Davies, PBS
Crows learn. Models based solely on Transformer architecture cannot learn. We obviously need a better architecture for our models, an architecture more akin to the neural networks in the tiny brains of crows. Meanwhile, the disastrous game of AGI Chicken carries on.
No comments:
Post a Comment
Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.