Saturday, October 4, 2025

A closer look at the potential world-wide benefits of Apple Silicon

Last update: 4:39pm Sunday 10/5/25

This second note in an occasional series explores an emerging solution to a global disaster
. It takes a closer look at the potential world-wide benefits of Apple’s forthcoming sales of energy-efficient inference servers to its customers. It also corrects a few misconceptions in the first note 


To read more of this blog note click HERE



Preface …

This is a long speculative piece that was written as part of a series of notes the editor is writing in response to questions that had been nagging him for the first six months of 2025:

  • After Apple’s market capitalization peaked at $3.5 trillion last year, why hasn’t the value of its stock been in a state of inexorable decline throughout 2025?  Why isn’t its market cap down to $2.9 trillion or even $2.7 trillion?

Facing declining sales in China, declines that were triggered by the Chinese government’s policies … plus a humiliating failure to “upgrade Siri” as a direct consequence of its late and grossly underfunded entry into the Big Tech generative AI race … plus mounting EU regulatory challenges … plus President Trump’s loudly proclaimed displeasure over Apple’s moving its manufacturing loop base from China to India instead of to the U.S. … What has Tim Cook been quietly telling Apple’s largest shareholders, like Warren Buffett, that has kept them from racing to the exit doors?


The editor’s response to his own questions is an optimistic speculation that Apple’s relentless pursuit of ever higher profitability has caused it to repurpose its super energy-efficient Apple Silicon in a way that will challenge to the hegemony of the current "Lords of the Cloud-based Data Centers". Their predictably fierce competitive responses will yield a satisfactory resolution of the increasingly disastrous problems posed by their hyper-scale cloud operations.  


The problems 

Building and deploying larger and larger models in cloud-based data centers requires larger and larger supplies of energy. Readers who are unaware of the growing magnitude of this calamity are referred to the following recent reports:

  • “AI’s promise is still taking shape. The costs of its data centers are already here.", Adam Rogers, Hannah Beckler, and Dakin Campbell, Business Insider, 7/17/25

  • "Big Tech’s A.I. Data Centers Are Driving Up Electricity Bills for Everyone", Ivan Penn and Karen Weise, NY Times, 8/14/25

  • Readers who want to keep abreast of the full scope of the negative externalities imposed on surrounding regions and on global climate change by the relentless growth of AI cloud-based data centers are referred to the Evironment/Climate section of the editor's Useful AI News page.

  • Perhaps the most perplexing aspect of this increasingly expensive Big Tech generative AI race to build more and more data centers that are larger and larger in order to develop "smarter" and "smarter" models at the cost of greater and greater negative externalities is its lack of a firm foundation within the bounds of any kind of underlying science. Readers, hopefully a small minority, who are unaware of this fact are referred to its extensive discussion in Appendix #2.

A solution

A solution to this problem that will be explored in these occasional notes is inference servers running on energy-efficient Apple Silicon — chips designed by Apple and fabricated by TSMC. 


During its “training” phase, a model learns from a large dataset; during its subsequent “inference” phase, a trained model responds to queries from its users. Large models will continue to be trained in the cloud, but if Apple sells enough inference servers, servers that house its energy efficient chips, a large share of inference will move from the cloud down to these local servers and thereby greatly reducing the energy demands and disastrous impact of the remaining energy inefficient cloud-based servers.


1. R1 on Macs | 2. Customer Servers | 3. Home Centers | 4. Enterprise Centers

5. Commercial Centers | 6. Anonymity & Payments | 7. Upgrading Siri

8. Rollout | 9. Competition | Appendix 1 | Appendix 2 


1. R1 on Mac Studios

Here are links to two articles that describe one of the most impressive achievements of Apple Silicon so far:

  • "Mac Studio With M3 Ultra Runs Massive DeepSeek R1 AI Model Locally", Tim Hardwick, MacRumors, 3/17/25

  • Apple Mac Studio M3 Ultra workstation can Deepseek R1 671B AI model entirely in memory using less than 200W, reviewer finds”, Efosa Udinmwen, MSN, 3/21/25
This demonstration is important because it provides an intuitive “proof of concept” — the concept being a world class large language model running on a desktop Mac that contained a powerful single energy-efficient Apple Silicon M3 Ultra chipset with unified memory, a chipset that was not designed to run world class models. Such models usually run on arrays of Nvidia chipsets in a cloud. Desktop Macs run on house current and do not require external cooling systems. 

Now imagine a data center containing arrays of inference servers running R1 and other foundation models on far more powerful energy-efficient Apple chipsets with unified memory that were especially designed to run large language models.


2. Customer Inference Servers
Here are a few quotes from an exclusive report in The Information … “
Apple Is Working on AI Chip With Broadcom", Wayne Ma and Qianer Liu, The Information, 12/11/24
  • "Apple is developing its first server chip specially designed for artificial intelligence, according to three people with direct knowledge of the project, as the iPhone maker prepares to deal with the intense computing demands of its new AI features."

  • "Apple is working with Broadcom on the chip’s networking technology, which is crucial for AI processing, according to one of the people. If Apple succeeds with the AI chip—internally code-named Baltra and expected to be ready for mass production by 2026—it would mark a significant milestone for the company’s silicon team."

  • "The team honed its expertise designing cutting-edge chips for the iPhone before later advancing to designing Mac processors that set new standards for performance and energy efficiency."

  • "The move could cut costs and help Apple scale up AI services to support billions of devices"
The fundamental hypothesis of this series of occasional notes is that Apple will sell these new servers to its customers for use as inference servers in their homes, in their professional facilities, and in local commercial data centers.

Apple is not a software company. Apple is a hardware company, arguably the best hardware company, by far, among the Big Tech elite. Faced with mounting disappointment for its failure to become a major player in the ongoing generative AI revolution, Apple will have to create a major position in the hardware infrastructure of generative AI.The Information article quoted an Apple spokesperson as saying "Apple is developing its first server chip specially designed for artificial intelligence" 

There are three ways to use this chip in a server:
  • Create a hyper-scale cloud in competition with Amazon, Google, Microsoft, and Oracle.

  • Greatly expand iCloud to include inference servers for the Apple ecosystem.

  • Sell the new servers to customers who will use them to host inference servers within the Apple ecosystem.
Apple considered providing cloud services in direct competition with AWS and other cloud providers during the first half of 2024, but dropped the idea. 
  • "Apple’s Secret Plan to Challenge AWS With M-Series Chips", Vikhyat, iGeeksBlog, 7/4/25
That leaves two options: Expand iCloud to do everything OR sell servers to customers who will operate inference servers.

Apple has become a multi-trillion dollar company by selling devices that run software created by other companies and using iCloud to manage the usage of that software. That's why editor of this blog hypothesizes that Apple would prefer to sell its expensive new inference servers to customers and earn additional dollars by using iCloud to manage the usage of the models running on its customers’ servers in a way that maximizes the privacy of its users.

Most readers probably know that Apple distinguishes between different components that manage its ecosystem, e.g., Regular iCloud (personal), iCloud+ (premium tier), iCloud Private Relay, and Managed Apple IDs/enterprise cloud services. Apple will probably have a name for the specific component that manages the new inference servers within its ecosystem. But these notes will refer to all components as “iCloud” because for the purposes of this discussion, we really don’t need to know which components are doing what.

3. Home Data Centers
Note: Subsequent sections of this report will refer to the families of chipsets known as "Apple Silicon".  Readers who are unfamiliar with terms like "M1 Pro" and "M3 Ultra" are referred to Appendix #1 for concise definitions and explanations.  
 
Home data centers will usually consist of one Mac Studio, a small square box, but powerful enough to run full versions of inference models on its M6 Ultra chipset with 2.5TB [2.5 terabytes] memory. Studios as home data centers will be configured and managed remotely by iCloud.
 They might therefore be characterized as “Don’t Touch!!!” 

Their owners will only be able to log on to these servers after they use their iPhones to report something that went wrong that could be corrected by online sessions with iCloud tech support. iCloud techs will guide the owners through step-by-step repair processes.

Mac Studio computers are highly cost-effective because of their compact form; they provide high processing power for low prices … but they cannot be upgraded. For example, a buyer cannot add more memory later on.

Probable timeframe until substantial streams of customer purchases … Mid 2027


4. Enterprise data centers 
The servers in enterprise data centers will tend to be purchased by enterprises that place high value on privacy; indeed, various regulations require this priority. However it will be useful to distinguish two subgroups: professional services and educational. Professional services include lawyers, accountants, consultants, and healthcare services; educational enterprises include elementary, secondary, college, and universities.
  • Compact, powerful Mac Studios will be preferred by both subgroups, but the number purchased will be determined by their anticipated usage levels.

  • Only designated employees and associates of professional enterprises will have access to the inference servers. Indeed, some employees and associates will be allocated home centers so they can work remotely and/or after hours. Professional centers will be accessible to all Apple devices, including iPhones.

  • Educational centers will be accessible to designated employees and associates via all Apple devices. Some students will be given access via Apple devices in highly controlled labs, but probably not from their iPhones.

  • Most enterprises employ tech support who, with proper training, will be able to work with iCloud’s techs to manage the servers, thereby reducing iCloud’s workloads. So the rule will only be “Don’t Touch” for non techs, but “Handle With Care” for techs.
Probable timeframe until substantial streams of customer purchases … Late 2027

The previous paragraphs identified two types of enterprise centers: professional services and educational. There is also a third type based on size. Larger enterprises will need data centers that would be better served by the kinds of servers that will be found in Apple’s allied data centers that will be discussed in the next section. A few examples:  headquarters of hospitals, state and local government agencies, and national/regional headquarters of larger corporations
  • A larger enterprise should therefore purchase the same kinds of servers and employ the same kinds of highly skilled tech support staff as an allied data center.

  • Alternatively, Apple could negotiate contracts with larger enterprises that would provide dedicated inference servers in nearby allied data centers. These servers would be maintained by the data center's staff but would only be accessible to employees of the large enterprise.  

    -- The members of a particular employee subgroup are likely to become heaviest users of these servers: the
    software engineers charged with writing the programs that enhance the productivity of their enterprises. Indeed, coding has proven to be the most reliably cost-effective use of generative AI.

5. Allied commercial data centers 
Commercial data centers will deploy Apple Silicon to provide inference servers for Apple devices not located in users’ homes and not located in professional facilities or in educational institutions.

These centers will purchase scores of modified M6 Ultra motherboards that are usually housed in “Mac Pro” tower shaped computers. However, in commercial data centers the motherboards will be removed from their towers and stacked in racks by Apple, then sold to the owners of commercial data centers. Like the Studio model, the unified memory on a Mac Pro cannot be expanded. However, the racked motherboards can contain more extensive network connectivity and massive high speed internal solid state devices that can hold many 2.5TB inference models and read a model into unified memory in less than 30 minutes. 
Local commercial centers will operate in direct competition with the hyper-scale AI data centers in the cloud. 

Note: Unfortunately, Apple calls a chipset in the M series “Pro” and also uses the same name for a powerful upgradable tower shaped computer .

Whereas cloud-based centers engage users within thousands of miles, local commercial centers will only engage users within hundreds of miles in order to provide faster responses to users’ queries as well as less costly responses because their deployment of Apple Silicon will make them far more energy-efficient.

Like their cloud-based competition, local commercial data centers deploying Apple Silicon will also be “lights out” operations whose employees will be mostly maintenance staff and highly skilled tech support teams. Additional support for these operations will more likely come from Apple Business Consultants than from iCloud. 

Probable timeframe until substantial streams of customer purchases … Late 2028


6. User Anonymity and payments 
Apple will probably configure its iCloud services to monitor the use of inference models in data centers that deploy Apple servers in a way that enables Apple to manage the flow of fees and server rental payments.
  • Apple will collect the monthly fees that users pay to the owners of inference models. Apple will then pass these payments to the owners of the inference models. 

  • Apple will also collect the rental payments that owners of inference models will pay to the owners of the data centers that cover the costs of hosting their models. Apple will pass these payments in a manner that will be described in section 8 (“Rollout”) of this note.
Apple will collect and distribute the fees and payments in a way that secures the anonymity/privacy of the users of the models. Indeed, Apple will not have access to the queries that users pose to the inference models.


7. Upgrading Siri 

For the last two years, Apple’s top managers have characterized its late and grossly underfunded entry into the generative AI race as “upgrading Siri”.  Those of us who have been tracking this race found this characterization to be absurd, or at best, misguided.


Apple is a hardware company that has always acquired its most important software -- including Siri and Safari -- from outside, then adapted the software to fit into its closely interconnected ecosystem. (See the note on this blog: Apple's dilemma: Pleasing its iPhone users AND its long term investors, Appendix #1, 7/8/25)


That Apple should fail to develop competitive foundation models was disappointing, but not surprising because foundation models are, perhaps, the most difficult software applications ever developed by anyone, anywhere.


However, to see Apple now perched on the verge of becoming one of the dominant players in generative AI based on its superb innovations in the design of energy efficient Apple Silicon is a welcome surprise of the best kind. 


Siri is not a chatbot. Siri is a personal assistant. Unlike a chatbot, Siri is not supposed to know everything about everything. It only needs to know how to help its users get through various situations in their personal lives. To do so, Siri needs to know how to manage some of the same tools that its users might use. 

  • Prior to the advent of generative AI, Siri knew how to manage email, calendars, text messaging, phone calls, and other mundane applications. 

  • In the context of generative AI, “upgrading Siri” should mean enabling Siri to manage agents based on large foundation models. Agents will be downloaded from the App Store.

  • Agents can be quite powerful. So enabling Siri to manage agents will make Siri a far more powerful personal assistant than it was before.

  • In the past, Siri’s behavior was based on a fixed set of rules; but the new Siri’s behavior will be guided by a language model.

  • The good news is that the underlying language model need not be a large language model; it can be a small language model, small enough to fit comfortably on a user’s iPhone. 

  • If Apple follows its own considerable precedents, it will not develop a small language model for Siri. It will acquire a model developed by someone else, then retrofit it to operate seamlessly within Apple’s closely interconnected ecosystem.

The generative AI community has developed a vast and ever-growing collection of agents. Here is a very partial list of the types of agents that Apple might enable Siri to manage:

  • Travel logistics – not just hotels/restaurants, but flights, car rentals, ride-shares.

  • Shopping and delivery – ordering groceries, consumer goods, or meal delivery.

  • Financial helpers – paying bills, tracking expenses, moving money between accounts.

  • Healthcare – booking doctor appointments, refilling prescriptions, checking insurance.

  • Media & entertainment – finding movies, booking tickets, curating playlists or podcasts.

  • Customer service – handling returns, tracking shipments, troubleshooting orders.
Agents that use foundation models can be very powerful, so iCloud techs will have to invest substantial effort to ensure that agents are properly configured in order to ensure that they "just  work" smoothly within Apple's ecosystem. Apple's failure to make sufficient investments in large language models was the direct cause of its failure to upgrade Siri in a timely manner. The editor assumes that Apple will not make this strategic mistake again.   

Therefore, Apple will probably introduce an "Agentic Management Fee" of $15 per month to ensure that Siri can manage these powerful agents unattended, i.e., while its users are tending to other matters. These monthly fees will paid by all users and will cover the costs of Apple's providing the following services. 

The editor of this blog is not predicting the strategies that Apple will actually follow. The following paragraphs merely demonstrate the feasibility of enabling Siri to manage agents autonomously using strategies that Apple has deployed to address comparable objectives in the past.
  • Anonymity of users
    Only Apple will know who is using the agents. Neither the developers nor the allied data centers will know.
  • Billing and payments
    Apple will collect and distribute all payments for user subscriptions and center charges to model owners so that no one is overcharged or underpaid.
  • Siri compliance
    Admission of agents to the app store will require stricter specifications and stricter enforcement of these specs than for other kinds of apps. Strict adherence to the specs will enable Siri to manage agents autonomously in the user's absence. Users can still sideload agents from outside the app store; but these users will have to manage the apps themselves, thereby forfeiting the biggest benefit of Siri's upgrade. 

    Accordingly, Apple should collect no fees from developers when their agents are admitted to the app store, a position that should greatly reduce demands from some developers that users should be able to download their apps from sources outside of the app store.
  • Apple's zero knowledge
    Apple will have to add another layer of complexity to its management processes so that Apple itself cannot determine the content of a user's interactions with agents. It will employ end-to-end encryption and/or whatever other techniques were needed to achieve similar objectives previously, as in its "Private Cloud Compute (PCC)"
  • Heavily discounted subscription fees 
    Agents that are based on foundation models assume that users of agent apps will haves paid access to the models for best performance. A large majority of Apple's user community currently have no access or free subscriptions to various chatbots. Appendix #1 will argue that the monthly fees for ChatGPT for home servers should be $10 per month, instead of the $20 per month that OpenAI charges for use of ChatGPT in cloud-based data centers.  This lower subscription fee reflects the lower costs that OpenAI will pay to Apple's local allied data centers for hosting OpenAI's models on energy-efficient Apple Silicon.

    Appendix #1
      will also argue that Apple should allocate $7 of the $10 to OpenAI instead of the $5 it usually receives from the $20 fee, thereby increasing its income for each subscription and making Apple users "preferred" users.
  • Siri is a "personal" assistant; so Apple will not provide this discount benefit to users who use their GPT Plus accounts to deploy agents as part of their business operations. Apple's allied data centers will block accounts that try to use Siri to make high volume GPT Plus requests for access to inference models, e.g., when they attempt to access a model 75 times in one day.
  • Here's a recent relevant report, "Apple working on MCP support to enable agentic AI on Mac, iPhone, and iPad", Marcus Mendes, 9to5Mac, 9/22/25


8. Rollout
Thus far, this speculative note has discussed Apple's new inference data centers after a successful launch. It has not offered insights as to how Apple might move from zero centers to a state wherein its ecosystem contains enough centers to support substantial percentages of its iPhone, iPad, and Mac users.

Once again, the editor of this blog is not predicting the strategies that Apple will actually follow. Subsequent paragraphs will merely demonstrate the feasibility of an orderly rollout that is consistent with previous strategies that Apple has followed using its iCloud servers to maintain the interoperability of the components of its ecosystem. The discussion focuses on the U.S., but the same strategies might be followed all over the world.


8a. Home Data Centers
According to Statista, there are almost 15 million households in the U.S. having income greater than $200,000 per year. 

Purchases and Apple’s Revenue
  • The Mac Studio containing an M3 Ultra chipset that ran the R1 model that was referenced in   Section 1 of these notes cost about $15,000. It had 512G memory by unifying the memories of two 128G chips.

    We will assume that the Mac Studios in the Home Data Centers will contain M6 Ultra chipsets that will be five times as powerful as the M3 Ultras. They will have 2.5TB (2.5 terabytes) of memory by unifying the memories of four 640GB chips and might cost $30,000.

  • These households can afford to buy compact, high powered $30,000 Mac Studios. They will need little or no persuasion to purchase home data centers that will protect their privacy and greatly reduce the negative impact of their generative AI activities on local environments and on global warming ... and the following discussion will also show that they will pay lower monthly fees with fewer limits on their generative AI activities on their home server than on cloud-based servers.

  • If only 10 percent of the most affluent households buy home centers each year, their purchases would generate 1.5 million X $30,000 = $45 billion in annual sales for Apple.
Delivery and Configuration 
Although we will refer to OpenAI’s ChatGPT and GPT-5 model throughout this discussion as examples, similar points could be made about chatbots and models, from Google, Anthropic, Meta, xAI, etc.
  • Apple might deliver the Mac Studio plus an external drive containing an encrypted copy of an inference server, e.g. OpenAI’s GPT-5, to the buyer's home address.

  • The server would have been pre-configured to send out a notification to system administrators that requests that the admin responsible for obtaining the encrypted drive provide it with a QR code that tells it how to decrypt the drive. 

  • The buyer would come to the server, be verified via facial recognition by the system’s camera, then display the QR code to the system’s camera that was in the message that iCloud had sent to the buyer’s iPhone when the server and the external drive had been shipped.

  • The QR code would contain instructions and a token. The instructions would direct the server to contact iCloud and present the token as proof that it was, indeed, the server that Apple had shipped to the buyer.

  • iCloud would provide the key for decrypting the external drive and would configure the server to become the default server for GPT-5 for all devices on the owner’s home network, i.e., all device having the same Apple ID as the owner. For example, all Macs, iPads, and iPhones would be directed to ChatGPT running GPT-5 on the home server rather than to the chatbot and its underlying model in the Azure cloud.


Fees ...  Intellectual Property Royalties
In the first note in this series, the editor referred to the fees paid by the users of a home inference server as "subscription fees". That was incorrect L. The part of the fees paid to OpenAI for using a copy of its GPT-5 inference model on a home server are the royalties to which OpenAI is entitled for the use of its intellectual property

The editor of this blog has no data from primary sources, but his regular reading of various tech publications suggests that at least $15 of the monthly GPT Plus $20 subscription fee is paid to Microsoft for hosting GPT-5 in its Azure cloud; at most $5 is OpenAI's royalty.


Apple’s iCloud Remote Management Expenses 
Apple will have to add more staff and servers to iCloud to provide remote management of home data centers. Appendix #1 will argue that these additional costs will justify Apple charging monthly rental fees to model owners, e.g., $3 per month to OpenAI for GPT 5.

Self-Imposed Usage Limits + Usage Apps
Although OpenAI will not impose usage limits on the users of a home server, many households might want to impose their own limits on themselves if they find that their inference server responds too slowly when everybody is trying to use all of the models and agents at the same time. 
  • For example, older members of the family might agree to let younger members have priority during early evening hours when younger members were trying to do their homework. 
Apple might obtain these self-imposed limits from households via a “Household Limits” app that it designed. The owner would download this app from the App Store. When an owner logged into the app with their Apple ID and password, the owner would designate how access to the inference models and agents should be limited. 

The app would then display easy-to-complete forms that listed the names of all the users on the household network. After the owner submitted the completed forms, iCloud would block a household member’s attempts to use a model or an agent that exceeded that user’s limits within a specified timeframe.

BackToTop

8b. Enterprise Data Centers
Apple’s sales and delivery of Mac Studios to buyers of enterprise centers would follow the same procedures as for home centers and pay the same fees to Apple as the home centers. But a few important differences must be noted:
  • Many Mac Studios.
    How many? That would depend on the anticipated number of users and the intensity of their use.

  • The tech skills of their tech support staff.
    More competent tech staff could mean less need for iCloud support and therefore lower fees the owners of the inference models would pay Apple for iCloud support.


  • Longer time for sales.
    Enterprise buyers would need more evidence from Apple of the cost-effectiveness of its local inference models compared to cloud-based models.
     

  • More complex Self-Imposed Usage Limits
    More complex limits would require more extensive iCloud manage management, so higher Apple charges to the owners of the models.

8c. Unlinked Apple devices 
It will now be useful to make an explicit distinction that has hitherto been implied. Macs, iPhones, and other Apple devices that have access to home data centers or to enterprise data centers will be called “linked”; devices that don’t have such access will be called “unlinked”. We will also call users of linked devices “linked users”, and will call users of unlinked devices “unlinked users
  • When Siri is upgraded, probably in early 2026, Siri will manage agents (downloaded from the App Store) that utilize foundation models in the cloud for all users because home and enterprise data centers will not yet exist. 

  • But when home data centers appear, Siri will perform tasks for linked users using agents that use the foundation models in their home data centers.

  • Similarly, when enterprise data centers appear, Siri will perform tasks for linked users via agents that use the foundation models the in their enterprise centers.

  • Finally, as nearby allied commercial data centers with sufficient capacity appear, iCloud will direct Siri to perform tasks for unlinked users who want to use agents to the foundation models in nearby commercial centers. There will be at least 100 million unlinked iPhone users in the U.S.
8d. Allied Commercial Data Centers
As noted in our brief description of inference servers in data centers in Section 5commercial centers will purchase scores of enhanced M6 Ultra motherboards that are usually housed in “Mac Pro” tower shaped computers. However, in commercial data centers the motherboards will be removed from their towers and stacked in racks by Apple, then sold to the owners of commercial data centers. 

Question: Why will commercial centers buy so much expensive Apple Silicon? 

Answer: Because the entrepreneurs who establish these centers will anticipate substantial profit margins from their status as the newest components of Apple's ecosystem

Here’s the essence of the value proposition that will bind Apple and its commercial data centers into mutually beneficial alliances:
  • Allied data centers will have no marketing expenses

    -- Apple will provide each center with high volume/low value users, i.e., hundreds of thousands of unlinked users of Apple devices.

    -- Apple will also negotiate contracts that will provide dedicated servers in nearby allied data centers for low volume/high value users, i.e., scores of
    larger enterprises.

  • Nevertheless Apple's allied data centers will earn payments from their users that will provide substantial returns on the data centers' investments, i.e. their purchase of Apple's inference servers.
Characteristics of Apple’s allied data centers 
Once again the reader is reminded that the editor of this blog is merely trying to show how a collection of data centers could be managed profitably for both the centers and for Apple. Apple will undoubtedly come up with a far more effective strategy.
  • Small group of partners in U.S. (and elsewhere)
    Let’s say we begin with the notion of one data center in each state, 50 centers. Too many partners. But if each partner eventually managed five data centers, 10 partners would be a manageable group.

    And one other modification. No ally can have more than two data centers in one state, but there is no upper limit on the size of its centers. Its other three centers must be in three different nearby states.vThe core idea here is a balance of concentration and geographical dispersion. Other numerical distributions could also provide this balance.

  • Sustainable energy 
    All of Apple’s iCloud facilities run on sustainable energy, so Apple would require the same commitment from its partners. Given the high energy-efficiency of its chips, no high powered cooling systems are required

    Empty warehouses near fiber optic infrastructure for high speed Internet connectivity would be ideal locations once they are equipped with solar panels on the roofs and battery storage to provide sustainable, round-the-clock power. 
    -- "Apple now globally powered by 100 percent renewable energy", Apple, 4/9/18
8d-1. Profitability = Payments received vs. capital investments for high volume/low value unlinked users
As argued in Appendix #1, the data centers will receive $3 per month payment from ChatGPT for running its inference model for the 100 million unlinked users = $300 million income per year. 

Question: How many M6 Ultra server boards allied data centers they have to buy to support Apple's unlinked users? 

Answer: In the short run, not many ... but the short run might be shorter than expected.

As noted in the previous Section 7, Siri is not a chatbot that is capable of acting as a research assistant to help users solve complex problems. Siri is a personal assistant. As such, it helps users deal with problems that are quickly calculated by computers because they are mathematically and logically trivial; but they are tedious, time consuming, and error prone for humans. So the amount of a server's time that will be required for an agentic app to receive a response from a server will usually be 2 or 3 minutes, at most. 

But more importantly, indeed, far more importantly is the profoundly different nature of the interactions between Siri and a user. When a user sends a prompt directly to ChatGPT or indirectly via an agentic app, the user stands by, waiting for the app's response. Waiting more than 15 seconds is concerning; but waiting 30 minutes is unacceptable ... except if the user has a GPT Pro account and is doing deep research. 

Not so for Siri acting as a manager of agents, because Siri is supposed to be able to act autonomously. Once Siri accepts a user's request to make a reservation, the user can walk away and get on with other issues on their day's agenda. 
  • If the agent requests additional information, Siri will provide it based on its knowledge of its users preferences. 

  • If Siri can't confirm the reservation until 30 minutes later -- because so many agents were trying to make reservations using the same inference server in the data center -- the 30 minute delay from request to response is not only acceptable, most of time it won't even be noticed by the user.

  • Meanwhile if the user generates another request for a reservation or some other task, Siri, the manager not the actual performer, will just pass the request to the appropriate agentic app, then wait for its response before returning the response to the user.
We could delve into the mire of deriving precise estimates of the low probability that hundreds of users' requests might hit the same inference server in a data center at the exact same time. Instead we merely reference a technique used by all data centers that will be overkill in the short run, but will become more and more essential as time goes on.
  • When an unlinked user's device is first linked to a data center, iCloud will assign the unlinked user to the center that is the closest to the user's device. This will be the device's default data center. 

  • From time to time iCloud will monitor the current workloads of all of the servers in all of the allied data centers.

  • When Siri assigns a task to an agentic app, iCloud will point the app to the server in the user's default data center that had the smallest workload in iCloud's last survey.

    -- But if the designated server reports that its workload has increased too much, iCloud will try to point the app to the another server in the default center.

    -- But if all of the servers in the default center inform iCloud that their current workloads are too high, iCloud will try to point the app to a server in the closest center owned by the same ally and restart the selection process in the second data center. 

    -- The main point of this process is that iCloud will try to find a server that will provide the fastest possible response to a user's request ... but the other owners of other data centers should not have to back up the owner of a default data center who did not buy enough inference servers to provide timely service during peak load times

  • However, if Apple mandates that each data center initially buy at least 10 M6 Ultras, iCloud's first choice of server in the user's default data center will usually be available... in the short run.

  • The cost of the initial 10 Ultras in the 50 U.s. data centers = 10 x 50 x $35,000 = $17,500,000. The income during the first year = 100,000,000 unlinked users x $3 per month from OpenAI = $300,000,000. This would yield a $300,000,000 / $17,500,000 =  17 to 1 return on their investments ... That's very high, too high ... way too high.
Next question: When will the "short run" end?

Answer: The end of the short run will begin when the first newly linked user says, "Hey, now that I have access to ChatGPT via a GPT Plus account, I'm curious to find out how much better my Plus account is than my old free account that I only used a few times."

In other words, the short run begins to end with initial user curiosity. Its end is hastened as more and more users' curiosity solidifies into habit as they learn firsthand that generative AI can do so much more than make reservations ... that generative AI chatbots can provide plausible answers to their questions about anything ... and the short run comes to a dead screeching halt if/when enough users’ habit of using chatbots frequently morphs into addiction wherein they interact with chatbots every day, many times a day for longer and longer periods

Long before the addictive phase is reached, iCloud needs to develop and thoroughly test effective algorithms for estimating how many M6 Ultras Apple's allied data centers really need in order to smoothly cope with surges in demand for the services of the chatbots running on the servers in their allied data centers.  Apple's legendary high standards will shield its users from the unexpected delays that might otherwise occur during peak surges . No matter what happens, things will continue to "just work".

So the editor conjectures that Apple will pay $3 per month to OpenAI for all of the formerly unlinked accounts that now have paid subscriber access to its servers ... but after the centers buy their first ten M6 Ultra servers, Apple will only pay its allied data centers $1 and will keep $2 to cover the costs of the extensive development and testing of its allocation algorithms.
  • The revised cost of the initial 10 Ultras in the 50 U.s. data centers = 10 x 50 x $35,000 = $17,500,00, as before. But the revised income during the first year = 100,000,000 unlinked users x $1 per month from OpenAI = $100,000,000. This would yield a $100,000,000 / $17,500,000 = almost 6 to 1 return on their investments ... That's still a very high return so it should still provide a strong inducements for entrepreneurs to build data centers that will become the newest components of Apple's ecosystem.

  • Apple might yield $2 of the $3 monthly payments from OpenAI to its allied centers after they install their second 10 M6 Ultra server boards; and pay the full $3 paid by OpenAI after the centers purchase their third set of 10 M6 Ultra motherboards (and thereafter if iCloud's algorithms have achieved the required level of accuracy in predicting the number of motherboards that are really required in each data center to make things "just work" no matter what happens.
8d-2. Larger enterprise customers = High profits from low volume/high value 
Assuming that an M6 ultra motherboard cost $30,000, Apple would probably stipulate that larger enterprise customers pay an annual “dedication fee”, e.g., $20,000, to the allied data center. Apple might also require a minimum of number dedicated inference servers for each customer, e.,g., at least 3.
  • At the end of the second year, the data center would recoup the $30,000 it paid to Apple for the server plus a $10,000 profit. As an added benefit, the data center would be able to reassign the “old” Ultra to the pool of inference servers it deploys in support of Siri’s autonomous management of agentic services for linked users.

  • The larger enterprise customer can also do this math, but would probably find it attractive nevertheless. A $20,000 annual dedication fee is better than a $30,000 purchase if it also comes with a stipulation that should Apple produce a more powerful chip before the end of the second year, the customer’s “old” Ultra will be replaced by a new one. So the marginally higher price provides a guarantee that the customer will always have access to Apple’s state-of-the-art technology.

  • Given the substantially lower costs of running inference models on Apple Silicon, Apple should face no significant obstacles to securing hundreds of larger enterprise customers for each of its 50 allied data centers.
Probable timeframe until millions of unlinked users are assigned by iCloud to each allied center … late 2027.

Probable timeframe until Apple negotiates contracts with one hundred larger enterprise customers for each allied center … mid 2029.


9. Competitors' responses to Apple's competitive challenge

a) Cloud-based data centers vs. Apple's local (edge) data centers
Once Amazon, Google, Microsoft, Oracle, and other operators of cloud-based data centers are convinced of the likelihood of Apple’s success, they will develop their own energy-efficient chips as quickly as possible for inference models in an effort to reduce the flow of inference to edge servers running on Apple Silicon. This competitive defense will reduce the disastrous side effects of cloud-based data centers 
  • The Verge recently reported that Nvidia is investing $5 billion in Intel to jointly develop PC and data center chips. Nvidia knows what Apple is up to. So this is probably Nvidia's initial response, investing billions in Intel to develop more energy-efficient chips.
b) Developers of foundation models vs. each other
We further anticipate that the major developers of foundation models —  OpenAI, Anthropic, xAI, Microsoft, Google, and Meta — will develop their own cloud-based data centers that deploy more powerful new energy efficient chips for training (or shift their training to cloud-based data centers that deploy these chips) in a fierce competition with each other, a competition that will reduce their demand for electric power for training their models

c) U.S. vs China
Here's another closely related recent story, "China bans tech companies from buying Nvidia’s AI chips", Zijing Wu, Cheng Leng,Tim Bradshaw, Financial Times, 9/17/25 -- This story also covered by Ars Technica

China is about the same size as the U.S., but it has a population that is four times as large. 
Therefore the relentless growth of today's cloud-based data centers, centers that deploy Nvidia's energy-inefficient chips, will have a disastrous impact on their environment and on climate change that will be four times as large as whatever will be experienced by the U.S. 

China is Apple's second biggest market, so the Chinese understand Apple’s strategies. They know, far better than the editor of this blog, that Apple is just a few years away from announcing a game changing breakthrough. So the Chinese have decided to develop their own energy-efficient chips for training and inference; but they might follow a different strategy than Apple’s.


Appendix #2 presents the editor's rationale for anticipating that current competitive efforts will not be resolved by national governments picking winners and losers, choosing favorites based on flattery and backstabbing. 


It’s far more likely that the brutal but constructive clashes of profit-driven competitors in free markets will anoint winners in multiple market segments, but not because any of them has achieved AGI (artificial general intelligence). 


Indeed, the continued absence of an underlying science to guide the efforts of the world’s most brilliant and best funded software engineers makes it highly unlikely that AGI will be achieved. The winners will win because markets will determine that their software has somehow become substantially more cost-effective than the software of competitors in their segments. And Apple will stand in all of the winners’ circles because its ultra energy-efficient hardware will facilitate all of the winning software.


BackToTop

... Appendix #1 ...
Components of Apple Silicon

1. Basics

Unified Memory
Each Apple chip contains a CPU, a GPU, an NPU (neural processing unit), and a single memory that is shared by all three processing units, i.e., a "unified memory". Each processing unit has super high speed links to the shared memory. Apple has released a new family of chips each year since 2021, i.e., an M1, M2, M3, and M4. An M5 chip will be coming in late 2025 or early 2026

Configurations
The CPU in each chip is actually a set of core chips, likewise for the GPU. This enables Apple to subdivide each family into configurations, specifically, a Base configuration, Pro, Max, and Ultra. The most powerful Basic is less powerful than the most powerful Pro, which is less powerful than the most powerful Max. The Ultra is a special configuration that joins the two most powerful Max chipsets into one powerful unit by a process called "UltraFusion" ... Basic < Pro <  Max < Ultra.
  • Combining the two most powerful M1 Max chipsets, each with 64GB of memory, yielded the M1 Ultra with 128GB2 times as much as each M1 Max

  • Similarly, combining the two most powerful M2 Max chipsets, each with 96GB of memory, yielded the M2 Ultra, with 192GB2 times as much as each M2 Max.

  • But when Apple produced the M3 Ultra, it made an important breakthrough.The most powerful M3 Max had 128GB memory, but the memory in the M3 Ultra was 512GBfour times the memory in each M3 Max.

  • However, the M4 Max only had 128GB memory, the same as the M3 Max; and Apple has yet to release an M4 Ultra.

  • Apple has indicated that its M5 chipsets will be released in late 2025 or early 2026 ... but it has given no indication that it will increase the memory in the M5 Max beyond 128GB.

2. Delayed increase in size of Ultra chips
The editor of this blog suggests that the forthcoming inference servers that Apple will sell to its customers will contain the chipset now code-named Baltra, but will then be called M6. Its Max configuration will have 640GB and its Ultra configuration will have 2.5TB (2.5 terabyte = 4 X 640GB) at first in home and enterprise data centers, but thereafter move 640 Max yielding 2.5TB Ultras into allied commercial data centers running racks of Ultras.

Why has Apple delayed increasing memory on the M4 Max and possibly on the forthcoming M5 Max? Apple has usually been reluctant to release new technologies until it perceived a substantial potential market for them. An M4 Ultra with 1TB would not have faced a substantially greater market than the M3 Ultra with 512GB. 

But when Apple introduces new technologies, it prefers to do so when its technologies are undeniably superior to all available alternatives. 

There are somewhere between 130 to 150 million iPhone users in the U.S., but only 10 to 15 million Mac and iPad users. The biggest expectation of its iPhone users is a satisfactory "upgrade of Siri". Section 8 of this note will argue that Siri needs a powerful small language model that will fit on iPhones far more than it needs more memory. 

On the other hand Apple has often introduced new features into the architecture of its A-series iPhone chipset cores that it subsequently incorporates into the cores of a forthcoming M-series of chips for its Macs and iPads ... with appropriate modifications. 


3. Shrinking the size of transistors
The editor informed Gemini that he had learned that TSMC had previously manufactured chips for Apple that were "5 nanometers", but Apple had recently negotiated a contract with TSMC for the purchase of an enormous quantity of chips that would be "3 nanometers." Gemini informed him that Apple had financed TMSC’s development of the “3 nanometer” cores. What did this really mean?
 
... Gemini's response ...
  • "Apple has a massive contract with TSMC to use their most advanced manufacturing processes. The technology you're thinking of is TSMC's family of 3-nanometer (3nm) chips.


    A "nanometer" refers to the size of the transistors—the tiny on-off switches that are the building blocks of a chip. Moving from a 5nm process to a 3nm process means you can pack billions more transistors into the same space.

    Here’s what that enables:

    1. Denser Connections: By making everything smaller, the metal lines (interconnects) that connect the processors, GPU, and unified memory are packed much closer together. This creates a shorter, wider "highway" for data to travel.

    2. Higher Bitrate: Because that highway is shorter and has more lanes, data can travel between the different parts of the chip and the unified memory much faster and with less energy. This increased speed is what we call higher bandwidth or bitrate. It's crucial for tasks that move huge amounts of data, like AI and high-resolution video editing.

    3. Improved Efficiency: Shorter travel distances for electrical signals also mean less power is wasted as heat. This is why M3 and M4 chips can deliver incredible performance while maintaining excellent battery life."

The editor then fed Gemini's remarks into Claude's prompt box. Here are reformatted excerpts from Claude’s most important comments.


... Claude’s comments ...
  • "Volume advantage for learning:
    -- iPhone market: 100+ million users in the US alone (massive volume)
    -- Mac market: 10-15 million units (much smaller volume)

    Apple’s smart staged approach:
    -- M5 [cores] get the 3nm process first - work out manufacturing kinks on the high-volume iPhone production ... iPhone production = learning laboratory - TSMC and Apple perfect the 3nm manufacturing process on hundreds of millions of chips


    -- M6 benefits from mature process - by the time M6 launches, the 3nm process is proven and optimized

    Why this makes business sense:

    -- Risk mitigation: Test new process on high-volume, lower-cost chips first

    -- Cost optimization: Spread 3nm development costs across massive iPhone volumes

    -- Quality assurance: Iron out any 3nm manufacturing issues before moving to expensive Mac chips

    -- Confident scaling: When M6 launches, TSMC can confidently handle the more complex memory configurations”

Gemini then informed the editor that Apple was now financing TSMC’s development of 2nm cores, which would make an M6 Ultra with at least 2TB even more achievable.


4.  Estimated costs of inference 

This section provides a method for devising a rough estimate of the relative cost of running inference servers in an allied Apple data center compared to the cost of winning them in an AI cloud based data center. 


It makes some assumptions based on the on the editors reading of a variety of tech media. However, it alsovassumes that Apple will use far more accurate numbers if they apply a method such as suggested in this section.


a) R1 on M3 Ultra

In the second section of the note that proceeds this appendix, called the reader’s attention to a breakthrough in the use of Apple Silicon in December 2024.  A Deep Seek R1 model was run on a Mac Studio containing a M3 Ultra with 512 GB of unified memory. Two caveats are needed now.

  1. The demonstration did not run a full version of the R1 model. A smaller, four bit quantized version of the model was deployed. Four 4 bit quantization produces a model that is about 95% as accurate as the original. Apple will not be competitive if it only runs quantized versions of inference models. Apple will have to run full models.

  2. The R1 model had an unusual structure. It used the so-called mixture of experts architecture. This meant that the entire 671 million byte model was never in unified memory. Only parts of it were in memory, which is why the model fit into the 512 GB of unified memory. Most models do not employ this architecture. So once again the M6 Ultra will have to have enough unified memory to run the largest prominent models.

    Today’s models seem to be larger than one terabyte (1TB). But by the end of 2027 we should expect that many models will be closer to 2 TB. We also need to allow for the use of RAG techniques that would add more information into unified memory. So it would be better if the M6 ultra contained at least 2.5TB of unified memory.

The rest of this discussion will use OpenAI models supporting ChatGPT as an example. Similar estimates could be made for other models.


b) Subscription fees in the cloud

The editor has read a number of tech articles that suggest that the GPT Plus $20 a month subscription fee has two components. The larger component, Microsoft’s charge for running the model on its Azure cloud, is about $15. The remaining $5 dollars is what openAI collects for the use of its models, its intellectual property.

  • Accordingly, Apple will suggest that the much smaller electrical demands of its allied centers deploying Apple Silicon will enable them to charge a much smaller cost to OpenAI for running its models.

  • But OpenAI subscribers should at least pay the same $5 royalty to OpenAI for the use of its intellectual property.

c) Apple Silicon vs. chips in the cloud

The electric bill is the main component of the operating cost of a cloud data center. So how does the electricity required by ordinary GPU chips in the cloud compare to the electricity required by Apple Silicon? Here is an article that reported an extensive examination of the performance of the M3 Ultra chip

  • "Mac Studio M3 Ultra Tested: Ultimate Power, But for Who?", Otabek Djuraev, Hostbar, 3/28/2x

Scrolling far into the article, the reader will find the author’s assessment of the extraordinary energy efficiency of the ultra chip. 


“One particularly impressive aspect of the M3 Ultra's performance is its energy efficiency. Electricity monitor measuring power usage between 160 to 180 W. Power consumption at the wall when running the Deep Seek R1 model was well under 200 watts. Comparable performance on a PC would require multiple high-end GPUs consuming perhaps 10 times more electricity.”


In other words, the M3 Ultra Mac Studio running the Deep Seek R1 inference model consumed well under 200 watts, while comparable performance on a PC would require multiple high-end GPUs consuming perhaps 10 times more electricity - representing approximately 90% power savings.


d) Linear relationships rarely scale, especially the good ones

It’s delightful to discover that the M3 Ultra with 512GB unified memory only required 10% of the power that a cloud chip would require for a small inference model. But that does not necessarily mean that the M6 Ultra would only require 10% of the power to run a 5 X 512GB = 2.5TB inference model. In the worst case, it might require 5 times as much power. 


But that seems way too pessimistic, especially given the fact the M3 Ultra contained transistors that were 5 nanometers. Since then TSMC, with Apple funding, has decreased the size of its transistors down to 3nm, and is now further reducing the size to 2nm, again with Apple funding. 


Given that 2nm is 40 percent of 5nm, this suggests how we should “correct” the pessimistic estimate. The pessimistic estimate = 5 X 10% of $15 = $7.50. Therefore we should use 40% of the pessimistic estimate = 40% of $7.50 = $3 per user per month. In other words, OpenAI should pay $3 per month to an allied data center for the same reason that it pays $15 per month to Microsoft’s Azure cloud.


Now take a step back. In the Azure cloud, OpenAI earned $5 per month from each subscription. At this point in our analysis, OpenAI receives $5 from subscribers but must pay $3 to an allied Apple data center. So it only earns $2 per month. Accordingly, Apple subscribers should pay at least $8 per month so that OpenAI nets $5 per month. Better still, if Apple’s users paid $10 per month, half the $20 cloud subscription rate, OpenAI would earn $7 per month, more than its cloud income. This would give Apple’s subscribers a strategic preferred customer status.


Let's return to the monthly $3 charge collected by an alllied data center. It's reasonable to apply this same logic to home and enterprise data centers. The tech support required to keep these centers running is supplied by iCloud, so Apple should collect this $3 monthly fee from OpenAI, and the homeowner and proprietors should pay $10 monthly subscription fees for each user to OpenAI for use of GPT 5 under GPT Plus subscription.


Question: GPT Pro subscribers who pay $200 per month in the cloud. What should they pay on Apple Silicon? 


Answer: Just the regular $10 Silicon subscription fee in most cases. Let's break this down. The reader should keep in mind that the Pro subscription affords more intensive use of some OpenAI models and features than a Plus subscription. But this advantage fades away in each type of access to inference models identified in this discussion, except one.

  • Home … No special access is necessary because members of the household can work out voluntary restrictions among themselves on the use of the server so that everyone gains as much access as possible

  • Enterprise … One or more servers will be dedicated to the relatively small group of users who need to make intense use of OpenAI's models

  • Large enterprise … Again, one or more servers in the allied center will be dedicated to the relatively small group of users who need to make intense use of OpenAI's models

  • Unlinked users assigned to allied data centers ... Their subscription fees should be calculated by the same process that was used to calculate the equivalent GPT Plus subscription fees. 
    -- Azure charges $200 - $5 = $195 per month 
    -- Recall that M3 Ultra chips that were 6-nm only required 10 percent power of cloud chips, 10% x $195
    -- Worst case: 5 M6 Ultra chips might be required 
        = 5 X (10% x $195) = 5 X $19.50 = $97.50
    -- But M6 Ultra's 2-nm chips will only require 40% power of M3 Ultra chips
        = 40% x $97.50 = $39 per month
    -- As in the Azure cloud, these users do not have unlimited access; from time to time, their access might be reduced by the managers of their assigned data center.

... Appendix #2 ... 
Large language models as software engineering unguided by science 

Modern engineering has achieved its greatest success within frameworks provided by one or more underlying sciences. 


  • Ancient Roman engineers turned cement into self-healing concrete which allowed them to build structures, e.g., aqueducts, that lasted for hundreds, nay thousands of years. They built arches with their concrete, then rotated their arches into magnificent domes that still enchant us with their beauty ... all without the benefits of anything remotely resembling modern materials science. But without an underlying guiding science, their "best practices" and "rules of thumb" only produced a handful of such history making innovations.

  • By contrast, ever since Renaissance giants like Galileo triggered the development of modern science as a systematic data-driven discovery of the underlying order in real world phenomena, engineers have produced generation after generation of astounding innovations.

    All engineers, ancient and modern, know how to make things work. Indeed, engineers are still assessed by their "know how". But the glory of modern engineering derives from its guidance by underlying sciences that alert engineers as to why some innovations are more likely to succeed and why others are more likely to fail. This knowledge greatly reduces the time modern engineers waste on vague hunches, gut feelings, trial and error, etc, etc, etc.

    Without modern physics there would be no cars, trucks, airplanes, radio, TV, transistors, computers, etc, etc, etc. Without modern chemistry/biochemistry we could never grow enough food to feed our large urban populations, cool them in summer, warm them in winter, clothe them, cure their diseases, etc, etc, etc.

    Unfortunately, modern science does not provide all of the answers to the questions modern engineers must address when designing their innovations. So modern engineers must still fall back on "best practices" and "rules of  thumb" from time to time, but only from time to time.

Now comes the ongoing, unbelievably expensive development of large language models and we are suddenly back in Ancient Rome. Without guidance from underlying science, once again it's all about "rules of thumb", and "best practices", or "maybe your best practices are better than my best practices so maybe I should copy some of your best practices" or "better still, maybe I should just steal some your best engineers." 


Empirical Studies

As most readers of this blog probably know, the biggest breakthroughs in generative AI, indeed, the foundations of generative AI, have been a few empirical observations that have not been explained by cognitive science nor computer science nor any other science.

  • Attention Is All You Need.”
    In this widely read paper, published in 2017, Google employees reported 
    their invention of "Transformers" (the "T" in OpenAI's "GPT" large language models), an architecture that would power every major AI breakthrough since 2017. It showed how to build AI models that could understand and generate human language with unprecedented sophistication. 
Actually, this report was about productivity. The new approach enabled developers to produce models that understood and generated language better than models using previous approaches with less effort because of greater computational efficiency. Hooray for the developers who now had a productivity booster ... but for the rest of us, it was just a mildly interesting "So What? Why should we care?"

We should care because of what happened next. Using the more productive transformer architecture, developers could afford to build larger models, which they did. At first, the larger models were surprising, but the larger the models became, the more surprising they became. Why? Because larger models suddenly, rather than gradually, displayed new cognitive skills not possessed by their smaller predecessors, e.g., the ability to summarize text ... spontaneously. These unplanned new capacities were called "emergent abilities". This phenomenon was first reported by OpenAI.
  • "Language Models are Few-Shot Learners", Tom B. Brown ... et al. ... Ilya Sutskever, Dario Amodei, arXiv,  7/22/20
    -- Written by OpenAI staff, this report noted that GPT-3 showed "strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic ... Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general."

  • "Scaling Laws for Neural Language Models“, Jared Kaplan... et al. ... Dario Amodei, arXiv, 1/23/20
    — This publication was the next big empirical breakthrough. The scaling laws estimated how much model size, computation, and data would be needed to enable subsequent emergent abilities. It was widely interpreted as predicting the pathway to artificial general intelligence (AGI) … but did it … really?
     
  • "Sparks of Artificial General Intelligence: Early experiments with GPT-4", Sébastien Bubeck et al., arXiv, 4/13/23
    -- Written by Microsoft’s staff, this paper boldly claimed that "beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance” … a striking example of the "irrational exuberance" inspired by the “scaling laws”.
The attention and scaling breakthroughs were empirical observations; they were unexplained. No appeal to an underlying science was asserted. So they were "true" .. until they weren't.

These empirical breakthroughs were reminiscent of "Moore's Law", first published in 1965.
  • "Cramming More Components onto Integrated Circuits", Gordon E. Moore, Electronics, 4/19/65 
    -- Moore's seminal paper was reprinted by the IEEE in 1998.
In his paper, Moore, a highly successful, cost-conscious engineer/entrepreneur, observed that:
  • "The complexity for minimum component costs has increased at a rate of roughly a factor of two per year (see graph). Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least ten years"
In other words, Moore expressed confidence that for the next ten years, the number of components per wafer would double each year. It was a cautious short term prediction that actually held true for five decades .. until it didn't ... around 2014 when it took Intel five years to move from 14-nanometer technology (2014) to 10-nanometer technology (2019). 

Big Tech's Game of Winner-Take-All AGI Chicken
By contrast to Moore's paper, the Scaling Laws report made no mention of costs. However the report did assert that developers were more likely to facilitate emergent abilities if they made the largest possible increases in the size of their next models. Furthermore, the Scaling Laws provided no estimates about how large a model would have to become in order to achieve Artificial General Intelligence (AGI). 

By the time ChatGPT running on GPT 3.5 was released in November 2022, Microsoft had committed to investing more than $13 billion in OpenAI. Whereas most Big Tech had feared that Google would dominate this critical activity, it suddenly looked like Microsoft had the inside track and a substantial lead ... but not if other Big Tech made even larger investments in AI before Microsoft increased its investments. 
  • Google declared a "Code Red", immediately invested ever increasing billions in generative AI during 2023, and promoted Demis Hassabis from his position at the top of Google's Deep Mind subsidiary to the top of all of Google's AI operations. Each year Google increased its capital investments by additional billions, mostly in cloud-based data centers

  • Amazon invested $4 billion in Anthropic, the startup founded by some of the authors of the Scaling Laws. Amazon subsequently increased its investment to $8 billion. Google also invested $3 billion in Anthropic

  • Meta also invested ever increasing billions, mostly in its own data centers

  • Currently Google, Amazon and Microsoft have announced plans to invest at least $50 billion annually in data centers, with Meta close behind at about $40 billion annually.
The Preface to this note referred to the "Big Tech generative AI race". This must now be seen as a misleading metaphor. What's actually happening is more like a colossal, increasingly expensive game of "AGI Chicken" ... Anyone who blinks, anyone who swerves to avoid disaster, anyone who drops out because the rising costs of continued participation defies all logic and wisdom will be a loser ... but the longer each player continues to play, the bigger becomes their potential loss. And worst of all, it is impossible for the winner to reach the AGI goal by only using transformer architecture.
  • Sam Altman recently admitted that GPT-5 was not AGI because it can't learn from its own experience.
    -- "Here's why Sam Altman says OpenAI's GPT-5 falls short of AGI", Brent D. Griffiths and Lakshmi Varanasi, Business Insider, 8/7/25

    Indeed, all models based on Transformer architecture cannot learn, i.e., they cannot add to their permanent repository of knowledge. Everything they learn occurs in their pre-training. Only guardrails governing their behavior are added after pre-training.

    Some models can search the Internet to acquire up-to-date additional information, but that new information is not retained permanently. Only the information acquired during a model's training is permanent. Indeed, they can't even transfer the new information they acquired when interacting with user A to a subsequent interaction with user B; they have to conduct the same search all over again. By definition, learning requires that new information be retained permanently. 

    In short, learning can never become an "emergent ability", no matter how large a model that is based solely on Transformer architecture becomes. 

  • And then there is the Transformer architecture's peculiar emphasis on the size of a model. Current models were inspired by the networks of neurons in the brains in living creatures. But we have to ask ourselves, are smaller brains always associated with less intelligent behavior?

    Consider the crow. Its brain is much smaller than a walnut. Nevertheless, crows use tools, conduct experiments, and learn from their experiments. Indeed, crows have been observed tossing small hard shell nuts onto roadways from a safe distance; waiting until cars run over the nuts; waiting until traffic subsided; then swooping down to gobble up the tasty morsels that were inside the cracked hard shells.
    --"Bird Brains", Gareth Huw Davies, PBS 

    Crows learn. Models based solely on Transformer architecture cannot learn. We obviously need a better architecture for our models, an architecture more akin to the neural networks in the tiny brains of crows. Meanwhile, the disastrous game of AGI Chicken carries on.
But there may still be a winner of this incredibly wasteful contest. Clever engineers will devise all kinds of ad hoc fudges and kludges -- like mixture-of-experts, chain-of-thought, gigantic prompts, retrieval augmented generation (RAG) -- but with no underlying scientific theories to estimate their long term effectiveness. 

Actually, the winner does not have to produce AGI. The winner only needs to produce a model that the market determines to be sufficiently effective and substantially more cost-effective than other competing models. Winners can succeed by whatever ad hoc methods seem to confer decisive competitive advantages.

Indeed, there may be more than one winner of this contest. The market already seems to be dividing into specialized segments, segments in which generative AI models are being used to replace human employees as just another phase in the long term automation of white collar work processes. The models don’t have to be creative and they certainly don’t have to know everything. They just have to understand important components of a standardized work process.

Were it not for the devastating side effects of today’s cloud-based data centers on their surrounding regions and on global warming, we might mumble and grumble our way through this latest round of automation. Until recently, we tended to rely on regulations to reduce the impact of negative externalities. But the current administration's unshakable propensity for reducing regulations is more likely to exacerbate their impact in this case.

Therefore it is exhilarating to contemplate the possibility that within the next few years that Apple’s ultra energy-efficient chips will pressure our competitive markets to severely punish the builders of cloud-based data centers that generate negative externalities because their chips are not energy-efficient and are, therefore, not cost-effective. Adam Smith would be so proud of us … 😎


No comments:

Post a Comment

Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.