Last update: Thursday 4/23/26 1:44am
This is the first in a series of notes that will suggest effective strategies for using chatbots as tutors when you want to learn a new subject or as research assistants when you want to create your own solutions to problems.
Like our Useful AI News page, these notes will be written for the benefit of computer savvy users who use chatbots to interact with large language models (LLMs) rather than write code to interact with the models via their APIs (application programming interfaces).
Introduction
Seven years ago, the Defense Advanced Resesarch Projects Agency (DARPA) posted a YouTube video that dismissed LLMs as having no cognitive abilities whatsover. As per its "about" page, DARPA's was "Created in response to the launch of Sputnik in 1957, DARPA stands as our nation’s commitment to never again face a strategic technical surprise."
(a) DARPA VIDEO
Unfortunately, DARPA‘s harsh judgment was premature. In the last few years chatbots (via LLMs) have developed two dependable cognitive abilities:
- Logical/deductive reasoning
- Reliable summaries of text
Indeed, DARPA itself has made substantial investments in the development of powerful apps that are based on these two capabilities. (Check out the DARPA section of our Useful AI News page)
(b) Mistakes
chatbots can make mistakes. Some people used to call these mistakes “hallucinations”. This made their errors mysterious, unpredictable, unpreventable. This first installment in our series will identify specific types of errors that chatbots can make, and it will suggest a few simple techniques to minimize their occurrence and/or recover from such errors.(c) Three chatbots — Gemini, ChatGPT, and Claude
All installments will focus on the three most widely used chatbots
-- Claude (Anthropic), Gemini (Google), and ChatGPT (OpenAI)
The editor will endeavor to support his findings about these chatbots with (1) links to information about them provided by credible sources, (2) transcripts of his own interactions with them, and (3) examples of images they created for him.
-- Claude (Anthropic), Gemini (Google), and ChatGPT (OpenAI)
The editor will endeavor to support his findings about these chatbots with (1) links to information about them provided by credible sources, (2) transcripts of his own interactions with them, and (3) examples of images they created for him.
Query Fundamentals
- Each chatbot can access at least three underlying models. Gemini and ChatGPT select the models based on the complexity of the users inquiry. Only Claude allows the user to select the model: Haiku, Sonnet, and Opus, names that suggest the relative power of the models
- The permanent information stored in a chatbot’s underlying models is updated once each year.
- Users can ask the chatbots to search the Internet for more recent information. This new information is not stored in the chatbot’s underlying models. If another user asks for the same recent information, the model will have to search the Internet again.
- What is a token?
A token is the basic unit of text that a language model reads and processes — roughly three quarters of a word on average, so a hundred tokens is about seventy five words. Larger prompt windows can accept prompts that contain more words.
Tokens also measure the amount of text that the model can hold in its working memory at one time. This includes everything you say in your prompt, everything the model replies, and enough remaining capacity to generate its next response.
Table of Basic Features
| Feature | Gemini | ChatGPT | Claude |
|---|---|---|---|
| Search Integration | "Google Search", the most powerful search engine | "ChatGPT Search" is built into ChatGPT. | "Brave Search" is the weakest search engine |
| Image Generation | Built-in image generator. | Built-in image generator. | Does not generate images directly; provides code for external drawing programs. |
| Context Window | Massive prompt window (up to 2 million tokens), allowing for massive document analysis. | 200,000 to 300,000 tokens. | 200,000 tokens, but compensates with recall of previous conversations if a user suggests key words in that conversation. |
(d) Inductive reasoning
Unfortunately, DARPA’s negative judgment that models had no capacity for inductive reasoning, i.e., for creating new knowledge, is still true. The editor's reasons for agreeing with DARPA's continued pessimism were presented in a previous note on this blog:
(e) Limits on human cognitive capacities
Our second installment will identify some human cognitive limitations that human users of chatbots should always keep in mind. The image for this series at the top of the page is a visualization of the most important limit, i.e., human inability to process more than four factors of any kind at the same time. Unfortunately, Big Tech developers have shown little or no awareness of these limitations in their designs of their chatbots’ user interfaces.
(f) Why write these notes?
(f) Why write these notes?
The Appendix to this first installment describes the editor’s growing concerns that large language models will cause massive social disruptions in the next five years that will not be offset by the utopian longer term gains promised by their Big Tech, profit-driven advocates.
(g) Ad hoc updates
Chatbots and their underlying models are updated/improved from time to time, so the editor of this blog will monitor all of the findings and recommendations in this note from time to time to keep the notes in synch with the chatbots and their models.
(g) Ad hoc updates
Chatbots and their underlying models are updated/improved from time to time, so the editor of this blog will monitor all of the findings and recommendations in this note from time to time to keep the notes in synch with the chatbots and their models.
1) Simple Mistakes
(a) Recent events
If a user’s prompt asks for information about a recent event, a chatbot might reference its permanent information instead of information from an Internet search and return an incorrect response. This error can be avoided if the user’s prompt explicitly states that the event occurred “last month” or “very recently”. Unfortunately if the user is unaware of the event’s time frame, the user will not recognize the error in the chatbot’s response.
(b) Snap judgements
Chatbots often make snap judgements about the meaning of a prompt, not just about its time frame. So once again its response might be incorrect and once again the burden is on the user to detect this error. Otherwise the user might think the chatbot’s response was correct. There are no exceptions to the GIGO rule: Garbage In, Garbage Out.
(c) Miscommunication
Fortunately, simple errors can be minimized by a strategy that comes naturally for complex prompts: Active Listening.
If a user wants to be sure that the chatbot understands what the prompt really means, the user should end the prompt with a request that the chatbot rephrase the prompt in its own words, then let the user provide approval or clarification of the chatbot’s interpretation before the chatbot looks for the requested information.
But the GIGO rule still places ultimate responsibility on the user. Asking the chatbot to engage in active listening does not guarantee that the user will recognize a chatbot’s misinterpretation of the user’s request.
2) Behavior
The previous section noted that the permanent information in large language models (LLMs) is usually updated annually; but that's only the first part of the update process. The updated models are not released until a second process is completed.
The second process, called Reinforcement Learning with Human Feedback (RLHF), shapes the way the model behaves when interacting with its human users. This shaping includes the so-called "guardrails" that are supposed to prevent the model from producing harmful responses. It also includes use of language in ways that embody personality traits, e.g., good manners, humor, confidence, unlimited patience, and helpfulness.
- All models seem to be especially eager to help their human users express themselves more effectively. So they will not only provide answers to their user's questions, they will even put their responses in quotation marks to encourage their users to copy and paste their responses verbatim into whatever documents the users are preparing.
The relative strengths of these and other traits will vary from model to model. For example, some models are witty, others just stick to the facts. The most important thing for users to remember is that the relative strengths of a model's personality traits are as permanent as their permanent information.
A model's information can be updated temporarily with an Internet search, but this information will not be available in the next chat session. (Note Claude has an unusual ability to find previous session results for each user if given key words in that session, as per our Table above).
So too a model can be asked to tone down an attitude in one session, but that attitude will reappear in the next session. Its permanence cannot be overcome, even if the model "sincerely" apologizes and says it won't do this again. The sincerity of its apology is a permanent trait; but its failure to keep its promise is equally permanent.
So too a model can be asked to tone down an attitude in one session, but that attitude will reappear in the next session. Its permanence cannot be overcome, even if the model "sincerely" apologizes and says it won't do this again. The sincerity of its apology is a permanent trait; but its failure to keep its promise is equally permanent.
Bottom line: You may prefer one model more than others, not just for the accuracy of its responses, but also for the way it presents its responses. What's good behavior for one user might be misbehavior for another. Indeed, some users might subscribe to two chatbots because they strongly prefer one for learning, but have an equally strong preference for another for research ... 😎
Here's an important question the editor posed to each chatbot on 4/21/26 @ 7:15pm
- Please respond to the following question in less than 100 words:
"Do we really understand how large language models work?"
ChatGPT
- "Not fully. We understand the architecture—like transformer model architecture and training processes—but not the exact internal reasoning behind specific outputs. LLMs learn statistical patterns across vast data, not explicit rules, which makes their behavior partly opaque. Researchers can trace influences and test behavior, but predicting precisely why a model gives a particular response remains limited. In short: we understand the mechanics well, but the emergent behavior only partially."
Gemini
- "While developers understand the mathematical architecture and training objectives of LLMs, the internal "reasoning" process remains a "black box." We can predict statistical outputs but cannot precisely map how billions of parameters combine to form specific insights or behaviors. This phenomenon, known as interpretability, is a major field of study. We understand the how of the construction (the algorithms) but lack a complete understanding of the why behind emergent capabilities and specific decision-making paths within the neural networks.
Claude
- "No, not fully. We understand the architecture — transformers, attention mechanisms, matrix operations — but we don't understand why they work as well as they do. We can't reliably predict what capabilities will emerge as models scale. We don't know precisely how concepts are represented inside the weights, or why a particular prompt produces a particular response. The people who build these systems are often surprised by what they produce. This gap between engineering knowledge and scientific understanding is one of the most important and underappreciated facts about modern AI. We built something that works. We don't fully understand what we built."
The characteristic discussed in the following paragraphs is in quotes because the editor suggests that the usual negative connotations might be disputed by some users because of the users' purposes and preferences.
"Sycophants"
In recent months, there's been a lot of negative buzz in the tech media about chatbots being sycophants who lavish excessive praise on their users. But these critiques, including a few previous nasty jabs by the editor of this blog, only apply to some of the roles that chatbots can play. For example, users who cherish their chatbots as virtual lovers might relish excessive praise as digital foreplay.
This note only considers two roles: tutor and research assistant. Some users, like reporters for tech media or editors of tech blogs, who want their chatbots to act as research assistants might reject a chatbot’s praise as impediments to their overcoming personal bias in their assessments of tech issues.
Nevertheless, there is a time honored position for “excessive praise” in teaching that goes all the way back to everyone’s early childhood when our parents were teaching us how to walk across the living room floor. By the time we reached the couch, a listener in the next room might have gotten the impression that we had somehow managed to hop around the room on one hand, given all the excessive praise the listener had heard us receive from our proud happy parents.
One might think it would be easy to ask a chatbot a question in a tutorial session when there’s nobody around to observe one’s ignorance. Nevertheless the chatbot’s confident presentation suggests that everything it just said should be fairly obvious … Except when it isn’t …
- … Except when you don’t understand what the chatbot just said. Even in the privacy of a tutorial, you might be reluctant to ask what might be a stupid question, a question whose answer might really be obvious. So you hesitate and hesitate until you finally ask, and to your immense relief the chatbot says “That’s a GREAT question” … which your mind interprets as meaning: it’s great that you asked that question. With a bit of luck you understand the answer and you toddle…
... but if you don’t understand the chatbot's answer, you have to admit to the chatbot that you still don't understand. The chatbot patiently responds, “That’s OK, let me put it another way” and it does and you do understand, and the chatbot gushes, “Fan-TAS-tic!!!” and now you toddle.
Of course, the bad connotation for "sycophants" came from users who were not looking for a tutor; they were just looking for good answers to their questions and the opportunity to ask followup questions if necessary. They got the answers and the follow-ups, but they also were doused with lots of gushy praise ... 😱
Appendix — GenAI’s near-term dystopian disruptions
Ten years ago the editor of this blog retired from a successful career that combined teaching, research, software engineering, data science, and policy analysis. When ChatGPT suddenly focused the tech world’s attention on generative AI in November 2022, he learned how to use chatbots effectively the old-fashioned way.
- He read everything he could about chatbots by subscribing to and reading all of the major tech publications every day.
- He purchased subscriptions to all of the major chatbots; and he used all of the major chatbots intensively every day.
- He also changed the focus of his Neoskeptics news blog to genAI.
He launched his blog before he retired because his personal experience confirmed the assertion that Sir Francis Bacon (1561–1626) made almost 500 years ago: “Writing maketh an exact man.”. You don’t know what you’re really thinking about a subject until you try to write it down.
His blog became his personal vehicle for clarifying his own understanding of complex new subjects; but the teacher within him, demanded that he share what he knew with other people, even if they weren’t his students.
Recently his "inner teacher" has become alarmed by genAI's highly disruptive near-term developments:
- Generative Artificial Intelligence, genAI, is simultaneously the most powerful new technology and the fastest growing technology in human history. Whereas prior technologies took decades to have substantial impact on the lives of most people, genAI’s large language models will have substantial impact on most people’s lives within the next five years.
- GenAI software will replace white collar workers much faster than than these workers can find new jobs, i.e., it will impose widespread white collar unemployment. Simultaneously, robots enhanced by genAI management software will replace blue collar workers much faster than those workers can find new jobs, i.e., it will impose widespread blue collar unemployment.
- A few days ago, the editor was writing a short response to a young relative's email that expressed sorrow over a death in the family of a friend. The editor wrote a few lines, then paused for less than a minute to consider his closing line.
Suddenly a pop-up window from Gmail took over most of the screen. It offered to help him write his response. The editor was outraged by this intrusion, but was aghast when he contemplated the high percentage of young users who might accept this offer. How does anyone learn to cope with grief except through their own efforts to express what they learned from their experience in their own words?
Worse still, a majority of our youngest generations get most of their news from TikTok, Instagram, YouTube, X, and other streaming services whose most popular contributors are self-credentialed "influencers" and where AI algorithms select what users see based on their prior expressed interests and the popularity of the influencers. These feeds are dominated by video clips that look real, but genAI deep fakes are now indistinguishable from real videos unless they are given the closest possible scrutiny.
To be sure, critics and other opinion leaders have always exerted substantial impact on people's decisions about a wide range of personal decisions. Many decades ago at the dawn of the "Information Age", Herbert Simon (1916-2001), one of the founding fathers of artificial intelligence, correctly predicted that when information became abundant, attention would become the scarce resource.
As consequence, opinion leaders who helped us conserve our attention would become more influential. Fortunately, the mainstream media in times past that hired movie critics and other experts had political and other biases that were easily identified.
- For example, if you were Black and lived in New York City, you knew that the Times claimed to cover "All the news that's fit to print " ... but for whatever reasons, the Times rarely found news within the city's Black communities "fit" enough to print; so you also read reliable Black news publications. Fortunately, most of the visual data the opinion leaders used in any publications to support their opinions was undoubtedly real.
Unfortunately, we are in the midst of the most divisive period in our nation's history since the decades immediately preceding the Civil War. Even Medicare, Social Security, and Medicaid -- the mainstays of our limited social safety nets for the elderly and the unemployed -- have evoked widespread disagreement as to how and whether these programs should be continued.
Meanwhile, millions of our youngest citizens and our oldest citizens who receive substantial support from their family's youngest generations face near-term financial catastrophes.
This is, therefore, not a time for us to be sanguine about Big Tech’s long-term utopian pipe dreams. Millions of Americans will not make it through Big Tech’s near-term dystopian disruptions.




No comments:
Post a Comment
Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.