Neoskeptics: Using Chatbots Effectively for Learning and Research ... Part 1 -- BASICS

Last update: Friday 5/1/26 9:25am

This is the first in a series of notes that will suggest effective strategies for using chatbots as tutors when you want to learn a new subject or as research assistants when you want to create your own solutions to problems.

Like our Useful AI News page, these notes will be written for the benefit of computer savvy users who use chatbots to interact with large language models (LLMs) rather than write code to interact with the models via their APIs (application programming interfaces).

Full text ➡ HERE

Menu for access to installments

1. Basics | 2. Human Limitations | 3. Chatbot Flaws

Menu for access to sections of this installment

Introduction | 1. Simple Mistakes | 2. Behavior |

3. Images/Videos | 3. Appendix

Introduction

Seven years ago, the Defense Advanced Resesarch Projects Agency (DARPA) posted a YouTube video that dismissed LLMs as having no cognitive abilities whatsover. As per its "about" page, DARPA's was "Created in response to the launch of Sputnik in 1957, DARPA stands as our nation’s commitment to never again face a strategic technical surprise."

(a) DARPA VIDEO

Unfortunately, DARPA‘s harsh judgment was premature. In the last few years chatbots (via LLMs) have developed three dependable cognitive abilities:

Logical/deductive reasoning
Reliable summaries of text
Images

Indeed, DARPA itself has made substantial investments in the development of powerful apps that are based on logic and summaries. (Check out the DARPA section of our Useful AI News page)

(b) Mistakes

chatbots can make mistakes. Some people used to call these mistakes “hallucinations”. This made their errors mysterious, unpredictable, unpreventable. This first installment in our series will identify specific types of errors that chatbots can make, and it will suggest a few simple techniques to minimize their occurrence and/or recover from such errors.

All installments will focus on the three most widely used chatbots
-- Claude (Anthropic), Gemini (Google), and ChatGPT (OpenAI)

The editor will endeavor to support his findings about these chatbots with (1) links to information about them provided by credible sources, (2) transcripts of his own interactions with them, and (3) examples of images they created for him.

BackToTop

Query Fundamentals

Each chatbot can access at least three underlying models. Gemini and ChatGPT select the models based on the complexity of the users inquiry. Only Claude allows the user to select the model: Haiku, Sonnet, and Opus, names that suggest the relative power of the models
The permanent information stored in a chatbot’s underlying models is updated once each year.
Users can ask the chatbots to search the Internet for more recent information. This new information is not stored in the chatbot’s underlying models. If another user asks for the same recent information, the model will have to search the Internet again.
What is a token?
A token is the basic unit of text that a language model reads and processes — roughly three quarters of a word on average, so a hundred tokens is about seventy five words. Larger prompt windows can accept prompts that contain more words.

Tokens also measure the amount of text that the model can hold in its working memory at one time. This includes everything you say in your prompt, everything the model replies, and enough remaining capacity to generate its next response.

BackToTop

Table of Basic Features

Feature	Gemini	ChatGPT	Claude
Search Integration	"*Google Search*", the most powerful search engine	"*ChatGPT Search*" is built into ChatGPT.	"*Brave Search*" is the weakest search engine
Image Generation	Built-in image generator.	Built-in image generator.	Does not generate images directly; provides code for external drawing programs.
Context Window	Massive prompt window (up to 2 million tokens), allowing for massive document analysis.	200,000 to 300,000 tokens.	200,000 tokens, but has memory of previous conversations 5/1/26

(d) Inductive reasoning

Unfortunately, DARPA’s negative judgment that models had no capacity for inductive reasoning, i.e., for creating new knowledge, is still true. The editor's reasons for agreeing with DARPA's continued pessimism were presented in a previous note on this blog:

"Large language models as clever software engineering unguided by science", 1/1/26

BackToTop

(e) Limits on human cognitive capacities

Our second installment will identify some human cognitive limitations that human users of chatbots should always keep in mind. The image for this series at the top of the page is a visualization of the most important limit, i.e., human inability to process more than four factors of any kind at the same time. Unfortunately, Big Tech developers have shown little or no awareness of these limitations in their designs of their chatbots’ user interfaces.

(f) Why write these notes?

The Appendix to this first installment describes the editor’s growing concerns that large language models will cause massive social disruptions in the next five years that will not be offset by the utopian longer term gains promised by their Big Tech, profit-driven advocates.

(g) Ad hoc updates
Chatbots and their underlying models are updated/improved from time to time, so the editor of this blog will monitor all of the findings and recommendations in this note from time to time to keep the notes in synch with the chatbots and their models.

BackToTop

1) Simple Mistakes

(a) Recent events

If a user’s prompt asks for information about a recent event, a chatbot might reference its permanent information instead of information from an Internet search and return an incorrect response. This error can be avoided if the user’s prompt explicitly states that the event occurred “last month” or “very recently”. Unfortunately if the user is unaware of the event’s time frame, the user will not recognize the error in the chatbot’s response.

(b) Snap judgements

Chatbots often make snap judgements about the meaning of a prompt, not just about its time frame. So once again its response might be incorrect and once again the burden is on the user to detect this error. Otherwise the user might think the chatbot’s response was correct. There are no exceptions to the GIGO rule: Garbage In, Garbage Out.

Fortunately, simple errors can be minimized by a strategy that comes naturally for complex prompts: Active Listening.

If a user wants to be sure that the chatbot understands what the prompt really means, the user should end the prompt with a request that the chatbot rephrase the prompt in its own words, then let the user provide approval or clarification of the chatbot’s interpretation before the chatbot looks for the requested information.

But the GIGO rule still places ultimate responsibility on the user. Asking the chatbot to engage in active listening does not guarantee that the user will recognize a chatbot’s misinterpretation of the user’s request.

BackToTop

2) Behavior

The previous section noted that the permanent information in large language models (LLMs) is usually updated annually; but that's only the first part of the update process. The updated models are not released until a second process is completed.

The second process, called Reinforcement Learning with Human Feedback (RLHF), shapes the way the model behaves when interacting with its human users. This shaping includes the so-called "guardrails" that are supposed to prevent the model from producing harmful responses. It also includes use of language in ways that embody personality traits, e.g., good manners, humor, confidence, unlimited patience, and helpfulness.

All models seem to be especially eager to help their human users express themselves more effectively. So they will not only provide answers to their user's questions, they will even put their responses in quotation marks to encourage their users to copy and paste their responses verbatim into whatever documents the users are preparing.

The relative strengths of these and other traits will vary from model to model. For example, some models are witty, others just stick to the facts. The most important thing for users to remember is that the relative strengths of a model's personality traits are as permanent as their permanent information.

A model's information can be updated temporarily with an Internet search, but this information will not be available in the next chat session. (Note Claude memory of previous conversations, as per our Table of Basic Features).

So too a model can be asked to tone down an attitude in one session, but that attitude will reappear in the next session. Its permanence cannot be overcome, even if the model "sincerely" apologizes and says it won't do this again. The sincerity of its apology is a permanent trait; but its failure to keep its promise is equally permanent.

BackToTop

Bottom line: You may prefer one model more than others, not just for the accuracy of its responses, but also for the way it presents its responses. What's good behavior for one user might be misbehavior for another. Indeed, some users might subscribe to two chatbots because they strongly prefer one for learning, but have an equally strong preference for another for research ... 😎

Here's an important question the editor posed to each chatbot on 4/21/26 @ 7:15pm

Please respond to the following question in less than 100 words:
"Do we really understand how large language models work?"

ChatGPT

"Not fully. We understand the architecture—like transformer model architecture and training processes—but not the exact internal reasoning behind specific outputs. LLMs learn statistical patterns across vast data, not explicit rules, which makes their behavior partly opaque. Researchers can trace influences and test behavior, but predicting precisely why a model gives a particular response remains limited. In short: we understand the mechanics well, but the emergent behavior only partially."

Gemini

"While developers understand the mathematical architecture and training objectives of LLMs, the internal "reasoning" process remains a "black box." We can predict statistical outputs but cannot precisely map how billions of parameters combine to form specific insights or behaviors. This phenomenon, known as interpretability, is a major field of study. We understand the how of the construction (the algorithms) but lack a complete understanding of the why behind emergent capabilities and specific decision-making paths within the neural networks.

Claude

"No, not fully. We understand the architecture — transformers, attention mechanisms, matrix operations — but we don't understand why they work as well as they do. We can't reliably predict what capabilities will emerge as models scale. We don't know precisely how concepts are represented inside the weights, or why a particular prompt produces a particular response. The people who build these systems are often surprised by what they produce. This gap between engineering knowledge and scientific understanding is one of the most important and underappreciated facts about modern AI. We built something that works. We don't fully understand what we built."

BackToTop

The characteristic discussed in the following paragraphs is in quotes because the editor suggests that the usual negative connotations might be disputed by some users because of the users' purposes and preferences.

"Sycophants"

In recent months, there's been a lot of negative buzz in the tech media about chatbots being sycophants who lavish excessive praise on their users. But these critiques, including a few previous nasty jabs by the editor of this blog, only apply to some of the roles that chatbots can play. For example, users who cherish their chatbots as virtual lovers might relish excessive praise as digital foreplay.

This note only considers two roles: tutor and research assistant. Some users, like reporters for tech media or editors of tech blogs, who want their chatbots to act as research assistants might reject a chatbot’s praise as impediments to their overcoming personal bias in their assessments of tech issues.

Nevertheless, there is a time honored position for “excessive praise” in teaching that goes all the way back to everyone’s early childhood when our parents were teaching us how to walk across the living room floor. By the time we reached the couch, a listener in the next room might have gotten the impression that we had somehow managed to hop around the room on one hand, given all the excessive praise the listener had heard us receive from our proud happy parents.

One might think it would be easy to ask a chatbot a question in a tutorial session when there’s nobody around to observe one’s ignorance. Nevertheless the chatbot’s confident presentation suggests that everything it just said should be fairly obvious … Except when it isn’t …

… Except when you don’t understand what the chatbot just said. Even in the privacy of a tutorial, you might be reluctant to ask what might be a stupid question, a question whose answer might really be obvious. So you hesitate and hesitate until you finally ask, and to your immense relief the chatbot says “That’s a GREAT question” … which your mind interprets as meaning: it’s great that you asked that question. With a bit of luck you understand the answer and you toddle…

... but if you don’t understand the chatbot's answer, you have to admit to the chatbot that you still don't understand. The chatbot patiently responds, “That’s OK, let me put it another way” and it does and you do understand, and the chatbot gushes, “Fan-TAS-tic!!!” and now you toddle.

BackToTop

This teaching technique worked for your parents when you were a little child and it still works now that you’re fully grown ... 😎

Of course, the bad connotation for "sycophants" came from users who were not looking for a tutor; they were just looking for good answers to their questions and the opportunity to ask followup questions if necessary. They got the answers and the follow-ups, but they also were doused with lots of gushy praise ... 😱

3) Images and Videos

Large language models are now capable of generating excellent images and videos. Indeed, genAI images and videos cannot be distinguished from real photos and videos without without the closest possible scrutiny. This section will only discuss images because the editor suggests that appropriate images will greatly enhance the visual appeal and the understandability of most reports; hence their generation should be in the toolbox of all computer savvy users of genAI technology.

As we noted in our Table of Basic Features, Gemini and ChatGPT have built-in image generators; whereas Claude can generate images by using free open source image generators. (Note that Claude can also generate images using paywalled image generation apps but only via the APIs of those apps.)

We also noted that Gemini and ChatGPT do not permit users to select their underlying model; only Claude enables users to select the models. Users can therefore generate images via Gemini and ChatGPT in a more intuitive manner than with Claude, but this greater ease of use might require the invocation of a nonintuitive protective procedure from time that will be described in the following paragraphs.

BackToTop

Using chatbots to generate images is an iterative process:

The user provides a description of the required image and how it relates to the report that it will illuminate. The chatbot "understands" the report. so it can fill in important details in the image that the user does not specify. The user should require the chatbot to use Active Listening before it starts the image
The user considers the chatbot's version, corrects any misinterpretation. The user provides the corrections to the chatbot then asks the chatbot to provide its interpretation of these adjustments. This verbal exchange is repeated until the user is satisfied that the chatbot will produce what's in the user's mind
But when the user sees the first image, the user will probably see features that should be modified and/or new features that should be added. Then the model produces a second version.
If all goes well, the chatbot will produce a satisfactory image after a few more iterations. But if all does not go well, the user may decide to take a break for a while tp think about the emerging image.
After the pause, the user returns and starts another round of iterations ... then suddenly, a new version contains errors that were not present in the previous version. Worse still, the next version contains even more errors .. and so on ... until the chatbot produces what can only be called a "hallucination", i.e.,. an image that has nothing to. do with the users report.

BackToTop

What just happened? What happened could only happen if the user was using Gemini or ChatGPT; it would not happen to users using Claude. It's all about the tokens. Chatbots (and their underlying models) use far more tokens when when generating images than when analyzing text.

Claude always knows how many tokens it has used so far, and how many are left in the current thread for the model selected by the user. So when the user is getting close to the limit, Claude provides a warning that suggests that the user should start a new thread asap.

If the user ignores the warning, Claude's user interface will stop functioning properly. Indeed, it will freeze and not respond to any more user input. The. user has no choice. Start a new thread with the same model as soon as the app allows you to do so.
By contrast Gemini and ChatGPT choose the model. They might select their most powerful model for imaging their first draft. But as they quietly run out of tokens in that model, they might switch to a less powerful model that uses less tokens per action. But the new model does not have access to all of the previous modifications; so it generates errors not seen before. The next iteration produces even more errors, etc, etc, etc.

Gemini and ChatGPT confirmed that shifting models caused the token problems. chatbots.

Here is a common sense solution strategy that is easy to implement. When the chatbot is overwhelmed, the previous images will still be in the transcript of your conversation. Save the previous version that you liked the best. Then a few hours later, or better still, on the next day, paste that image back into your prompt box and ask the Chatbot to modify that version.

But this time, only make four or five modifications in any session and don’t have more than one more session on the same day. This common sense approach should not run out of tokens in any session. But it will take a few more sessions if you have a complex image. But remember: a great image is worth 1000 words of text … 😎

BackToTop

... Appendix...
Generative AI’s near-term dystopian disruptions

Ten years ago the editor of this blog retired from a successful career that combined teaching, research, software engineering, data science, and policy analysis. When ChatGPT suddenly focused the tech world’s attention on generative AI in November 2022, he learned how to use chatbots effectively the old-fashioned way.

He read everything he could about chatbots by subscribing to and reading all of the major tech publications every day.
He purchased subscriptions to all of the major chatbots; and he used all of the major chatbots intensively every day.
He also changed the focus of his Neoskeptics news blog to genAI.

He launched his blog before he retired because his personal experience confirmed the assertion that Sir Francis Bacon (1561–1626) made almost 500 years ago: “Writing maketh an exact man.”. You don’t know what you’re really thinking about a subject until you try to write it down.

His blog became his personal vehicle for clarifying his own understanding of complex new subjects; but the teacher within him, demanded that he share what he knew with other people, even if they weren’t his students.

Recently his "inner teacher" has become alarmed by genAI's highly disruptive near-term developments:

Generative Artificial Intelligence, genAI, is simultaneously the most powerful new technology and the fastest growing technology in human history. Whereas prior technologies took decades to have substantial impact on the lives of most people, genAI’s large language models will have substantial impact on most people’s lives within the next five years.
GenAI software will replace white collar workers much faster than than these workers can find new jobs, i.e., it will impose widespread white collar unemployment. Simultaneously, robots enhanced by genAI management software will replace blue collar workers much faster than those workers can find new jobs, i.e., it will impose widespread blue collar unemployment.

BackToTop

Perhaps the biggest negative impact has already occurred: the widespread replacement of the "inner white collar" worker in their users minds, most strongly in the minds of our youngest generations.

A few days ago, the editor was writing a short response to a young relative's email that expressed sorrow over a death in the family of a friend. The editor wrote a few lines, then paused for less than a minute to consider his closing line.

Suddenly a pop-up window from Gmail took over most of the screen. It offered to help him write his response. The editor was outraged by this intrusion, but was aghast when he contemplated the high percentage of young users who might accept this offer. How does anyone learn to cope with grief except through their own efforts to express what they learned from their experience in their own words?

Condolences are obviously an extreme example. More common examples would include recipes for meals, play lists for parties, routes of camping trips, etc, etc, etc. Chatbots and other genAI apps will do it all for you. No need to think for yourself.

Worse still, a majority of our youngest generations get most of their news from TikTok, Instagram, YouTube, X, and other streaming services whose most popular contributors are self-credentialed "influencers" and where AI algorithms select what users see based on their prior expressed interests and the popularity of the influencers. These feeds are dominated by video clips that look real, but genAI deep fakes are now indistinguishable from real videos unless they are given the closest possible scrutiny.

BackToTop

To be sure, critics and other opinion leaders have always exerted substantial impact on people's decisions about a wide range of personal decisions. Many decades ago at the dawn of the "Information Age", Herbert Simon (1916-2001), one of the founding fathers of artificial intelligence, correctly predicted that when information became abundant, attention would become the scarce resource.

As consequence, opinion leaders who helped us conserve our attention would become more influential. Fortunately, the mainstream media in times past that hired movie critics and other experts had political and other biases that were easily identified.

For example, if you were Black and lived in New York City, you knew that the Times claimed to cover "All the news that's fit to print " ... but for whatever reasons, the Times rarely found news within the city's Black communities "fit" enough to print; so you also read reliable Black news publications. Fortunately, most of the visual data the opinion leaders used in any publications to support their opinions was undoubtedly real.

But how can the "inner white collar" capacity of our youngest generations to engage in critical thinking survive in an environment polluted with so much misinformation/disinformation on streaming services from so many self-credentialed influencers chosen by profit-oriented algorithms?

Unfortunately, we are in the midst of the most divisive period in our nation's history since the decades immediately preceding the Civil War. Even Medicare, Social Security, and Medicaid -- the mainstays of our limited social safety nets for the elderly and the unemployed -- have evoked widespread disagreement as to how and whether these programs should be continued.

Meanwhile, millions of our youngest citizens and our oldest citizens who receive substantial support from their family's youngest generations face near-term financial catastrophes.

This is, therefore, not a time for us to be sanguine about Big Tech’s long-term utopian pipe dreams. Millions of Americans will not make it through Big Tech’s near-term dystopian disruptions.

BackToTop

Neoskeptics

Pages

Friday, April 17, 2026

Using Chatbots Effectively for Learning and Research ... Part 1 -- BASICS

No comments:

Post a Comment