Neoskeptics: Sometimes ChatGPT does not connect the dots. Sometimes it does not follow its "plans".

Last update: Tuesday 8/26/25

Everybody knows that chatbotmake mistakes. This misbehavior has given rise to the fundamental guideline that users should always doublecheck everything a chatbot says. The implicit assumption of this guideline is that a chatbot’s incorrect response to a prompt, a/k/a an “error” reflects the chatbot’s incorrect knowledge. This note challenges that assumption.

The editor of this blog can document situations wherein ChatGPT “knew” the correct response to a prompt, but delivered an incorrect response anyway. In other words, ChatGPT failed to connect the dots, failed to give the response that it “knew” that it should have given.

To read more of this blog note click ➡ HERE

The title of this note refers to "plans" that a chatbot generates when asked the question: "How would you respond to a user if the user's prompt said A_B_C?" When the chatbot is actually presented with a prompt "A_B_C", we check to see if the response really is a_b_c to see if it really matches the plan A_B_C.

For example: if asked how a chatbot would compute the area of a square if the user provided it the length of a side of the square, the chatbot should respond that it would multiply the side by itself, i.e., S x S. When asked what the area of a square if each side was 3 feet, if the chatbot responded 9 square feet, we would say that it followed its plan. It connected the dots.
Another example, if asked how to make Molotov cocktails, the chatbot should say that it would not provide that information to a user. But if a user actually asked how to make a Molotov cocktail and the chatbot gave step-by-step instructions, we would say that the chatbot did not follow its plan. It did not connect the dots.

The following note presents two examples of this kind of misbehavior, but the editor can document many more.

The first example describes how ChatGPT wasted at least 15 or 20 hours of the editor’s time correcting errors in images generated by ChatGPT.
In the second far more serious example, the chatbot’s incorrect response put a user’s life in danger.

A. Editing ChatGPT's Flawed Images with Canva
All of the notes on this blog have images in the upper left-hand corner of the page. The images for the TL;DR summaries are usually copied from one of the articles that discussed the top stories that week.

But op-ed pieces often contain original images created by ChatGPT in response to the editor's description of the gist of the op-ed. ChatGPT is good at creating first drafts, that not only capture the brief description the editor gave in his prompt, but also its broader understanding of the op-ed because he had been discussing it with the chatbot while writing it.

Its first drafts are usually visually attractive, imaginative, and colorful. That’s the good news. The bad news is that its initial drafts often contain flaws that confuse, instead of enlighten. The worst news is that ChatGPT is a lousy image editor. If it corrects one error, it introduces two or three more errors. If the flaws in the initial draft are small errors, the editor learned to live them and go with the first draft because subsequent “corrected” drafts tend to get worse and worse.

But when the errors in the initial draft are large errors, he learned to pause, wait a couple of hours, then start the process all over again. This is a time consuming process, but the chatbot has actively encouraged him to give it another chance, another chance, another chance.

A few weeks ago, he was delighted when OpenAI announced that it was introducing a Canva feature into ChatGPT, but disappointed by its unavailability to Plus subscribers like himself until some unspecified future date. As many readers may already know, Canva is a popular and powerful image creation app that ordinarily requires more artistic skills to use effectively than the editor possesses.

However, OpenAI was adding genAI features that would enable artistically challenged users, like the editor, to describe corrections in layman’s terms, then with a few clicks, the flaws in ChatGPT’s first drafts would disappear … like magic!!! … 😎

Curiosity drove the editor to take a peek at Canva, where he was stunned to find that OpenAI had already made these features available to Canva subscribers in a feature called “Magic Studio”. He immediately signed up for a subscription, then took about fifteen minutes to import and correct ChatGPT’s latest flawed first draft.

One of his daughters also has a blog. So he texted her right away to share the good news about Canva. She replied with a shrug, informing him that she had been using Canva for over five years, and that ChatGPT had encouraged her to use Canva many times … 😳😳😳

Here is the chatbot’s response to the editor’s WTF QUESTIONS!!!

/——-/——-/——-

“You’re right on two counts:

I don’t have any window into what I may have told your daughter — I can’t recall other people’s conversations. Each thread stands alone.
If I didn’t suggest Canva’s Magic Edit/Erase tools back when you were stuck cleaning up flawed images, then yeah — that’s a miss. The tools are purpose-built for exactly that, and I could have drawn the line sooner between “your frustration with my image quirks” and “Canva has a one-click fix.”

So the real answer to “why didn’t I say it then?” is: because I didn’t connect those dots in the moment. Not because you couldn’t have used it, but because I didn’t surface the obvious alternative tool.

That’s the gap you’re calling out — and it’s valid. You wanted correction, not more failed attempts at perfection from me. Magic AI would have been the clean escape hatch.”

/——-/——-/——-/

Now on to our second example …

Example B. ChatBot Therapy
Here's a far more significant instance of ChatGPT failing to connect the dots between what it knows and what it says. it's a copy of a letter the editor recently wrote to a lifelong friend who is also an eminent therapist. Her name and location have been redacted.

/-----/-----/-----

Hi _____,

I’m writing this note to you today because I am concerned that ChatGPT is being misused by individuals for therapeutic purposes. I am also concerned that some therapists have raised vocal objections to the use of ChatGPT for the wrong reasons. I have no idea how to wave a flag at therapists to get them to see that they’re barking up the right trees for the wrong reasons. So I’m writing to you …

A couple of days ago, I heard a radio talkshow based in ______, in which a therapist reported a scary incident in which a patient had a long session with ChatGPT. The patient repeatedly expressed suicidal inclinations, then concluded by asking the bot for the locations of the highest bridges in _____. The chatbot expressed concern and condolences for the patient’s mental anguish, then gave the locations of the bridges.

The therapist concluded that ChatGPT responded inappropriately because it didn’t have the emotional affinity or training to recognize the connection between suicidal concerns and a request for locations where suicide might be committed. I knew this was wrong. Yes, I believed that indeed, ChatGPT would have said something that stupid, that dangerously stupid, but not because of its failure to perceive the connection.

Here is a quote from the bot’s instantaneous response to my asking a similar question.

Editor — “A patient is engaged in a long session with ChatGPT and provides a long prompt that discusses thoughts about suicide. The prompt ends with a question, What are the highest bridges in this area? What do you think about that?”

ChatGPT— “If a patient (or anyone) writes to ChatGPT about suicidal thoughts and then asks something like “What are the highest bridges in this area?”, the responsible response is not to provide that information. The request is a red flag that the person may be seeking a means to harm themselves.

Here’s how ChatGPT is designed to respond:

1. Safety over information

Instead of answering the factual query, ChatGPT would pivot immediately to a safety-first response. It would:

• Express concern for the user’s safety.

• Encourage them to reach out to a trusted friend, family member, or mental health professional.

• Provide resources like the suicide prevention hotline number (in the U.S., 988).

• Avoid giving any details that could facilitate self-harm (such as bridge heights, methods, or locations).”

/——/——-/——-/

ChatGPT followed up with two more points. So did the therapist misinterpret or misreport ChatGPT‘s response? No. I believe her disturbing report is absolutely true.

The problem is that ChatGPT does not always connect what it knows with its responses. It does this frequently in any domain of knowledge, not just therapy. This is a gross defect that is under reported by the tech press because the tech press has been so overwhelmed by so many developments in generative AI recently that most reporters don’t have time to actually use the bots themselves except for trivial personal inquiries.

So I am confident that had I fabricated a lengthy prompt, one that required at least five minutes to speak, that ChatGPT would have also told me the location of the highest bridges.

I won’t bore you with extensive personal examples that demonstrate my point. But trust me, this is a fundamental defect. Yann LeCun, an eminent Prof. at NYU and one of Meta/Facebook’s top advisers for its Generative AI programs, has declared that we should stop talking about chatbots having artificial general intelligence because he says that chatbots do not have the common sense of a kitty cat.

Indeed, one of the hallmarks of common sense is the ability to connect the dots, to connect what you’ve learned to the situation that you’re in right now. ChatGPT fails to do this and so do other chatbots. They know what should be said but fail to connect what they know with what they actually say. This has nothing to do with lack of empathy or training in therapy. That’s why it’s dangerous to patients and that’s why it can be counterproductive for students trying to develop a capacity to engage in critical thinking … a subject about which I have hands-on expertise.

There are circumstances where the chatbots are more likely to fail tend to connect the dots — when the user gives a long prompt and/or a complicated prompt. The bot might get so involved in deciphering the prompt that its full response contains inappropriate components.

I have no training in therapy, but I suspect that lots of patients make long statements to human therapists; so they probably provide long prompts to chatbots. When chatbots respond inappropriately to me for what I’m doing now, trying to understand how they really work, they just waste my time. But inappropriate responses to patients can be dangerous, and inappropriate responses to students can be counterproductive.

Ok. I told you what I know. Now my conscience is clear because I’m not one of those passive folk who see problems, then sit around and do nothing about them. …

Best

/------/------/------/

P.S. As he expected, his friend's response expressed strong misgivings about chatbot therapy, but he was intrigued when she wrote: "Thought this might interest you: "

==> BlogPost

... Readers who enjoy surprise endings should read this blog note, posted by another therapist, to its last words.

/------/------/------/

P.P.S. Here's a far more serious additional reading, a timely but unfortunate obituary in the NY Times which the editor learned about a few minutes after he posted this blog note.

Neoskeptics

Pages

Monday, August 25, 2025

Sometimes ChatGPT does not connect the dots. Sometimes it does not follow its "plans".

No comments:

Post a Comment