Friday, April 11, 2025

GenAI Diary (page) … Agents = AI Chatbots that can autonomously fill out Web forms

 Last update: Saturday 4/12/25

For all of the hub-bub about agents throughout the last six months, there is still no widely accepted agreement as to what agents are supposed to be able to do. However, one skills-set that is referenced in many, if not most definitions is the ability to autonomously search the web and fill out forms, just like people do. 

So the editor of this blog posed this as a question to ChatGPT running OpenAI's "Deep Research" feature. (Note: The editor found ChatGPT's initial response when running GPT-4o to be woefully inadequate). Given the focus of this blog on readers who use generative AI via chatbots, the editor limited ChatGPT's assignment to Claude, Gemini, and ChatGPT itself, the most widely used chatbots. Deep Research took about 12.5 minutes to generate the following response.

Response from ChatGPT (Deep Research) ...

Which AI Chatbots Can Autonomously Fill Out Web Forms?

As of early 2025, several leading chatbots have begun to develop agentic capabilities — that is, the ability to autonomously interact with websites, including filling out and submitting forms. Here’s how the top three stack up: ChatGPT (OpenAI), Claude (Anthropic), and Gemini (Google).


ChatGPT (OpenAI)

  • Agentic Capability: Yes, via an experimental feature called "Operator."
  • Availability: Limited to ChatGPT Pro users in the U.S. as part of a research preview.
  • How It Works: Operator allows ChatGPT to simulate a real user — it can click buttons, select dropdowns, type into fields, and submit forms entirely on its own.
  • Technology Behind It: Powered by a specialized version of GPT-4 known as the "Computer-Using Agent". It “sees” webpages and performs actions step-by-step.
  • Limitations: Still in testing. Won’t handle sensitive or financial sites. May defer to the user in complex or ambiguous cases.

Conclusion: ChatGPT with Operator is the most advanced and publicly accessible example of a chatbot that can autonomously fill out and submit web forms.


Claude (Anthropic)

  • Agentic Capability: Yes, through a beta feature called “computer use”, available via API.
  • Availability: Accessible to developers through platforms like Amazon Bedrock or Google Cloud Vertex AI. Not yet available in the consumer Claude.ai interface.
  • How It Works: Claude receives screen information and issues precise actions (clicks, typing, navigation) like a virtual assistant controlling a computer.
  • Technology Behind It: Claude 3.5 Sonnet and Claude 3 Opus are optimized for this kind of structured GUI interaction.
  • Limitations: API-only for now. Some interactions like scrolling or dragging may be clunky. Not built for end-users out of the box yet.

Conclusion: Claude can fill out forms autonomously in developer environments, but not yet for the general public through its chatbot.


Gemini (Google)

  • Agentic Capability: Yes, demonstrated through "Project Mariner", a prototype built on Gemini 2.0.
  • Availability: Not yet public. Project Mariner is in "trusted tester" release. Consumer-facing Gemini (e.g. Bard) cannot yet fill forms autonomously.
  • How It Works: Gemini uses a Chrome extension to understand webpages and simulate user actions like typing and clicking.
  • Technology Behind It: A browser extension feeds structured page info into Gemini, which then acts on it. Gemini 1.5 and 2.0 models support this functionality in prototype form.
  • Limitations: Still experimental. Slow and sometimes inaccurate. Currently limited to one browser tab and basic interactions.

Conclusion: Gemini’s autonomous form-filling capabilities exist but are not yet available to the public. The most advanced agent (Mariner) remains in development.


Final Verdict

Chatbot

Can It Fill Out
Forms Autonomously?

Public Access

Maturity Level

ChatGPT

        Yes (via Operator)

Limited beta (US Pro users)

Most mature

Claude

        Yes (API only)

Developer access only

Mid-stage

Gemini

        Yes (Project Mariner)

Prototype, not public

Early-stage

All three companies are actively developing autonomous agents that can navigate the web and interact with forms — but only OpenAI’s ChatGPT (with Operator) currently offers this to real users, aalbeit in a controlled beta.

BackToTop

No comments:

Post a Comment

Your comments will be greatly appreciated ... Or just click the "Like" button above the comments section if you enjoyed this blog note.