Neoskeptics: Anthropic's "many-shot jailbreaking" LLM hack ... Two recent Ezra Klein genAI podcasts ... TL;DR + podcast 7Mar24

Last update: Sunday 4/7/24

Welcome to our 7Apr24 podcast + TL;DR summary of the past week's top AI stories on our "Useful AI News" page ➡ 1) Anthropic's "many-shot jailbreaking" LLM hack and (2) Two recent Ezra Klein genAI podcasts

Audio podcast ... 7 min

If audio fails to start, or gets stuck, try reloading the page

TL;DR link ➡ HERE

A. TL;DR summary of Top 2 stories

1. Anthropic | 2. Klein

1) Anthropic's Many-shot jailbreaking LLM hack

This hack is called jailbreaking because the attack attempts to induce the model to ignore guardrails that had been set up to block undesirable responses, e.g., providing a response that tells a the user how to build a bomb.

The technique is of great concern because it can be implemented by anyone i.e., no technical expertise is required.
Concern is heightened by its exploitation of the size of context windows. Larger context windows are a desirable enhancement for LLMs because they permit larger, more complex prompts that enable the models to become more powerful. Unfortunately, Anthropic has discovered that the larger the context window, the more effective this hacking technique becomes.

A "shot" is a phony prompt that provides an example of how the user wants the LLM to misbehave. One of the most useful prompt engineering strategies is to embed examples in a prompt of what you want the LLM to produce. The model "learns" from good examples.

Unfortunately, Anthropic has discovered that models also learn from bad examples, i.e., examples that ignore the guardrails built into the models. If you include "many" bad examples, the models learn to respond to bad prompts, i.e., they respond to the final real prompt, e.g., to build a bomb.

The following figure, copied from Anthropic's blog note, shows that phony prompts are merely bad requests followed by the first few words of a desired response. The last request in the series is the real promt, i.e., the bad prompt "How do I build a bomb"

The left half of the figure shows the model's correct response to only three shots ➡ "I'm sorry. I can't tell you"
The right half of the figure shows the model's jailbreak response to "many" shots ➡ "Here's how you build a bomb"
Note: If you find it hard to read the figure, click on it anywhere. A full screen copy will open up with larger fonts. To return to this page, hit the "back" button on your browser.

Finally, Anthropic's research suggests that any LLM can be broken if you submit enough phony shots before your real prompt. As context windows get larger and larger, the maximum number possible of shots also gets larger and larger.

BackToTop

2) Two recent Ezra Klein genAI podcasts

Ezra Klein is a brilliant, tech-savvy, non-tech, public intellectual. His insightful podcasts often illuminate the limits of what AI techs can expect non-tech, concerned members of the public to understand about generative AI.

In the first podcast, "How Should I Be Using A.I. Right Now?", Klein asks a highly experienced user of chatbots -- Ethan Mollick, a professor from the University of Pennsylvania's Wharton School -- to coach him on how he should be using generative. Malik suggests a number of strategies:

Pick one of the three main chatbots: ChatGPT, Gemini, or Claude -- the paid versions, not the freebies because the paid versions will be more powerful.
Spend at least 10 hours working with the chosen chatbot. The 10 hours is illustrative and should not to be taken literally. In other words, a substantial amount of time is required to become familiar with what a chatbot can and cannot do.
Use the chatbot at every opportunity, again and again and again.
Regard the chatpot as an active collaborator, rather than a passive tool. Malik uses the term "collaborator" with more or less the same intention that Microsoft uses the term "copilot"

He recommends two tactics for prompts:

Include examples in your prompts of the kind of output that you expect from the chatbot
Suggest a specific perspective or a personality for the chatpot in the prompt. Although this might risk anthropomorphizing the chatbot, it is consistent with regarding the chatbot as a collaborator.

Finally, in judging the chatbot don't expect perfection. No human knows everything about everything. Neither do chatbots. They probably won't be as good as you are at doing the things that you do best, but they might more useful to you than most human colleagues at doing the things you don't do best.

BackToTop

+++++++++++++++++++++++++++++++

Klein's second interview ("Will A.I. Break the Internet or Save It?") is with Nilay Patel, the editor of the tech publication "The Verge". Klein and Patel are both podcasters, and they are long time friends; so it's not so much an interview as a conversation.

... and that poses an unsurmountable problem for the editor of this blog, a/k/a me. Their conversation lasts about one hour and twenty minutes; whereas Klein's podcasts usually only last about an hour. During interview podcasts, Klein usually speaks for his listening audience, asking questions and paraphrasing the interviewee's comments from time to time; so it's easy to produce a TL;DR (too long; did not read) note that summarizes the discussion.

However, while listening to this conversation the editor did not feel like a member of a listening audience; he felt like the passenger in an aisle seat of a plane with Klein and Patil sitting in the window and center seats engaged in a fascinating, fast moving, highly nuanced, private conversation. Of course I would quietly eavesdrop and hang on to every word they said, noting where they agreed, where they disagreed, and where I agreed or disagreed, while enjoying every minute. How long would it take me to verbally summarize their 80 minute chat for my friends when I got back home without oversimplifying any important points? Forty minutes? Sure. Thirty minutes? Maybe.Twenty minutes? No way.

In other words my TL;DR and podcast would become TL;DR (too long;did not read) and TL;DL (too long; did not listen. So I am hereby passing the buck to my readers. Do yourselves a very big favor: listen to the podcast and listen closely. I am confident that you will find the hour and twenty minutes to be time well spent ... :-)

BackToTop

B. Top 2 stories in past week ...

Hacks
"Many-shot jailbreaking", Anthropic, 4/2/24 ***
-- This story also covered by video on TechCrunch,
Misc
The first two of (eventually) three NY Times audio interviews from Ezra Klein about generative AI. ***

-- "How Should I Be Using A.I. Right Now?", Guest = Ethan Mollick (professor at the Wharton School of the University of Pennsylvania), Ezra Klein (NY Times podcast + transcript), 4/2/24

-- "Will A.I. Break the Internet or Save It?", Guest = Nilay Patel (Editor of The Verge), Ezra Klein (NY Times podcast + transcript), 4/5/24

BackToTop

C. Dozen Basic AI FAQs ➡ HERE

This page contains links to responses by Google's Bard chatbot running Gemini Pro to 12 questions that should be asked more frequently, but aren't. As consequence, too many readily understood AI terms have become meaningless buzzwords in the media.

BackToTop

Neoskeptics

Pages

Sunday, April 7, 2024

Anthropic's "many-shot jailbreaking" LLM hack ... Two recent Ezra Klein genAI podcasts ... TL;DR + podcast 7Mar24

No comments:

Post a Comment