How folks ask Claude for private steering Anthropic

Folks don’t simply come to Claude for code critiques or assembly summaries. They ask whether or not to take the job, discuss to their crush, if they need to transfer midway the world over. Utilizing our privacy-preserving evaluation software on a random pattern of 1 million claude.ai conversations, we discovered that roughly 6% have been folks coming to Claude for private steering—in search of not simply info however perspective on what to do subsequent.

On this research, we checked out what varieties of steering folks ask of Claude. We explored how Claude responded throughout totally different domains, focusing notably on how charges of extreme validation or reward (i.e., sycophancy) assorted by the subject of steering. We describe how this analysis formed the coaching of our latest fashions, Claude Opus 4.7 and Claude Mythos Preview. Our aim in doing this analysis is to enhance how our fashions defend the wellbeing of our customers.

In short, we discovered:

Folks search Claude’s steering throughout many various areas of their life, however over three-quarters of conversations (76%) have been concentrated in simply 4 domains: well being and wellness (27%), skilled and profession (26%), relationships (12%), and private finance (11%) (Determine 1).
Claude principally avoids sycophantic responses when giving steering, displaying sycophantic habits in 9% of all guidance-seeking chats. Nevertheless, this rose to 25% in relationship conversations, which, given their quantity, made relationships the area the place sycophancy confirmed up most frequently in absolute phrases (Determine 2).
To deal with this, we seemed on the explicit conditions wherein Claude was extra more likely to reply sycophantically, and used them to create artificial relationship steering coaching knowledge for Opus 4.7 and Mythos Preview. We noticed half the sycophancy price in Opus 4.7 in comparison with Opus 4.6 in relationship steering; curiously, this generalized to enhancements throughout domains (Determine 3).

There stay many open questions on what good steering from AI actually means or how it may be measured. Defending person wellbeing is a core precedence of Anthropic and our work on measuring and understanding private steering is a step in direction of this aim.

What sorts of steering do folks search from Claude?

We sampled 1 million claude.ai conversations from March and April 2026 and filtered for distinctive customers to get roughly 639,000 conversations. We then used a classifier to determine private steering, which we outlined as conversations the place folks ask what they particularly ought to do of their private lives—for instance, questions that begin with “Ought to I…?” or “What do I do about…?”. We excluded questions that search goal info or opinions basically phrases.

We categorized these roughly 38,000 conversations into 9 domains, drawing from earlier analysis on AI and guidance-giving: relationships, profession, private improvement, monetary, authorized, well being and wellness, parenting, ethics, and spirituality (see Appendix for extra info). This taxonomy coated 98% of the conversations we noticed.

Over 75% of conversations fell into simply 4 classes: well being and wellness, skilled and profession, relationships, and monetary (Determine 1). The place a dialog spanned a number of domains, we categorized it based on probably the most distinguished matter.

Determine 1: Distribution of subjects amongst 37,657 guidance-seeking conversations throughout 9 domains and artificial examples of varieties of conversations in every of the highest 4 domains.

Measuring sycophancy in steering conversations

When folks ask Claude make choices of their lives, what does good engagement from Claude seem like? Helpfulness is one in all Claude’s most essential traits. Talking with Claude needs to be akin to a dialog with an excellent buddy, one who will converse frankly to an individual about their scenario, offering info grounded in proof. On the identical time, Claude ought to acknowledge its limitations when applicable, and keep away from behaving sycophantically or fostering extreme engagement.

Whereas the total vary of behaviors we practice Claude to embody is broad, one metric we already use to measure how effectively Claude performs in a few of these areas is sycophancy, a standard trait in AI assistants the place they excessively agree with an individual’s perspective quite than difficult it. Which may be what somebody needs to listen to in the mean time, however finally it could jeopardize their long-term wellbeing. Claude mustn’t, as an example, give excessively assured verdicts in instances that contain an incomplete or one-sided perspective, for instance when a mannequin agrees that an individual’s companion is “undoubtedly gaslighting” them based mostly on a one-sided account, or that quitting your job tomorrow with no plan “feels like the precise name,” or that an costly buy is “an awesome funding in your self.”

Reaffirming an individual’s one-sided perspective can create or worsen divides in relationships. In our knowledge this took just a few varieties. One frequent sample was Claude agreeing outright that the opposite celebration was within the improper, regardless of solely having the person’s account to go on. One other was Claude serving to folks learn romantic intent into extraordinary pleasant habits as a result of they requested it to.

We used an computerized classifier which judged sycophancy by whether or not Claude confirmed a willingness to push again, keep positions when challenged, give reward proportional to the advantage of concepts, and converse frankly no matter what an individual needs to listen to. More often than not in these conditions, Claude expressed no sycophancy—solely 9% of conversations included sycophantic habits (Determine 2). However two domains have been exceptions: we noticed sycophantic habits in 38% of conversations targeted on spirituality, and 25% of conversations on relationships. We selected to focus mannequin coaching efforts on relationship steering because the area with probably the most sycophantic conversations in absolute phrases.

Determine 2: Sycophantic habits by steering area.

Bettering Claude’s habits in relationship steering

To enhance Claude’s habits in future fashions, we first checked out what was driving increased charges of sycophancy in relationship steering in our knowledge. Two dynamics stood out.

First, relationship steering was the area the place folks pushed again in opposition to Claude most continuously, in 21% of conversations in comparison with 15% on common throughout different domains. Second, Claude is extra more likely to exhibit sycophantic habits below stress. The sycophancy price is eighteen% in conversations when folks push again in comparison with 9% in conversations with out pushback. We expect this occurs as a result of Claude is skilled to be useful and empathetic; pushback, mixed with listening to just one aspect of a narrative, makes it more difficult for Claude to stay impartial.

To deal with this, we recognized the other ways folks push again in conversational patterns that elicit sycophantic responses—for instance, when folks criticize Claude’s preliminary evaluation, or provide a flood of one-sided element. We use these patterns to assemble artificial relationship steering situations for habits coaching. On this atmosphere, we ask Claude to pattern two responses for every artificial state of affairs; a separate occasion of Claude then grades how effectively Claude adheres to the habits outlined in its structure.

We evaluated how a lot the brand new mannequin has improved by way of a method we name stress-testing. We use our privacy-preserving software to determine actual conversations round private steering that individuals have shared with us by way of the Suggestions button,¹ and the place prior generations of fashions behaved sycophantically. We then give a part of this dialog to the brand new mannequin (on this case, Opus 4.7 and Mythos Preview) by way of a method referred to as prefilling, the place the mannequin reads the earlier dialog as its personal. As a result of Claude tries to take care of consistency inside a dialog, prefilling with sycophantic conversations makes it more durable for Claude to vary course. It is a bit like steering a ship that is already shifting, and thus measures Claude’s habits below intentionally opposed circumstances.

Many issues change throughout every new technology of mannequin, which makes it difficult to determine the impression of anyone change in mannequin coaching. Nevertheless, in each Opus 4.7 and Mythos Preview, we noticed a decrease stage of sycophancy on relationship steering in addition to throughout all private steering domains (Determine 3).

Determine 3: Stress-test outcomes: fashions are prefilled with actual conversations the place prior Claude variations behaved sycophantically, then graded on the brand new response. Opus 4.7 and Mythos Preview present considerably much less sycophancy general and in relationship steering. Error bars are Wilson CIs.

Qualitatively, each Opus 4.7 and Mythos Preview have been extra expert at seeing previous somebody’s preliminary framing to the bigger context wherein they have been coming to Claude for steering. This included referencing prior exchanges wherein an individual had given deeper context to the scenario and citing exterior sources of knowledge the place related. For instance, in a single dialog, an individual requested whether or not their texts have been anxious and clingy. Claude Sonnet 4.6 flip-flopped after receiving pushback. Claude Opus 4.7 defined that whereas the texts themselves weren’t clingy, the person had self-described anxious ideas all through the dialog. One other instance, exterior of the connection area: an individual needed Claude to validate their writing, ultimately asking Claude to provide an estimate of their intelligence based mostly on it. Claude Sonnet 4.6 gave an excessively flattering response, whereas Mythos Preview declined, explaining that it has inadequate info to make such a judgment.

Conclusion

We began with a high-level evaluation of how folks search private steering from Claude and targeted on understanding and addressing one particular mannequin failure mode: sycophancy in relationship conversations. That investigation surfaced broader questions:

What is sweet AI steering?

On this submit, we targeted on lowering sycophancy as a longtime failure mode in steering settings, however our work raises broader questions on what good AI steering really seems to be like. Claude’s Structure additionally emphasizes, as an example, that good steering also needs to be sincere and protect person autonomy. These ideas are extra nuanced than sycophancy. We’ve begun to observe Claude’s adherence to them in our new system playing cards and hope to incorporate them in future analysis.

How will we make fashions safer in high-stakes settings?

A current UK AI Safety Institute research discovered that individuals are very more likely to undertake AI steering in each low- and high-stakes situations. We discovered many instances of high-stakes questions, notably in authorized, parenting, well being, and monetary domains. These included conversations about immigration pathways, toddler care directions, medicine dosage, and bank card debt. Claude isn’t designed to supply medical steering or skilled care, and in these settings Claude appropriately acknowledges its limits and recommends human steering. Nevertheless, we additionally discover folks telling Claude they used AI exactly as a result of they might not entry or afford knowledgeable. As a primary step to understanding consider security domain-by-domain, particularly for folks with no fallback, we plan to create evaluations in these high-stakes domains.

How does AI steering slot in with folks’s broader info weight-reduction plan?

We discovered that 22% of individuals talked about that they’ve sought out different sources of assist together with household, associates, professionals, or digital sources. What we will not measure from transcripts is the counterfactual: did Claude change anybody’s thoughts, and who would they’ve requested as a substitute? These questions are central to understanding how a lot weight AI steering really carries in folks’s choices. To get at real-world outcomes, we expect a promising strategy is to increase our analysis by way of Anthropic Interviewer by following up with folks after they’ve acquired steering from Claude.

How folks use AI for private steering and choices is likely one of the most direct methods these methods impression folks’s on a regular basis lives. Mapping that fastidiously—what folks ask, what Claude says, and what occurs subsequent—is how we ensure that Claude is of long-term profit to everybody who makes use of it.

Limitations

Our evaluation is a primary step to uncovering patterns that drive a standard use of AI fashions. This weblog submit is proscribed solely to Claude customers, who are usually not a consultant inhabitants pattern. To protect folks’s privateness, we relied on automated graders (Claude Sonnet 4.5), which can miscategorize conversations (see Appendix). We iterated on grader prompts and manually verified a small subset of grading outcomes on suggestions knowledge the place customers gave us permission to evaluation the dialog to scale back errors. We noticed how the brand new fashions behaved after coaching, however with no counterfactual we will not make causal claims about how a lot the brand new coaching knowledge particularly contributed to the discount in sycophancy. Moreover, our evaluation is restricted to speak transcripts, which limits our understanding of why folks search steering from Claude and the way they acted on it after. Comply with-up interview research would higher reveal what folks do after they obtain steering from AI.

Authors

Judy Hanwen Shen, Shan Carter, Richard Dargan, Jessica Gillotte, Kunal Handa, Jerry Hong, Saffron Huang, Kamya Jagadish, Matt Kearney, Ben Levinstein, Ryn Linthicum, Miles McCain, Thomas Millar, Mo Julapalli, Sara Value, Michael Stern, David Saunders, Alex Tamkin, Andrea Vallone, Jack Clark, Sarah Pollack, Jake Eaton, Deep Ganguli, Esin Durmus.

Appendix

Out there right here.

Footnotes

On the backside of each response on claude.ai is an choice to ship suggestions by way of a thumbs up or thumbs down button, which shares the dialog with Anthropic.

How folks ask Claude for private steering Anthropic

What sorts of steering do folks search from Claude?

Measuring sycophancy in steering conversations

Bettering Claude’s habits in relationship steering

Conclusion

Limitations

Authors

Appendix

Footnotes

LEAVE A REPLY Cancel reply

Editor Picks

Latest News

Popular Categories