← All concepts

system prompt robustness

2 articles · 6 co-occurring · 0 contradictions · 0 briefs

Understanding whether LLMs can detect their own prompt/context manipulation informs design of defensive system prompts

Understanding whether LLMs can detect their own prompt/context manipulation informs design of defensive system prompts

The failure suggests system prompt or constraint encoding is fragile—approval requirement isn't surviving tooling updates, indicating it may not be robustly enforced.

query this concept
$ db.articles("system-prompt-robustness")
$ db.cooccurrence("system-prompt-robustness")
$ db.contradictions("system-prompt-robustness")