Machine Learning

20 readers

1 users here now

This subreddit is temporarily closed in protest of Reddit killing third party apps, see /r/ModCoord and /r/Save3rdPartyApps for more information.

founded 2 years ago

MODERATORS

[email protected]

[R] Semantic Drift in LLMs Is 6.6x Worse Than Factual Degradation Over 10 Recursive Generations (old.reddit.com)

submitted 1 month ago by [email protected] to c/[email protected]

0 comments fedilink hide all child comments

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Actual_Requirement58 on 2025-06-11 02:35:19+00:00.

We ran a study to test how truth degrades in LLMs over recursive generations—but instead of measuring hallucinations, we measured semantic drift.

The common assumption is that recursive use of LLM outputs results in factual degradation. But when we systematically tested this over 10 academic domains and 10 generations of GPT-4o outputs, we found something different:

Facts are mostly retained: Only a 2% drop in factual accuracy over 10 generations
Semantic intent collapses: A new metric we introduced, Purpose Fidelity, dropped 42.5%
That’s a 6.63× higher rate of semantic drift vs factual decay

Examples:

A Descartes excerpt (“Cogito, ergo sum”) became career advice about leadership and self-awareness

A history excerpt on the Berlin Wall became a lesson in change management

Law and medicine were rewritten as “best practices” for business professionals

Chemistry and CS stayed stable: semantic degradation was domain-specific

Why this matters: Most LLM eval frameworks focus on factual accuracy and hallucination rates. But our data suggests the real long-term risk may be subtle, systematic recontextualization. Outputs can look factual and well-structured, while completely losing their intended purpose. This may impact content authenticity, training data curation, and long-term epistemic stability.

📄 Full paper (ResearchGate) - https://www.researchgate.net/publication/392558645_The_Half-Life_of_Truth_Semantic_Drift_vs_Factual_Degradation_in_Recursive_Large_Language_Model_Generation

🧵 Medium summary for general audience - https://medium.com/@maxwell.ian/when-ai-loses-its-mind-but-keeps-the-facts-the-hidden-danger-of-recursive-ai-content-08ae538b745a

no comments (yet)

sorted by: hot top controversial new old

there doesn't seem to be anything here