Tag: AI对齐
All the articles with the tag "AI对齐".
- 6.4
Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities
arXiv:2604.05339v1 Announce Type: new Abstract: As LLMs become increasingly integrated into human society, evaluating their orientations on human v...
- 6.4
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
arXiv:2604.06066v1 Announce Type: new Abstract: Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning ...
- 6.4
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
arXiv:2409.19894v5 Announce Type: replace-cross Abstract: Code translation transforms code between programming languages while preserving functiona...
- 6.3
Robust AI Security and Alignment: A Sisyphean Endeavor?
arXiv:2512.10100v2 Announce Type: replace Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and...
- 6.3
FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment
arXiv:2604.04992v1 Announce Type: cross Abstract: Safety-aligned LLMs go through refusal training to reject harmful requests, but whether these mec...
- 5.7
Multi-Drafter Speculative Decoding with Alignment Feedback
arXiv:2604.05417v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates large language model (LLM) inference by using a smaller model...