Tag: AI对齐

All the articles with the tag "AI对齐".

6.4
Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05339v1 Announce Type: new Abstract: As LLMs become increasingly integrated into human society, evaluating their orientations on human v...
6.4
From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.06066v1 Announce Type: new Abstract: Intrinsic self-correction in Large Language Models (LLMs) frequently fails in open-ended reasoning ...
6.4
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2409.19894v5 Announce Type: replace-cross Abstract: Code translation transforms code between programming languages while preserving functiona...
6.3
Robust AI Security and Alignment: A Sisyphean Endeavor?
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2512.10100v2 Announce Type: replace Abstract: This manuscript establishes information-theoretic limitations for robustness of AI security and...
6.3
FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment
2026年04月08日
· cs.AI updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.04992v1 Announce Type: cross Abstract: Safety-aligned LLMs go through refusal training to reject harmful requests, but whether these mec...
5.7
Multi-Drafter Speculative Decoding with Alignment Feedback
2026年04月08日
· cs.CL updates on arXiv.org· 04/08 12:31 采集
arXiv:2604.05417v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates large language model (LLM) inference by using a smaller model...

Tag: AI对齐

Human Values Matter: Investigating How Misalignment Shapes Collective Behaviors in LLM Agent Communities

From Hallucination to Structure Snowballing: The Alignment Tax of Constrained Decoding in LLM Reflection

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Robust AI Security and Alignment: A Sisyphean Endeavor?

FreakOut-LLM: The Effect of Emotional Stimuli on Safety Alignment

Multi-Drafter Speculative Decoding with Alignment Feedback