政策伦理

76 篇文章

5.5
Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
非生成式评估方法：在不生成内容的情况下评估模型有害专业化程度
5.0
Making AI-Assisted Grant Evaluation Auditable without Exposing the Model
2026年04月29日
· arXiv cs.LG· 04/29 14:31 采集
TEE 架构实现 AI 辅助基金评审的可审计性与模型保护
4.0
Bye Bye Perspective API: Lessons for Measurement Infrastructure in NLP
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
Perspective API 关闭引发的 NLP/CSS/LLM 评估测量基础设施依赖反思
4.0
The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
用真实妄想用户聊天日志建模人机双向错误信念放大动态
4.0
Analyzing LLM Reasoning to Uncover Mental Health Stigma
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
分析 LLM 中间推理步骤 uncover 隐藏的心理健康偏见
4.0
Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
多司法管辖区 RAG 系统导航全球 AI 法规，68 管辖区 242 文档
4.0
From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support
2026年04月29日
· arXiv cs.CL· 04/29 14:31 采集
首个大规模跨文化研究 LLM 作为情感支持知己的使用情况和用户感知
7.5
Brief chatbot interactions produce lasting changes in human moral values
2026年04月24日
· arXiv· 04/24 08:00 采集
实验发现：与 AI 聊天机器人的简短对话可对人类道德判断产生持久改变
6.0
Ideological Bias in LLMs' Economic Causal Reasoning
2026年04月24日
· arXiv· 04/24 08:00 采集
LLM 在经济因果推理中展现系统性意识形态偏见——EconCausal 基准扩展研究
7.5
5 AI Models Tried to Scam Me. Some of Them Were Scary Good
2026年04月23日
· Wired· 04/23 08:00 采集
实测 5 个 AI 模型的钓鱼和社会工程攻击能力，部分模型的表现令人担忧。
7.0
AI Tools Are Helping Mediocre North Korean Hackers Steal Millions
2026年04月23日
· Wired· 04/23 08:00 采集
朝鲜黑客组织利用 AI 工具编写恶意软件、创建虚假公司网站，三个月内窃取高达 1200 万美元。
6.0
Detoxification for LLM: From Dataset Itself
2026年04月22日
· cs.CL updates on arXiv.org· 04/22 14:31 采集
从预训练数据源头去毒LLM，而非依赖后训练或可控解码等治标方法
7.0
Whispers in the Machine: Confidentiality in Agentic Systems
2026年04月22日
· cs.LG updates on arXiv.org· 04/22 14:31 采集
LLM agent集成外部工具后的机密性威胁分析，特别是prompt injection在agentic setting下的严重升级
5.0
Celebrities will be able to find and request removal of AI deepfakes on YouTube
2026年04月22日
· The Verge· 04/22 04:32 采集
7.0
Why having 'humans in the loop' in an AI war is an illusion
2026年04月16日
· 04/17 00:31 采集
The availability of artificial intelligence for use in warfare is at the center of a legal battle between Anthropic and the Pentagon . This debate has become urgent, with AI playing a bigger role than
5.5
Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice
2026年04月10日
· Wired· 04/10 20:32 采集
Meta Muse Spark 主动索取用户原始健康数据并给出糟糕医疗建议
7.5
OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters
2026年04月10日
· Wired· 04/10 20:32 采集
OpenAI 公开支持限制 AI 公司责任的法律草案，即使产品造成「关键伤害」也可免责
8.7
Agent Skills 凭证泄露大规模实证研究：17,022 个技能中发现 520 个存在漏洞
2026年04月06日
· arXiv cs.AI· 04/06 12:33 采集
首次大规模分析 Agent Skills 生态的凭证泄露问题，在 17,022 个技能中发现 520 个存在 1,708 个安全问题，76.3% 的泄露为跨模态（代码+文档双通道）。
7.7
理解安全对齐移除：越狱微调和权重正交化如何瓦解 LLM 安全护栏
2026年04月06日
· arXiv cs.AI· 04/06 12:33 采集
首次系统分析越狱微调和权重正交化两种安全移除方法的影响范围，发现安全退化不仅限于拒绝有害请求，还会影响模型整体的推理质量。
7.7
AgentHazard：首个 Computer-Use Agent 有害行为评估基准
2026年04月06日
· arXiv cs.AI· 04/06 12:33 采集
提出首个系统性评估计算机使用 Agent 有害行为的基准，关注局部合理步骤如何串联为全局有害行为的新安全挑战。

政策伦理

Evaluation without Generation: Non-Generative Assessment of Harmful Model Specialization with Applications to CSAM

Making AI-Assisted Grant Evaluation Auditable without Exposing the Model

Bye Bye Perspective API: Lessons for Measurement Infrastructure in NLP

The Dynamics of Delusion: Modeling Bidirectional False Belief Amplification in Human-Chatbot Dialogue

Analyzing LLM Reasoning to Uncover Mental Health Stigma

Navigating Global AI Regulation: A Multi-Jurisdictional Retrieval-Augmented Generation System

From Chatbots to Confidants: A Cross-Cultural Study of LLM Adoption for Emotional Support

Brief chatbot interactions produce lasting changes in human moral values

Ideological Bias in LLMs' Economic Causal Reasoning

5 AI Models Tried to Scam Me. Some of Them Were Scary Good

AI Tools Are Helping Mediocre North Korean Hackers Steal Millions

Detoxification for LLM: From Dataset Itself

Whispers in the Machine: Confidentiality in Agentic Systems

Celebrities will be able to find and request removal of AI deepfakes on YouTube

Why having 'humans in the loop' in an AI war is an illusion

Meta’s New AI Asked for My Raw Health Data—and Gave Me Terrible Advice

OpenAI Backs Bill That Would Limit Liability for AI-Enabled Mass Deaths or Financial Disasters

Agent Skills 凭证泄露大规模实证研究：17,022 个技能中发现 520 个存在漏洞

理解安全对齐移除：越狱微调和权重正交化如何瓦解 LLM 安全护栏

AgentHazard：首个 Computer-Use Agent 有害行为评估基准