12.2.1 - Alignment

Artificial Intelligence Policy

Author

Prof. Jack Reilly

Published

S2026

🧠 Think:

What does it mean for a model to be “aligned”? To whose interests is it aligned, and how do we evaluate that?

📖 Read:

None Required. Discussion for today will include items originally assigned for The Singularity and the Future as well; please do the reading originally assigned for that day.

🌐 Browse:

Claude Constitution
OpenAI spec: Introduction and Source
Hidden AI instructions reveal how Anthropic controls Claude 4. 2025. Ars Technica
Shao et al, 2026. Future of Work with AI Agents: Auditing Automation and Augmentation Potential across the U.S. Workforce. arXiv.
Farrell et al, 2025. “Large AI models are cultural and social technologies”. Science.
Mobayed, 2025. “Your Brain on ChatGPT” Psychology Today.
Bastani et al, 2025. “Generative AI without guardrails can harm learning: Evidence from high school mathematics” PNAS.
Berg and Rosenblatt, 2025. “The Monster inside ChatGPT” Wall Street Journal.
“Is AI Rewiring our Minds? Scientists probe cognitive cost of chatbots” Washington Post.
Li et al, 2025. “(Core Knowledge Deficits in MMLMs)(https://arxiv.org/abs/2410.10855v4)” arXiv.
- See also: Grow AI like a Child
Gao et al, 2025. Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

📝 Submit:

Discussion question to course chat

Tip

“📖 Read”, “🎧 Listen”, and/or “📺 Watch” items are required content for the day, and should be read/heard/watched before class on that day.
“🌐 Browse” items should be briefly looked at but do not need to be read deeply unless you want to
“📚 Additional Resources” do not need to be looked at; they are there to serve, if useful, as further references for your debates, final projects, and general edification later.