A welcoming corner for exploring AI safety with curiosity, care, and clarity — beginner-friendly and carefully curated.
This week
Anthropic argues that Claude Mythos Preview is still low-risk overall, but agentic capability is rising fast enough that monitoring and mitigation need to improve quickly.
Read full breakdown ↗What’s new in the AI safety field? Explore the latest research and ideas.
New to AI safety? Start here with a collection of notes from my learning journey.
When do AI systems fail? The warning signs we shouldn't ignore when building safe AI.
How do we evaluate the safety of AI systems? Explore common evals and benchmarks here.
Encountered an unfamiliar AI safety term? Find the definition here.
New AI model just dropped? See what safety evaluations it went through.