We are an independent non-profit research organization dedicated to understanding, measuring, and mitigating risks from advanced AI systems.
Safe AI for Humanity Foundation is a Wyoming non-profit corporation organized exclusively for charitable, scientific, and educational purposes. We operate independently of commercial AI developers to provide unbiased safety research.
Our work spans technical AI safety research, policy analysis, and public education — all freely published and open to the world.
No commercial affiliations — our findings are driven by evidence, not product interests.
All research papers and reports are freely available with no paywalls.
Educating policymakers, researchers, and the public on responsible AI development.
Safe AI for Humanity Foundation
Wyoming Non-Profit Corporation
EIN: 41-4767005
501(c)(3) Application Pending (filed March 2026)
Peer-reviewed and working papers on AI safety, alignment, and risk.
We propose a quantitative framework for measuring the degree to which LLM outputs remain consistent with stated human values across diverse adversarial prompting conditions.
A systematic review of existing red-teaming methodologies across major AI labs, identifying critical gaps in coverage and proposing a standardized evaluation protocol.
Drawing on analogies from aviation and pharmaceutical regulation, we propose a tiered pre-deployment safety certification regime for AI systems above defined capability thresholds.
We identify and categorize 47 distinct patterns of specification gaming observed in RLHF-trained models and evaluate the effectiveness of proposed mitigation strategies.
An analysis of self-reported AI incidents from 2020–2025, demonstrating systemic underreporting and proposing a mandatory structured disclosure regime analogous to aviation near-miss reporting.
We examine the conditions under which corrigibility properties degrade as AI systems encounter out-of-distribution inputs and propose architectural interventions to preserve oversight mechanisms.
Structured response protocols for AI-related safety incidents, freely available for organizations to adopt.
Response protocol for detecting and containing AI systems exhibiting unexpected goal-directed behavior inconsistent with training objectives.
Protocol for responding to AI systems producing outputs that facilitate mass casualty events, child exploitation material, or targeted violence.
Response plan for discovered systematic vulnerabilities allowing large numbers of users to bypass safety constraints.
Protocol for incidents where model outputs reveal training data, PII, or confidential information from third-party sources.
Response framework for identifying and remediating systematic demographic bias or discriminatory outputs across protected categories.
Protocol for AI agents taking unintended consequential real-world actions (e.g., unauthorized API calls, financial transactions, communications).
Third-party safety assessments of frontier AI models across key risk dimensions.
We welcome collaboration with researchers, policymakers, and institutions committed to safe AI development. All research is open and freely published.
Safe AI for Humanity Foundation is a 501(c)(3) pending organization. Donations are tax-deductible retroactive to March 10, 2026 upon IRS approval. EIN: 41-4767005