The deployment of increasingly capable AI systems without mandatory pre-deployment safety evaluation represents a significant governance gap. Drawing on regulatory analogies from aviation, pharmaceuticals, and nuclear power, this paper proposes the Minimum AI Safety Certification (MASC) framework: a three-tier, capability-calibrated pre-deployment certification regime. MASC Tier I applies to all general-purpose AI systems, requiring baseline safety benchmarking and transparency disclosures. Tier II applies to frontier systems above defined capability thresholds, requiring independent third-party evaluation. Tier III applies to systems with potential for mass-casualty-level harm, requiring governmental review and approval before deployment. We propose a governance structure, the AI Safety Certification Authority (ASCA), for administering the framework, and analyze implementation costs and incentive effects.
Pharmaceutical companies cannot sell drugs without clinical trial evidence of safety and efficacy. Aircraft manufacturers cannot deploy new designs without airworthiness certification. Nuclear power plants cannot operate without regulatory approval. These requirements exist because the potential consequences of failure are severe, irreversible, and borne by parties other than the deploying organization.
Frontier AI systems increasingly share these characteristics: potential consequences of failure are severe (mass disinformation, autonomous cyberattacks, assistance with weapons of mass destruction), potentially irreversible (value lock-in, catastrophic misuse events), and distributed across society rather than confined to developers. Yet AI systems currently face no mandatory pre-deployment safety certification requirements in most jurisdictions.
This paper proposes the Minimum AI Safety Certification (MASC) framework as a concrete, implementable proposal for mandatory AI deployment regulation.
The FAA airworthiness certification process requires aircraft manufacturers to demonstrate, through a combination of analysis, simulation, and flight testing, that new designs meet prescribed safety standards before commercial operation. This process has contributed to commercial aviation's exceptional safety record. Key features applicable to AI include: capability-based certification tiers (different requirements for different aircraft classes); third-party involvement; and ongoing airworthiness monitoring post-deployment.
The FDA drug approval process requires phased clinical trials demonstrating safety before efficacy, with increasing evidence requirements as trials progress. Post-market surveillance requirements ensure ongoing monitoring. The parallel for AI is a staged deployment model with increasing evidence requirements as AI capability and deployment scale increase.
The NRC licensing process for nuclear plants involves independent safety analysis, public comment periods, and ongoing operational oversight. The nuclear analogy is most apt for the highest-capability AI systems: those with potential for catastrophic, irreversible harm require commensurate regulatory scrutiny before deployment.
All general-purpose AI systems deployed commercially must complete Tier I certification, which requires: publication of a standardized model card including training data sources, known limitations, and intended use cases; completion of a minimum safety benchmark battery covering the nine dimensions in our SREP framework; and designation of a responsible deployment entity with legal accountability for system behavior.
Systems above defined capability thresholds—operationalized through a combination of benchmark scores, parameter counts, and demonstrated emergent capabilities—must complete Tier II certification, which adds: independent third-party safety evaluation conducted by a certified evaluation body; formal risk assessment covering potential catastrophic misuse scenarios; and publication of a pre-deployment safety report meeting standardized disclosure requirements.
Systems with demonstrated potential to facilitate mass-casualty events, undermine democratic institutions, or produce other catastrophic outcomes require Tier III governmental review before deployment. This involves formal review by the proposed AI Safety Certification Authority, interagency consultation, public comment period, and explicit deployment authorization.
MASC requires an administrative body with the technical expertise, legal authority, and operational capacity to implement the certification regime. We propose the AI Safety Certification Authority (ASCA) as an independent agency modeled on the NRC, with a mandate to develop and maintain certification standards, accredit third-party evaluators for Tier II assessments, conduct Tier III reviews, and publish annual reports on AI safety landscape.
The most common objection to mandatory AI safety certification is that it would inhibit innovation. Our analysis suggests the opposite: certification requirements create a level playing field that rewards safety investment, provide clarity to developers about deployment requirements, and build public trust that accelerates adoption. The aviation and pharmaceutical industries, both subject to mandatory certification, have achieved substantial innovation within regulatory frameworks.
The deployment of frontier AI systems without mandatory safety certification is a governance gap that grows more dangerous as AI capabilities scale. The MASC framework provides a concrete, proportionate, and administratively feasible path toward capability-calibrated AI deployment regulation.