MLCommons is a non-profit association with a global focus on developing benchmarks and public datasets, including the MLPerf benchmarks. The newly formed AI Safety Working Group, co-chaired by Percy Liang, aims to establish standard benchmarks for AI safety, building on the Stanford CRFM's HELM effort.
The event is expected to host mix of attendees from corporate, policy, academic, and standards organizations. For more information, you can refer to the MLCommons announcement, Google AI blog post, and the detailed proposal. Stay tuned for updates on this initiative.
About ML Commons AI Safety Working Group?
The Safety Group within ML Commons is focused on the development and promotion of AI safety tests and benchmarks. Its main objectives are as follows:
Tests: The group curates a pool of safety tests from various sources, and it also facilitates the creation of new and improved tests and testing methodologies. These tests are designed to assess the safety aspects of AI systems.
Benchmarks: The Safety Group defines benchmarks for specific AI use-cases. These benchmarks utilize a subset of the safety tests and summarize the results in a way that is understandable to non-experts. These benchmarks aim to quantify the safety performance of AI systems.
Platform: The group is responsible for developing a community platform dedicated to safety testing of AI systems. This platform supports the registration of tests, the definition of benchmarks, the actual testing of AI systems, the management of test results, and the presentation of benchmark scores. It provides the infrastructure for conducting and tracking safety evaluations.
Governance: The Safety Group establishes a set of principles and policies to guide its activities. It also initiates a multi-stakeholder process to ensure that decision-making within the group is trustworthy and represents the interests of various stakeholders in the AI safety field.