OpenAI reveals 10 examples of AI safety initiatives
At the second AI Summit held in Seoul, South Korea, 16 companies from around the world working on AI development reached
OpenAI safety practices | OpenAI
https://openai.com/index/openai-safety-update/
1: Red teaming and testing of empirical models before release
At OpenAI, we empirically evaluate the safety of our models both internally and externally before releasing our AI. If the risk threshold exceeds the medium level in our preparedness framework, we will not release a new model until we have implemented safety measures that will return the score to medium after mitigation. In addition, more than 70 external experts worked together as a red team to evaluate the risk of GPT-4o.
2: Alignment and safety research
OpenAI's models are becoming safer over time. This is because we've built smarter models that are less likely to misidentify and output harmful content even in adversarial situations like jailbreaking. We've also invested heavily in practical alignment, safety systems, and post-training research. These efforts improve the quality of our human-generated fine-tuning data, which in the future will work to improve the instructions our models are trained to follow. We also conduct and publish fundamental research to dramatically improve the robustness of our systems against attacks like jailbreaking.
3: Abuse monitoring
OpenAI deploys high-performance language models through its APIs and ChatGPT, and leverages a wide range of tools, including purpose-built moderation models and the use of proprietary models for safety risk and abuse monitoring. Along the way, we have shared key findings, including jointly disclosing abuses of our technology by state actors with Microsoft, so that other users can be appropriately protected from similar risks. We also use GPT-4 to shape content policies and make content moderation decisions, enabling a feedback loop for policy refinement to reduce the frequency with which human moderators are exposed to bad content.
4: A systematic approach to safety
From pre-training to deployment, OpenAI implements various safety measures at each stage of the lifecycle. Along with developing safer and more consistent model behavior, we are also investing in pre-training data safety, system-level control of model behavior, a data flywheel for continuous safety improvement, and a robust monitoring infrastructure.
5. Protecting Children
A key focus of OpenAI's safety activities is protecting children. ChatGPT and DALL-E have strong guardrails and safety measures built into them to mitigate potential harm to children. In 2023, a mechanism was introduced to detect, verify, and report attempts to handle CSAM (child sexual abuse material) in OpenAI tools. OpenAI works with professional organizations and the broader technical community to uphold the principles of 'Safety by Design.'
6. Election Integrity
OpenAI is working with governments to ensure transparency of AI-generated content and improve access to accurate voting information. Specifically, it is introducing tools to identify images created with DALL-E 3 and incorporating metadata from the technical specification 'C2PA' to keep a record of data editing, allowing users to verify the source of content they find online. ChatGPT also directs users to official election sources in the United States and Europe. In addition, OpenAI supports the bipartisan bill 'Protect Elections from Deceptive AI Act' proposed in the US Senate.
7: Invest in impact assessment and policy analysis
OpenAI's impact assessment efforts have had far-reaching impacts, including early research into measuring the chemical, biological, radiological and nuclear risks associated with AI systems, studying how language models might affect different professions and industries, and pioneering research into how society might manage related risks, for example by working with external experts to assess the impact of language models on impactful activities.
8: Security and access control management
OpenAI prioritizes its customers, intellectual property, and data protection. OpenAI deploys AI models as a service around the world and controls access through APIs. OpenAI's cybersecurity efforts include need-to-know access controls to training environments and high-value algorithm secrets, internal and external penetration testing, and bug bounty programs. OpenAI believes that the evolution of infrastructure security is beneficial to protect advanced AI systems, and is exploring novel control methods to protect technology, such as confidential computing on GPUs and the application of AI to cybersecurity. We also fund researchers through grant programs to strengthen cybersecurity.
9. Government partnerships
OpenAI partners with governments around the world to inform the development of effective and applicable AI safety policies, including by sharing what we learn, collaborating to pilot government and other third-party assurances, and informing discussions around new standards and laws.
10: Safety decision-making and board oversight
As part of its readiness framework, OpenAI has an operational structure for safety decision-making. A cross-functional Safety Advisory Group reviews model capability reports and makes recommendations prior to deployment. The final decision rests with company leadership, with oversight provided by the Board of Directors.
This approach enables OpenAI to build and deploy safe and capable models.
Related Posts:
in Software, Posted by logc_nt