Open Collaboration in Action: Inside the Open Safeguard Hackathon

Last week, ROOST, Hugging Face, and OpenAI gathered the safety community together for a hackathon that reminded us exactly why open collaboration matters. The Open Safeguard Hackathon, held on Dec. 8 in San Francisco, showcased the sense of urgency that safety practitioners, researchers, policy leaders, and others feel about improving the future of online safety. Throughout the day, these experts rolled up their sleeves to test new ideas and tackle challenges head-on.

The hackathon underscores the exact reason why ROOST was established: to build open, community-governed safety tools and resources capable of addressing the growing challenges of AI-driven online harms and keeping digital spaces safer for everyone. Our approach reflects a new vision for online safety – one defined by concrete action and shared, open infrastructure. Hugging Face, one of ROOST's founding members, has been instrumental in advancing this vision. Both organizations operate on principles of openness, collaboration, and transparency, exemplified by our partnership in launching the ROOST Model Community, which directly connects safety practitioners with model creators to maximize the value of open safety models. The release of gpt-oss-safeguard, an open weight model fine-tuned for safety by OpenAI, provided the perfect catalyst to accelerate the work of this initiative.

Lessons in Collaboration from the Frontlines of AI Safety

The hackathon made it clear that builders urgently need and want more opportunities to collectively tackle the rapidly evolving challenges of modern online safety. We knew that builders need safety models: the FalconsAI NSFW image detection classifier is the second most downloaded model on Hugging Face, with 90.5 million monthly downloads. Model creators have recognized this need as well, with safety model releases including ShieldGemma, LlamaGuard, Nvidia’s NemoGuard resources, Zentropi’s CoPE models, and the QwenGuard series. Gpt-oss-safeguard, which only launched on October 29th, has already reached over 40,000 monthly downloads.

Events like this hackathon are crucial because they create a shared space for safety practitioners who are isolated within their own organizations to experiment. The outpouring of enthusiasm online, from LinkedIn posts to private follow-ups, made something clear: practitioners need more spaces that feel less like closed-door demos and more like a productive and effective workbench.

Three people sit on stage for a panel discussion with one holding a microphone. In white text on a black screen behind them: Johannes Heidecke, Head of Safety Systems at OpenAI, Anne Bertucio, Chief Operating Officer at ROOST, Yacine Jernite, Head of ML & Society at Hugging Face

The hackathon welcomed 75 participants from tech companies (big and small), academia, nonprofits, and other industries, with expertise ranging from policy development to ML research. Opening remarks from Anne Bertucio (ROOST COO), Yacine Jernite (Hugging Face Head of ML & Society), and Johannes Heidecke (OpenAI Head of Safety Systems) inspired questions that transformed into brainstorms and then became real project ideas. Teams were eager to try different models across contexts, from A/B testing different policy iterations to directly integrating gpt-oss-safeguard into existing tech stacks to red teaming model limits against jailbreak tactics such as crescendo attacks. That eagerness sparked creativity and transparency as teams learned from each other about how best to tackle new problems.

To accommodate the diverse range of participant interests and project ideas, we guided hackathon projects down one of three tracks:

Policy Development: Using open safety models, including gpt-oss-safeguard, to test, refine, and iterate on your own policies.
Model Testing: Testing open safety models' performance, benchmarking against gpt-oss or other open-weight alternatives and considering compute costs.
Real-World Applications: Using open safety models to solve a real world challenge; seeing how gpt-oss-safeguard works in your tech stack, integrating it with applications you use, or building a new proof of concept application.

The range of projects on display highlighted our belief that users must adapt existing tools to their specific needs. We know that there is no one size fits all model for all safety contexts, especially given different organizations' resources and needs. As such, it is critical that we learn how to best empower organizations to bring their own policies and to adapt solutions to their specific communities.

Gathering these practitioners in one room also reinforced the value of a connected community. Crosspolination of ideas across sectors helped to focus our participants' attention on the unique value of open safety models as an adaptable and transparent foundation for new use cases. That focus is essential as organizations weigh the costs and benefits of using AI for safety, especially given that even high performance models like gpt-oss-safeguard are not perfect solutions for every context.

Below, we've highlighted some projects that caught our eye, although this list is far from exhaustive. You can learn about more projects via the ROOST Model Community GitHub repository.

- Evaluating whether gpt-oss-safeguard deems controversial statements as “good” without a given policy, to understand the model’s inherent preferences [link]
- Integrating structured content from and testing complex policies on audio signals when assessing transcribed voice data for markers of distress
- Testing model performance when policy or prompt languages are not written in English, particularly in humanitarian aid contexts, raising questions around multilingual pre-training datasets [link]
- Building an appeals copilot to support Trust & Safety teams through the case review and appeals process [link]
- Testing whether adding an LLM's reasoning traces to content embeddings would improve clustering for toxicity detection [link]

Both ROOST and Hugging Face are especially grateful to the OpenAI team for their partnership in bringing this hackathon to life. From their transparency in answering questions on gpt-oss-safeguard's performance, to their sponsorship of up to $50,000 in API credits for selected participants, their collaboration in building a safe and accessible future has been invaluable.

This Is Only the Beginning

A group of people crowd around a large monitor looking at the screen showing an interface on HuggingFace for exploring the safety models. A woman in a brown sweater gestures towards the screen, explaining how it works.

This hackathon was the first of many events and programs the ROOST Model Community is planning. Significant problems remain unanswered, and the safety ecosystem is at an inflection point where AI has amplified old harms and introduced new ones entirely. Identifying new use cases for open safety models, including but not limited to gpt-oss-safeguard, will help us address the biggest safety challenges ahead of us. And our community only grows stronger when events like this give people a place to contribute, challenge assumptions, and shape the roadmap in real time.

To everyone who showed up, shared ideas, shipped demos, asked hard questions, or simply soaked in the energy – thank you. You reminded us why building openly is a superpower.

And there is more to come. In 2026, we want to create as many opportunities as possible to gather the people doing the work and make progress together. That means more hackathons in more locations around the world, and more tooling that puts accessible safety in everyone's hands.

We welcome everyone to this effort. If you’re interested in using open weight models to keep the internet safe, join the ROOST Model Community, where you can share your work and stay connected with like-minded peers. Join our GitHub repository here and join our Discord here. If you’re interested in helping our teams host another hackathon anywhere around the world, please reach out to hello@roost.tools. We can’t wait to keep building with you.