Hello and welcome to this week’s edition of the Future in Five Questions. This week I spoke with Sarah Bird, the chief product officer of responsible AI at Microsoft. Bird is responsible for testing the security of Microsoft’s AI tools and making sure they don’t cause harm or replicate bias, and we discussed what needs to be done after an AI tool is “red-teamed,” the limits of thinking about AI in terms of whether it’s “aligned” with humanity’s interests — and why industry still needs government to bring the two sides together and work out standards for AI evaluation. An edited and condensed version of the conversation follows: What’s one underrated big idea? Testing, but a very specific type of testing. In the last two years, we've started to see the term “red-teaming” become very popular. Red-teaming is one very important testing tool, but we see people not understanding how critical it is to follow that with what we call measurement, or robust evaluation. A lot of our investment in responsible AI over the past couple of years has been in building safety evaluation systems that allow us to look across a variety of risks, test a large number of examples against these systems, and then see what rate of defects we're seeing in our systems. We do not ship anything without first looking at the safety evaluation results. We all know we should test software, but common practice was not to test AI systems for types of risks like prompt injection attacks, the ability to produce copyrighted materials, or harmful content or stereotyping content. Hallucination is another example — you need a lot of expertise to even understand the risk. There's still a lot more work we need to build in our system to make it so that people can customize effectively, so that it probes their application as deeply as possible. It’s still a very nascent field. What’s a technology that you think is overhyped? The big one for me in the responsible AI space is alignment. It is actually a critical technology, but what we hear from many people talking about this, whether in academia or other organizations, is that they look at alignment as the one-size-fits-all silver bullet. We'll just align the models, and we will have no challenges with safety or responsible AI. We very much believe you need an in-depth approach. We're never going to have just one technology that solves the whole problem here. The other challenge that we see with alignment is that building safety into the model makes it less useful for certain applications. For example, I was mentioning the safety evaluation system before where in order to actually role-play and generate these tests we use AI — but to do that, we actually need to generate harmful content as part of that. We want these tests to be very robust and we want to use the most sophisticated models possible, but if you have the safety entirely built in, we're not able to use them. What book most shaped your conception of the future? “Tools and Weapons,” by our president Brad Smith and Carol Ann Browne. The title points out that a lot of this technology is dual-use. The technology itself is not good or bad, it's how we use it. The book looks at technology and innovation throughout history, their challenge, impact and how we solved those problems. The fact that we're all figuring out right now globally how to regulate AI, the level of coordination that's needed is very different than how this looked in the past where technology was often staying in one pocket for much longer as it matured. That’s really influenced my understanding of AI’s impact and how we need to prepare for it. What could the government be doing regarding technology that it isn’t? Microsoft's position for a long time has been that we need laws and regulations around AI. For quite a few years now we’ve self-regulated and made this public so people can see us put in place the rules that we're upholding. But not every organization is going to go and do that on their own. So in order to protect people's rights and civil liberties it's important that there’s actually a standard for how this technology should be built and how this technology should be used. However, for me as a technologist one of the parts of this I think is really important is that it’s quite complex in practice, and sometimes what we think would work doesn't actually work the way we expected it to. It’s important that the government help convene different stakeholders and make sure they're learning from technologists. I think of what the National Institute of Standards and Technology is doing here around evaluation, bringing together many different experts on the topic to figure out how we should start building standards for evaluating AI systems, and Microsoft’s Frontier Model Forum. We need more of these kinds of conversations. What has surprised you the most this year? It’s easy in responsible AI to think of negative surprises, but I have been delighted to see how the field is maturing very quickly. We were talking about the importance of testing and evaluation, and even six months ago that was not something that customers were asking about. People are now really thinking about how they're building AI, and looking for tools to help them do better and make sure they're governing appropriately. That’s been a massive change over the past year.
|