The Next Chapter for Artificial Intelligence
Panelists discuss the trends revealed in this year’s AI Index Report, including technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development.
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) that tracks, collates, distills, and visualizes data related to artificial intelligence. Its mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI.
PARLI: All right. Hello, everyone. I’m Vanessa Parli. I’m director of research programs at Stanford’s Institute for Human-Centered Artificial Intelligence, or HAI, and a steering committee member on the AI Index.
I want to give a special thanks today to the Council on Foreign Relations for partnering with us on this launch event for the AI Index, and especially Ambassador Froman and the rest of the Council staff.
Before you hear from our distinguished panelists today I’m going to give you a few brief highlights from the AI Index report to kind of ground the conversation. OK. The clicker is working.
So for those not familiar with the AI Index it’s an annual report that provides unbiased rigorously vetted global data for policymakers, researchers, journalists, executives, to develop a deeper understanding of the state of AI especially as it continuously and rapidly emerges.
The AI Index is led by a steering committee of a diverse set of individuals led by Jack Clark and Ray Perrault and the heavy lifting is done by Nestor Maslej and Loredana Fattorini, who is also here in the audience today.
This year we increased the research done by an AI Index staff, contractors, and students who are listed on this slide so I wanted to make sure we gave a big shout out to them. They do a lot of great work for our report. And we partner with many data vendors in making the report what it is so as you thumb through make sure to check out, you know, where the data is coming from.
So one of the main highlights in the report this year is that industry continues to dominate frontier AI research. So in 2023 fifty-one notable machine learning models came from industry and only fifteen from academia. Twenty-one were from industry-academia partnerships, which—perhaps, which is a rise from last year and perhaps is, because these models take up so much resources, academia needs to partner with industry to have access to some of the biggest models.
When you look at the data by region, United States is far in the lead in producing these big models, followed by China and the European Union and the U.K. And foundation models are a subset of these AI systems that are trained on massive amounts of data and can be used as a foundation, as the word said, for some of these generative models—think ChatGPT, Gemini. And again, mostly made from industry.
So a huge barrier in creating these models is the training costs. According to estimates developed in collaboration with Epoch AI, state of the art models have really reached unprecedented levels. You can see the bars over on this in 2017 were quite low but in 2023 they really jump up. It’s estimated that OpenAI’s ChatGPT or GPT-4 cost an estimated 78 million (dollars) to train and Google’s Gemini cost an estimated almost 200 million (dollars).
One concern with only certain players able to train such models is that there is no standardized way to evaluate them so we’re not clear on how to compare models from different organizations. So the AI Index team examined a selection of leading AI model developers and assessed the benchmarks on which they evaluated their models.
So this chart across the top are five of the largest models and then down the side is the benchmarks they evaluated their models on and you can see three of those benchmarks all five organizations used but there’s some inconsistency across the rest of the benchmarks.
And then when you look at benchmarks that really look at responsibility there’s even less consistency. So TruthfulQA is one of the most used benchmarks with three out of the five people using it and that looks at how truthful these language models are in their responses.
It looks at over 800 questions across many categories—law, health care, finance, politics—and it looks at how well or if it can avoid generating false answers. The AI Index team evaluated many developers on these metrics and one in five are—only one in five are using responsible benchmarks. However, they are all saying that internally they are looking at kind of the trustworthiness and safety of the models.
Now, speaking of benchmarks, in recent years AI has surpassed human performance in many of the traditional benchmarks. So traditionally these benchmarks looked at one aspect like image recognition, for example, and we have all seen kind of the progress in these models.
This is a picture of what would come out of Midjourney if you asked it to create a picture of Harry Potter. Even just in 2022 that picture is kind of like an abstract version of Harry Potter and then as recent as December 2023 it looks quite real.
So this chart illustrates this progress in AI systems relative to the human baseline. So human baseline is that dotted line and each of the colored lines is a different benchmark image classification, basic level reading comprehension, and you’ll notice that many of the lines especially in the past couple years are hitting that human baseline and the question is, like, does that mean these AI systems are as good as humans.
Anyone who’s used ChatGPT knows there are still issues and researchers are looking at and evaluating how can we come up with some better benchmarks to measure these systems, especially as they move into multimodal systems. So not just language but language and image or language and audio.
So one example of this new benchmark is called the massive multi-discipline multimodal understanding benchmark—kind of a lot to say, MMMU—and the benchmark is over 11,000 questions across six disciplines. So there’s art, there’s design, business, health, and more, and the question formats aren’t just text. There’s pictures. There’s tables. There’s chemical structures even.
And this is one of the most demanding tests that we have right now of our AI system. As of January the highest performing model on this benchmark was Google’s Gemini that scored an accuracy rate of 59 percent. So, again, if you remember back to that human baseline chart a human baseline is usually around, like, 80 or 90 percent.
So around AI investment, private investment in AI peaked in 2021 and has since kind of decreased or remained stable over the past two years. However, if you pull out the generative AI data from that it shows quite a different story. So despite overall decline in AI investment funding for generative AI specifically surged and these are some examples of those investment activities.
So Microsoft invested 10 billion (dollars) in OpenAI. Amazon invested 4 billion (dollars) in Anthropic, an OpenAI competitor. Cohere, which is more of a startup looking at large language models for enterprise had a $270 million Series C funding round.
And if you look at this data by region the United States is widening that gap. So in 2020—between 2022 and 2023 China and the EU and U.K. had a decline in private investment. In that same time period you can see the United States actually is increasing by about 22 percent.
And over the past five years and even most recently I’m sure you all have been in some of these types of conversations where we’re wondering what are the productivity gains of this technology. In this past year there have been a few studies that specifically look at this type of question.
So this is a study from Microsoft that looked at software developers using their tool Copilot. So some of the team got—used this AI tool, some did not, and the folks who did have access to the AI technologies were about 25 (percent) to 73 (percent), depending on the task, percent more productive than those who did not.
A study on the impact of AI in legal analysis showed that teams with GPT-4 access improved significantly on efficiency and quality across various legal tasks, especially contract drafting, and this chart shows the improvement observed in a group of law students who utilized GPT-4 compared to the control group.
So while there are efficiency gains that we’ve seen in these studies there are also a risk of being over reliant on the technology. So this study looked at folks reviewing resumes and, overall, people who had access to the AI technologies were more efficient.
However, in this study a group of recruiters were told that the AI technology was good, like, use it, don’t worry about it. Another group was told it’s good but be a little bit careful—like, watch out—and the folks who were told they had the good technology actually ended up performing less well in their reviews, and the researchers believe it may have been because they became kind of complacent whereas the ones who were told to make sure you check out what the AI is saying performed better.
So moving away from industry, a few highlights on the regulation environment. Probably to no surprise given everything going on the number of regulations has increased in—between 2022 and 2023. The AI Index categorized these regulations by expansive or restrictive and we saw significantly more restrictive type regulations, which is in the pink, and then which agencies are kind of leading this.
So, historically, the executive office of the president and health and human services were leading in AI regulations but you can see on this chart over time as we move through the years the blue is expanding. More and more of these agencies are drafting up these regulations, which speaks to kind of the general purpose. It’s impacting more and more parts of our lives.
And, finally, I’ll close on a few highlights around AI and science. So AlphaDev is a reinforcement learning system that has improved decades of work from scientists and engineers on the field of algorithmic enhancement.
So AlphaDev developed algorithms that have fewer instructions for sorting than the human benchmark. Some of these algorithms discovered by AlphaDev are now incorporated in a C++ sort library, which is used by many developers, and it marks the first update in this library for over ten years and the first due to reinforcement learning.
And then GraphCast—both of these were actually developed by Google DeepMind—is a weather forecasting system and this utilizes graph neural networks and machine learning to process vast datasets, forecast temperature, wind speed, atmospheric conditions, and has shown to more closely correspond to observed weather patterns than the current state of the art.
Thank you. That is all that I have and I will hand it over to our panelists, and we have a QR code here if you want to download the report as well.
Thank you.
BRENNAN: I just want to say thank you to Vanessa for a very comprehensive overview of the 2024 AI Index and it’s an honor and privilege to be on stage with you tonight and with our panelists.
I’m Morgan Brennan. I’m the co-anchor of CNBC’s “Closing Bell: Overtime.” I’ll be presiding over today’s discussion and joining me here on stage is Erik Brynjolfsson, Kat Duffy, and Russell Wald. It’s good to have you all here.
Erik is the director Stanford Digital Economy Lab steering committee member, Artificial Intelligence Index Report 2024, and the Jerry Yang and Akiko Yamazaki Professor of Stanford Institute of Human-Centered Artificial Intelligence; Kat Duffy, senior fellow for digital and cyberspace policy, Council on Foreign Relations; and Russell Wald is deputy director and steering committee member Artificial Intelligence Index Report and Stanford Institute for Human-Centered Artificial Intelligence.
There’s so much in this report, let me start by saying. I tried to print it out today. I broke the printer in the office. It’s 502 pages. I have not seen a more comprehensive report than this. So let me start by saying congratulations on that. But—
DUFFY: You almost broke notability. Like, literally, notability was like, woo.
BRENNAN: Perhaps we can start. And, Erik, I’ll send this first question to you to sort of set the stage in terms of what went into the 500-plus pages of this report, and the fact that you hit so many different key elements of AI, and where we’re at and where we’re headed in the process.
BRYNJOLFSSON: Well, everyone’s very excited about AI right now. We’ve been thinking about it for some number of years. I guess, what was the year we started, Nestor? It was—yeah, in 2017 we first got a group together to start doing it and it was a small report and each year it’s gotten bigger and I really want to especially call out Nestor but also Ray and Jack and—well, you saw the whole team there.
So with the support of HAI and a lot of other organizations we’ve continued to make it more comprehensive. A lot of people have been gathering data. We see ourselves as a sort of repository to bring this stuff together. There was nothing like this when we first created it and I just—honestly, out of my own personal interest I wanted to know what’s going on with technical benchmarks, what’s happening in the economy, what’s happening in business, and there was no way to find all of this, and I knew that a lot of other people had the same questions.
So we put together this resource to do it and this is the biggest, the best, most comprehensive one ever. I think maybe next year’s chart we should have one of the charts be the number of pages of the AI Index growing right alongside the others.
BRENNAN: Did you run any of the data or any of the findings through any of the AI models?
BRYNJOLFSSON: No. No. It’s human generated and gathered and so there’s a team of experts that are doing it. Of course, the underlying data is based on a lot of different kind of benchmarks which have AI in them but as far as I know we didn’t use AI to do any of the—did we, Nestor?
MR. : (Off mic.)
BRYNJOLFSSON: OK. There you go. (Inaudible.)
BRENNAN: It is amazing to me, though, Russell, just—I mean, if we go back to the charts, it’s a huge step up in capabilities but also simultaneously this huge step up in investment in generative AI specifically and the soaring costs to develop these foundational models and large language models.
I would imagine the two are going hand in hand, right? Which came first?
WALD: I think the soaring cost is what’s really causing this significant demand. What we have here is we have what we refer to as foundation models. Foundation models are trained on a large amount of data and they have an extraordinarily significant amount of compute.
So if you go back to what Vanessa presented earlier that’s really a stark thing to go from Google spending $930 five years ago to now they’ve spent almost $200 million on these models and the—just the difference between that is so significant.
That sends a couple of signals—one, who has a seat at the table and is able to develop these. Right now you have a limited set of organizations, probably about five companies in the world, that can really take the time and have the resources to be able to do this and it makes policy as well as other areas of scientific discovery and research—it puts it at a little bit of risk.
So the ecosystem is very tilted right now towards industry and they should keep doing what they need to do. But we do need to rethink this and how we approach the entire mindset of the development of AI and should—do we want to just with the select few or do we need to expand this base out a little bit more.
BRENNAN: Why was Gemini so much more expensive than everything else? Do we know?
WALD: So I think a large part of this is they expand the parameters more and which allows them to expand the data and it becomes much more cost intensive as you need that much more data and that much more compute.
BRYNJOLFSSON: Could I just say a little bit about that as well?
BRENNAN: Sure.
BRYNJOLFSSON: One of the irregularities that was discovered about four years ago, something called scaling laws—that if you just make the model bigger you have more compute, more data, more parameters.
If you just increase all of those three items proportionately you get predictable improvements in the quality of the models. You get less error rate. And so what they’ve been doing is working their way down those scaling laws by having more of all three of those and so far it’s been—no one knows for sure if it’s going to continue forever but the bets they’re making is let’s crank it up another notch or another 10X or another 100X and that’s what’s driving a lot of these economics.
BRENNAN: Kat, you’re immersed in this world. I wonder what really stood out to you, and there’s a lot in this report. But I wonder what really stood out to you, especially given your insights on the intersection of technology and policy.
DUFFY: Well, I first want to say congrats, right, to HIA (sic; HAI). It is, as you all said, an incredibly comprehensive report. We don’t do a lot of launches at CFR of other organizations’ reports but this one—
WALD: And we’re grateful.
DUFFY: This one covers—no, but this one covers so many equities of our membership, right, and so many of the areas that we look at as an organization that I think it really is a uniquely broad and robust analysis at a time where we need things that allow people even at 500 pages still, relatively speaking, a snapshot of where we are at this moment.
You know, one of the things that struck out to me the most, I think a lot these days about governance as carrots as opposed to sticks and we are—what we’re seeing right now in governance is a significant focus on constraint and on controlling and on protecting and what do safeguards look like, and I think that is all fine and good.
But I also wish that I was seeing equally proactive ideation around where government should be leaving, where public sector investment should be driving, and I think it’s remarkable to see the uptick in, you know, public R&D investment in AI and I think in 2024 it’s projected at, what, 1.8 (billion dollars), 1.9 bill (billion dollars), which is a significant increase and, yet, just Amazon’s investment in Anthropic is $4 billion, right.
Like, that’s pretty remarkable to me that one company’s investment in one company is more than double the U.S. federal budget for AI R&D. And, admittedly that doesn’t take in classified systems so there may be—but the full expenditures of the USG, according to this report, were around 3.3 bill (billion dollars) last year.
So when you look at that differentiation between the private sector investment and what is driving this space versus what the public sector is coming in to do it, frankly, concerns me a great deal because private sector investments that—the governance model for that is not governance as we think about it in foreign affairs.
It is investor risk. You are managing investor risk. You are not managing societal risk. That is not what corporate governance was designed to do.
Governance, capital G, government—governance—was designed to manage societal risk. But when government is this little of a player in building the field as well as it has been that means that one of the most transformative technologies of our lives and of this century is overwhelmingly answering to investor risk and investor demand rather than societal needs and inevitably vulnerable, marginalized communities, disenfranchised communities, and societal needs will get left on the wayside because the market won’t benefit from answering those needs.
And so I think a lot about how we are balancing these scales and what types of investments are required, both in the U.S. and globally.
WALD: Can I build on that real quick with one point?
There is a proposal within the U.S. Congress that essentially calls for 2.6 billion (dollars) over six years for compute—to provide compute towards nonprofit research or academia. It’s 2.6 billion (dollars) over six years and another even staggering number is the number that Microsoft gave to OpenAI, which is $10 billion.
So you just can’t—the comparison here is jaw dropping when you see the difference.
BRENNAN: Yeah. And it’s also a fair—it’s fair to say, like, these aren’t simple deals. You know, it’s not, like, just cash. I mean, it’s compute and all sorts of other things so I want to be careful about the equivalency. But it is—just the sheer numbers are, I think, relatively breathtaking.
I mean, that raises the question is there a way for government to be thinking about this a little bit differently, though—I mean, you talked about sticks and carrots—to be thinking about a more public-private partnership aspect or are there lessons to be learned I think just as importantly as I hear you talk, Kat, from what we’ve seen with social media as well, which is still, you know, sort of sitting in this amorphous place of how it gets regulated, whether it gets regulated, when it gets regulated, who regulates it.
DUFFY: Mmm hmm. Well, and what is the revenue model, right—what is making the money, and I think that’s something where with social media we were far too late to the game to recognize that the revenue model was going to consistently then be in a sort of vicious circle in terms of generating engagement, generating essentially, like, arm to drive advertising, to drive money, to—and so on and so forth. There was a bit of a, like, lather, rinse, repeat cycle there.
To me one of the things that’s been really striking with social media in the United States at least is—we were talking about this actually a little bit earlier today at the Council—you know, the—when you look at a global digital platform that is coming out of an American cultural space, which is many of these platforms, there is some exportation then of those American values onto societies that don’t necessarily share those values or hold them in the same way.
And so when you look at the way that Europe has handled this, which has been much more privacy and human rights focused in the way that they’ve handled governance and regulation, and then you’re looking at other countries that have been doing straight up bans, content demands, you know, content takedowns, data demands, using it, we’ve also seen autocrats using it to scale repression, to scale foreign influence, we can all look back on when this started and be astonished at how it evolved. Like, it’s easy in hindsight now to say, oh, we should have done X or we should have been, you know, clearer on Y.
But the enormity of what we don’t know about AI and how it will propagate in the world and what the use cases will be I’m heartened by how much faster the uptick has been in governance and policy conversations on AI. Like, it’s been much faster since ChatGPT came out, basically, than anything we saw in—not just in social media but in cybersecurity, like, in all sorts of sort of emerging tech areas.
But there’s just, honestly, so little that we don’t know that it is really hard to figure out which policies right now and which approaches are in fact going to steer it in the way that we want it and which in fact may have a blast radius or a collateral damage that we’re not seeing.
So in that respect being at the precipice of something is a—it feels very similar to me and AI governance’s debates are—it’s just a lot of platform governance PTSD just landing, right? Right? And that is just a lot of fights that didn’t—nobody won and everybody’s mad and so they’re just going to sit in the AI. It’s going to be AI instead and so that’s the other—that part makes me chuckle a little bit sometimes.
BRENNAN: Yeah.
DUFFY: Oh, this is a bunch of unhealed wounds. (Laughter.)
BRENNAN: I mean, it raises questions about geopolitical implications and everything as well and I want to get into that.
But first, Erik, one of the things that came out of this report the word nervousness in terms of public opinion, at least here in the U.S., and I realize that maybe that varies depending on what part of the world you’re in and how you’re interacting with AI—nervousness at a time where perhaps you have workers that are—that fear that they are going to be out of—you know, they’re going to be basically digitized or automated out of a job.
What is the economic piece of this showing in your data and in your research?
BRYNJOLFSSON: Yeah. Well, this is going to have a much bigger effect than social media on economics and it’s already beginning to see—we’re already beginning to see some productivity effects, and I can completely understand the nervousness because it’s affecting a lot of different kinds of tasks and it’s coming after some of the tasks that knowledge workers have been doing which, you know, historically have been, largely, spared from automation. It was something that affected other parts of the income distribution.
So that’s journalists and others who are going to be writing stories about it. But one thing that we’ve seen strikingly is that in most cases it’s been much more of a complement, augmenting workers rather than replacing them. Those of you—you’ve probably all worked with ChatGPT and played around with it. You know it’s great at a lot of things but you also know you can’t totally trust it. It hallucinates and it’s imperfect. Maybe it’ll help you brainstorm some ideas but you would be unwise to just generate something and walk away from it and take it as given.
But if you’re a lawyer or a doctor or writing ad copy it can help you think through things and what we found—like, for instance, we looked at a call center where they used it to help the call center operators answer questions.
It didn’t try and replace them. It wasn’t a bot that the customers talked to. It was something that coached them, and that kind of using it to augment people to do a better job seems to be the primary use of it in most uses right now and that has been more of something that has boosted productivity and quality and even wages.
In the call center example we had significantly higher productivity. We also had higher customer satisfaction. But, interestingly, the workers also liked it better. There’s less worker turnover and higher worker satisfaction. So really all three groups. It wasn’t pitting one against the other, and similar things have been found in studies of management consulting, in coding, in advertising so—in writing tasks. There’s a number of them in the report.
So I’m—you know, I’m optimistic that this could be something that that makes the pie bigger but also creates more widely shared prosperity in a way that the previous—some of the previous technologies didn’t.
The last wave of computerization led to a lot of what economists call skill biased technical change where there was higher wages for college-educated workers but lower wages for people with less education and that was part of the thing that exacerbated growing inequality in the United States over the past—and other advanced countries over the past twenty or thirty years.
There’s a real possibility that this could be a tool that helps to rebuild the middle class and actually reduces that kind of inequality.
BRENNAN: Can that be the dynamic that remains if you’re working towards AGI and towards generative AI capabilities that are going to be ever more accurate, ever better? I mean, I guess the question, and maybe this is fodder for 2025 but do we stay there or are we actually headed to something that is even more intelligent and, thus, can actually begin to replace some workers?
I ask this because I have so many conversations with so many COs who say, maybe I’m not going to let people go as I implement aspects of generative AI but it also means I’m not going to have to hire as many.
BRYNJOLFSSON: Yeah. Well, I think it’s a fair question and it’s a really interesting question and concern about where could this ultimately go and how rapidly we get to something like AGI—artificial general intelligence—or AI that it can do almost all the tasks that humans can do.
We’re not there yet. In the report—the report isn’t, you know, doing forecasts or something five or ten years into the future or a hundred years in the future—I don’t know how many years. It’s looking at what we’re seeing now and what we’re seeing right now in terms of the technology that’s being rolled out is very far from AGI.
In fact, if you look at the set of tasks that it can do almost every occupation has some tasks where AI can help and many other tasks where humans are better and there’s a natural division of labor there. And the other thing is, you know, I’ve studied the rollout of technologies for most of my career and you’d be surprised how big a time lag there is between inventing a really cool technology—like, electricity took about thirty or forty years before it was widely used in factories.
And so I don’t know how—I suspect that the rollout of AI is already happening significantly faster and we see some of that in the data. But even if we just—you know, just the capabilities we have right now will take a number of years for them to roll out.
So for the set of technologies that we’re seeing in the report and that we’re working with right now I think it is very much a complement. I think it’s fair to keep your eye on the ball about how these things evolve and how things—and maybe start having some people think about scenarios where there’s a different set of technologies. But that’s not what we have today.
BRENNAN: Yeah.
And, Russell, I mean, I think there’s so much to talk at least in the media about—and focus on chatbots. But just to even go back to the slide that Vanessa was presenting, some of the capabilities that are being unlocked when you think about quantum or when you think about—I mean, I was having a conversation with a space investor and a serial entrepreneur last week and he was saying this is going to unlock things that people aren’t even thinking about in terms of space exploration.
I mean, do we even yet know what this does in terms of creating new industries?
WALD: So this is actually one of the most exciting things, but I’m glad you asked that question, because this is what excites me the most about AI is the areas that will unlock that we aren’t even going to be aware of.
So a good example of this is the nuclear fusion breakthrough that happened last year. We sit here and we refer to that as a nuclear fusion breakthrough and it’s amazing. But that breakthrough would not have happened if you didn’t have AI and ML techniques applied.
So we’re going to start hearing over time all of these scientific breakthroughs or new knowledge that will come from this and in last year’s AI Index report we did note the matrix multiplication problem. That’s a fifty-year math problem that’s actually being solved like that, right, and we can’t necessarily explain how it’s solving it so we’re still learning from this. But you’re about to—this general purpose technology is going to affect every domain and that, of course, of scientific discovery. So you’re going to see a lot of these explosion in these other areas of—that will come from that.
And, furthermore, we spend a lot of time talking about the language model and I always say for 2023 for me—I was previously the policy director at HAI and now I’m the deputy director—but so I spent a lot of time with policymakers and 2023 for me was not the year that the public became more aware of AI. It was more the year that the policymaker became captivated by product AI.
And the reality is, is there’s all of these fundamentally different areas of AI that are going to change how society operates and my colleague Fei-Fei Li refers to it as a civilization-changing technology, and because of that we need to be less myopic in looking at just the language model and understanding all of these other areas that are going to be affected.
BRENNAN: Kat, I see you nodding your head and this kind of brings me back to the—to geopolitical elements of this because, certainly, in Silicon Valley one of the heated debates we’ve seen—it’s, arguably, been a little bit more bifurcated at least in terms of the coverage of it than perhaps reality—but it’s been this debate about accelerationists versus doomers—I think maybe the terms have even changed since then since a couple of months ago—but this idea that, like, if you regulate or put guardrails on or approach AI as responsible AI then you’re potentially holding back the innovations and the capabilities and the possibilities of AI at a time where at least from a U.S. and American standpoint adversaries on the world stage, most notably China, are also working on their own capabilities because this is—what was the term you just used, civilization—
WALD: Civilization-changing technology.
BRENNAN: Yes.
DUFFY: Yes. I think it’s impossible to think about how we would try to govern or regulate the innovation and evolution of AI as a technology in the U.S. without acknowledging the at least strongly presumed and articulated national security interest in maintaining primacy globally in AI innovation and in AI models.
I wonder sometimes if—I wish, and I think this report is helpful with it—it can be difficult sometimes to know which elements of that are—truly represent a national security threat and which elements are maybe overhyped or overstated. It can be easy to just say, like, China, China, China over and over again.
And so I—sometimes I get concerned that we need to be breaking down that analysis a little more critically and a little more carefully and the components of that analysis are built into an understanding within the IC—like, within the intelligence community and within sort of classified spaces that the public doesn’t have access to and so the public doesn’t have the same awareness of why some of those choices are getting made.
What I think is a really, you know, significant shift, though, is in part because of the digital Belt and Road Initiative and because, I would say, a vacuum of American leadership over more than a decade globally on this, you know, ate our lunch on connectivity, right.
I mean, Africa is wired with Huawei. And so, you know, China came out to country after country after country in what many people call the Global South—I call it the global majority—but China did a fantastic job of going out to low and middle income countries and supporting things like the Chinese development model, supporting things like increasing connectivity at a time when the United States was not doing that.
And China has been on a very, very full-on and strategic offensive around the political discourse as well in digital governance and has really tied concepts that come from China to broader concepts of digital sovereignty, of countries having their own rights and their own agency, and of the SDGs as opposed to civil and political rights, which is very much where the United States, I think, has historically lived in its geopolitical narrative on some of these issues.
And now you’re seeing it. You know, you’ve seen it being—you’ve seen it play out in U.N. governance. You’ve seen it play out in international bodies, that—I’m not sure that the United States and the Western democracies with the vote count that they need in multilateral settings in the way that they might have been able to do, you know, sort of predigital Belt and Road Initiative.
And so I think there are real questions here and you see this in the U.N. resolution that—on AI that the U.S. government just led—you see in the language of that resolution a significantly greater focus on equity, on the SDGs, on those concerns that have been stated by representatives of global majority countries in the U.N. High-Level Advisory Council and that have been coopted to some degree by China over the years.
So I’ve actually been very heartened to see the Biden administration becoming more aggressive about creating a vision-forward narrative for the way that we think about AI governance as opposed to have it landing purely in markets or purely in the fact that we need to regulate it.
So that’s an interesting shift that I’m seeing us make now but we have a lot of work to do to rebuild trust, I think, and to rebuild some of our influence.
WALD: Can I make one comment on the China-U.S. paradigm here?
I often get this question on who’s winning and who’s losing in this space, and this report was actually really interesting and I encourage people just to look at the highlights of this, and in some cases you could see that China is ahead on patents and then you see the U.S. way ahead on the model releases and the types of models, and if I were to make a judgment based on the report I would say the U.S. is extraordinarily ahead of China on some of these things and what we’re seeing particularly on the foundation model front.
However, one important caveat to that is I think the question should no longer be who’s winning and who’s losing. Its who’s maintaining any kind of lead and the reason for that is, is in this report we note that, roughly, 67 percent of foundation model releases were open source.
So it’s anybody can quickly start to catch up if they start to put the resources behind this and go to that level. So it’s really important that we understand that this is never going to be who’s winning, who’s losing consistently. It’s going to be do you have the means to be able to access an open source community and start to catch up and get caught up in that space.
DUFFY: And can I just add to that? I’m so sorry. One quick thing.
Part of what we can leverage in that regard is not only the sort of our private investment model but also all of the soft power that comes with being the country that people still want to come to to build and to learn and to study.
We are incomparably a stronger draw for most countries and most talent around the world than China in terms of recruitment, and if we can figure out how to recruit and retain and keep that talent in the United States and really think about the way that we leverage our appeal in that regard I think that’s an element of our soft power that we should—we could be leaning into harder than we currently are.
BRENNAN: So rather than just framing it as purely an AI arms race for better or worse this is—it’s much more nuanced, it’s much more complicated, and it’s an opportunity particularly when we’re talking about the U.S. with the example you just gave us, Kat, for—potentially for coalition building.
I want to go back to the open versus closed source, though, for a second because it what was interesting to me in this report, Erik, was the fact that, yes, the majority is open source but if I was reading everything correctly it seems like closed source models are actually doing better in terms of accuracy and capability, at least at this point, and I wonder if we see that start to shift.
BRYNJOLFSSON: There does seem to be some evidence that the gap is closing a bit and, by the way, I think it’s interesting to sort of nuance even the word open source.
BRENNAN: OK.
BRYNJOLFSSON: So when a model is trained you can release the weights, and there are these billions or hundreds of billions of weights that are on each of the parameters and they make those available. But it’s very hard to interpret what they all mean. So you can work with them but you don’t really know what data they were trained on.
So there’s layers of openness. A truly open model would also reveal all the data that—all the information that went into it and so forth and it’s rare to see that. So most of them have the weights open so you can do some limited building on it. But you can make a case that it’d be nice to have it even more open and get—and researchers could work with it in a richer way.
But it is very exciting just to see the dynamism of a lot of different people, now Mistral in France and other models that are coming and challenging some of the leaders. At the same time, of course, last night I had Mira Murati come to my class. She’s the chief technology officer of OpenAI.
And I tried to prod her. She wouldn’t reveal. But, you know, they’re obviously working on GPT-5 and will be releasing that later and she seemed to think it was going to be surprisingly good to everybody. I guess we’ll all kind of—the rest of us will find out.
So it’s a constant leapfrogging of each other that’s happening there. But on the open and closed the other thing that she said that they have pivoted to just publishing less and just working internally more, and so has Google. You see, and it shows up a little bit in some of the publications, they’re still going up but not as fast as they did before because they feel like they want to hang on to some of their competitive advantages.
So this is a real tension there between the traditional openness of science and, really, that’s been the strength of why AI and AI in America in particular has advanced so rapidly. People are learning from each other through this incentives to try to keep things private and profit from them or not have other competitors know about them, and it’s something that those of us in academia in particular are frustrated.
BRENNAN: That feels like a good place to open this up. So I want to invite members to join the conversation with their questions. This is a reminder that this meeting is on the record and so we’re going to start with a question here in the room.
Yes, sir?
Q: I’m Andrew McLaughlin. I’m an AI, like, builder and investor.
So are you impressed with the argument that foundation models are basically just going to turn into one component of a modularized stack and that the stuff we should care about is all the bits that will be, you know, grafted on to foundation models?
In other words, the foundation model does a great job of, like, interpreting and composing language but things like retrieval augmentation, infrastructures or, you know, knowledge graph type infrastructures is really where the action is going to be and so we should maybe not stress so much about building, you know, significantly greater numbers of significantly more expensive foundation models but instead rely on them and put our focus on what lies above?
BRYNJOLFSSON: Well, I agree with the direction you’re describing but I wouldn’t just say just or be dismissive. I could see that there’s a lot of value to be added by retrieval-augmented generation or agentic systems or other tool-using systems that will create all sorts of new capabilities that foundation models or large language models don’t have.
You know, when you want to do an accurate database lookup or do a calculation or do a simulation, an LLM is just not very good at that, but these other tools can help a lot. Or take action in the world—you know, make an airline reservation, you know, cook something, or do all these other sort of things.
That said, the core LLM could also be very valuable depending on what the application is. It may be in some cases you don’t need a state of the art system. Maybe if you have a video game or something you need a character that’s sort of plausible but doesn’t have to be a genius.
But for other purposes—you know, Russell’s talking about some of the scientific discovery—you really do want something that is absolutely as intelligent as on the frontier as you can. So I think what we’re going to see is a diversity of different kinds of applications and just explosion of companies, you know, more than ever that are being started in this space that are going after lots of these different applications, some of them leveraging some of the tools that you just mentioned and some of the extensions, others working on the core capabilities.
WALD: Can I add one quick point to this?
In 2021 we came up with the concept of foundation models and we received a lot of heat from the community about this and one accusation was we were renaming something and that large language models was the term and we should adhere to that.
But the reality is, is what’s interesting about this report is it’s very reflective of the multi-modality of these and what’s going to, I think, happen and I think it’s just this broader spectrum that you’re going to start seeing over time, and I think that that was correct in that concept of not just looking at it from just a language model perspective but widening the modality aspects here.
DUFFY: And if I can add on to that.
I also think—I think so much about where does accountability sit then in that stack as well. Like, what does true responsible AI look like. I can’t—when you think of the stacks of the sort of licensing models and use cases and combinations that sit on top of any foundation model and then you have, you know, the company with the foundation model saying, well, we do this type of risk mitigation or that type of risk mitigation.
But that type of risk mitigation combined with this type of use may be, you know, wholly ineffective. I mean, we can look at—I’ve raised this—
BRYNJOLFSSON: Yeah. The layers will interact with each other in weird ways.
DUFFY: Gemini produced racially-inclusive Nazis, right, because inclusion. Like, that wasn’t what they were going for but that’s what that particular trust and safety mechanism allowed until they went back and course corrected.
And so I think about this a lot in terms of U.S. government procurement in, like, DOD systems and, like, where do you peg accountability and all I’ve got, honestly, is that if America is going to be better at innovating anything—like, we’re good at innovating AI. We’re going to be even better at innovating litigation around AI. (Laughter.)
So, like, I’m confident that we’ll be best in class on lawsuits, too. (Laughter.)
BRENNAN: Yes, sir?
Q: Hi. Munish Walther-Puri. I’m in Exiger.
I’m excited to run this report through ChatGPT and some of the others and see what they spit out. Good discussion. Great discussion so far.
I don’t often mention this but it’s very relevant here. I used to be the director of cyber risk for the city of New York and thought a lot about vulnerabilities and cyber exposure to marginalized communities.
So your point about us managing societal risk, kind of mistaking that for investor risk, is a very interesting one. A lot of lessons from cybersecurity. So here are my two questions related to both of this.
The first is if that is true and if we can’t change that immediately then should we be managing AI risk through investors? Like, should SEC be the tip of the spear on regulation here? They know how to do that. I’m being a little bit facetious but not really. It’s a genuine question.
Second, related is I think a lot about the supply chain of generative AI, where the choke points are, where the vulnerabilities are, fragility, and we have a similar problem we see coming around—everyone’s paying attention to hardware, compute, you know, semiconductors, but things around, like, open source, open source packages.
And people know that TensorFlow and, you know, some of the others are developed by these large companies but then there’s others that everyone’s really dependent on that are maintained by a community.
How do we think about the risk exposure from there and minimizing that as we go forward and think about their frontier models and the explosive growth that we’re about to experience?
Thank you.
DUFFY: I mean, I guess I would—I’ll maybe speak to the first point, and I don’t want to dominate the question.
I think—you know, we did a panel here a couple of weeks ago with the current head of Responsible AI from New York City as well as the former chief administrative officer, I think, who basically was the one who took all of the administrative hearings online during COVID. So we’re really talking about the different ways of doing technological transformation in New York City.
And so to me one of the ways that we—when we think about—I actually would love to see deeper and broader engagement not just with the SEC but also looking at CFIUS, right, and getting a better clarity on what CFIUS is actually analyzing and what it’s not, what does foreign direct investment look like in this space, right, because so much of this money is coming in from private capital and there’s a lot of opacity around exactly what that capital is and what the intentions are behind it.
So, personally, I feel like we would benefit from significantly less opacity right now in private funding and more up-to-date rules as well around disclosures for something like with, like, SEC. But I also think there’s really interesting work that can be done with government and you see it to some degree, I think, with the AI EO around building markets as well that would reward some constraint.
So if New York City, right, which has its own market and economic power comes out with—has its own standards for what it is going to require under its own procurement that’s actually a pretty powerful force for shaping industry and shaping different builds and different applications.
It won’t be the end-all be-all but you combine that with the federal government, with, you know, California and you start to incentivize markets to operate a little bit differently in a way that reward some of that investor risk while balancing the safeguards.
So I think there’s some interesting work that can be done there. I also think there’s an enormous amount of really fascinating work that can be done on private-public partnerships to this question of inequity.
So New York City, for example, has 311 calls. 311 calls are something that AI is probably pretty good at automating, right—significant elements of that as well, and my understanding is that New York has a legal obligation to serve up 311 calls or provide 311 services in 11 languages and one of those languages is Haitian Creole.
Well, who do you think has more economic capacity to truly support good testing and good fine tuning against Haitian Creole, Haiti or New York City? And so what does it look like when we think about economic opportunity and—you know, God help Haiti right now—but when we think about what a public-private partnership might look like where New York City is working with one of the leading companies, really, on fine tuning and making Haitian Creole in 311 an excellent product and an excellent tool and then we think about foreign assistance and we think about how the State Department, right, or USAID to then take those investments and translate them into, you know, a gift, a grant, aid, foreign direct investment that would then incentivize the use of AI in that country with a minority language, that’s where I’m hoping we’re going to be going is, like, more creative, more thoughtful ways about how to maximize different investments and manage those inequities in that same way.
I feel the same way about community-based red teaming. I think there’s enormous opportunity for doing red teaming with marginalized communities that city governments can power in really fascinating ways.
And so I think it’s more about really trying to get creative in how we think about solving it as opposed to just looking at the markets to do it. That’s—sorry, that was a long answer.
BRENNAN: And we’re going to take a virtual question right now.
OPERATOR: We’ll take our next question from Steve Hellman.
Mr. Hellman, please accept the unmute now prompt.
Q: Thanks for taking the question.
So just quickly, I’m sort of an energy guy and one of the things that is discussed a lot about AI is just the sheer quantity of energy consumption, and if you look at the projections it’s—you know, the energy requirements of artificial intelligence are basically going to consume the entire planet or something ridiculous like that.
My question really isn’t about that. It’s more is there sufficient innovation happening in the sort of economy of energy consumption around AI such that we can sort of disregard those projections and assume that as AI progresses so too will the energy efficiency of the software?
BRYNJOLFSSON: I’ve got good news and bad news. (Laughter.) The amount of computation going on in data centers has grown enormously. I don’t know off the top my head, but many, many orders of magnitude over the past six, seven, eight years as data centers have done more and more work, and so that by itself would have led to an explosion of energy consumption.
But it’s been—the amount used by data centers has been almost flat, about 2 (percent) to 3 percent, because at the same time, as you suggested, the chips have been getting much more efficient. There’s been a big improvement. John Koomey characterized this trend—we call it Koomey’s law—of an improvement in energy efficiency that’s going as fast or faster than Moore’s law.
That’s the good news. The bad news is that these very large models, because of the scaling laws I described earlier and the strong incentive to make them bigger and bigger, are on track to grow much faster than what we’ve seen in the past and a lot more data being analyzed by a lot more compute, and it’s unlikely that Koomey’s law or any other technological improvements could negate that the way it has in the past.
So we almost surely will see an increase in electricity usage in AI training models and AI inference models, going forward. I mean, it’s, you know, a single digit—low single digit percentage now so it’s going to get higher. I don’t know how much higher. Hopefully, it won’t be the entire planet.
But it could easily get significantly higher in the coming years and already some of these models are large enough where they’re looking to train them and they’re doing them in a distributed way, which is harder. It requires more computer science to figure out how to train a model in multiple locations and bring it together. But they have to do that because of the energy requirements. It’s getting to be hard to find a single place where they can run these models at full throttle for the training.
So this is something that is becoming an increasing challenge and something that we need to pay more attention to and, hopefully, there will be some more innovation on the energy side. But the plans are for models that are so large that it’s going to be an issue.
DUFFY: And, similarly, water consumption, too.
BRYNJOLFSSON: Yeah, water as well.
BRENNAN: We have time for maybe one more question here.
Q: I’m David Schatsky. I’m with Deloitte.
I’m interested in the question of the national competition around AI and I wonder if the panelists have a view about what the real benefits are of national leadership in this technology and what the consequences are of being number two or number three and, therefore, the policy implications of trying to secure the number-one position.
WALD: When you say national or international, like, country—
Q: International, yeah.
BRENNAN: As opposed to, like, federal versus state. Yeah, I have the same question.
Q: Yeah. Thank you.
WALD: I kind of slightly alluded to that earlier about the issue as who’s ahead and who’s not ahead and where we’re at in terms of competition, and the hard part I see on that, again, is there are—as Erik noted, there are open weights and there are open models and it’s a gradient descent on what we would consider those to be.
I think the opportunity exists if the country is willing to put the resources there, but it’s really hard to say who’s in the lead and who’s—who can stay consistently in that space.
BRYNJOLFSSON: Yeah. I mean—yeah, I think it does to some extent. I think right now, though, if I were to look at it the U.S. is significantly ahead, not a little bit but significantly, and the question is what is your innovative base, I think, in that case, and I think if I look at Europe right now they’re going to have a lot of challenges because they don’t have the same level of an innovative base and they’re trying to adjust accordingly with—in a new regulatory regime.
DUFFY: I’d also—I’d like to unbox a little bit what we think of as global leadership and being the global leader because it’s one thing if global leadership is having the most foundation models, right, and producing the biggest models or, you know, producing the most cutting-edge systems.
But to Andrew’s point, a lot of what I think the practical reality in the world is going to be is taking these existing models and then building stacks on top of them that allow multiple low income and medium income countries to seize this technology in order to move themselves forward and there’s this huge thirst among, you know, global majority countries to not be left out of this transformation and to be able to harness this technology.
And so when I think about—again, I go back to, like, connectivity, right, and the way that China addressed that and I think about the fact that it’s not clear at all that if you have a Chinese company that goes into, let’s say, Zambia—a Chinese company which already did, I think, all of their facial recognition technology—goes into Zambia and it’s the Chinese company that has a contract then to help Zambia build its AI systems for X or Y or Z, whatever that might be.
We have a pretty strong belief that whatever data that Chinese company ingests is going to go straight back to the Chinese government for its own use, and so if bigger is better and they are able to capture the market because they offer the cheaper product that does what somebody wants in that country then they are going to get more and more and more of the data globally and in that respect I can see them truly outpacing us in terms of the diversity of information that they have coming in, the diversity of information that they’re learning off of, and their influence in all of those countries as well.
If we think about putting our own products out there do we need to be subsidizing American companies to make them competitive in those spaces because China is subsidizing its companies? Like, how are we thinking about this in terms of bilateral investment? How are we thinking about it in terms of foreign assistance?
And so this is where I get a little hesitant about the binary of global leadership is maintaining primacy over foundation models as opposed to thinking more holistically around what true leadership would look like in this moment and how we convince a lot more countries around the world to go with our companies, to go with our models, to work with our principles, and to not give China that huge edge in the ten- to fifteen-year span.
So that’s another way that I think about this, which is a little bit different, I think, than how you were—what you were asking about.
BRENNAN: Erik, do you have a final thought before I wrap this up?
BRYNJOLFSSON: Well, just very briefly.
I think one thing—I agree with what Kat was saying—one thing I would just highlight is these are dual-use multi-use technologies and so there’s an economic aspect to it and, you know, in economics often it’s great if your competitors become more productive and successful. Actually, it can be beneficial worldwide.
But on the military side, you know, then it’s much more of a zero sum game and there is a real concern, I think, that as seriously as the United States is taking the AI threat and opportunity it’s not taking it seriously enough.
The speaker a week ago at my class was Eric Schmidt. I don’t know if he’s come and spoken here at the Council on Foreign Relations. But he’s pulling his hair out trying to get the national security establishment to take AI even more seriously because it is something that can be a real game changer in military competition.
In these technologies it’s very hard to sort of isolate them one versus the other. Many of these technologies can be used across a whole set of different applications. So that would be something that you would have to put into the mix.
BRENNAN: And we are already starting to see some of those AI applications in real-time on battlefields in Ukraine and in the Middle East right now.
BRYNJOLFSSON: Exactly. Yeah. And Eric in particular is very involved on the Ukraine drone battlefield AI application.
BRENNAN: On that cheery note we’re going to wrap it up right there.
Erik Brynjolfsson, Kat Duffy, and Russell Wald, thanks for joining me here on stage, and congratulations on your report. (Applause.)
DUFFY: Thank you, Morgan.
BRYNJOLFSSON: Thank you.
WALD: Thank you.
BRYNJOLFSSON: Thank you so much. (Applause.)
(END)