Stochastic Parrots 🦜 and Shaky Foundations

Meg Mitchell Speaks at Stanford HAI

“When I was first invited, my gut reaction was to decline,” demurred computer scientist Dr. Margaret (Meg) Mitchell, who recently joined startup Hugging Face to continue her work promoting algorithmic fairness.  

Workshop chatrooms blew up. In one chat, many Stanford students speculated on what would happen next. Most of the attendees at the August 23-24, 2021 Stanford HAI Workshop on Foundation Models, organized by Professor Percy Liang and Rishi Bommasani, understood Dr. Mitchell’s reasoning. She and her diverse community of researchers have led the conversation on AI ethics for years. However, with the announcement of the new Center for Research on Foundation Models (CRFM), a 200-page white paper, and workshop, Stanford entered that conversation anew. Moreover, these latest Stanford initiatives tackled some of the very same work for which Google had fired both Dr. Mitchell and her former co-head of Google’s Ethics AI group, Dr. Timnit Gebru. Many observers have wondered, was Stanford trying to build productively on the work of Mitchell, Gebru and others, or was it trying to catch and neutralize it, as Google had attempted?

Google had retaliated first against Dr. Gebru and then Dr. Mitchell for raising their critical voices, but it ultimately failed to suppress their now highly influential paper, co-written with Professor Emily M. Bender and Angelina McMillan-Major, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. This work interrogates large language models LLMs (e.g., BERT, GPT-3, CLIP, Codex) questioning their size, creation, environmental, financial, and societal costs. Trained on an enormous amount of webcrawl data, LLMs produce remarkably human-sounding language. Yet, they also frequently fail, remain poorly understood, and propagate biases. Despite all these problems, LLMs are widely deployed in AI development. “Stochastic Parrots 🦜” suggests alternative research directions and recommends protocols for mitigating bias in datasets.

The CRFM white paper drew on “Stochastic Parrots 🦜” and the Center invited Dr. Mitchell, Dr. Gebru, Professor Bender, and many other critics to present their views on LLMs at the workshop. All but Mitchell declined. Her apprehension about attending arises from two main concerns she shares with her community:

  1. The need for a diverse engineering community to address these models’ social harms.

  2. With CRFM’s effort to gain access to industry compute power to study the models, Dr. Mitchell and her community question the ethical and intellectual costs of Stanford+tech collaboration.

To some of my students, the price of collaboration seemed high. Google’s firing of Gebru and Mitchell sent a chill through the student engineering community. When Stanford provided little support for their own graduate, Dr. Gebru, some students worried that their university remained more strongly committed to tech than them. Was Stanford’s silence on Timnit Gebru the price of access to industry compute power?

Other students felt more optimistic about a Stanford HAI+tech collaboration and saw the workshop as an honoring of the AI ethics community and inviting them, if belatedly and unsuccessfully, into a conversation. One student made this meme to champion the recommendations of “Stochastic Parrots 🦜” and argued that Stanford had learned a lot from the paper:

With gentle candor, Dr. Mitchell explained her reticence about participating in the workshop. She asserted that her venerable host embodied many of the problems of the tech industry. To illustrate her point, Dr. Mitchell scrolled through the Stanford HAI collaborator roster, remarking on the homogeneous faculty character of the institute at its founding, despite having the formidable Professor Fei-Fei Li as a co-director. Stanford HAI’s much discussed lack of diversity continues to vex observers. Mitchell reflected:

“Obviously [these exclusions were] not intentional, but that’s exactly right. By not being intentional about our implicit biases, we implicitly propagate the message that some people are more welcome than others.”

Stanford HAI explicitly declares inclusion as part of its mission, but despite diversity efforts, the institute has struggled to attract faculty from the African diaspora and other historically marginalized groups. To change tech and make it as truly “human-centered” as Stanford HAI advertises, requires diverse contributors at all levels. Without this diversity, HAI and its new CRFM center remain ill-prepared to tackle the pressing issues raised in the “Stochastic Parrots 🦜” paper or contribute to the years of work others have already committed.

Dr. Mitchell’s second concern addressed the new CRFM Center’s mission and renaming of LLMs. For example, the Center landing page ambitiously proclaims its mission:


According to the CRFM white paper, a name change proved necessary because “language” is only one kind of data these models learn from. Several contributors have described their lively debate searching for a more descriptive name. However, critical observers like Professor Meredith Whittaker and Professor Bender questioned whether the renaming was a “rebrand” catering to industry that elided the important criticisms of the models.

Even those who agreed the previous name “language models” lacked a full description of the models’ potential, still worried the words “foundation” and “groundwork” implied something conceptually and empirically solid —Professor Liang used the metaphor of a house foundation, but he also stressed in the white paper, workshop, and interviews that these models create a “single point of failure” for downstream applications. Deb Raji offered “large base models,” a designation that avoids the epistemological and ontological overreach of “foundation.” For Dr. Mitchell these models serve best as “support structures” not foundations. They are not the future of AI. People and personnel are. A truly “human-centered AI” would make people, not unstable algorithms the foundation for building further applications, she affirms.

From Stanford’s perspective, meanwhile, Stanford HAI has indeed engaged in a human-centered endeavor to address such issues. Well aware of the need for diversity at HAI and on the Stanford campus in general, HAI made a great addition with Professor Michele Elam, Faculty Associate Director of the institute, who also serves as a Race & Technology Affiliate at the Center for Comparative Studies in Race and Ethnicity. She has implemented important diversity initiatives and enjoys cross-campus the support. Professor James Landay reports on some recent progress at HAI and in Stanford Computer Science with incoming faculty and more funding for diverse candidates.

One key sign of this progress is that the CRFM white paper and conference proved much more inclusive. Dr. Mitchell remarked on this improvement and assured the audience that she ultimately accepted the invitation because she greatly respected many Stanford HAI researchers.

Among Stanford HAI’s many initiatives, the fellowships offer an excellent opportunity. But HAI could do more to advance the careers of young diverse leaders. How about prizes for great new work at all levels? Our excellent undergraduate and grad projects could use recognition. What about visiting teaching positions? Stanford students will flock to courses (even mini ones) taught by visitors from the AI ethics community, especially Dr. Gebru and Dr. Mitchell. Stanford HAI could collaborate with deans and departments to secure the funding. More AI ethics community members would be willing to visit if they were housed schools, departments, or programs like Education, AAAS, or FemGen. There are already courses on the books in many departments, which they could rework to include their own research.

Even as Stanford moves forward with greater inclusion efforts, many questions remain about the Stanford + tech collaboration. HAI aims to guide industry to a more ethical use of these models. Will industry listen or care? In the end, the workshop organizers, participants, and Dr. Mitchell agreed to consider large models with caution and to continue the discussion. Professor Rob Reich expressed he “learned a lot:”

Such a coda resonates with the report’s estimation: widespread adoption of foundation models promotes the homogenization of outcomes and centralization of power. Yes, and Stanford HAI wants to deploy that power for good. Whatever success Stanford HAI’s new CRFM will enjoy in securing industry cooperation, profit-making and academic research value, at least in theory, different goals. Hopefully, Stanford HAI remains focused on its research commitments and protects the academic freedom of its collaborators.

Meanwhile, now is the time to reread “Stochastic Parrots 🦜” to develop norms for model auditing and build alternative tools. When models affect humans at a global scale, we need greater ability to contest their outcomes, gain consent for data usage, and govern data. We need better models, but we also need more diverse scholars to enrich the research enterprise with their perspectives.