
119. Jaime Sevilla - Projecting AI progress from compute trends
Towards Data Science
04/13/22
•48m
About
Comments
Featured In
There’s an idea in machine learning that most of the progress we see in AI doesn’t come from new algorithms of model architectures. instead, some argue, progress almost entirely comes from scaling up compute power, datasets and model sizes — and besides those three ingredients, nothing else really matters.
Through that lens the history of AI becomes the history f processing power and compute budgets. And if that turns out to be true, then we might be able to do a decent job of predicting AI progress by studying trends in compute power and their impact on AI development.
And that’s why I wanted to talk to Jaime Sevilla, an independent researcher and AI forecaster, and affiliate researcher at Cambridge University’s Centre for the Study of Existential Risk, where he works on technological forecasting and understanding trends in AI in particular. His work’s been cited in a lot of cool places, including Our World In Data, who used his team’s data to put together an exposé on trends in compute. Jaime joined me to talk about compute trends and AI forecasting on this episode of the TDS podcast.
***
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 2:00 Trends in compute
- 4:30 Transformative AI
- 13:00 Industrial applications
- 19:00 GPT-3 and scaling
- 25:00 The two papers
- 33:00 Biological anchors
- 39:00 Timing of projects
- 43:00 The trade-off
- 47:45 Wrap-up
Previous Episode

118. Angela Fan - Generating Wikipedia articles with AI
April 6, 2022
•51m
Generating well-referenced and accurate Wikipedia articles has always been an important problem: Wikipedia has essentially become the Internet's encyclopedia of record, and hundreds of millions of people use it do understand the world.
But over the last decade Wikipedia has also become a critical source of training data for data-hungry text generation models. As a result, any shortcomings in Wikipedia’s content are at risk of being amplified by the text generation tools of the future. If one type of topic or person is chronically under-represented in Wikipedia’s corpus, we can expect generative text models to mirror — or even amplify — that under-representation in their outputs.
Through that lens, the project of Wikipedia article generation is about much more than it seems — it’s quite literally about setting the scene for the language generation systems of the future, and empowering humans to guide those systems in more robust ways.
That’s why I wanted to talk to Meta AI researcher Angela Fan, whose latest project is focused on generating reliable, accurate, and structured Wikipedia articles. She joined me to talk about her work, the implications of high-quality long-form text generation, and the future of human/AI collaboration on this episode of the TDS podcast.
---
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
---
Chapters:- 1:45 Journey into Meta AI
- 5:45 Transition to Wikipedia
- 11:30 How articles are generated
- 18:00 Quality of text
- 21:30 Accuracy metrics
- 25:30 Risk of hallucinated facts
- 30:45 Keeping up with changes
- 36:15 UI/UX problems
- 45:00 Technical cause of gender imbalance
- 51:00 Wrap-up
Next Episode
AI scaling has really taken off. Ever since GPT-3 came out, it’s become clear that one of the things we’ll need to do to move beyond narrow AI and towards more generally intelligent systems is going to be to massively scale up the size of our models, the amount of processing power they consume and the amount of data they’re trained on, all at the same time.
That’s led to a huge wave of highly scaled models that are incredibly expensive to train, largely because of their enormous compute budgets. But what if there was a more flexible way to scale AI — one that allowed us to decouple model size from compute budgets, so that we can track a more compute-efficient course to scale?
That’s the promise of so-called mixture of experts models, or MoEs. Unlike more traditional transformers, MoEs don’t update all of their parameters on every training pass. Instead, they route inputs intelligently to sub-models called experts, which can each specialize in different tasks. On a given training pass, only those experts have their parameters updated. The result is a sparse model, a more compute-efficient training process, and a new potential path to scale.
Google has been pushing the frontier of research on MoEs, and my two guests today in particular have been involved in pioneering work on that strategy (among many others!). Liam Fedus and Barrett Zoph are research scientists at Google Brain, and they joined me to talk about AI scaling, sparsity and the present and future of MoE models on this episode of the TDS podcast.
***
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:- 2:15 Guests’ backgrounds
- 8:00 Understanding specialization
- 13:45 Speculations for the future
- 21:45 Switch transformer versus dense net
- 27:30 More interpretable models
- 33:30 Assumptions and biology
- 39:15 Wrap-up
If you like this episode you’ll love
Promoted




