
118. Angela Fan - Generating Wikipedia articles with AI
Towards Data Science
04/06/22
•51m
About
Comments
Featured In
Generating well-referenced and accurate Wikipedia articles has always been an important problem: Wikipedia has essentially become the Internet's encyclopedia of record, and hundreds of millions of people use it do understand the world.
But over the last decade Wikipedia has also become a critical source of training data for data-hungry text generation models. As a result, any shortcomings in Wikipedia’s content are at risk of being amplified by the text generation tools of the future. If one type of topic or person is chronically under-represented in Wikipedia’s corpus, we can expect generative text models to mirror — or even amplify — that under-representation in their outputs.
Through that lens, the project of Wikipedia article generation is about much more than it seems — it’s quite literally about setting the scene for the language generation systems of the future, and empowering humans to guide those systems in more robust ways.
That’s why I wanted to talk to Meta AI researcher Angela Fan, whose latest project is focused on generating reliable, accurate, and structured Wikipedia articles. She joined me to talk about her work, the implications of high-quality long-form text generation, and the future of human/AI collaboration on this episode of the TDS podcast.
---
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
---
Chapters:- 1:45 Journey into Meta AI
- 5:45 Transition to Wikipedia
- 11:30 How articles are generated
- 18:00 Quality of text
- 21:30 Accuracy metrics
- 25:30 Risk of hallucinated facts
- 30:45 Keeping up with changes
- 36:15 UI/UX problems
- 45:00 Technical cause of gender imbalance
- 51:00 Wrap-up
Previous Episode

117. Beena Ammanath - Defining trustworthy AI
March 30, 2022
•46m
Trustworthy AI is one of today’s most popular buzzwords. But although everyone seems to agree that we want AI to be trustworthy, definitions of trustworthiness are often fuzzy or inadequate. Maybe that shouldn’t be surprising: it’s hard to come up with a single set of standards that add up to “trustworthiness”, and that apply just as well to a Netflix movie recommendation as a self-driving car.
So maybe trustworthy AI needs to be thought of in a more nuanced way — one that reflects the intricacies of individual AI use cases. If that’s true, then new questions come up: who gets to define trustworthiness, and who bears responsibility when a lack of trustworthiness leads to harms like AI accidents, or undesired biases?
Through that lens, trustworthiness becomes a problem not just for algorithms, but for organizations. And that’s exactly the case that Beena Ammanath makes in her upcoming book, Trustworthy AI, which explores AI trustworthiness from a practical perspective, looking at what concrete steps companies can take to make their in-house AI work safer, better and more reliable. Beena joined me to talk about defining trustworthiness, explainability and robustness in AI, as well as the future of AI regulation and self-regulation on this episode of the TDS podcast.
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
Chapters:- 1:55 Background and trustworthy AI
- 7:30 Incentives to work on capabilities
- 13:40 Regulation at the level of application domain
- 16:45 Bridging the gap
- 23:30 Level of cognition offloaded to the AI
- 25:45 What is trustworthy AI?
- 34:00 Examples of robustness failures
- 36:45 Team diversity
- 40:15 Smaller companies
- 43:00 Application of best practices
- 46:30 Wrap-up
Next Episode

119. Jaime Sevilla - Projecting AI progress from compute trends
April 13, 2022
•48m
There’s an idea in machine learning that most of the progress we see in AI doesn’t come from new algorithms of model architectures. instead, some argue, progress almost entirely comes from scaling up compute power, datasets and model sizes — and besides those three ingredients, nothing else really matters.
Through that lens the history of AI becomes the history f processing power and compute budgets. And if that turns out to be true, then we might be able to do a decent job of predicting AI progress by studying trends in compute power and their impact on AI development.
And that’s why I wanted to talk to Jaime Sevilla, an independent researcher and AI forecaster, and affiliate researcher at Cambridge University’s Centre for the Study of Existential Risk, where he works on technological forecasting and understanding trends in AI in particular. His work’s been cited in a lot of cool places, including Our World In Data, who used his team’s data to put together an exposé on trends in compute. Jaime joined me to talk about compute trends and AI forecasting on this episode of the TDS podcast.
***
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 2:00 Trends in compute
- 4:30 Transformative AI
- 13:00 Industrial applications
- 19:00 GPT-3 and scaling
- 25:00 The two papers
- 33:00 Biological anchors
- 39:00 Timing of projects
- 43:00 The trade-off
- 47:45 Wrap-up
If you like this episode you’ll love
Promoted




