
107. Kevin Hu - Data observability and why it matters
Towards Data Science
12/15/21
•49m
About
Comments
Featured In
Imagine for a minute that you’re running a profitable business, and that part of your sales strategy is to send the occasional mass email to people who’ve signed up to be on your mailing list. For a while, this approach leads to a reliable flow of new sales, but then one day, that abruptly stops. What happened?
You pour over logs, looking for an explanation, but it turns out that the problem wasn’t with your software; it was with your data. Maybe the new intern accidentally added a character to every email address in your dataset, or shuffled the names on your mailing list so that Christina got a message addressed to “John”, or vice-versa. Versions of this story happen surprisingly often, and when they happen, the cost can be significant: lost revenue, disappointed customers, or worse — an irreversible loss of trust.
Today, entire products are being built on top of datasets that aren’t monitored properly for critical failures — and an increasing number of those products are operating in high-stakes situations. That’s why data observability is so important: the ability to track the origin, transformations and characteristics of mission-critical data to detect problems before they lead to downstream harm.
And it’s also why we’ll be talking to Kevin Hu, the co-founder and CEO of Metaplane, one of the world’s first data observability startups. Kevin has a deep understanding of data pipelines, and the problems that cap pop up if you they aren’t properly monitored. He joined me to talk about data observability, why it matters, and how it might be connected to responsible AI on this episode of the TDS podcast.
Intro music:
➞ Artist: Ron Gelinas
➞ Track Title: Daybreak Chill Blend (original mix)
➞ Link to Track: https://youtu.be/d8Y2sKIgFWc 0:00
Chapters:
- 0:00 Intro
- 2:00 What is data observability?
- 8:20 Difference between a dataset’s internal and external characteristics
- 12:20 Why is data so difficult to log?
- 17:15 Tracing back models
- 22:00 Algorithmic analyzation of a date
- 26:30 Data ops in five years
- 33:20 Relation to cutting-edge AI work
- 39:25 Software engineering and startup funding
- 42:05 Problems on a smaller scale
- 46:40 Future data ops problems to solve
- 48:45 Wrap-up
Previous Episode

106. Yang Gao - Sample-efficient AI
December 8, 2021
•49m
Historically, AI systems have been slow learners. For example, a computer vision model often needs to see tens of thousands of hand-written digits before it can tell a 1 apart from a 3. Even game-playing AIs like DeepMind’s AlphaGo, or its more recent descendant MuZero, need far more experience than humans do to master a given game.
So when someone develops an algorithm that can reach human-level performance at anything as fast as a human can, it’s a big deal. And that’s exactly why I asked Yang Gao to join me on this episode of the podcast. Yang is an AI researcher with affiliations at Berkeley and Tsinghua University, who recently co-authored a paper introducing EfficientZero: a reinforcement learning system that learned to play Atari games at the human-level after just two hours of in-game experience. It’s a tremendous breakthrough in sample-efficiency, and a major milestone in the development of more general and flexible AI systems.
---
Intro music:
➞ Artist: Ron Gelinas
➞ Track Title: Daybreak Chill Blend (original mix)
➞ Link to Track: https://youtu.be/d8Y2sKIgFWc
---
Chapters:
0:00 Intro
1:50 Yang’s background
6:00 MuZero’s activity
13:25 MuZero to EfficiantZero
19:00 Sample efficiency comparison
23:40 Leveraging algorithmic tweaks
27:10 Importance of evolution to human brains and AI systems
35:10 Human-level sample efficiency
38:28 Existential risk from AI in China
47:30 Evolution and language
49:40 Wrap-up
Next Episode

108. Last Week In AI — 2021: The (full) year in review
January 5, 2022
•50m
2021 has been a wild ride in many ways, but its wildest features might actually be AI-related. We’ve seen major advances in everything from language modeling to multi-modal learning, open-ended learning and even AI alignment.
So, we thought, what better way to take stock of the big AI-related milestones we’ve reached in 2021 than a cross-over episode with our friends over at the Last Week In AI podcast.
***
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 0:00 Intro
- 2:15 Rise of multi-modal models
- 7:40 Growth of hardware and compute
- 13:20 Reinforcement learning
- 20:45 Open-ended learning
- 26:15 Power seeking paper
- 32:30 Safety and assumptions
- 35:20 Intrinsic vs. extrinsic motivation
- 42:00 Mapping natural language
- 46:20 Timnit Gebru’s research institute
- 49:20 Wrap-up
If you like this episode you’ll love
Promoted




