
124. Alex Watson - Synthetic data could change everything
Towards Data Science
05/18/22
•51m
About
Comments
Featured In
There’s a website called thispersondoesnotexist.com. When you visit it, you’re confronted by a high-resolution, photorealistic AI-generated picture of a human face. As the website’s name suggests, there’s no human being on the face of the earth who looks quite like the person staring back at you on the page.
Each of those generated pictures are a piece of data that captures so much of the essence of what it means to look like a human being. And yet they do so without telling you anything whatsoever about any particular person. In that sense, it’s fully anonymous human face data.
That’s impressive enough, and it speaks to how far generative image models have come over the last decade. But what if we could do the same for any kind of data?
What if I could generate an anonymized set of medical records or financial transaction data that captures all of the latent relationships buried in a private dataset, without the risk of leaking sensitive information about real people? That’s the mission of Alex Watson, the Chief Product Officer and co-founder of Gretel AI, where he works on unlocking value hidden in sensitive datasets in ways that preserve privacy.
What I realized talking to Alex was that synthetic data is about much more than ensuring privacy. As you’ll see over the course of the conversation, we may well be heading for a world where most data can benefit from augmentation via data synthesis — where synthetic data brings privacy value almost as a side-effect of enriching ground truth data with context imported from the wider world.
Alex joined me to talk about data privacy, data synthesis, and what could be the very strange future of the data lifecycle on this episode of the TDS podcast.
***
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 2:40 What is synthetic data?
- 6:45 Large language models
- 11:30 Preventing data leakage
- 18:00 Generative versus downstream models
- 24:10 De-biasing and fairness
- 30:45 Using synthetic data
- 35:00 People consuming the data
- 41:00 Spotting correlations in the data
- 47:45 Generalization of different ML algorithms
- 51:15 Wrap-up
Previous Episode

Two ML researchers with world-class pedigrees who decided to build a company that puts AI on the blockchain. Now to most people — myself included — “AI on the blockchain” sounds like a winning entry in some kind of startup buzzword bingo. But what I discovered talking to Jacob and Ala was that they actually have good reasons to combine those two ingredients together.
At a high level, doing AI on a blockchain allows you to decentralize AI research and reward labs for building better models, and not for publishing papers in flashy journals with often biased reviewers.
And that’s not all — as we’ll see, Ala and Jacob are taking on some of the thorniest current problems in AI with their decentralized approach to machine learning. Everything from the problem of designing robust benchmarks to rewarding good AI research and even the centralization of power in the hands of a few large companies building powerful AI systems — these problems are all in their sights as they build out Bittensor, their AI-on-the-blockchain-startup.
Ala and Jacob joined me to talk about all those things and more on this episode of the TDS podcast.
---
Intro music:
Artist: Ron Gelinas
Track Title: Daybreak Chill Blend (original mix)
Link to Track: https://youtu.be/d8Y2sKIgFWc
---
Chapters:
- 2:40 Ala and Jacob’s backgrounds
- 4:00 The basics of AI on the blockchain
- 11:30 Generating human value
- 17:00 Who sees the benefit? 22:00 Use of GPUs
- 28:00 Models learning from each other
- 37:30 The size of the network
- 45:30 The alignment of these systems
- 51:00 Buying into a system
- 54:00 Wrap-up
Next Episode

125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety?
September 7, 2022
•48m
It’s no secret that the US and China are geopolitical rivals. And it’s also no secret that that rivalry extends into AI — an area both countries consider to be strategically critical.
But in a context where potentially transformative AI capabilities are being unlocked every few weeks, many of which lend themselves to military applications with hugely destabilizing potential, you might hope that the US and China would have robust agreements in place to deal with things like runaway conflict escalation triggered by an AI powered weapon that misfires. Even at the height of the cold war, the US and Russia had robust lines of communication to de-escalate potential nuclear conflicts, so surely the US and China have something at least as good in place now... right?
Well they don’t, and to understand the reason why — and what we should do about it — I’ll be speaking to Ryan Fedashuk, a Research Analyst at Georgetown University’s Center for Security and Emerging Technology and Adjunct Fellow at the Center for a New American Security. Ryan recently wrote a fascinating article for Foreign Policy Magazine, where he outlines the challenges and importance of US-China collaboration on AI safety. He joined me to talk about the U.S. and China’s shared interest in building safe AI, how reach side views the other, and what realistic China AI policy looks like on this episode of the TDs podcast.
If you like this episode you’ll love
Promoted




