124. Alex Watson - Synthetic data could change everything

Towards Data Science

05/18/22

•

51m

About

Comments

Featured In

There’s a website called thispersondoesnotexist.com. When you visit it, you’re confronted by a high-resolution, photorealistic AI-generated picture of a human face. As the website’s name suggests, there’s no human being on the face of the earth who looks quite like the person staring back at you on the page.

Each of those generated pictures are a piece of data that captures so much of the essence of what it means to look like a human being. And yet they do so without telling you anything whatsoever about any particular person. In that sense, it’s fully anonymous human face data.

That’s impressive enough, and it speaks to how far generative image models have come over the last decade. But what if we could do the same for any kind of data?

What if I could generate an anonymized set of medical records or financial transaction data that captures all of the latent relationships buried in a private dataset, without the risk of leaking sensitive information about real people? That’s the mission of Alex Watson, the Chief Product Officer and co-founder of Gretel AI, where he works on unlocking value hidden in sensitive datasets in ways that preserve privacy.

What I realized talking to Alex was that synthetic data is about much more than ensuring privacy. As you’ll see over the course of the conversation, we may well be heading for a world where most data can benefit from augmentation via data synthesis — where synthetic data brings privacy value almost as a side-effect of enriching ground truth data with context imported from the wider world.

Alex joined me to talk about data privacy, data synthesis, and what could be the very strange future of the data lifecycle on this episode of the TDS podcast.

***

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

***

Chapters:

2:40 What is synthetic data?
6:45 Large language models
11:30 Preventing data leakage
18:00 Generative versus downstream models
24:10 De-biasing and fairness
30:45 Using synthetic data
35:00 People consuming the data
41:00 Spotting correlations in the data
47:45 Generalization of different ML algorithms
51:15 Wrap-up

Previous Episode

123. Ala Shaabana and Jacob Steeves - AI on the blockchain (it actually might just make sense)

May 12, 2022

•

54m

Two ML researchers with world-class pedigrees who decided to build a company that puts AI on the blockchain. Now to most people — myself included — “AI on the blockchain” sounds like a winning entry in some kind of startup buzzword bingo. But what I discovered talking to Jacob and Ala was that they actually have good reasons to combine those two ingredients together.

At a high level, doing AI on a blockchain allows you to decentralize AI research and reward labs for building better models, and not for publishing papers in flashy journals with often biased reviewers.

And that’s not all — as we’ll see, Ala and Jacob are taking on some of the thorniest current problems in AI with their decentralized approach to machine learning. Everything from the problem of designing robust benchmarks to rewarding good AI research and even the centralization of power in the hands of a few large companies building powerful AI systems — these problems are all in their sights as they build out Bittensor, their AI-on-the-blockchain-startup.

Ala and Jacob joined me to talk about all those things and more on this episode of the TDS podcast.

---

Intro music:

Artist: Ron Gelinas

Track Title: Daybreak Chill Blend (original mix)

Link to Track: https://youtu.be/d8Y2sKIgFWc

---

Chapters:

2:40 Ala and Jacob’s backgrounds
4:00 The basics of AI on the blockchain
11:30 Generating human value
17:00 Who sees the benefit? 22:00 Use of GPUs
28:00 Models learning from each other
37:30 The size of the network
45:30 The alignment of these systems
51:00 Buying into a system
54:00 Wrap-up

Next Episode

125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety?

September 7, 2022

•

48m

It’s no secret that the US and China are geopolitical rivals. And it’s also no secret that that rivalry extends into AI — an area both countries consider to be strategically critical.

But in a context where potentially transformative AI capabilities are being unlocked every few weeks, many of which lend themselves to military applications with hugely destabilizing potential, you might hope that the US and China would have robust agreements in place to deal with things like runaway conflict escalation triggered by an AI powered weapon that misfires. Even at the height of the cold war, the US and Russia had robust lines of communication to de-escalate potential nuclear conflicts, so surely the US and China have something at least as good in place now... right?

Well they don’t, and to understand the reason why — and what we should do about it — I’ll be speaking to Ryan Fedashuk, a Research Analyst at Georgetown University’s Center for Security and Emerging Technology and Adjunct Fellow at the Center for a New American Security. Ryan recently wrote a fascinating article for Foreign Policy Magazine, where he outlines the challenges and importance of US-China collaboration on AI safety. He joined me to talk about the U.S. and China’s shared interest in building safe AI, how reach side views the other, and what realistic China AI policy looks like on this episode of the TDs podcast.

If you like this episode you’ll love

Python Bytes

DAOn the Rabbit Hole

Security Weekly Podcast Network (Audio)

The GIG Economy Podcast

Cleaning Up: Leadership in an Age of Climate Change

Promoted

‌