WEBVTT

00:00:00.000 --> 00:00:04.480
One of the most irritating things about modern life, you know, other than political polarization,

00:00:04.480 --> 00:00:07.840
stagnating wages, and the feeling of somehow being socially isolated in a hyperconnect

00:00:07.840 --> 00:00:12.560
environment, is how difficult these stupid little capture puzzles have gotten.

00:00:12.560 --> 00:00:16.480
But why is this happening? A while ago, we did an episode on those super simple

00:00:16.480 --> 00:00:21.280
I'm not a robot recaptures. And those aren't that bad. You just click a box and that's it.

00:00:21.280 --> 00:00:26.320
But now it seems like we're having to solve these arcane picture puzzles, sometimes multiple

00:00:26.400 --> 00:00:30.160
times simply to read a news article. So what gives?

00:00:30.160 --> 00:00:34.640
Well, part of it is that the evolution of captures to become more annoying is part of an

00:00:34.640 --> 00:00:39.920
ongoing arms race between spam, malware, and shopping bots on the one side, and cloud and

00:00:39.920 --> 00:00:44.880
website operators trying to stop them on the other. Back in the day, bots were so dumb that

00:00:44.880 --> 00:00:49.680
just a slightly distorted word or two could send them packing. But of course, it was quite easy

00:00:49.680 --> 00:00:54.080
for fleshy humans to understand what was on the screen. But unsurprisingly, there's been a lot

00:00:54.080 --> 00:00:59.200
of investment over the past couple of decades into automated optical character recognition,

00:00:59.200 --> 00:01:05.200
or OCR, which helps digitize old books and other publications so that they're more easily searchable.

00:01:05.200 --> 00:01:10.880
Doing this requires AIs to discern text even if it's distorted in some way. So over the years,

00:01:10.880 --> 00:01:15.600
efforts were made to make them better at this. And unsurprisingly, the text we put into those

00:01:15.600 --> 00:01:21.840
little capture tests help the AIs do this. Many captures use scans of distorted words

00:01:21.840 --> 00:01:27.600
from real books or newspapers. And because we humans took so many of these tests,

00:01:27.600 --> 00:01:34.000
we trained AIs to become very good at them. But unfortunately, AI algorithms can be used by bots.

00:01:34.000 --> 00:01:38.800
So we stopped using word-based captures and moved on to those little picture puzzles.

00:01:38.800 --> 00:01:43.600
For a while, picture captures proved more effective than word captures. But guess what?

00:01:44.160 --> 00:01:47.440
Google, who's responsible for producing many of those picture puzzles,

00:01:47.440 --> 00:01:53.520
uses human input on those to train their AIs as well. If you've noticed how lots of these

00:01:53.520 --> 00:01:58.640
captures are photos of street signs, traffic lights, and crosswalks, etc., it's because

00:01:58.640 --> 00:02:03.120
Google is taking all that data that you give it when solving these puzzles and using that

00:02:03.120 --> 00:02:08.960
information to help build AIs for self-driving cars, as well as to improve the quality of results

00:02:08.960 --> 00:02:14.800
on Google Maps. And again, AIs got really good at solving these puzzles, so the folks behind

00:02:14.800 --> 00:02:19.520
capture have been making them progressively harder, much to the chagrin of you, the average

00:02:19.520 --> 00:02:24.080
internet user, who's just trying to bulk-buy toilet paper online instead of schlepping up to

00:02:24.080 --> 00:02:30.160
Walmart. But is there an end in sight to this arms race? The way those I'm-not-a-robot boxes work

00:02:30.160 --> 00:02:35.440
without forcing you to solve a puzzle is by tracking your in-browser behavior, how fast you type,

00:02:35.440 --> 00:02:40.480
how you move your mouse around, even the way you switch tabs to determine that you're human,

00:02:40.480 --> 00:02:45.120
and not a bot that can input paragraphs of text in a matter of seconds or move a mouse to

00:02:45.120 --> 00:02:50.480
precisely a single pixel location on the screen. Some engineers believe what we'll end up having

00:02:50.480 --> 00:02:54.640
is a kind of suit-up version of this. There's no secret that you're constantly being tracked in

00:02:54.640 --> 00:02:59.760
some form or another while you're online, so future versions of capture may not only track the way

00:02:59.760 --> 00:03:04.800
you type or move your mouse, but also keep tabs on you more thoroughly through cookies and browser

00:03:04.800 --> 00:03:10.000
activity to determine whether or not you're human. Such a capture would basically always be running

00:03:10.080 --> 00:03:15.600
in the background on a browser level, so while it might help us beat the bots, there are obvious

00:03:15.600 --> 00:03:22.000
privacy concerns, in addition to the fact that bots might end up beating these tests at some point

00:03:22.000 --> 00:03:26.480
in the future as well. I just hope it doesn't get to the point where all computers come with a USB

00:03:26.480 --> 00:03:30.640
connected DNA sample collector. So thanks for watching guys, if you liked this video, hit like,

00:03:30.640 --> 00:03:34.960
hit subscribe, tell your friends, and hit us up in the comments section for your suggestions for

00:03:34.960 --> 00:03:37.200
topics that we should cover in the future.
