1
00:00:00,000 --> 00:00:04,480
One of the most irritating things about modern life, you know, other than political polarization,

2
00:00:04,480 --> 00:00:07,840
stagnating wages, and the feeling of somehow being socially isolated in a hyperconnect

3
00:00:07,840 --> 00:00:12,560
environment, is how difficult these stupid little capture puzzles have gotten.

4
00:00:12,560 --> 00:00:16,480
But why is this happening? A while ago, we did an episode on those super simple

5
00:00:16,480 --> 00:00:21,280
I'm not a robot recaptures. And those aren't that bad. You just click a box and that's it.

6
00:00:21,280 --> 00:00:26,320
But now it seems like we're having to solve these arcane picture puzzles, sometimes multiple

7
00:00:26,400 --> 00:00:30,160
times simply to read a news article. So what gives?

8
00:00:30,160 --> 00:00:34,640
Well, part of it is that the evolution of captures to become more annoying is part of an

9
00:00:34,640 --> 00:00:39,920
ongoing arms race between spam, malware, and shopping bots on the one side, and cloud and

10
00:00:39,920 --> 00:00:44,880
website operators trying to stop them on the other. Back in the day, bots were so dumb that

11
00:00:44,880 --> 00:00:49,680
just a slightly distorted word or two could send them packing. But of course, it was quite easy

12
00:00:49,680 --> 00:00:54,080
for fleshy humans to understand what was on the screen. But unsurprisingly, there's been a lot

13
00:00:54,080 --> 00:00:59,200
of investment over the past couple of decades into automated optical character recognition,

14
00:00:59,200 --> 00:01:05,200
or OCR, which helps digitize old books and other publications so that they're more easily searchable.

15
00:01:05,200 --> 00:01:10,880
Doing this requires AIs to discern text even if it's distorted in some way. So over the years,

16
00:01:10,880 --> 00:01:15,600
efforts were made to make them better at this. And unsurprisingly, the text we put into those

17
00:01:15,600 --> 00:01:21,840
little capture tests help the AIs do this. Many captures use scans of distorted words

18
00:01:21,840 --> 00:01:27,600
from real books or newspapers. And because we humans took so many of these tests,

19
00:01:27,600 --> 00:01:34,000
we trained AIs to become very good at them. But unfortunately, AI algorithms can be used by bots.

20
00:01:34,000 --> 00:01:38,800
So we stopped using word-based captures and moved on to those little picture puzzles.

21
00:01:38,800 --> 00:01:43,600
For a while, picture captures proved more effective than word captures. But guess what?

22
00:01:44,160 --> 00:01:47,440
Google, who's responsible for producing many of those picture puzzles,

23
00:01:47,440 --> 00:01:53,520
uses human input on those to train their AIs as well. If you've noticed how lots of these

24
00:01:53,520 --> 00:01:58,640
captures are photos of street signs, traffic lights, and crosswalks, etc., it's because

25
00:01:58,640 --> 00:02:03,120
Google is taking all that data that you give it when solving these puzzles and using that

26
00:02:03,120 --> 00:02:08,960
information to help build AIs for self-driving cars, as well as to improve the quality of results

27
00:02:08,960 --> 00:02:14,800
on Google Maps. And again, AIs got really good at solving these puzzles, so the folks behind

28
00:02:14,800 --> 00:02:19,520
capture have been making them progressively harder, much to the chagrin of you, the average

29
00:02:19,520 --> 00:02:24,080
internet user, who's just trying to bulk-buy toilet paper online instead of schlepping up to

30
00:02:24,080 --> 00:02:30,160
Walmart. But is there an end in sight to this arms race? The way those I'm-not-a-robot boxes work

31
00:02:30,160 --> 00:02:35,440
without forcing you to solve a puzzle is by tracking your in-browser behavior, how fast you type,

32
00:02:35,440 --> 00:02:40,480
how you move your mouse around, even the way you switch tabs to determine that you're human,

33
00:02:40,480 --> 00:02:45,120
and not a bot that can input paragraphs of text in a matter of seconds or move a mouse to

34
00:02:45,120 --> 00:02:50,480
precisely a single pixel location on the screen. Some engineers believe what we'll end up having

35
00:02:50,480 --> 00:02:54,640
is a kind of suit-up version of this. There's no secret that you're constantly being tracked in

36
00:02:54,640 --> 00:02:59,760
some form or another while you're online, so future versions of capture may not only track the way

37
00:02:59,760 --> 00:03:04,800
you type or move your mouse, but also keep tabs on you more thoroughly through cookies and browser

38
00:03:04,800 --> 00:03:10,000
activity to determine whether or not you're human. Such a capture would basically always be running

39
00:03:10,080 --> 00:03:15,600
in the background on a browser level, so while it might help us beat the bots, there are obvious

40
00:03:15,600 --> 00:03:22,000
privacy concerns, in addition to the fact that bots might end up beating these tests at some point

41
00:03:22,000 --> 00:03:26,480
in the future as well. I just hope it doesn't get to the point where all computers come with a USB

42
00:03:26,480 --> 00:03:30,640
connected DNA sample collector. So thanks for watching guys, if you liked this video, hit like,

43
00:03:30,640 --> 00:03:34,960
hit subscribe, tell your friends, and hit us up in the comments section for your suggestions for

44
00:03:34,960 --> 00:03:37,200
topics that we should cover in the future.