WEBVTT

00:00:00.000 --> 00:00:04.320
Why do search engines suck now? Wait, before we get ahead of ourselves,

00:00:04.320 --> 00:00:07.720
do search engines suck now? Are they actually getting worse,

00:00:07.720 --> 00:00:13.600
or have they just changed in a way that I personally hate? Just by glancing at the first page of a Google or Bing search,

00:00:13.600 --> 00:00:16.760
it's easy to find a long list of potential complaints.

00:00:16.760 --> 00:00:21.320
Paid sponsors crowd out the first few search results. There's annoying little widgets everywhere,

00:00:21.320 --> 00:00:25.360
and it keeps giving you barely related suggestions for what to search next,

00:00:25.360 --> 00:00:29.960
like a lame choose your own adventure book. But why does this search page have so many ads?

00:00:30.080 --> 00:00:33.680
Isn't an interesting question. It's because money.

00:00:33.680 --> 00:00:38.880
The more interesting question is why it feels like it's gotten harder to find the information you want

00:00:38.880 --> 00:00:42.400
despite all those supposedly helpful widgets.

00:00:42.400 --> 00:00:47.560
The answer is also because money, but it's worth unpacking why.

00:00:47.560 --> 00:00:52.520
It's difficult to get a lot of rigorous data on this subject, but at least a few researchers have tried to answer

00:00:52.520 --> 00:00:57.120
whether or not search engines are, in fact, getting worse. One recent German study did a survey

00:00:57.120 --> 00:01:02.480
of over 7,000 product review searches on Bing, Google, and DuckDuckGo over the course of a year,

00:01:02.480 --> 00:01:05.880
and concluded that you could still find useful information,

00:01:05.880 --> 00:01:12.720
but it was being drowned out by a torrent of low quality content, especially SEO spam.

00:01:12.720 --> 00:01:15.960
Top ranked pages typically were heavily optimized

00:01:15.960 --> 00:01:22.320
and littered with affiliate marketing links, and they also showed clear markers of lower content quality.

00:01:22.320 --> 00:01:26.520
There's not a ton of academic papers on this issue, but there's plenty of data on the web

00:01:26.520 --> 00:01:32.520
showing how users have changed their behavior to sidestep low quality, highly optimized results.

00:01:32.520 --> 00:01:39.040
One possible indicator that search engines suck now is the growing use of Reddit as a de facto search engine.

00:01:39.040 --> 00:01:45.280
Sadly, Reddit's own internal search function has long been considered what the experts call absolute trash,

00:01:45.280 --> 00:01:48.920
but that hasn't stopped savvy users from just sticking the word Reddit

00:01:48.920 --> 00:01:54.080
on otherwise unrelated Google search queries. It's a well-known tact for cutting out SEO-ified garbage

00:01:54.080 --> 00:02:00.760
and vapid listicles because it effectively bypasses the weaknesses of both search engines and Reddit itself.

00:02:00.760 --> 00:02:06.320
Reddit isn't perfect, just ask any Redditor, but on the modern internet, it does a rare and special thing.

00:02:06.320 --> 00:02:11.920
It allows users to direct their question to a bunch of big old nerds who care more about being right

00:02:11.920 --> 00:02:18.560
than they care about making money off the interaction. If you search site colon reddit.com search engine bad,

00:02:18.560 --> 00:02:22.480
you'll find plenty of posts complaining about the decaying state of modern search engines

00:02:22.480 --> 00:02:26.240
going back over a decade. There has, however, been a pretty major uptick

00:02:26.240 --> 00:02:29.360
in such posts over the last two years.

00:02:29.360 --> 00:02:34.320
Separately, Google Trends data shows that Reddit has been steadily gaining popularity

00:02:34.320 --> 00:02:38.880
as a search term since 2010, when news aggregator Dig shot themselves in the foot

00:02:38.880 --> 00:02:42.400
with a controversial redesign and started bleeding users.

00:02:42.400 --> 00:02:45.800
That growth remained steady until December, 2021,

00:02:45.800 --> 00:02:50.280
when users started appending Reddit to their searches at an increased rate,

00:02:50.280 --> 00:02:53.600
over 40% of Reddit's growth as a Google search term

00:02:53.600 --> 00:02:57.680
since 2010 is from the end of 2021 onward,

00:02:57.680 --> 00:03:02.040
a bit over two years. Now, there are a lot of potential confounding factors here,

00:03:02.040 --> 00:03:07.080
but this could be, at least in part, a consequence of widespread dissatisfaction

00:03:07.080 --> 00:03:10.800
with search engines. An interesting contrast to Reddit's upward trend

00:03:10.800 --> 00:03:15.000
as a search term is Wikipedia, which long predated Reddit as the kind of word

00:03:15.000 --> 00:03:18.000
you add to the end of a search query in order to cut through the noise.

00:03:18.000 --> 00:03:21.040
Wikipedia is a far more popular website,

00:03:21.040 --> 00:03:24.640
currently ranked seventh for global traffic to Reddit's 16th,

00:03:24.640 --> 00:03:28.040
but it's been on the decline as a search term since 2010,

00:03:28.040 --> 00:03:31.640
in part because Google heavily prioritizes Wikipedia already,

00:03:31.640 --> 00:03:36.200
both in search results and as part of its knowledge panel widget.

00:03:36.200 --> 00:03:40.240
But this might also indicate that the decline in quality for search engine results

00:03:40.240 --> 00:03:43.320
isn't hitting every search subject equally.

00:03:43.320 --> 00:03:48.200
There's not a ton of money riding on a search query like, when was the war of 1812?

00:03:48.200 --> 00:03:52.320
So the top results are mostly authoritative for reliable history sources.

00:03:52.320 --> 00:03:56.840
But if the most important goal of search engines is to find useful results,

00:03:56.840 --> 00:04:01.080
why haven't they fixed the problem? It's not that search engine companies don't care

00:04:01.080 --> 00:04:04.280
that spam is cluttering up the first two pages of results.

00:04:04.280 --> 00:04:07.360
They've been in an arms race with spam since the very beginning.

00:04:07.360 --> 00:04:10.920
It's just that the spam is now apparently winning.

00:04:10.920 --> 00:04:16.040
According to the authors of the German study that we mentioned earlier, search engine companies banning spam sites

00:04:16.040 --> 00:04:21.000
and readjusting their parameters had a positive, but ultimately temporary impact.

00:04:21.000 --> 00:04:24.600
There was still a general downward trend in terms of text quality and relevance

00:04:24.600 --> 00:04:29.240
for all three search engines studied, which could imply that this isn't a problem

00:04:29.240 --> 00:04:33.560
with search engines, but a problem with the internet itself.

00:04:33.560 --> 00:04:37.280
Appearing on the first page of Google can be life or death for a company.

00:04:37.280 --> 00:04:40.580
So there's massive financial incentive to game that system.

00:04:40.580 --> 00:04:45.040
The same as how there's a massive financial incentive to game ratings on sites like Amazon

00:04:45.040 --> 00:04:49.480
where fake reviews are notoriously rampant. Companies both big and small have realized

00:04:49.480 --> 00:04:54.000
that word of mouth personal recommendations from a financially disinterested human being

00:04:54.000 --> 00:04:59.200
are far more effective than traditional advertising, which means that the shadier among them

00:04:59.200 --> 00:05:03.600
put a lot of effort and resources into infiltrating so-called organic,

00:05:03.600 --> 00:05:09.040
user-generated systems of validation, drowning out authentic user reviews.

00:05:09.040 --> 00:05:12.040
Not to get too metaphorical, but the only way to be heard in a room

00:05:12.040 --> 00:05:15.560
where everybody's already yelling is to scream even louder.

00:05:15.560 --> 00:05:18.880
Everyone's incentive is to make more and better garbage.

00:05:19.560 --> 00:05:24.440
Large, vertically integrated companies like Google don't really help this hyper-competitive,

00:05:24.440 --> 00:05:27.680
low-effort environment when they leverage their platform

00:05:27.680 --> 00:05:31.520
to prioritize their own products over competitors.

00:05:31.520 --> 00:05:34.800
Google has had long-standing beef with Yelp

00:05:34.800 --> 00:05:38.560
since at least 2011 when the FTC investigated allegations

00:05:38.560 --> 00:05:43.920
that Google's search algorithm consistently favored Google Places over Yelp.

00:05:43.920 --> 00:05:48.080
That allegation was serious enough that Google actually agreed to allow online services

00:05:48.080 --> 00:05:53.760
to opt out of data scraping. Yelp further contributed data to a 2015 academic paper

00:05:53.760 --> 00:05:57.360
claiming that Google manipulates search results to favor itself.

00:05:57.360 --> 00:06:02.880
Small companies perceive often accurately that the platform they essentially have to use

00:06:02.880 --> 00:06:06.560
is a potential competitor that can and will replace them

00:06:06.560 --> 00:06:12.120
with a store-brand version at any time. Those fancy widgets and rich snippets exist

00:06:12.120 --> 00:06:15.840
so that the engine that can take you anywhere you wanna go

00:06:15.840 --> 00:06:19.960
is now a place that you never have to leave. So what are you gonna do?

00:06:19.960 --> 00:06:23.720
Build a better product? Pay for your position at the top

00:06:23.720 --> 00:06:28.080
or find a louder way to scream. That's why even though we said earlier

00:06:28.080 --> 00:06:31.200
that it's not necessarily search engine companies fault

00:06:31.200 --> 00:06:34.360
that they've gotten worse, it also kinda is.

00:06:34.360 --> 00:06:39.000
You know how captures have gotten harder and harder over time? Well, that's in part because they were used

00:06:39.000 --> 00:06:43.080
to train machine learning, which then led to the bots becoming more sophisticated,

00:06:43.080 --> 00:06:46.800
which then led to the need for stronger and stronger captures and so on.

00:06:46.800 --> 00:06:51.840
What we're seeing here is likely something similar, only internet-wide, where search engines are struggling

00:06:51.840 --> 00:06:56.080
to distinguish between quality content and spam from AI systems trained

00:06:56.080 --> 00:06:59.120
on traditionally trusted sources like Wikipedia.

00:06:59.120 --> 00:07:02.640
The increasing cheapness of relatively sophisticated spam tools

00:07:02.640 --> 00:07:08.600
has resulted in numerous odd trends. Some funny, like when oodles of products on Amazon

00:07:08.600 --> 00:07:12.320
wound up being named, I'm sorry, I cannot fulfill that request.

00:07:12.320 --> 00:07:16.840
It goes against open AI use policy. And others, disturbing.

00:07:16.840 --> 00:07:20.360
Like the rise of procedurally generated clickbait obituaries,

00:07:20.360 --> 00:07:23.400
often for private citizens, many of whom aren't even dead.

00:07:23.400 --> 00:07:27.240
You used to have to be at least a D-list celebrity in order to get that kind of treatment.

00:07:27.240 --> 00:07:30.240
Search engines have not lost the fight against spam,

00:07:30.240 --> 00:07:33.920
at least not yet. But as machine generation continues to progress

00:07:33.920 --> 00:07:37.680
and proliferate, your search experience probably isn't going to get any better.

00:07:37.680 --> 00:07:44.760
Thanks for watching guys, if you liked this video, maybe you'd like another video we have about how streaming is basically becoming cable.

00:07:44.760 --> 00:07:46.960
You can click on it somewhere.
