Our Web Versus Search Engines

May 25 2020

Sometimes it seems like search engines are amazing. And yes, for a lot of questions, they do get us an answer quickly.

But other times search engines fail me. For example I might search for pages that talk about two topics together, and I’ll get results for a popular page about the first topic that happens to have a link to a page on the second topic in its navigation menu.

For a long time Google would return pages that were written in PHP for queries about PHP, presumably because the URL ended in “.php” and that matched with my query: “PHP how to append an array”. Not useful, Google.

They have other quirks, like just today while trying to write proper asynchronous code in TypeScript I submitted dozens of queries, one of which was “javascript emit occasional result”. Not a great query but I was desperate. For some reason three of the first five results are for books about utterly unrelated topics. The first result is a book on nanoscience, another is on marine fauna. The fourth is on galaxy formations so I assume this is Google’s galaxy brain search results.

We all know search engines have limitations, but the bigger problems lie deeper.

Algorithms Are Abused

When the algorithm isn’t tripping over a simple query, other actors are determined to make it trip in their favor.

Everyone wants to be #1 on search results. Any aspect of any algorithm will be abused to get higher in the results pages.

Google is famous for inventing the PageRank algorithm. The number of pages linking to a page is a signal for how helpful that page is, or at least how popular.

But as soon as web properties figured this out, the bazaar of back-link bat-shittery began. How many times have you gotten (or sent 😲) an email asking for a link to a page? Are bots leaving random word salad comments in your forums with a few links over and over again? Thanks PageRank!

Google tries to penalize abusers, and it succeeds over time. But there are always new aspects to its algorithms, and the SEO crowd is a crafty bunch so the PhDs in Mountain View are always playing catch-up.

The reality today is results are littered with pages that should not be there: they tricked the algo and won.

Have you ever clicked on a promising search result, read through the content, and was left with the impression you just ingested the spilled contents of a word bag related to your query? There are sentences, and they make sense, yet so little meaning or insight is revealed? The site is often nice and usable and looks like every other helpful site out there, yet lacks a true personality? The only invariant is that there are a bunch of links sprinkled throughout, all pointing to the same commercial site. Uh huh. So much HTML theater for a measly click!

Thanks, search algorithms.

The Algorithm Shapes Our Web

It’s bad enough sifting through spammy search results to get a good one. Sometimes even the good ones are compromised by a desire to increase search visitors.

A few years ago Google updated its algorithm, and a bunch of people saw their search traffic drop. SEO folks figured out that pages that were a bit old but had a date on them weren’t being shown anymore. The way to recover traffic? Just remove the date from your posts. Easy peasy!

Have you ever landed on a technical blog, found the content promising but needed to know how old it was before committing to reading it? Tech moves fast, and reading a post that is many years old can be misleading. So you scroll all the way up and down to find the date, only to realize there is none on the page. Frustration! The reason? It’s likely the blogger heard that removing post dates would increase their search visitors.

Google has since corrected this, but not everybody is up to speed, so dates are still missing here and there. Sigh, and thanks again search algorithms.

The Algorithm Shapes Our View Of Our Web

Just the other day a question on the orange site ranked highly and got lots of positive feedback: Ask HN: Is there a search engine which excludes the world’s biggest websites?. It seems Google favors pages from large internet properties, and tends to leave small sites out of the results.

It’s clear that people (or HN users at least) feel they are missing out on part of the web. When every query is answered by one of very few search engines, we are viewing the web’s content through a filter that has its own stake holders who are not us.

There is the web and there is the web we see through Google. How much are we missing?

It’s like having an intergalactic spaceship that can take you anywhere in the universe in the blink of an eye, but its radar can only detect large planets with the proper arrangement of keywords.

We Need Something New

I want a new paradigm for finding stuff online. I want something that depends less on algorithms and more on humans. I want results that make sense to me and other humans like me, even if there are fewer and it takes a bit longer. I am no longer blown away by 60,000 results in 0.000003 seconds. Blow me away with five results that are outstanding. I’ll wait.

I know we can’t completely replace the might of Google, but just like I try searching on DuckDuckGo before I search on Google, I want to search on ? before I search on DDG and Google, whatever “?” is.

Let’s search for our “?”.

This was post 20 of the #100DaysToOffload challenge.

Algorithms Are Abused

The Algorithm Shapes Our Web

The Algorithm Shapes Our View Of Our Web

We Need Something New

Olivier Forget