It can only index stuff that's on the Web. Stuff on the Web is, contrary to what is popularly asserted, only a tiny fraction of all human knowledge.
I think people are forgetting how bad search was before Google. Google drove Web directories to extinction. Remember Yahoo!? Back in that era, if I were looking for something as simple as the University of Michigan, I clicked and drilled down through a Yahoo directory. The obvious search query would have been useless. Google changed all that.
I view Google as the yellow pages. It works well for that. Is it an oracle of knowledge? Of course not. How could I possibly expect to find knowledge on a place where there is no reward for making it available? People producing knowledge don't work for free.
I've tried ChatGPT and it's no better. It serves up stuff that is flat-out wrong.
Not optimize for "most documents indexed" but "highest quality of results". One of them encourages adding spam to their index, the other encourages removing spam from their index.
And yet for some reason they're all too eager to serve up sites scraping stackoverflow.
So do I. I can't tell you the last time I even held yellow pages in my hands.
In the last 2-3 months search quality for me has absolutely crashed and is barely usable.
In the present day, I cannot find my answer on the first page. If I click on the top hits the page is a deluge of useless blogg fluff which takes me more time to find what I am looking for.
More often than not have to add reddit, forum, stackoverflow, etc to find what I am looking for because online communities provide more concise answers.
This is why googles utility has collapsed.
HN is constantly pushing this notion that "spam" is some well-defined, solvable problem, so obviously Google wants it. That narrative just doesn't make sense from any angle. The notion that more click bait improves Google's bottom line is absurd
It's not that the content doesn't exist or isn't indexed, its that its been drowned out by noise. Sifting through noise better was the entire reason google took off from more standard crawlers. It now returns results worse than crawlers from the previous era.
That’s still a thing, although it seems they’re A/B testing its removal. I just opened a private tab (as I always do) and got a boring "More results" button, but I tried another browser (also with a private tab) and got the classic pagination.
Relevant search results that aren't just marketing sites or the big websites.
> It can only index stuff that's on the Web.
And much of it isn't really exposed by Google search.
> I view Google as the yellow pages. It works well for that
It used to. For me, it stopped working well for that a few years ago and has been getting steadily worse ever since.
Product reviews alone, whether it is enterprise software or sports clothing should be something that they can easily comb through by hand, as humans, and uprank sites that are putting out more than affiliate link assemblies.
That is an absurd exaggeration.
Before the results would just not match what I was looking for. Now they do match what I was looking for, except some AI procedurally generated the content to show up when I searched those terms, with no regard for the accuracy of what the page says.
Today:
* Any term that might be related to a commercial product? That product comes first and frequently only.
* Search for two terms? It will first give it's prefer result for each separately - usually commercial products (ha). And then might give them together.
* Quoted terms are often taken as vague suggestions. Negative sign is often useless, etc.
Luckily HN posters don’t exactly represent a meaningful portion of the population.
I'm willing to accept that maybe you are exaggerating to make a point. Maybe you have a better example that is actually illustrative?
If I say "show me the best winter gloves, and only from sites that you can verify actually product tested" and it follows the instruction (ignoring sites that just have a list of popular search results aggregated) then it is better. If it doesn't do what I want, I expect to be able to follow up and teach it.
I expect the chat style stateful search to take instruction for what type of sites I want results from. "Return me a list of websites with recipes for Bolognese that do not have a long story above the recipe. Build a table with the top five results normalized for portion size, comparing and contrasting the ingredients. Highlight unique ingredients in bold."
Then, with Google, it got better and almost all results were relevant.
But we’ve been regressing over the years, and now we’re at the point where 80% of all results are both irrelevant and simply SSO spam.
I find it really hard to believe Google has some of the smartest people in the world on search and they cannot identify this.
For example, just the other day I was searching for one string that I knew was part of a common code repository. To my surprise google couldn't find anything at all. Yandex on the other hand found the repository immediately and linked to github.
Other common issue with google is the difficulty of finding stuff like forum posts related to the search query. Sure, you could append "reddit" to the query, but there are still plenty of traditional forum sites and some of them have decades worth of discussion. I Never see those sites pop up on a typical google search unless I specifically look for them. Again, with yandex, my experience is much better, it is not uncommon to see posts from forums to be on the first page of results.
ChatGPT usually gives me the answer that I'm looking for and nothing else. Sometimes it does add extra info, which often teaches me about something that I wasn't aware of at all.
But the greatest benefit is I can ask it to clarify anything I don't understand. I don't need to go on a completely new Google quest, or jump through hoops to register on some site and hope a random internet person will ordain to help me out. I can just ask, in the same conversation, and immediately get clarification.
Many people underestimate the incredible learning opportunities a well trained language model provides. It doesn't matter that it hallucinates or lies. Whatever it claims is usually easy to validate. What matters is the speed with which you can find uncluttered new leads or answers.
You are dealing with a moving target that has a huge financial incentive. It's a very difficult problem.
Google didn't innovate that much except to provide a clutter-free interface and slightly better search. Prior to that, I used Webcrawler and then HotBot. A search like what you described would have easily returned useful results.
I want you to start a blank slate C (or C++) project. Ask Google how to write heapify, push_heap, and pop_heap in C. Ask ChatGPT the same.
I did this a few weeks ago. I literally could not find the answer on Google. ChatGPT gave me actual C code that I definitely did not trust but did verify.
Google results for questions like that are genuinely awful. It’s full of shitty tutorial websites that are full of ads and either don’t have the answer I need or don’t have it in a convenient form.
And I've /recently/ hunted for something obscure, couldn't find it, managed to find an old bookmark to it, the server was still online and the content I wanted was still there. And no amount of crafting of a google search would bring it up. And the server in question didn't contain copyrighted material which would have resulted in a takedown block or anything like that.
It's frustrating how /bad/ Google has gotten for anything other than fairly basic, high level "searching".
I mean what you just listed.
Google won the search war because of PageRank eliminating lots of spam, and then something like 15 years of staying ahead of SEO spam and providing useful search.
Lately it seems like they've given up on the arms race and let the SEO spam win, but it isn't clear why.
And Google didn't produce high quality search for free, they used ads and sold the eyeballs they won.
Im legitimately asking, who is responsible for Search at Google? Prabhakar Raghavan is SVP, Search, Assistant & Ads, and I click under him, he has 8 product groups reporting to him, and none of the people are responsible for Search. Yossi Matias is responsible for Search Engineering.
It may at first come off as a laughable answer, but Google Search has been in a directionless spiral since Marissa Mayer left. Her Yahoo tenure was not well received, but at Google she cared about the end quality of the product. Her title was Search Products and User Experience. Notice how we have gone from User Experience to Search Engineering, forgetting about the people who actually use the product.
The competition for many kinds of search terms is causing a race to the bottom. E.g. tech docs, lyrics, recipes, reviews.
That’s why Kago has a lense for “non-spammy recipe searches” — there’s just so much noise on popular, easily copyable material.
You don’t get the best site by popular vote like PageRank was known for, you get the one that generates the most ad revenue.
You figure out a way to crowdsource certain decisions and establish who you can trust. Ask them questions with right and wrong answers. You start to tackle it one product category at a time. Instead of pagerank, which was a web of who linked to who" you start figuring which voters you have who consistently turn in good feedback.
This is some form of metamoderation that slashdot tried to implement.
If you are going to be a tastemaker, stop hiding behind "the algorithm" having some mind of its own that cant be controlled.
#1 result is a long article with culinary history, detailed instructions, many pictures, and a credited author originally from Shanghai.
#2 result is a simple recipe listing from Buzzfeed. Written by a young white guy from Minnesota who worked as a producer. No fluff, no pictures, no backstory. Doubtful the author ever made the recipe at all. You could grab a recipe database and generate thousands of these pages.
I've been burned by #2 too many times disregard the fluff. It shows their investment in the content.
Lots of trash out there but Serious Eats is good quality.
https://www.verybestbaking.com/toll-house/recipes/original-n...
https://www.allrecipes.com/recipe/10813/best-chocolate-chip-...
> The notion that more click bait improves Google's bottom line is absurd
If you don't find what you're looking for on the first try, you'll need to try again, and see more ads. What else are you going to do, go elsewhere, visit a library, ask the town elders or give up on looking for things you want to know? You don't have a choice, you know it, they know it.
I find it equally plausible that Youtube's search sucks badly because they don't care what you're looking for, they want you to watch videos that they predict will lead to the maximum time spent on the site, again so you watch more ads. What other explanation is there that the world's leading search engine has the search of one of their flagship products run at 1999 quality? Presumably they have giant teams of people working on that too?
I see two options: a) Google can't do any better than that, b) Google has a reason to keep it in the current state (I'll put "Google doesn't know because nobody at Google has used Youtube in the last 5 years" and similar options under "a").
a) sounds ridiculous, b) sounds conspiratorial. What are the other options?
And again, I'm not saying they are making search worse on purpose (no "from now on our core mission is to make search suck"). I'm saying they aren't optimizing for SERP quality. They seem to care about index size (maybe it's an internal KPI? would certainly explain their aggressive guessing at additional URLs that you might have on their page but don't link to, don't add in sitemaps etc, and their stubbornness in keeping results from the index even if they've been 301ed or 410ed ages ago (they do get downranked after a while though)), but I assume that they mostly care about paid ad clicks, and if something increases ad clicks while the result quality decreases, they'll do it.
I use BBC good food, almost always straight to the point
I get not everyone is a foodie that cares about the details and wants to tweak it, but I appreciate them.