Archive

Rant

Recently, the New York Times reported about widespread discontent with Google AI search helper. There are plenty of anecdotes about Google AI incorrectly recommending that people add food-safe glue to their culinary endeavors, leave animals in hot cars, or smoke cigarettes while pregnant. While the plural of anecdote is not data and while it’s bad to dislike something simply because it’s popular to dislike it, I think there’s a compelling reason the Google AI in search is just bad.

I lied, there are two.

Let’s start with the smaller of the two reasons: energy. If Google had volumes like DuckDuckGo or OpenAI it might not be an issue, but I think it’s important to remember the scale of Google. Google gets about ~100k searches per second, or about 8.5 billion searches per day.

Let’s try some back-of-the-envelope math: The best-performing model I could find with sub 1-B parameters on the Hugging Face large language model leaderboard was a 7M. (7 million parameters, for the uninitiated.) Big caveat: this model was not fine tuned for summarization, I don’t know the overall performance, I don’t know if it will do summarization, and I don’t know if it will generate good embeddings. The important part is that we have a reasonable count for the minimum viable model. A batched inference, assuming a maximally efficient operation, will be qp = query*projmatrix, kp = key*projmatrix, vp = value*projmatrix, (three flops per weight), qp dot kp (another flop per weight), softmax (another flop), and (qp dot kp) dot vp (another flop). That’s six flops per weight, thereabout, so a 7M model will take 42 MFlops per inference at optimal efficiency. I think it’s also important to reiterate that this is a lower bound. I’m not counting any normalization, assuming a full batch, and not counting any activations other than the softmax. At our 8 billion queries per day that’s an additional 357,000,000,000,000,000 flops (357 petaflops) per day. (See below for notes on caching.)

Admittedly 357 PFlops is a drop in the bucket compared to other compute, but nevertheless, let’s look at it from an energy perspective: the most efficient supercomputer (JEDI, as of 2024) has about 72.7 GFLOPS/watt. (Source: Green500, May 2024) We have a net additional cost of 4,910,591 Watts for their daily search.

4.9 megawatts for the feature, very rough ballpark, for a 7M parameter model. That’s maybe 1% of the smallest nuclear reactor in the US? If they use a 1B model, this value becomes around 701 megawatts per day.

This figure ignores query caching, which would bring compute down, but it also ignores indexing for RAG, which would bring compute up. It also assumes TPUs are as efficient as the most efficient supercomputer in the world.

I think it’s a judgement call at this point. 5-700 megawatts extra doesn’t seem like it’s worth it to me, but it’s not so obscene that it’s completely impossible to tolerate.

That brings us to the second and more important matter: hallucination.

In the best case, a user is smart enough to look up the answer and verify that it’s correct. In the worst case, they take a wrong answer and use it as gospel. The incidence rate of bad generations might be low, but it’s sufficiently high that people always have to check, making the feature useless. (Amusingly, since the people need to check regardless and the AI response comes at the top, this forces people to use that tiny bit of extra time scrolling down.) Even if the bad generations are rare, they’re wrong often enough to be embarrassing to the industry. Google was always competent at more subtle implementations. Their computational photography was absolutely state of the art. Photo search was top notch. Clever algorithmic ranking used to be great. It was the quiet, carefully considered applications of ML that were most compelling. This feels like a vocal “behold my greatness” preceding a staggering display of mediocrity.

A link to the original Reddit thread that held this comment: https://www.reddit.com/r/MachineLearning/comments/1czzt45/d_should_google_ai_overview_haven_been_released/l5kflx9/

Welcome back to the Bro Jogan experience.

Bro: Today we have our long awaited debate between Professor Albert Michaelson…

Albert: Thank you, Bro.

Bro: And Xalthra the Perpetually Vomiting.

Xalthra: *vomits*

Bro: Thanks for coming. So uh, Albert, you’re an expert on gravity.

Albert: That’s probably safe to say, yes. I’m the chair of the physics department at Purdue University and the head of the joint research council for cosmological and applied physics.

Bro: And Xalthra, welcome back to the show. I understand you have a new book on how gravity is a false concept. “Smash-and-Grab-ity: How big aviation wants you to pay more for less safety.”

Xalthra: *vomits*

Bro: We’ve been seeing stories about this all over the place. So Albert, why should we be paying more for less safety?

Albert: That feels like a mischaracterization of the position. I’m not proposing that we pay more for less safety, I’m suggesting that Xalthra’s proposal to completely deregulate the aviation industry is not based on any evidence. Or, really, I’m rebutting the idea that gravity isn’t real for domestic practical purposes, and I’m suggesting that we do have a good understanding of the behavior of gravity on the scale of classical dynamics.

Xalthra: *vomits*

Bro: I think Xalthra is making a good point there. Do you have a rebuttal?

Albert: I’m not sure quite where to begin. There was nothing really to rebut. I’m not sure there were even words in there.

Bro: But you are the expert in gravity? So what do you think about all the people who have fallen out of airplanes and survived?

Albert: I mean, there do exist people who have fallen out of airplanes and survived, but they constistute a small minority of individuals, and even if they did survive–

Xalthra: *vomits again, more violently*

Bro: Yeah, it sounds to me like falling distance and death are uncorrugated.

Albert: Uncorrelated?

Bro: Yeah, like fall more and you’re no more likely to die. I have a buddy who fell out of a fourth story window one night, landed on his neck, and he’s still alive, so why should we be giving the airlines all this money to keep us safe?

Albert: There’s a lot to unpack here. The plural of anecdote is not data. And I’m fairly sure falling and death may well be correlated. Even if they weren’t, you can still test gravity, which is the point. Giving money to the aviation industry doesn’t necessarily mean they’re keeping you safer. It’s not like your dollars go to airplane manufacturers to keep us safe. It’s the regulatory body that’s responsible for ensuring the airplane manuf–

Bro: Let’s go into that. Giving money to the aviation industry doesn’t mean they’re keeping you safer.

Albert: That’s not the best pull-quote.

Bro: I know all kinds of old people that will fall out of their beds and die. The stories are all over the internet, you can do the research.

Xalthra: *vomits*

Bro: Exactly, so really, can any of us know that gravity exists for a fact?

Albert: That’s more of a philosophical tangent. Epistemology is not my specialty and not the route I’m hoping to travel.

Bro: But you’re the one that said epidemiology should track all kinds of deaths?

Albert: Uh.Bro: We have the pull quote, “The CDC is a perfectly fine agency to measure gun fatalities. It’s not perfect, but they’re well equipped to monitor other causes of death, so it’s a good stopgap.”

Albert: Epidemiology is not epistomol–

Bro: So did you or did you not say that?

Albert: That’s not relevant.

Bro: Sounds relevant to me.

Xalthra: *vomits again*

Bro: I think we’re going to have to wrap this up. I’d like to thank our guests for coming on the show tonight.

Let’s get the 4k jokes out now. All done? Great.

My personal website started as a way of tracking my New Year’s resolution some 12 years ago. It’s still up as a place for putting short stories, rants, and guides. I think cohost as a microblogging platform is taking over that role in some respects, but I’m not ready to make a redirect here. I’ll settle for cross-posting.

I’ve been burned out a half-dozen times in about as many years, and something that particularly gets to me is not long hours but a lack of shipping anything. Spending a year on a project to have it canned is more demoralizing than spending a week under duress. In the interest of resolving this, I would like to ship one small app or game per month. It seems strange that adding more strain to the schedule would cause less stress, but every vacation I’ve had so far this year has been absolutely plagued by depression over getting nothing done and yet somehow failing to relax.

This will not be a trivial matter, as work, home obligations, and a commute leave me with almost solely the hours of 7:30PM to 10:30PM, Mon/Wed/Th/Fri to myself. That’s 12 hours a week plus whatever I can do on Saturday morning. That’s 48 hours a month at maximum efficiency, and probably rather optimistic, given I need to do domestic things like laundry and cleaning and taking care of people/creatures.

Is it really worth spending the little time I’ve got making games instead of cultivating a skill like making art or practicing music? I’m not sure. 12 hours a week feels like so little time for all the things I want to do, but this is the reality of the situation, so I guess the only option is to make the most of it.

So many resolutions revolve around suddenly having discipline never before demonstrated in one’s life, and this isn’t much different. Perhaps I can simply make do with building up a habit.