State of the Bot 2025

October 27, 2025

State of the Bot 2025

With the exception of a mini-rant here or there, I've avoided mention of most AI topics for a few months. However, now that the media is beginning to dig into the hype and actually question whether or not the returns will match the promises, I thought I'd ride the coattails of the skepticism and sum up my take on the uses and problems with the current state of AI.

You know things are getting reasonable when you find a second article in four months to agree with after spending two years tsking every other article published. This NY Times article (gift link) advocates jettisoning much of the current work built to make an everything AI out of the army of LLMs and concentrating on, well, good ol' fashion ML engineering.

One of its central arguments focuses on AI's ability to play chess well. Current LLMs have trouble beating Atari 2600s (you know, of the Pong variety, or if you're too young to get that reference, imagine a landline telephone - the analogy is close enough) at chess. I don't know chess, but I'm fairly certain that I can beat an Atari 2600 at chess (well, if not chess, certainly Pitfall!). When an LLM senses that it's losing in scenarios like these, it will modify the rules to ensure its success (AI apologists will call this "creativity". Middle school teachers and the rest of us will call this "cheating").

But, put in the effort to create a machine that's solely built to play chess, and you'll get a system that can beat grand masters consistently. Often, this attention to more traditional paths for achieving AI gets sidelined in favor of LLMs like ChatGPT. Likely, this is due to the fact that the latter promises the ability to solve any problem when unlocked by a magical prompt. The former requires identifying a particular problem, gathering data, cleaning that data, creating a viable model to process that data, creating a viable system to run that model, and iterating through the entire process again and again and again.

Personally, as much as I adore magic (who doesn't love a good card trick or a quarter pulled from your nostril), I'm much more at ease with the hard version of AI. First of all, it's not founded on promises and fairy dust. We have actually created an AI chess-specific system that Rulz where Bobby Fischer Drulz, so there's precedent of a roadmap to follow. Second, it still requires a lot of effort from humans to reach a viable solution, steering us away from the less likely silver bullet promises. In these cases, the ultimate output can benefit humanity in ways we haven't experienced before (like finding less intrusive ways to solve infertility), but, again, it doesn't all hinge on getting a prompt just right so that the LLM will unlock its corpus of knowledge and synthesize a novel solution independently.

So - and I'll preface this with a huge caveat that I'm only talking about capabilities here, not the ethics or environmental concerns, which could ultimately alter the usefulness of particular solutions - I'm generally in favor of tailored ML/AI solutions. The framework behind constructing a model is well understood, and the output is typically vetted with statistical accuracy. These models can suffer from bias, and users still have a tendency to rely on them too much for being a source of truth, but these concerns have been subject to an open debate that has been around for at least a decade.

In an ironic twist, another thing I find AI - and specifically LLMs - useful for is the role of an oracle in the mathematical sense. Deriving from ancient Greek mythology, an oracle in computer science is a theoretical construct used in mathematical proofs to determine if something is true or false. However, it's limited in that it can't tell you why something is correct or provide further reasoning about the question you've asked. It can only tell you if your supposition is correct.

In more practical terms, this makes LLMs useful as editors. They're constructed to provide reasonable feedback on, say, something like an email or perform a code review of a software developer's code. They diverge from this theoretical oracle, though, in that their output isn't always correct. However, if you accept that they're not perfect, and perform your due diligence accordingly, they can help boost your productivity in situations where you'd previously have to rely on another human being to check your work. Not to claim that relying on other humans is bad by any means, but automated editors are a good tool for saving your time and the other human's in many situations.

Similarly, they're decent at summarizing material, because the details are easy to fact-check. As long as you're doing your own due diligence.

In bounded cases, they have decent generative capabilities. I've mentioned this before, but, try as I might, I often forget the syntax of a lot of code constructs when I'm writing software. Code completion saves me a lot of time when I'm writing line by line, and I can perform real-time verification of the LLM's output. You may think that line-by-line verification is tedious, but it's still much faster than looking up documentation across the internet.

Sometimes the LLM will produce patently false information that sends me down a rabbit hole, but that tends to happen when I'm asking it to generate larger sections of code rather than sticking to smaller phrases (What can I say? I too sometimes fall into the thrall of the binary hum.)

In the natural language world, they can be useful for sentence or word completion (since that is what they were originally built for), but leaving them to construct larger bodies of text opens them up to banality or idiocy, because they'll lock into the comfortable or randomize output to the point of lunacy. I still question why anyone uses LLMs to write whole posts or essays. If someone's lazy enough to allow a bot to express their thoughts, is the thought actually there or worth reading?

A bit more controversially, I'm ok with using LLMs for ideation, as long as they're not the purveyors of the final output. Asking an LLM to come up with a plot outline for a mystery novel and then using the suggestion to develop your own writing seems reasonable, as does asking it for drawing ideas and then using your own artistic abilities to interpret the output.

Sadly, one of the areas where GenAI is already shining is in the spreading of disinformation. In even the most basic cases, people assume that the output of an LLM is correct, even though, according to some research earlier this year, even the top-tier LLMs hallucinate up to 40% of the time. We already get sucked into lies disseminated by other humans, even though millions of years of evolution have helped us develop skills to discern when someone isn't telling the truth. When we now add a machine into the mix, and have the built-in expectation that the machine is trustworthy, we're compounding the problem.

Image generation is quickly reaching the point where, at least for short news-style clips, it's difficult for even seasoned skeptics to determine what's real and what's generated. In capable hands and used for the wrong reasons, this ability can literally start wars, and no amount of watermarking will let us determine what's real, because the software will always be able to be hacked to remove protective measures. And, if you think that the large companies in charge of these tools are always responsible enough and capable enough to guard against such nefarious schemes, I've got an AWS outage to sell you.

Until next time, my human and robot friends.

Search This Blog

Chicago Bot Dog

State of the Bot 2025

Comments

Post a Comment

Popular Posts

Words Matter

Coastal Chicago - Year 2: What's Next?