this post was submitted on 22 Jun 2025
106 points (73.7% liked)
Programming Humor
3184 readers
20 users here now
Related Communities [email protected] [email protected] [email protected] [email protected]
Other Programming Communities [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
4O got wrecked. My ai fan friend said O3 is their reasoning model so it means nothing. I don't agree but can't find proof.
Has someone done this with O3?
It’s a fundamental limitation of how LLMs work. They simply don’t understand how to follow a set of rules in the same way as a traditional computer/game is programmed.
Imagine you have only long-term memory that you can’t add to. You might get a few sentences of short-term memory before you’ve forgetting the context of the beginning of the conversation.
Then add on the fact that chess is very much a forward-thinking game and LLMs don’t stand a chance against other methods. It’s the classic case of “When all you have is a hammer, everything looks like a nail.” LLMs can be a great tool, but they can’t be your only tool.
MY biggest disappointment with how AI is being implemented is the inability to incorporate context specific execution if small programs to emulate things like calculators and chess programs. Like why does it doe the hard mode approach to literally everything? When asked to do math why doesn't it execute something that emulates a calculator?
That's definitely being done. It's referred to as "tool calling" or "function calling": https://python.langchain.com/docs/how_to/tool_calling/
This isn't as potent as one might think, because:
But all-in-all, it is a path forward where the LLMs could just do the semantics and then call a different tool for each thinky job, serving at least as a user interface.
The hope is for it to also serve as glue between these tools, automatically calling the right tools and passing their output into other tools. I believe, the next step in this direction is "agentic AI", but I haven't yet managed to cut through the buzzword soup to figure out what that actually means.
I’ve been waiting for them to make this improvement since they were first introduced. Any day now…
Chatgpt definitely does that. It can write small python programs and execute them, but it doesn't do it systematically, you have to prompt for it. It can even use chart libs to display data.