Anybody who thought the answer could have been even remotely close to Yes is delusional.
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I doubt anyone expected it to work completely, but it is interesting to see to what extent it worked and how it failed (halucinations and sycophancy)
True; I just hate headlines that ask stupid questions.
But then again, there's always the premise that it could work, in such attempts, which annoys me no less.
It is an interesting article, even if it's conclusions are entirely too rosy. The "storefront" was a single vending machine, and the bot was instructed to interact with Anthropic employees (with an hourly cost attached) to do all physical interactions. While the bot did a decent job managing the stock most of the time, it made a lot of bad decisions based on trying to be too helpful to it's customers. It also frequently hallucinated, with some hilarious results I wont spoil here. But as anyone who owns a small business knows, one bad decision could put it under, so saying that an AI can manage a vending machine well "most of the time" is equivalent to saying it cant do the job at all.
Their conclusion is that with a bit more work, Claude might be able to perform as a middle-manager. To me, that says more about how useless middle-management is than how capable their AI is.
So what you are saying is the AI is ready to replace tech CEOs.
All the tasks could have been easily solved with some basic APIs and algorithms.
This is so funny. It fails miserably and they’re all “yeah so this is promising.”
Sure, a world where your manager hallucinates meetings with you and assesses you poorly for not performing according to plans that were hallucinated through said meetings sounds like a fantastic idea.