I found Diabolus Ex Machina super relatable because it mirrors some of the experiences I’ve had using AI. Tools like ChatGPT and Claude are very powerful, but they are so different from all the other tools we’ve used in the past that, I think, none of us are truly able to use them well yet.
As Amanda Guinzburg helped show, AI’s utility comes from its remarkable persuasiveness, which in turn comes from its unwavering projection of confidence, over-the-top flattery, and feigned empathy. I’ve had many conversations with AI that turned out just like Amanda’s: a promising start followed by my growing awareness that the AI was hallucinating and just telling me what it thought I wanted to hear. The issue is compounded by how often AI gets things exactly right. I find myself often lulled into a false sense of confidence in AI’s abilities because of how often it’s correct.
AI tools like ChatGPT, Claude, or Copilot aren’t helping us use them correctly. They provide us with suggested prompts like “write a first draft” or “prep for an interview” or “critique an argument”, which just lead us to getting used to writing short, simple prompts for AI, even though long, detailed prompts seem to result in the best answers.
When I’ve had success with the kinds of tasks Amanda tried, it’s usually because of two strategies I’ve used:
Always tell the AI to include direct quotes in its response to support its arguments. This helps ground the AI in the documents I’ve asked it about, and it helps me quickly notice when the AI is hallucinating. An advantage of using new.space’s Brainstorms on iOS is that new.space automatically fetches the content of links and provides it directly to the AI.
Have one AI critique another’s response. This is one of my favorite uses for new.space’s Brainstorms. Asking Claude to critique ChatGPT, for example, has helped me find many inaccuracies in ChatGPT’s original response. Also, since every AI has a different tone, it’s nice to be able to compare how the different AIs answer a given question. Second (or third or fourth) opinions are always valuable.
Just for fun, I put together a quick Brainstorm showing how Amanda’s original question worked in new.space. Judge it for yourself and let me know how you think it did.