Agent hallucinating tool execution results

Hello,

I encountered a situation where the agent confirmed it had created a file on my local desktop, but it didn’t actually run any command to do so. It only performed the actual file-writing tool call after I challenged its initial response.

This “false confirmation” is a bit confusing for users. It would be better if the agent ensures the tool call is successful before claiming the task is finished.

Attached is a recap of the conversation for reference.

1 Like