If we want to avoid making AI agents a huge new attack surface, we’ve got to treat agent memory the way we treat databases: with firewalls, audits, and access privileges. The pace at which large ...
My use case requires a straightforward way to measure tool calling accuracy using F1 score, it would be great to have promptfoo support this. When testing LLMs that call tools/functions, I'd like to ...
本リポジトリは、AI Chat における tool calling ベースの可視化(グラフ/表)の PoC です。 LLM が文脈に応じて必要なデータを tools で取得し、UI は AI SDK の tool parts(tool-*)を描画してカード化し ...