Promptfoo Tutorial - Search News

AI memory is really a database problem

If we want to avoid making AI agents a huge new attack surface, we’ve got to treat agent memory the way we treat databases: with firewalls, audits, and access privileges. The pace at which large ...

GitHub

[Feature] F1 score metric for tool calling evaluations

My use case requires a straightforward way to measure tool calling accuracy using F1 score, it would be great to have promptfoo support this. When testing LLMs that call tools/functions, I'd like to ...

GitHub

jade-kk/ai-chat-viz-poc

本リポジトリは、AI Chat における tool calling ベースの可視化（グラフ/表）の PoC です。 LLM が文脈に応じて必要なデータを tools で取得し、UI は AI SDK の tool parts（tool-*）を描画してカード化し ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

AI memory is really a database problem

[Feature] F1 score metric for tool calling evaluations

jade-kk/ai-chat-viz-poc

Trending now