This PR addresses what I believe is a shortcoming in the existing prompt used in FactCheckingEvaluator
.
Specifically: No matter what I provided to FactCheckingEvaluator
, isPass()
was always returning false
. I did a little debugging and noticed that the evaluation response was a long-winded explanation of how the response content either aligned or didn't align with the given context. But the decision on whether the fact-checking passes was based on the much simpler expectation that the evaluation response was either "yes" or "no" (case-insensitive).
This change refines the prompt used by FactCheckingEvaluator
to specify a yes/no answer, which seems to have resulted in good isPass()
values in my testing (at least when evaluated against the default OpenAI GPT-4o model).
Comment From: markpollack
I'll run this against the bespoke-minicheck model on ollama, which I believe responds with the yes/no as that is the models purpose. There was a github repo that generated the LLM-AggreFact Leaderboard and I sort of remember there being some massaging of the prompt depending on what LLM was used. I suspect this change may break usage with bespoke-minicheck but work with openai and other models. Not yet sure on the best way to handle it, if there can be a portable prompt or we need to pass in the llm used in order to pick the correct prompt. Will report back.
Comment From: markpollack
I've updated the class so that there are two prompt styles, one for general llms and one for bespoke mini. I made the general llm prompt the default and it is likely the more common - though perhaps less accurate - usage.
merged in f92a3f0fcb5710d0b3e611c88da5a40d83de02c5