LangProbe Benchmark Updates #1593

klopsahlong · 2024-10-07T02:38:53Z

Adding the following tasks to the LangProbe Benchmark: HotpotQA Conditional, Iris, Iris-Typo, HoVer, and Heart Disease
Made a few minor improvements to the testing set up including:
- minor to the README, which now provides instructions using dspy.LM
- adding in max_output_tokens() function for each task so that these can be set for the task model on a task by task basis
- improving logging
Added to this PR are also minor improvements to the hackercup_utils.py script, which now suppresses print outs when executing generated code

klopsahlong added 8 commits October 6, 2024 18:55

updating LangProbe benchmark + small change to hackercup utils

b090ed0

clean up for tasks

6ae4ad4

Merge branch 'main' of github.com:stanfordnlp/dspy into new_dspy

61699a2

removing biodex

4ba83c5

formatting updates

c223c3b

adding in hotpotqa_conditional dataset

e75cbc8

adding in init file

8e13377

ruff fixes

01a7075

okhat merged commit 6a00c85 into main Oct 7, 2024
4 checks passed

Provide feedback