Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New RM Serper #102

Merged
merged 9 commits into from
Jul 31, 2024
Merged

Conversation

zenith110
Copy link
Contributor

@zenith110 zenith110 commented Jul 27, 2024

This PR aims to accomplish the following:

  1. Create a new RM that uses Serper.dev as it's base and is able to search using queries. This RM can be utilized by calling SerperRM and passing it a json dictionary of values.
    (important to know that it can also be a list of dictionaries as well to 100 in an list).
{
    "gl": "us",
    "hl": "en",
    "autocorrect": true,
    "page": 1,
}
Key Description
gl Country two letter code, ex us.
hl Language of results, default is en for English.
autocorrect Boolean to autocorrect any results
page How many pages will be returned from search.
  1. Not sure if this counts for input and output examples, but I had this:
python .\knowledge_storm\example.py --output-dir examples/serper --retriever serper --do-research --do-generate-outline --do-generate-article --do-polish-article
Error decoding TOML file: secrets.toml
knowledge_storm.interface : INFO     : run_knowledge_curation_module executed in 75.8139 seconds
knowledge_storm.interface : INFO     : run_outline_generation_module executed in 21.6632 seconds
sentence_transformers.SentenceTransformer : INFO     : Use pytorch device_name: cpu
sentence_transformers.SentenceTransformer : INFO     : Load pretrained SentenceTransformer: paraphrase-MiniLM-L6-v2
knowledge_storm.interface : INFO     : run_article_generation_module executed in 37.2815 seconds
knowledge_storm.interface : INFO     : run_article_polishing_module executed in 15.0878 seconds
***** Execution time *****
run_knowledge_curation_module: 75.8139 seconds
run_outline_generation_module: 21.6632 seconds
run_article_generation_module: 37.2815 seconds
run_article_polishing_module: 15.0878 seconds
***** Token usage of language models: *****
run_knowledge_curation_module
    claude-3-haiku-20240307: {'prompt_tokens': 15011, 'completion_tokens': 4765}
    claude-3-sonnet-20240229: {'prompt_tokens': 9042, 'completion_tokens': 3071}
    claude-3-opus-20240229: {'prompt_tokens': 0, 'completion_tokens': 0}
run_outline_generation_module
    claude-3-haiku-20240307: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-sonnet-20240229: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-opus-20240229: {'prompt_tokens': 2863, 'completion_tokens': 361}
run_article_generation_module
    claude-3-haiku-20240307: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-sonnet-20240229: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-opus-20240229: {'prompt_tokens': 5082, 'completion_tokens': 1669}
run_article_polishing_module
    claude-3-haiku-20240307: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-sonnet-20240229: {'prompt_tokens': 0, 'completion_tokens': 0}
    claude-3-opus-20240229: {'prompt_tokens': 1832, 'completion_tokens': 298}
***** Number of queries of retrieval models: *****
run_knowledge_curation_module: {'SerperRM': 35}
run_outline_generation_module: {'SerperRM': 0}
run_article_generation_module: {'SerperRM': 0}
run_article_polishing_module: {'SerperRM': 0}

example.py is a local file I made in knowledge_storm to test as the example could not pick up the imports for knowledge_storm oddly enough.

I couldn't get the secrets.toml loaded so I set local environment variables.

I also pinned a specific version of NumPy as the latest version(NumPy 2) was having issues with PyTorch, would recommend pinning versions in the future to avoid issues with cross compability.

Looking forward to the feedback.

For input validation, the bare minimum in the serper example provides what would be needed to pass, would more input validation be appropriate?

@shaoyijia
Copy link
Collaborator

Hi @zenith110 , thank you so much for working on this! I will review it and help merge this very useful feature. Before I proceed, could you first revert the change on unnecessary files? I think this PR shall only contain changes to run_storm_wiki_serper.py, rm.py, requirements.txt.

@zenith110
Copy link
Contributor Author

I'm curious, what formatters do y'all suggest? I've been using black for personal projects since it works for pep8.

@shaoyijia
Copy link
Collaborator

Thanks, please also revert change to other scripts under examples.

@shaoyijia
Copy link
Collaborator

I'm curious, what formatters do y'all suggest? I've been using black for personal projects since it works for pep8.

Good question. Currently, we haven't set up a default one for this project. I think we probably need to set one up (what do you think @Yucheng-Jiang?) - the main purpose is to maintain readability and to avoid changing other unrelated files when contributing a certain feature.

Copy link
Collaborator

@shaoyijia shaoyijia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your contribution! Overall, it looks good to me, except some nits. Could you help fix them so I could merge this?

knowledge_storm/rm.py Outdated Show resolved Hide resolved
knowledge_storm/rm.py Outdated Show resolved Hide resolved
knowledge_storm/rm.py Outdated Show resolved Hide resolved
knowledge_storm/rm.py Outdated Show resolved Hide resolved
knowledge_storm/rm.py Outdated Show resolved Hide resolved
examples/run_storm_wiki_claude.py Show resolved Hide resolved
examples/run_storm_wiki_gpt_with_VectorRM.py Show resolved Hide resolved
examples/run_storm_wiki_mistral.py Show resolved Hide resolved
knowledge_storm/rm.py Show resolved Hide resolved
@zenith110
Copy link
Contributor Author

Hiya @shaoyijia following up if the changes look good. Thank you.

Copy link
Collaborator

@shaoyijia shaoyijia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you so much for following up!!

@shaoyijia shaoyijia merged commit 6a57b0c into stanford-oval:main Jul 31, 2024
@zenith110 zenith110 deleted the users/zenith110/serper branch July 31, 2024 13:03
feldges pushed a commit to feldges/storm that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants