-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming #1874
Conversation
This looks AMAZING, thanks @CyrusNuevoDia ! I'm just wrapping my head around the caching improvements here (which I quite like so far) and then will merge |
Before: Infinite LRU caches (unbounded memory growth in prod with cache=True) Before: Had to serialize/deserialize JSON to cache properly Anything else I can help clarify? |
Hi @CyrusNuevoDia. Sick feature, have been waiting for this! :) |
Signed-off-by: dbczumar <[email protected]>
@dbczumar |
Hi @dbczumar, Will it be possible to apply assertions/suggestions on the partially streamed output so that we can stop streaming if the assertion fails and reduce the token usage by trying again with prompt added due to assertion failure? This might be particularly useful if we are applying multiple assertions to assess the output according to multiple dimensions |
@rohitgarud great idea! out of scope for this current version but would be awesome |
@dbczumar merged in your request cache logic, getting this error on tests — any idea how to fix it?
|
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Signed-off-by: dbczumar <[email protected]>
Thanks @CyrusNuevoDia ! I'll push some updates to remove caching changes from this PR. My reasoning is that streaming and caching aren't directly related (I added some test coverage to verify that caching works properly with streaming, though). We can make adjustments to caching in future PRs. |
Signed-off-by: dbczumar <[email protected]>
@@ -17,6 +18,7 @@ | |||
backoff_time=10, | |||
callbacks=[], | |||
async_max_workers=8, | |||
send_stream=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only meaningful change in the file - everything else is a linter adjustment
Thanks @CyrusNuevoDia This looks great. Appreciate it. |
dspy.streamify
can be used to convert the dspy program to a streaming mode. This is useful when you want to streamthe intermediate outputs (i.e. O1-style reasoning) to the client before the final prediction is ready. This uses
asyncify under the hood and inherits the execution semantics.
The deltas of every module in the program are streamed directly with no processing and then once the final prediction is ready it is yielded.
Here's how it works for deployment
Changes
Notes