[1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements #1564

dbczumar · 2024-09-30T03:30:09Z

DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-09-30T03:31:05Z

dspy/retrieve/databricks_rm.py

        dspy.settings.configure(lm=llm, rm=retriever_model)
        ```

-        Below is a code snippet that shows how to query the Databricks Direct Vector Access Index using the forward() function.


The forward() function is not what users call to retrieve documents with DatabricksRM

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-09-30T03:36:57Z

dspy/retrieve/databricks_rm.py

        ```python
-        self.retrieve = DatabricksRM(query=[1, 2, 3], query_type = 'vector')


The preexisting implementation of DatabricksRM used an inconsistent / incorrect definition of query_type. In Databricks Vector Search, query_type defines the search algorithm (ANN or hybrid), not the type of input (text or vector). The type of input can be inferred from the Python value type. This PR updates the query_type argument to have consistent semantics with Databricks Vector Search while maintaining backwards compatible for the old argument values.

dbczumar · 2024-09-30T03:40:33Z

dspy/retrieve/databricks_rm.py

        self.databricks_index_name = databricks_index_name
-        self.columns = columns
+        self.columns = list({docs_id_column_name, text_column_name, *(columns or [])})


Automatically include the user-specified docs_id_column and text_column in the list of columns to retrieve from the vector search index (self.columns). Previously, users had to specify the docs ID and text columns in 2 places (the corresponding arg and the columns arg). If the user forgot to specify the docs ID / text columns in the columns arg, they would face a confusing error (ID / text column not found). This change fixes that problem. Users should not have to reason about this.

dbczumar · 2024-09-30T03:41:01Z

dspy/retrieve/databricks_rm.py

-            for k, v in item.items()
-            if k not in [self.docs_id_column_name, self.text_column_name]
-        }
+        extra_columns = {k: v for k, v in item.items() if k not in [self.docs_id_column_name, self.text_column_name]}


This is just a formatting change from the precommit linter

dbczumar · 2024-09-30T03:41:41Z

dspy/retrieve/databricks_rm.py

            }
        return extra_columns

    def forward(
        self,
        query: Union[str, List[float]],
-        query_type: str = "text",
+        query_type: str = "ANN",


See https://github.com/stanfordnlp/dspy/pull/1564/files#r1780358443

dbczumar · 2024-09-30T03:42:30Z

dspy/retrieve/databricks_rm.py

-                self._extract_doc_ids(doc)
-                for doc in sorted_docs
-            ],
+            doc_ids=[self._extract_doc_ids(doc) for doc in sorted_docs],


This is just a formatting change from the linter

dbczumar · 2024-09-30T03:42:40Z

dspy/retrieve/databricks_rm.py

@@ -217,14 +228,117 @@ def forward(
            items += [item]

        # Sorting results by score in descending order
-        sorted_docs = sorted(items, key=lambda x: x["score"], reverse=True)[:self.k]
+        sorted_docs = sorted(items, key=lambda x: x["score"], reverse=True)[: self.k]


This is just a formatting change from the linter

dbczumar · 2024-09-30T03:42:42Z

dspy/retrieve/databricks_rm.py

-            raise Exception(
-                f"text_column_name: '{self.text_column_name}' is not in the index columns: \n {col_names}"
-            )
+            raise Exception(f"text_column_name: '{self.text_column_name}' is not in the index columns: \n {col_names}")


This is just a formatting change from the linter

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-09-30T03:46:21Z

dspy/retrieve/databricks_rm.py


        # Extracting the results
        items = []
-        for idx, data_row in enumerate(results["result"]["data_array"]):


idx was unused

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-09-30T04:23:14Z

docs/api/retrieval_model_clients/AzureCognitiveSearch.md

@@ -1,5 +1,5 @@
 ---
-sidebar_position: 2
+sidebar_position: 3


Place DatabricksRM second in the list of retrievers documented at https://dspy-docs.vercel.app/api/category/retrieval-model-clients

dbczumar · 2024-09-30T04:24:04Z

docs/api/retrieval_model_clients/ChromadbRM.md

@@ -18,6 +18,7 @@ ChromadbRM(
 ```

 **Parameters:**
+


These newlines were inserted by the pre-commit linter

Signed-off-by: dbczumar <[email protected]>

okhat · 2024-09-30T14:59:00Z

@krypticmouse Any idea if the docs here are easy to fix before we merge?

krypticmouse · 2024-09-30T16:18:47Z

@okhat @dbczumar The issue is in docs/dspy-usecases.md file line 112, the link markdown is wrong:

https://github.com/stanfordnlp/dspy/blob/cdbea6eb9accf1f1d2cc3f8ce333502bddf3b8fb/docs/docs/dspy-usecases.md?plain=1#L112C1-L112C143

Change line 112 from this:

| **Langfuse** | [Link]([https://docs.langtrace.ai/supported-integrations/llm-frameworks/dspy](https://langfuse.com/docs/integrations/dspy)) |

To this:

| **Langfuse** | [Link](https://langfuse.com/docs/integrations/dspy) |

okhat · 2024-09-30T17:36:21Z

@arnavsinghvi11 is just checking if this will break any existing documentation on the DB site etc since the changes affect the interface.

krypticmouse · 2024-09-30T17:38:10Z

If the build fails vercel won’t merge it so there would be no changes reflected in the existing website.

arnavsinghvi11 · 2024-09-30T17:55:27Z

Thanks @dbczumar for the updates! All looks good.

With the changes regarding https://github.com/stanfordnlp/dspy/pull/1564/files#r1780356012, can we update the DSPy on Databricks blog post to reflect this?

Specifically, how the retriever is configured and the retreived_results is called through DatabricksRM

retriever = DatabricksRM(
    databricks_index_name = "your_index_name",
    docs_id_column_name="id",
    text_column_name="field2",
    k=3
)
retrieved_results = DatabricksRM(query="Example query text", query_type="hybrid"))

Signed-off-by: dbczumar <[email protected]>

dbczumar · 2024-09-30T18:05:44Z

Thanks @arnavsinghvi11, absolutely! I'll get that blog updated. I'll also file a follow-up PR with some test coverage, but I've QAed this PR manually

Signed-off-by: dbczumar <[email protected]>

krypticmouse

Structurally LGTM!

dbczumar added 4 commits September 29, 2024 20:08

fix

328d7fa

Signed-off-by: dbczumar <[email protected]>

fix

2cd97d1

Signed-off-by: dbczumar <[email protected]>

fix

c6b83d3

Signed-off-by: dbczumar <[email protected]>

fix

0310e72

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Sep 30, 2024

View reviewed changes

fix

9b040a8

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Sep 30, 2024

View reviewed changes

dbczumar added 3 commits September 29, 2024 20:43

fix

e8b3e35

Signed-off-by: dbczumar <[email protected]>

fix

255e675

Signed-off-by: dbczumar <[email protected]>

fix

c538996

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Sep 30, 2024

View reviewed changes

dbczumar added 4 commits September 29, 2024 20:47

fix

279f548

Signed-off-by: dbczumar <[email protected]>

fix

fc6fc6d

Signed-off-by: dbczumar <[email protected]>

fix

70da6ac

Signed-off-by: dbczumar <[email protected]>

docs

af8a285

Signed-off-by: dbczumar <[email protected]>

dbczumar commented Sep 30, 2024

View reviewed changes

dbczumar added 7 commits September 29, 2024 21:25

fix

5fab088

Signed-off-by: dbczumar <[email protected]>

fix

facdf5b

Signed-off-by: dbczumar <[email protected]>

fix

ce2b4d2

Signed-off-by: dbczumar <[email protected]>

fix

1c5b577

Signed-off-by: dbczumar <[email protected]>

fix

c898777

Signed-off-by: dbczumar <[email protected]>

fix

628ec2f

Signed-off-by: dbczumar <[email protected]>

fix

afcb597

Signed-off-by: dbczumar <[email protected]>

okhat requested a review from krypticmouse September 30, 2024 14:59

okhat requested a review from arnavsinghvi11 September 30, 2024 17:32

RM

1778fdc

Signed-off-by: dbczumar <[email protected]>

dbczumar force-pushed the databricks_rm branch from afcb597 to 1778fdc Compare September 30, 2024 18:01

docs

7442f42

Signed-off-by: dbczumar <[email protected]>

dbczumar changed the title ~~DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements~~ [1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements Sep 30, 2024

fix

2204ec5

Signed-off-by: dbczumar <[email protected]>

dbczumar added 3 commits September 30, 2024 11:06

langfuse fix

926c750

Signed-off-by: dbczumar <[email protected]>

langfuse fix

89aecc4

Signed-off-by: dbczumar <[email protected]>

Reset use cases

ae86c9c

Signed-off-by: dbczumar <[email protected]>

krypticmouse approved these changes Sep 30, 2024

View reviewed changes

okhat merged commit 59ae987 into stanfordnlp:main Sep 30, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements #1564

[1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements #1564

dbczumar commented Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024 •

edited

Loading

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

dbczumar Sep 30, 2024

okhat commented Sep 30, 2024

krypticmouse commented Sep 30, 2024

okhat commented Sep 30, 2024

krypticmouse commented Sep 30, 2024

arnavsinghvi11 commented Sep 30, 2024

dbczumar commented Sep 30, 2024

krypticmouse left a comment

		```python
		self.retrieve = DatabricksRM(query=[1, 2, 3], query_type = 'vector')

[1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements #1564

[1] DatabricksRM: Use databricks-sdk to fetch token / workspace URL + several small improvements #1564

Conversation

dbczumar commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okhat commented Sep 30, 2024

krypticmouse commented Sep 30, 2024

okhat commented Sep 30, 2024

krypticmouse commented Sep 30, 2024

arnavsinghvi11 commented Sep 30, 2024

dbczumar commented Sep 30, 2024

krypticmouse left a comment

Choose a reason for hiding this comment

dbczumar Sep 30, 2024 •

edited

Loading