Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize TextSearchProvider API #59921

Open
roblourens opened this issue Oct 3, 2018 · 75 comments
Open

Stabilize TextSearchProvider API #59921

roblourens opened this issue Oct 3, 2018 · 75 comments
Assignees
Labels
api-proposal feature-request Request for new features or functionality on-testplan search Search widget and operation issues search-api
Milestone

Comments

@roblourens
Copy link
Member

roblourens commented Oct 3, 2018

Master issue to track stabilizing the TextSearchProvider extension API...

Forked from #47058

Depends on

@roblourens roblourens added search Search widget and operation issues api-finalization labels Oct 3, 2018
@roblourens roblourens self-assigned this Oct 3, 2018
@roblourens roblourens added the feature-request Request for new features or functionality label Nov 9, 2018
@suntobright
Copy link

looking forward to progress

@mostafaeweda
Copy link

Any progress on moving this to the stable API?

@gjsjohnmurray
Copy link
Contributor

Is there anything preventing this from happening? Both of the issues it's shown as depending on are already closed.

@roblourens
Copy link
Member Author

This still isn't ready to be stabilized, I don't have any ETA, sorry.

@eamodio
Copy link
Contributor

eamodio commented Jun 7, 2019

There needs to be a 😢 reaction

@gjsjohnmurray
Copy link
Contributor

Any progress to report on getting this finalized?

@roblourens roblourens added this to the Backlog milestone Oct 27, 2019
@gjsjohnmurray
Copy link
Contributor

Please put this on the 2020 roadmap, preferably sooner rather than later.

@gjsjohnmurray
Copy link
Contributor

@roblourens 🙏 can this get some love soon? Those of us building extensions that implement FileSystemProvider can only offer search if we get our users to (a) use Insiders, (b) download and install a VSIX, and (c) launch Insiders with the correct --enable-proposed-api argument.

It is a small relief that (c) can now be achieved using argv.json, but what we really need is the API finalized.

@NightRa
Copy link

NightRa commented Aug 21, 2020

I'm interested in the stabilization of this as well.
What's blocking this, and how can we help?

@roblourens
Copy link
Member Author

Sorry, it still needs a lot of thinking and I won't get to it in the near future

@gjsjohnmurray
Copy link
Contributor

Sorry, it still needs a lot of thinking and I won't get to it in the near future

@roblourens this is disheartening news for me, and likely for others trying to leverage FileSystemProvider and take VSCode into new domains. Is there any way we can help? What do you see as the outstanding problems with the proposed API?

@CompuIves
Copy link

Heyo! I've been following this issue for a while, and I was wondering if by now there's a better idea on when this API will be stabilized. Seeing that there are already quite some extensions using it, even including Microsoft extensions in the marketplace.

@gjsjohnmurray
Copy link
Contributor

@roblourens please can we blow the dust off this and get it into Stable? Or else get an understanding of what's holding it up? IMO as more and more FileSystemProvider implementations show up (aka virtual filesystems) the more important it becomes to resolve this.

@caleb-allen
Copy link

caleb-allen commented Aug 2, 2024

Hey @andreamah, thanks for your great work on this.

I've been developing a "semantics" search system called TSearch, and I'm curious if it would qualify as a TextSearchProvider, or if it perhaps falls outside the scope of this API. Would love to hear your thoughts.

Rather than taking the "code words" and generating an index of text, it constructs an index where each code word is encoded with its semantic "context".

Take, for example, this javascript snippet:

function hello(name) {
   console.log("Hello", name);
}

In addition to including the word hello, the index also encodes the semantic use of hello—in this case, it is a function name. The point of it all is to let somebody search not just for textual instances of hello, but specifically for functions called hello. Or for variables, string literals, etc., and to do so across very large projects.

Anyways, the reason I'm describing all this is because the "query language" used to search this index isn't exactly text, it's got a few simple rules and operators in order to distinguish which parts of a query are for context ("function") and which are for content ("hello"). It's not as gnarly as regex, not even close, but it's definitely not "just" text.

I see that TextSearchQuery has the property isRegExp, explicitly including at least one instance of search which isn't strictly text. My question is this: is this API seen as something which supports a more general function of "pattern matching", where text and regex are simply two implementations? Or if not, is such a design something that might be desired? As a (rough, uninformed) idea, perhaps indicating a patternEngine property in TextSearchQuery would suffice to open up the API to a much larger feature set. patternEngine would default to ["text", "regex"], and support the same behavior and API that exists today, e.g. isRegExp = (patternEngine == "regex"), but would explicitly tease out regex as simply a default implementation of a more general concept—that of a pattern engine.

Regardless, I don't think my question needs an answer before the TextSearchProvider API can be stabilized (I don't think it'd be a breaking change anyways). I'm primarily interested in the conversation about whether this lower-level part of search is of interest as a surface for extension.

Let me know if any of this needs clarification. Thanks!

@andreamah
Copy link
Contributor

@andreamah In version 1.92 I see new proposed APIs textSearchProviderNew and fileSearchProviderNew. Are these ready for extension authors to adopt so we can evaluate the new API surface?

This isn't ready to be consumed- I'm just creating these so that I can start changing the internal implementation without affecting the existing proposed APIs (since some internal extensions currently use it).

@isc-bsaviano
Copy link

Thanks for the clarification and for all of your hard work to help make this a reality!

@andreamah
Copy link
Contributor

andreamah commented Aug 2, 2024

@caleb-allen Great question!

For the most part, this API is meant to be consumed to simply search for 'text' as-is. For example, if you create a custom filesystem, this API helps with actually understanding what it takes to get search results from your project. This being said, it was not necessarily created to facilitate a special or 'intelligent' search that requires custom options. The reason why the options have things like isRegex is because the UI (aka the search view on the sidebar) will have a button for that, which will drive what info we send to the API. If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

@caleb-allen
Copy link

If we introduced more/alternative options, this would preferably match changes in the UI. We want to keep the options simple, as that is what the user expects out of the search view (for now). Also, we only allow one provider per file scheme, so regular text would lose the traditional ripgrep text results if you overwrote our default provider for text with your own.

I see, this clarifies things a lot, thank you!

It seems that the behavior I'm trying to construct may be better achieved with other APIs, an "enhancement" on search, rather than modifying the deeper plumbing of all search.

Thanks for your answer!

@andreamah
Copy link
Contributor

Update on the progress of this: the internal implementation has been changed to support everything in the last update, minus the TextSearchResult changes. The team has been discussing what the best way to express the TextSearchResult- whether it should remain as:

type TextSearchResult = TextSearchContext | TextSearchMatch

or whether it should more like what is it above with

interface TextSearchResult {
   uri: Uri,
   match: TextSearchMatch,
   surroundingContext: {
      text: string;
      lineNumber: number;
    }[]
}

We ultimately concluded back on something similar to the latter approach, as it was more friendly to receive when using the findTextInFiles API (although it puts more work on the provider). Also, upon testing in the team, there was lots of confusion on what context was and why it would get returned separately from which match it actually related to. We concluded that it should be the job of the provider to match which context relates to which matches (with de-duplication on the vscode core side where necessary to optimize).

After a bit more iterating on the design of the latter, we fell upon something like

interface TextSearchResult {
  uri: Uri;
  range: Range; // target range
  preview: {text: string; location: {start: number, end: number}};
  contextLines: {text: string; lineNumber: number}[];     
}

Which seemed clearer to the consumers.

Right now, there's some discussion on whether we should have something like

interface TextSearchResult {
  uri: Uri;
  range: Range; // target range
  preview: {text: string; location: {start: number, end: number}};
}

Where the context lines are included in the preview. Then, context lines could be separated from the preview text by shaving off numContextLines lines off of the start and end. It is a more simple way of getting the result (clearer to the consumers), but will likely add overhead to search performance, notably in the search editor. If anyone has an opinion about this, please let me know!

However, the next steps involve the plumbing to support associating matches with context sooner in the implementation. Since our ripgrep text search has its own provider, we will also need to change how to serves context and matches to the main thread via the API.

Full transparency- my time has been a bit split between this and other things, as there are some team priorities that are more urgent. This being said, the timeline is slower than initially anticipated.

I've tried to get some of the team testing these APIs (with a version that still splits TextSearchMatch and TextSearchContext), and am now iterating on that feedback.

@isc-bsaviano
Copy link

Thanks for the update @andreamah! I assume we should hold off on updating our extensions until you and team have decided on the new shape?

@andreamah
Copy link
Contributor

Yes, that would be great! Thank you @isc-bsaviano :)

@richpodraza
Copy link

I'm very interested in this API for writing an integration with OpenGrok. The way I'm imagining it working is a replacement or parallel function to Find in Files (ctrl + shift + f). Ideally, the basic inputs that OpenGrok supports would be text input fields at the top of a left-hand side panel the same way Find in Files is today. Then, the results would be displayed below, in such a way that if the user clicks the search result it looks for and opens the LOCAL file in a new tab, similar to how current Find in Files works when a user clicks on a search result.

Of course, configuration would be needed to provide the OpenGrok host, username and password, plus some mapping of a local folder designated as the project root folder which matches an OpenGrok project. This would be needed so that OpenGrok search results could direct VSCode to open the local file as opposed to the web GUI for OpenGrok.

The main goal behind this would be to leverage a highly-indexed search engine replacement for Find in Files for large code bases where Find in Files can be very slow. Does this sound like a use case that TextSearchProvider would support?

@caleb-allen
Copy link

@richpodraza you may be interested in #230337, your use case sounds like the feature set discussed on that issue

@rebornix rebornix assigned osortega and unassigned andreamah Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-proposal feature-request Request for new features or functionality on-testplan search Search widget and operation issues search-api
Projects
None yet
Development

No branches or pull requests