Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting a different DB doesn't release the cache directory lock on the original DB #249

Closed
henrymercer opened this issue Feb 26, 2020 · 22 comments · Fixed by #681
Closed
Labels
bug Something isn't working Complexity: High Requires careful design or review. VSCode

Comments

@henrymercer
Copy link
Contributor

Describe the bug
Selecting a new DB doesn't release the cache directory lock on the originally selected DB, causing query runs on the originally selected DB from the CLI to fail.

To Reproduce

  1. Add two databases, database A and database B to the databases view in the CodeQL extension using the "+" button
  2. Select database A
  3. Run a query on database A
  4. Select database B
  5. Using the CodeQL CLI run a query on database A
  6. Observe that database A is still locked:

    A fatal error occurred: Error initializing the IMB disk cache: the cache directory is already locked by another running process. Only one instance of the IMB can access a cache directory at a time. The lock file is located at /Users/henry/databases/ppppbn_boardgame_ff33297/db-javascript/default/cache/.lock (eventual cause: OverlappingFileLockException)

Expected behavior
Selecting database B should unlock the cache directory of database A, so that the query from the CLI succeeds.

Additional context
The cache directory of database A will be unlocked when a query is run from the extension on database B.

@henrymercer henrymercer added the bug Something isn't working label Feb 26, 2020
@jcreedcmu
Copy link
Contributor

So do we want additional functionality from the cli to fix this? Does actually running a query on the newly selected database release the old db's lock and acquire the new one?

@henrymercer
Copy link
Contributor Author

So do we want additional functionality from the cli to fix this?

I'm unfamiliar so am not sure whether this would require additional queryserver functionality.

Does actually running a query on the newly selected database release the old db's lock and acquire the new one?

Once a query has been run from the newly selected database, running a query on the originally selected database from the CodeQL CLI succeeds. The lockfile seems to be preserved, however.

@aeisenberg
Copy link
Contributor

OK. Reproduced your problem. If there is a cli or query server command to remove a process's lock file, that would be what we need. Though, I don't think there is one.

@hmakholm do you have any suggestions on what we can do here?

@hmakholm
Copy link
Contributor

hmakholm commented Mar 23, 2020

The lock ought to be dropped automatically when the MemoryBackend that refers to the old database is closed. If the problem can be reproduced without provoking any particular evaluator error first, it must be that somehow the MemoryBackend doesn't get closed when we change database in the query server -- which is bad for a whole range of reasons.

@hmakholm
Copy link
Contributor

hmakholm commented Mar 23, 2020

On the other hand, it doesn't look like the code that creates MemoryBackends ought to be capable of not closing the old backend. The alternative would be that FileSystemLock#release() somehow doesn't work.

Which platform are you reproducing on?

@aeisenberg
Copy link
Contributor

I'm on mac.

@aeisenberg
Copy link
Contributor

Would it make sense for the extension to explicitly delete the lock file when databases change? I guess it would require some sleuthing because the lock file name looks partially generated.

@aeisenberg
Copy link
Contributor

And to be clear, when the extension changes databases, it doesn't actually do anything from the cli/queryserver perspective. It is only updating its internal state so that it can handle future queries.

@hmakholm
Copy link
Contributor

And to be clear, when the extension changes databases, it doesn't actually do anything from the cli/queryserver perspective. It is only updating its internal state so that it can handle future queries.

Ah, that's it; I see what you mean. I'll cancel my panic over something potentially being wrong with the locking implementation. 😁

Yes I think we should add a "select database" or "database changed" command in the query server protocol so it can close its MemoryBackend eagerly when that happens.

@alexet, you have a better knowledge of the protocol -- is there a way to add a new command in a backward-compatible way, or would we need to have the extension vary its behavior based on the CLI version?

(Note that this would not entail deleting the lock file, but simply relinquishing its OS-level exclusive access to the file. This is safer than having the mere presence of the file mean "locked", because the OS will implicitly drop the lock when the owning process terminates for any reason, so it should be impossible to end up with a stale lock after a process is killed).

@alexet
Copy link
Contributor

alexet commented Mar 24, 2020

The problem with my initial implementation of the command was that due to the queue it was very hard to get a visual indication of whether a database was locked. The following sequence was problematic.

  • User selects database
  • user runs a query
  • query starts compiling
  • user selects another database
  • message was sent to close the database but it was already closed.
  • query finishes compiling
  • run command is now sent which opens the database.
  • command finishes but database is kept open in anticipation of new commands.

The protocol was designed to be externally stateless as most of the bugs with the older protocol were to do with disagreement about the state. It was mostly fine when everything was synchronous and serialised but the ability to run more than one query at once caused problems. However problems like these make me wonder if we should have had a more stateful design where the onus would be on the client to ensure that we don't change database while a query is running.

Note that due to the parallelism you can end up doing things like run two queries on two different databases and they end up running in the opposite order to what you ran (because we compile queries in parallel but we avoid opening multiple databases). This makes the system much harder to design.

I am tempted for vscode to run in a mode that always closes the database instantly. This would be a major performance hit but would avoid these problems.

For adding a new command you can kind-of just run the command and get an error back for a missing command but it is probably easier to depend on the cli version.

@hmakholm
Copy link
Contributor

Note that due to the parallelism you can end up doing things like run two queries on two different databases and they end up running in the opposite order to what you ran (because we compile queries in parallel but we avoid opening multiple databases).

Hmm, won't that mean that we have multiple active MemoryBackends each with their own MemoryManager that thinks it can use all the configured space? That doesn't sound good.

@alexet
Copy link
Contributor

alexet commented Mar 24, 2020

No we avoid that, we just compile the queries in parallel before obtaining the backend lock. So whichever query compiles first claims the backend lock which may not be in the order which the user ran the queries.

@adityasharad
Copy link
Contributor

However problems like these make me wonder if we should have had a more stateful design where the onus would be on the client to ensure that we don't change database while a query is running.

A point in defence of your stateless design: we did put this onus on the client in the Visual Studio extension, and I remember it was really hard to maintain the client-side database locking, so we moved the onus to the query server instead.

@hmakholm
Copy link
Contributor

A possible design might be to add messages that let the extension inform the query server of what the currently selected database is. Then the query server would shut down a backend when it (a) is idle, and (b) doesn't match what we have been told is the selected database. Everything else stays as it is now, so if we lose synchronization between the two ends, the behavior degrades gracefully.

@aeisenberg
Copy link
Contributor

So, the extension just needs to message the query server any time the database changes?

And if a request comes in from the cli for a different database, then the lock file will be transparently removed if the query server is idle?

I think that makes sense from my perspective.

@hmakholm
Copy link
Contributor

So, the extension just needs to message the query server any time the database changes?

Well yes, and the query server would need to implement its end of the design 😄

@jcreedcmu
Copy link
Contributor

Still blocking on lock-releasing or cache-clearing functionality in cli.

@hmakholm
Copy link
Contributor

We should probably discuss at the team meeting tomorrow what the plan is -- I have the impression everybody is waiting for someone else to pick this up.

@nickrolfe
Copy link
Contributor

This has been a minor frustration for me in my recent work, so I'd be delighted if you could fix it.

One additional pain point that I didn't see mentioned: for Windows users, the open lock also prevents you from deleting the database directory.

@lcartey
Copy link

lcartey commented Nov 6, 2020

This has also been a frustration for me recently as well, particularly with respect to tests:

  1. Run a test which fails
  2. Use the generated test database
  3. Fix the query
  4. Switch to a different database
  5. Re-run the test

Which then fails.

@aeisenberg
Copy link
Contributor

OK. The solution in #681 is a partial fix. Now when a database is removed from the workspace, the lock is removed as well.

I think we need to add a third query server command (in addition to register and deregister databases), called unlock database. It will be similar to deregister in that the database will be closed, but it will not actually remove the database from the allow-list.

@aeisenberg
Copy link
Contributor

After discussion with @hmakholm and @alexet we've decided that we will not explicitly try to solve this workflow. The workflow that we are solving is that we want to guarantee that if you remove a database from vscode, you can always run a query on it.

If you don't remove a db from vscode, the lock may be released under certain circumstances, but this is not a workflow we are encouraging or documenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Complexity: High Requires careful design or review. VSCode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants