Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index v2 #4387

Merged
merged 19 commits into from
Apr 18, 2024
Merged

Index v2 #4387

merged 19 commits into from
Apr 18, 2024

Conversation

JohnMcPMS
Copy link
Member

@JohnMcPMS JohnMcPMS commented Apr 17, 2024

Change

This change introduces schema 2.0 for the SQLiteIndex. This new major version takes the learnings on how we actually used the index to shift to a package centralized store. It also moves some of the data that is not directly needed for search and correlation to intermediate manifests per-package. The goal is that search and list functionality (including update probing) should not need the intermediate files; only operations that change the system state (write operations) should.

Using a recent index as a starting point for comparison, the 2.0 index reduced the size of a ZIP archive (containing the 1 file and produced by Windows Explorer) by 79%.

This change only implements the index creation. A future change will implement the consumption, including a correct implementation of the 2.0 search functionality. The current search is just copied/commented from a previous schema version for now.

Implementation

The 1.* index schema allows easy updating from individual manifest changes, without the need to inspect any other manifests for the same package. The final 2.* schema will not allow that. Instead, 2.0 actually uses a 1.7 schema internally, with an additional table to track changes so that we can produce the intermediate manifests. When PrepareForPackaging is called, the schema 2.0 tables are created and the data migrated to them. The intermediate manifest files that have changed are also written to disk.

Intermediate Manifests

The intermediate manifest files (PackageVersionDataManifest) are YAML with shortened key values. This saves some bytes since humans are neither authoring them nor reading them (except to debug). They are also stored in a compressed stream, using the MSZIP compression algorithm. The compression brings the average size down from 1236 bytes to 463 bytes, and the median down from 338 bytes to 206 bytes.

The future consumption change will cache these intermediate manifests (and likely the version manifests as well), allowing their reuse as long as they have not changed.

Additional Functionality

To support our use in creating the index in our services, some additional functionality was added to the SQLiteIndex.

Migration

A schema version migration function is added, allowing the target schema to migrate from the existing one as it sees fit. If the migration is not supported, it can simply return a value indicating that. Only 1.7 => 2.0 migration is implemented.

Properties

Properties can be set on the index object, some of which are for that object only and some of which are persisted into the database itself. An implicit property of the database file name is stored when appropriate. The caller can set the directory path to output intermediate files to. The caller can also set the time (in Unix epoch) to use as the baseline for which intermediate files should be output [in practice, only an empty string (for "now") and "0" (to output everything) are likely to be used].

Validation

New unit tests are added for the specific features of this change. Existing SQLiteIndex tests were updated to give coverage on the schema 2.0 changes.

Microsoft Reviewers: Open in CodeFlow

@JohnMcPMS JohnMcPMS requested a review from a team as a code owner April 17, 2024 23:49
msftrubengu
msftrubengu previously approved these changes Apr 18, 2024

namespace AppInstaller::Manifest
{
static constexpr std::string_view s_FieldName_SchemaVersion = "sV"sv;
Copy link
Contributor

@msftrubengu msftrubengu Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add here a comment on why the keys need to be like. Basically copy paste that part from the PR description

@JohnMcPMS JohnMcPMS merged commit 7dcc3c3 into microsoft:master Apr 18, 2024
8 checks passed
@JohnMcPMS JohnMcPMS deleted the index-v2 branch April 18, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants