Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC-不要合入][filecache] refine and correct cache docs #1518

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/compute-storage-decoupled/file-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,13 @@ In a decoupled architecture, data is stored in remote storage. The Doris databas

### Eviction

To maximize the use of cache space for data access acceleration, Doris will utilize available cache space as much as possible. Cache eviction can be triggered in two scenarios: deleting remote storage data or writing when cache space is insufficient. When remote storage data is deleted, the corresponding data in the cache is directly removed. When writing when cache space is insufficient, eviction occurs in the order of Disposable, Normal Data, Index, and TTL. Data in the TTL queue will be moved to the Normal Data queue when it expires.
All types of caches share the total cache space, and proportions can be allocated based on their importance. These proportions can be set in the `be` configuration file using `file_cache_path`, with the default being: TTL: Normal: Index: Disposable = 50%: 30%: 10%: 10%.

These proportions are not rigid limits; Doris dynamically adjusts them to make full use of space. E.g., if users do not utilize TTL cache, other types can exceed the preset proportion and use the space originally allocated for TTL.

Cache eviction is triggered by two conditions: garbage collection or insufficient cache space. When users delete data or when compaction tasks end, expired cache data is asynchronously evicted. When there is not enough space to write to the cache, eviction follows the order of Disposable, Normal Data, Index, and TTL. For instance, if there is not enough space to write Normal Data, Doris will sequentially evict some Disposable, Index, and TTL data in LRU order. Note that we do not evict all data of the target type before moving on to the next type; instead, we retain at least the aforementioned proportions to ensure other types can function properly. If this process does not free up enough space, LRU eviction for the type itself will be triggered. E.g., if not enough space can be freed from other types when writing Normal Data, Normal Data will then evict its own data in LRU order.

Specifically, for the TTL queue with expiration times, when data expires, it is moved to the Normal Data queue and participates in eviction as Normal Data.

## Cache Warming

Expand Down Expand Up @@ -101,6 +107,9 @@ file_cache_total_evict_size | Total amount of data evicted from the entire File
file_cache_ttl_cache_evict_size | Total amount of data evicted from the TTL queue since startup
file_cache_ttl_cache_lru_queue_element_count | Current number of elements in the TTL queue
file_cache_ttl_cache_size | Current size of the TTL queue
file_cache_evict_by_heat\_[A]\_to\_[B] | Data from cache type A evicted due to cache type B (time-based expiration)
file_cache_evict_by_size\_[A]\_to\_[B] | Data from cache type A evicted due to cache type B (space-based expiration)
file_cache_evict_by_self_lru\_[A] | Data from cache type A evicted by its own LRU policy for new data

### SQL Profile

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ under the License.

已编译好的二进制文件(包含所有 Doris 模块)可从 [Doris 下载页面](https://doris.apache.org/download/) 获取(选择 3.0.2 或更高版本)。

### 2.2 编译产出(可选)
### 2.2 编译产出 (可选)

使用代码库自带的 `build.sh` 脚本进行编译。新增的 MS 模块通过 `--cloud` 参数编译。

Expand Down Expand Up @@ -178,7 +178,6 @@ bin/start.sh --meta-service --daemon
```bash
bin/start_fe.sh --daemon
```

第一个 FE 进程初始化集群并以 FOLLOWER 角色工作。使用 mysql 客户端连接 FE 使用 `show frontends` 确认刚才启动的 FE 是 master。

### 5.3 添加其他 FE 节点
Expand All @@ -191,7 +190,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";

将 `host:port` 替换为 FE 节点的实际地址和编辑日志端口。更多信息请参见 [ADD FOLLOWER](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-FOLLOWER.md) 和 [ADD OBSERVER](../sql-manual/sql-statements/Cluster-Management-Statements/ALTER-SYSTEM-ADD-OBSERVER.md)。

生产环境中,请确保在 FOLLOWER 角色中的前端 (FE) 节点总数,包括第一个 FE,保持为奇数。一般来说,三个 FOLLOWER 就足够了。观察者角色的前端节点可以是任意数量
生产环境中请确保在 FOLLOWER 角色中的前端 (FE) 节点总数,包括第一个 FE,保持为奇数。一般来说,三个 FOLLOWER 就足够了。OBSERVER 角色的前端节点可以是任意数量

### 5.4 添加 BE 节点

Expand All @@ -205,13 +204,17 @@ ALTER SYSTEM ADD FOLLOWER "host:port";
- 描述:指定 doris 启动模式
- 格式:cloud 表示存算分离模式,其它存算一体模式
- 示例:`cloud`

2. `file_cache_path`
- 描述: 用于文件缓存的磁盘路径和其他参数,以数组形式表示,每个磁盘一项。`path` 指定磁盘路径,`total_size` 限制缓存的大小;-1 或 0 将使用整个磁盘空间。
- 格式: [{"path":"/path/to/file_cache","total_size":21474836480},{"path":"/path/to/file_cache2","total_size":21474836480}]
- 描述: 用于文件缓存的磁盘路径和其他参数,以数组形式表示,每个磁盘一项。`path` 指定磁盘路径,`total_size` 限制缓存的大小;0 将使用整个磁盘空间。还可以加上[各类型缓存](./file-cache)的空间比例限制。
- 示例: [{"path":"/path/to/file_cache","total_size":21474836480},{"path":"/path/to/file_cache2","total_size":21474836480}]
- 示例: [{"path":"/path/to/file_cache","total_size":21474836480, "ttl_percent":50, "normal_percent":40, "disposable_percent":5, "index_percent":5}]
- 默认: [{"path":"${DORIS_HOME}/file_cache"}]

:::info 备注

file_cache_path 一旦配置并投入使用,不建议更改路径顺序,否则 BE 在重启后的一段时间内命中率将会降低。增减磁盘也会引起同样的问题。随着缓存数据不断更替,命中率将会逐渐恢复。后续版本中将会引入一致性哈希算法解决这个问题。
:::

#### 5.4.1 启动和添加 BE

1. 启动 Backend:
Expand Down Expand Up @@ -250,7 +253,7 @@ ALTER SYSTEM ADD FOLLOWER "host:port";

## 6. 创建 Storage Vault

Storage Vault 是 Doris 存算分离架构中的重要组件。它们代表了存储数据的共享存储层。您可以使用 HDFS 或兼容 S3 的对象存储创建一个或多个 Storage Vault 。可以将 Storage Vault 设置成为默认 Storage Vault ,系统表和未指定 Storage Vault 的表都将存储在这个默认 Storage Vault 中。默认 Storage Vault 不能被删除。以下是为您的 Doris 集群创建 Storage Vault 的方法:
Storage Vault 是 Doris 存算分离架构中的重要组件。它们代表了存储数据的共享存储层。您可以使用 HDFS 或兼容 S3 的对象存储创建一个或多个 Storage Vault 。可以将一个 Storage Vault 设置为默认 Storage Vault ,系统表和未指定 Storage Vault 的表都将存储在这个默认 Storage Vault 中。默认 Storage Vault 不能被删除。以下是为您的 Doris 集群创建 Storage Vault 的方法:

### 6.1 创建 HDFS Storage Vault

Expand Down
Loading