Skip to content

Commit 23aa0a2

Browse files
authored
[Doc] Describe new partition lifecycle behaviours (#2992)
doc of apache/doris#57060, apache/doris#57013 and fix of #2970 ## Versions - [x] dev - [x] 3.x - [ ] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built
1 parent 81cad63 commit 23aa0a2

File tree

4 files changed

+59
-44
lines changed

4 files changed

+59
-44
lines changed

docs/table-design/data-partitioning/auto-partitioning.md

Lines changed: 29 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ When creating a table, use the following syntax to populate the `partitions_defi
6363
1. AUTO RANGE PARTITION:
6464

6565
```sql
66-
AUTO PARTITION BY RANGE(<partition_expr>)
66+
[AUTO] PARTITION BY RANGE(<partition_expr>)
6767
<origin_partitions_definition>
6868
```
6969

@@ -98,6 +98,8 @@ When creating a table, use the following syntax to populate the `partitions_defi
9898
);
9999
```
100100

101+
In AUTO RANGE PARTITION, the `AUTO` keyword can be omitted, and it still conveys the meaning of automatic partitioning.
102+
101103
2. AUTO LIST PARTITION
102104

103105
```sql
@@ -228,33 +230,43 @@ Doris supports both Auto and Dynamic Partition. In this case, both functions are
228230

229231
There is no conflict between the two syntaxes, just set the corresponding clauses/attributes at the same time. Please note that it is uncertain whether the partition in current period is created by Auto Partition or Dynamic Partition. Different creation methods will lead to different naming formats for the partitions.
230232

231-
### Best Practice
233+
## Lifecycle Management
234+
235+
:::info
236+
Doris supports the simultaneous use of automatic partitioning and dynamic partitioning for lifecycle management, but it is now not recommended.
237+
:::
238+
239+
In the AUTO RANGE PARTITION table, the property `partition.retention_count` is supported, which accepts a positive integer value as a parameter (denoted as `N`), indicating that **only the top `N` historical partitions with the largest partition values** are retained among all historical partitions. All current and future partitions are retained. Specifically:
232240

233-
In scenarios where you need to set a limit on the partition lifecycle, you can **disable the creation of Dynamic Partition, leaving the creation of partitions to be completed by Auto Partition**, and complete the management of the partition lifecycle through the Dynamic Partition's function of dynamically reclaiming partitions:
241+
- Since RANGE partitions are always non-overlapping, `partition A's value > partition B's value` is equivalent to `partition A's lower bound value > partition B's upper bound value` which is equivalent to `partition A's upper bound value > partition B's upper bound value`.
242+
- Historical partitions refer to **partitions whose upper bound is <= current time**.
243+
- Current and future partitions refer to **partitions whose lower bound is >= current time**.
244+
245+
For example:
234246

235247
```sql
236-
create table auto_dynamic(
237-
k0 datetime(6) NOT NULL
238-
)
239-
auto partition by range (date_trunc(k0, 'year'))
240-
(
248+
create table auto_recycle(
249+
k0 datetime(6) not null
241250
)
242-
DISTRIBUTED BY HASH(`k0`) BUCKETS 2
251+
AUTO PARTITION BY RANGE (date_trunc(k0, 'day')) ()
252+
DISTRIBUTED BY HASH(`k0`) BUCKETS 1
243253
properties(
244-
"dynamic_partition.enable" = "true",
245-
"dynamic_partition.prefix" = "p",
246-
"dynamic_partition.start" = "-50",
247-
"dynamic_partition.end" = "0", --- Dynamic Partition No Partition Creation
248-
"dynamic_partition.time_unit" = "year",
249-
"replication_num" = "1"
254+
"partition.retention_count" = "3"
250255
);
251256
```
252257

253-
This way we have both the flexibility of Auto Partition and consistency in partition names.
258+
This represents keeping only the top 3 partitions with the largest date values in the history. Assuming the current date is `2025-10-21`, and inserting data for each day from `2025-10-16` to `2025-10-23`, after one recycling, the remaining partitions are as follows:
259+
260+
- p20251018000000
261+
- p20251019000000
262+
- p20251020000000 (The following partition and above: Only keep three historical partitions)
263+
- p20251021000000 (The following partition and below: The current and future partitions are not affected)
264+
- p20251022000000
265+
- p20251023000000
254266

255267
## Conjunct with Auto Bucket
256268

257-
Only AUTO RANGE PARTITION can be used together with the [Auto Bucket](./data-bucketing.md#auto-setting-bucket-number) feature. When using this feature, Doris assumes that the data import is incremental in time order, and each import only involves one partition. In other words, this usage is only recommended for tables that are incrementally imported daily.
269+
Only AUTO RANGE PARTITION can be used together with the [Auto Bucket](./data-bucketing.md#auto-setting-bucket-number) feature. When using this feature, Doris assumes that the data import is incremental in time order, and each import only involves one partition. In other words, this usage is only recommended for tables that are incrementally imported batch by batch.
258270

259271
:::warning Note!
260272
If the data import method does not conform to the above pattern, and both auto partitioning and auto bucketing are used at the same time, there is a possibility that the number of buckets in the new partition is extremely unreasonable, which may greatly affect query performance.

i18n/zh-CN/docusaurus-plugin-content-docs/current/table-design/data-partitioning/auto-partitioning.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ PROPERTIES (
6161
1. AUTO RANGE PARTITION:
6262

6363
```sql
64-
AUTO PARTITION BY RANGE(<partition_expr>)
64+
[AUTO] PARTITION BY RANGE(<partition_expr>)
6565
<origin_partitions_definition>
6666
```
6767

@@ -96,6 +96,8 @@ PROPERTIES (
9696
);
9797
```
9898

99+
在 AUTO RANGE PARTITION 中,`AUTO` 关键字可以省略,仍然表达自动分区含义。
100+
99101
2. AUTO LIST PARTITION
100102

101103
```sql
@@ -217,42 +219,43 @@ show partitions from `DAILY_TRADE_VALUE`;
217219

218220
经过自动分区功能所创建的 PARTITION,与手动创建的 PARTITION 具有完全一致的功能性质。
219221

220-
## 与动态分区联用
221-
222-
Doris 支持自动分区和动态分区同时使用。此时,二者的功能都生效:
222+
## 生命周期管理
223223

224-
1. 自动分区将会自动在数据导入过程中按需创建分区;
225-
2. 动态分区将会自动创建、回收、转储分区。
224+
:::info
225+
Doris 支持同时使用自动分区与动态分区实现生命周期管理,现已不推荐。
226+
:::
226227

227-
二者语法功能不存在冲突,同时设置对应的子句/属性即可。请注意,当前时间所在的分区由自动分区还是动态分区创建,是不确定的。不同创建方式会导致分区的名称格式不同。
228+
在 AUTO RANGE PARTITION 表中,支持属性 `partition.retention_count`,接受一个正整数值作为参数(此处记为 `N`),表示在所有历史分区中,**只保留分区值最大的 `N` 个历史分区**。对于当前及未来分区,全部保留。具体来说:
228229

229-
### 最佳实践
230+
- 由于 RANGE 分区一定不相交,`分区 A 的值 > 分区 B 的值` 等价于 `分区 A 的下界值 > 分区 A 的上界值` 等价于 `分区 A 的上界值 > 分区 A 的上界值`
231+
- 历史分区指的是**分区上界 <= 当前时间**的分区。
232+
- 当前及未来分区指的是**分区下界 >= 当前时间**的分区。
230233

231-
需要对分区生命周期设限的场景,可以**将 Dynamic Partition 的创建功能关闭,创建分区完全交由 Auto Partition 完成**,通过 Dynamic Partition 动态回收分区的功能完成分区生命周期的管理
234+
例如
232235

233236
```sql
234-
create table auto_dynamic(
235-
k0 datetime(6) NOT NULL
237+
create table auto_recycle(
238+
k0 datetime(6) not null
236239
)
237-
auto partition by range (date_trunc(k0, 'year'))
238-
(
239-
)
240-
DISTRIBUTED BY HASH(`k0`) BUCKETS 2
240+
AUTO PARTITION BY RANGE (date_trunc(k0, 'day')) ()
241+
DISTRIBUTED BY HASH(`k0`) BUCKETS 1
241242
properties(
242-
"dynamic_partition.enable" = "true",
243-
"dynamic_partition.prefix" = "p",
244-
"dynamic_partition.start" = "-50",
245-
"dynamic_partition.end" = "0", --- Dynamic Partition 不创建分区
246-
"dynamic_partition.time_unit" = "year",
247-
"replication_num" = "1"
243+
"partition.retention_count" = "3"
248244
);
249245
```
250246

251-
这样我们同时具有了 Auto Partition 的灵活性,且分区名上保持了一致性。
247+
这代表只保留历史分区日期值最大的 3 个分区。假设当前日期为 `2025-10-21`,插入 `2025-10-16``2025-10-23` 中每一天的数据。则经过一次回收,剩余如下 6 个分区:
248+
249+
- p20251018000000
250+
- p20251019000000
251+
- p20251020000000(该分区及以上:只保留三个历史分区)
252+
- p20251021000000(该分区及以下:当前及未来分区不受影响)
253+
- p20251022000000
254+
- p20251023000000
252255

253256
## 与自动分桶联用
254257

255-
只有 AUTO RANGE PARTITION 可以同时使用[自动分桶](./data-bucketing.md#自动设置分桶数)功能。使用此功能时,Doris 将假设表的数据导入是按照时间顺序增量的,每次导入仅涉及一个分区。即是说,这种用法仅推荐用于按日增量导入的表
258+
只有 AUTO RANGE PARTITION 可以同时使用[自动分桶](./data-bucketing.md#自动设置分桶数)功能。使用此功能时,Doris 将假设表的数据导入是按照时间顺序增量的,每次导入仅涉及一个分区。即是说,这种用法仅推荐用于逐批次增量导入的表
256259

257260
:::warning 注意!
258261
如果数据导入方式不符合上述范式,且同时使用了自动分区和自动分桶,存在新分区的分桶数极不合理的可能,较大影响查询性能。

i18n/zh-CN/docusaurus-plugin-content-docs/version-3.x/table-design/data-partitioning/auto-partitioning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -256,10 +256,10 @@ properties(
256256
## 与自动分桶联用
257257

258258
:::note
259-
这个功能从 Doris 3.1.1 开始正常工作
259+
这个功能从 Doris 3.1.2 开始正常工作
260260
:::
261261

262-
只有 AUTO RANGE PARTITION 可以同时使用[自动分桶](./data-bucketing.md#自动设置分桶数)功能。使用此功能时,Doris 将假设表的数据导入是按照时间顺序增量的,每次导入仅涉及一个分区。即是说,这种用法仅推荐用于按日增量导入的表
262+
只有 AUTO RANGE PARTITION 可以同时使用[自动分桶](./data-bucketing.md#自动设置分桶数)功能。使用此功能时,Doris 将假设表的数据导入是按照时间顺序增量的,每次导入仅涉及一个分区。即是说,这种用法仅推荐用于逐批次增量导入的表
263263

264264
:::warning 注意!
265265
如果数据导入方式不符合上述范式,且同时使用了自动分区和自动分桶,存在新分区的分桶数极不合理的可能,较大影响查询性能。

versioned_docs/version-3.x/table-design/data-partitioning/auto-partitioning.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,10 +263,10 @@ In some early versions prior to 2.1.7, this feature was not disabled but not rec
263263
## Conjunct with Auto Bucket
264264
265265
:::note
266-
This feature has been working normally since Doris 3.1.1
266+
This feature has been working normally since Doris 3.1.2
267267
:::
268268
269-
Only AUTO RANGE PARTITION can be used together with the [Auto Bucket](./data-bucketing.md#auto-setting-bucket-number) feature. When using this feature, Doris assumes that the data import is incremental in time order, and each import only involves one partition. In other words, this usage is only recommended for tables that are incrementally imported daily.
269+
Only AUTO RANGE PARTITION can be used together with the [Auto Bucket](./data-bucketing.md#auto-setting-bucket-number) feature. When using this feature, Doris assumes that the data import is incremental in time order, and each import only involves one partition. In other words, this usage is only recommended for tables that are incrementally imported batch by batch.
270270
271271
:::warning Note!
272272
If the data import method does not conform to the above pattern, and both auto partitioning and auto bucketing are used at the same time, there is a possibility that the number of buckets in the new partition is extremely unreasonable, which may greatly affect query performance.

0 commit comments

Comments
 (0)