Skip to content

Conversation

@yinyannlp
Copy link
Contributor

貌似是chonkie库的更新导致的,初始化RecursiveChunker的时候,参数应该是
def init(
self,
tokenizer: Union[str, TokenizerProtocol] = "character",
chunk_size: int = 2048,
rules: RecursiveRules = RecursiveRules(),
min_characters_per_chunk: int = 24,
) -> None:

把corpus.py对应的参数名改过来就行了:
chunker = RecursiveChunker(
tokenizer=tokenizer,
chunk_size=chunk_size,
rules=RecursiveRules(),
min_characters_per_chunk=min_characters_per_chunk,
)

否则会导致sentence和recursive分块报错

@hm1229 hm1229 requested a review from mssssss123 October 27, 2025 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants