You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Source and Segments now use typing.Annotated to document
individual parameters rather than putting it in the code comments
directly. The documentation browswer takes advantage of this to
produce cleaner parameter docs.
deflistFiles(patterns: Annotated[Iterable[str], "Iterable of file patterns or paths (supports wildcards like *, ?, [])"], full_path: Annotated[bool, "Whether to yield full absolute paths or just filenames"] =True, files_only: Annotated[bool, "Whether to include only files (excluding directories)"]=False):
88
80
"""
89
81
Lists files matching given patterns (potentially with wildcards) and yields their paths.
90
82
91
-
Args:
92
-
patterns (Iterable[str]): Iterable of file patterns or paths (supports wildcards like *, ?, []).
93
-
full_path (bool): Whether to yield full absolute paths or just filenames.
94
-
files_only (bool): Whether to include only files (excluding directories).
95
-
96
83
Yields:
97
84
str: File paths (absolute if full_path=True, filenames if full_path=False).
98
85
99
-
100
86
Raises:
101
87
None: This function does not raise exceptions for non-matching patterns.
defhtmlToTextSegment(raw: Annotated[str, "The raw HTML content to be converted"], cleanText: Annotated[bool, "Whether to clean and normalize the output text"] =True):
75
76
"""
76
77
Converts HTML content to text segment.
77
78
78
79
This function takes HTML content and converts it to plain text format.
79
80
If cleanText is enabled, the resulting text will also be cleaned so it
80
81
tries to retain only the main body content.
81
82
82
-
Args:
83
-
raw (str): The raw HTML content to be converted
84
-
cleanText (bool, optional): Whether to clean and normalize the output text. Defaults to True.
85
-
field (str): The field name to be used for the segment. If None, assuming the incoming item is html.
86
-
set_as (str): The name of the field to append the text to. If None, just pass on the cleaned text.
0 commit comments