diff --git a/docs/md_v2/core/quickstart.md b/docs/md_v2/core/quickstart.md index 83cb6cef..7f245ca6 100644 --- a/docs/md_v2/core/quickstart.md +++ b/docs/md_v2/core/quickstart.md @@ -97,23 +97,28 @@ By default, Crawl4AI automatically generates Markdown from each crawled page. Ho ### Example: Using a Filter with `DefaultMarkdownGenerator` ```python -from crawl4ai import AsyncWebCrawler, CrawlerRunConfig +import asyncio +from crawl4ai import AsyncWebCrawler, CrawlerRunConfig, CacheMode from crawl4ai.content_filter_strategy import PruningContentFilter from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator -md_generator = DefaultMarkdownGenerator( - content_filter=PruningContentFilter(threshold=0.4, threshold_type="fixed") -) +async def main(): + md_generator = DefaultMarkdownGenerator( + content_filter=PruningContentFilter(threshold=0.4, threshold_type="fixed") + ) -config = CrawlerRunConfig( - cache_mode=CacheMode.BYPASS, - markdown_generator=md_generator -) + config = CrawlerRunConfig( + cache_mode=CacheMode.BYPASS, + markdown_generator=md_generator + ) + + async with AsyncWebCrawler() as crawler: + result = await crawler.arun("https://news.ycombinator.com", config=config) + print("Raw Markdown length:", result.markdown.raw_markdown) + print("Fit Markdown length:", result.markdown.fit_markdown) -async with AsyncWebCrawler() as crawler: - result = await crawler.arun("https://news.ycombinator.com", config=config) - print("Raw Markdown length:", len(result.markdown.raw_markdown)) - print("Fit Markdown length:", len(result.markdown.fit_markdown)) +if __name__ == "__main__": + asyncio.run(main()) ``` **Note**: If you do **not** specify a content filter or markdown generator, you’ll typically see only the raw Markdown. `PruningContentFilter` may adds around `50ms` in processing time. We’ll dive deeper into these strategies in a dedicated **Markdown Generation** tutorial. @@ -462,4 +467,4 @@ If you’re ready for more, check out: - **Deployment**: Explore ephemeral testing in Docker or plan for the upcoming stable Docker release. - **Browser Management**: Delve into user simulation, stealth modes, and concurrency best practices. -Crawl4AI is a powerful, flexible tool. Enjoy building out your scrapers, data pipelines, or AI-driven extraction flows. Happy crawling! \ No newline at end of file +Crawl4AI is a powerful, flexible tool. Enjoy building out your scrapers, data pipelines, or AI-driven extraction flows. Happy crawling!