developer docs changes markdown'

palaka · palaka · commit 996f4fbba4fc · 2025-09-07T22:16:11.000+05:30
diff --git a/gatsby-browser.js b/gatsby-browser.js
@@ -262,6 +262,13 @@ export const onRouteUpdate = ({ location, prevLocation }) => {
       ) {
         pageHeadTittle = "PDF Services API Extract PDF";
       } else if (
+        window.location.pathname.indexOf(
+          "pdf-services-api/howtos/pdf-to-markdown-api/"
+        ) >= 0
+      ) {
+        pageHeadTittle = "PDF Services API PDF to Markdown API";
+      }
+      else if (
         window.location.pathname.indexOf(
           "pdf-services-api/howtos/pdf-properties/"
         ) >= 0
diff --git a/gatsby-config.js b/gatsby-config.js
@@ -33,6 +33,11 @@ module.exports = {
                         description: 'Create, combine and export PDFs',
                         path: '../document-services/apis/pdf-services/'
                     },
+                    {
+                        title: 'PDF to Markdown',
+                        description: 'Convert PDF documents to Markdown format',
+                        path: '../document-services/apis/pdf-to-markdown/'
+                    },
                     {
                         title: 'PDF Accessibility Auto-Tag',
                         description: 'Auto-tag PDF content to improve accessibility',
@@ -229,6 +234,10 @@ module.exports = {
                                 title: 'Extract PDF',
                                 path: 'overview/pdf-services-api/howtos/extract-pdf.md'
                             },
+                            {
+                                title: 'PDF to Markdown API',
+                                path: 'overview/pdf-services-api/howtos/pdf-to-markdown-api.md'
+                            },
                             {
                                 title: 'Get PDF Properties',
                                 path: 'overview/pdf-services-api/howtos/pdf-properties.md'
@@ -716,6 +725,10 @@ module.exports = {
                                         title: 'Extract PDF',
                                         path: 'overview/legacy-documentation/pdf-services-api/howtos/extract-pdf.md'
                                     },
+                                    {
+                                        title: 'PDF to Markdown API',
+                                        path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-to-markdown-api.md'
+                                    },
                                     {
                                         title: 'Get PDF Properties',
                                         path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-properties.md'
diff --git a/src/pages/overview/pdf-services-api/howtos/pdf-to-markdown-api.md b/src/pages/overview/pdf-services-api/howtos/pdf-to-markdown-api.md
@@ -0,0 +1,126 @@
+---
+title: PDF to Markdown API | Adobe PDF Services
+description: Learn about the PDF to Markdown API service that converts PDF documents into well-formatted Markdown text.
+---
+
+# PDF to Markdown API
+
+The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.
+
+## Structured Information Output Format
+
+The output of a PDF to Markdown operation includes:
+
+- A primary `.md` file containing the converted Markdown content
+
+### Output Structure
+
+The following is a summary of key elements in the converted Markdown:
+
+#### Elements
+
+Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:
+
+- Text content with proper Markdown syntax
+- Document hierarchy and structure
+- Inline formatting and emphasis
+- Links and references
+- Images and figures
+- Tables and complex layouts
+
+#### Content Types
+
+The API processes various content types as follows:
+
+##### Text Elements
+
+- **Headings**: Converted to appropriate Markdown heading levels (H1-H6)
+- **Paragraphs**: Preserved with proper spacing and formatting
+- **Lists**: Both ordered and unordered lists with proper nesting
+- **Text Emphasis**: Bold, italic, and other text formatting
+- **Links**: Preserved with proper Markdown link syntax
+
+##### Images and Figures
+
+- Provided as base64-embedded images in the Markdown output
+- Referenced correctly in the Markdown output
+- Original quality preserved
+- Proper alt text and captions maintained
+
+##### Tables
+
+- Converted to Markdown table syntax
+- Column alignment preserved
+- Cell content formatting maintained
+- Complex table structures supported
+
+#### Element Types and Paths
+
+The API recognizes and converts the following structural elements:
+
+| Category  | Element Type      | Description                                               |
+| --------- | ----------------- | --------------------------------------------------------- |
+| Aside     | Aside             | Content which is not part of regular content flow         |
+| Figure    | Figure            | Non-reflowable constructs like graphs, images, flowcharts |
+| Footnote  | Footnote          | Footnote                                                  |
+| Headings  | H, H1, H2, etc    | Heading levels                                            |
+| List      | L, Li, Lbl, Lbody | List and list item elements                               |
+| Paragraph | P, ParagraphSpan  | Paragraphs and paragraph segments                         |
+| Reference | Reference         | Links                                                     |
+| Section   | Sect              | Logical section of the document                           |
+| StyleSpan | StyleSpan         | Styling variations within text                            |
+| Table     | Table, TD, TH, TR | Table elements                                            |
+| Title     | Title             | Document title                                            |
+
+### Reading Order
+
+The reading order in the output Markdown maintains:
+
+- Natural document flow
+- Proper content hierarchy
+- Column-based layouts
+- Page transitions
+- Inline elements and references
+
+## Use Cases
+
+The PDF to Markdown API is particularly valuable for:
+
+- LLM-friendly content ingestion and prompt creation
+- Training/Fine-tuning LLM with PDFs
+- Content migration from PDF to documentation platforms
+- Legacy document conversion
+- Content repurposing for modern documentation systems
+- Integration with Markdown-based workflows
+- Automated document processing pipelines
+- Searchable internal knowledge repositories
+
+## API Limitations
+
+### File Constraints
+
+- **File Size**: Maximum of 100MB per file
+- **Page Count**:
+  - Non-scanned PDFs: Up to 400 pages
+  - Scanned PDFs: Up to 150 pages
+- **Page Dimensions**: Between 6" and 17.5" in either dimension
+
+### Processing Limits
+
+- **Rate Limits**: Maximum 25 requests per minute
+- **Language Support**: Optimized for English, supports other Latin-based languages
+- **OCR Quality**: Dependent on scan quality (minimum 200 DPI recommended)
+
+### Document Requirements
+
+- Files must be unprotected or allow content copying
+- No support for:
+  - Hidden objects (JavaScript, OCG)
+  - XFA and fillable forms
+  - Complex annotations
+  - CAD drawings or vector art
+  - Password-protected content
+
+## REST API
+
+See our public API Reference for [PDF to Markdown API](../../../apis/#tag/PDF-to-Markdown).