added pdf 2 md docs

mahour · mahour · commit 356814b704d7 · 2025-09-22T12:40:02.000+05:30
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 ## How to develop
 
-For local development, simply use :
+For local development, simply use:
 
 ```bash
 $ yarn install
diff --git a/gatsby-browser.js b/gatsby-browser.js
@@ -262,6 +262,13 @@ export const onRouteUpdate = ({ location, prevLocation }) => {
       ) {
         pageHeadTittle = "PDF Services API Extract PDF";
       } else if (
+        window.location.pathname.indexOf(
+          "pdf-services-api/howtos/pdf-to-markdown-api/"
+        ) >= 0
+      ) {
+        pageHeadTittle = "PDF Services API PDF to Markdown API";
+      }
+      else if (
         window.location.pathname.indexOf(
           "pdf-services-api/howtos/pdf-properties/"
         ) >= 0
diff --git a/gatsby-config.js b/gatsby-config.js
@@ -33,6 +33,11 @@ module.exports = {
                         description: 'Create, combine and export PDFs',
                         path: '../document-services/apis/pdf-services/'
                     },
+                    {
+                        title: 'PDF to Markdown',
+                        description: 'Convert PDF documents to Markdown format',
+                        path: '../document-services/apis/pdf-to-markdown/'
+                    },
                     {
                         title: 'PDF Accessibility Auto-Tag',
                         description: 'Auto-tag PDF content to improve accessibility',
@@ -229,6 +234,10 @@ module.exports = {
                                 title: 'Extract PDF',
                                 path: 'overview/pdf-services-api/howtos/extract-pdf.md'
                             },
+                            {
+                                title: 'PDF to Markdown API',
+                                path: 'overview/pdf-services-api/howtos/pdf-to-markdown-api.md'
+                            },
                             {
                                 title: 'Get PDF Properties',
                                 path: 'overview/pdf-services-api/howtos/pdf-properties.md'
@@ -716,6 +725,10 @@ module.exports = {
                                         title: 'Extract PDF',
                                         path: 'overview/legacy-documentation/pdf-services-api/howtos/extract-pdf.md'
                                     },
+                                    {
+                                        title: 'PDF to Markdown API',
+                                        path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-to-markdown-api.md'
+                                    },
                                     {
                                         title: 'Get PDF Properties',
                                         path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-properties.md'
diff --git a/src/pages/apis/index.md b/src/pages/apis/index.md
@@ -1,6 +1,6 @@
 ---
 title: Adobe PDF Services Open API spec
 description: The OpenAPI spec for Adobe PDF Services API endpoints, parameters, and responses.
-openAPISpec: https://raw.githubusercontent.com/AdobeDocs/pdfservices-api-documentation/main/src/pages/resources/openapi.json
+openAPISpec: https://raw.githubusercontent.com/AdobeDocs/pdfservices-api-documentation/pdf2md/src/pages/resources/openapi.json
 ---
 -[]
diff --git a/src/pages/overview/pdf-services-api/howtos/pdf-accessibility-checker-api.md b/src/pages/overview/pdf-services-api/howtos/pdf-accessibility-checker-api.md
@@ -3,7 +3,7 @@ title: PDF Accessibility Checker | How Tos | PDF Services API | Adobe PDF Servic
 ---
 # PDF Accessibility Checker
 
-The Accessibility Checker API verifies if PDF files meet the machine-verifiable requirements of PDF/UA and WCAG 2.0. It generates a report summarizing the findings of the accessibility checks. Additional human remediation may be required to ensure the reading order of elements is correct and that alternative text tags properly convey the meaning of images. The report contains links to documentation that assists in manually fixing problems using Adobe Acrobat Pro.
+The Accessibility Checker API verifies if PDF files meet the machine-verifiable requirements of PDF/UA and WCAG. It generates a report summarizing the findings of the accessibility checks. Additional human remediation may be required to ensure the reading order of elements is correct and that alternative text tags properly convey the meaning of images. The report contains links to documentation that assists in manually fixing problems using Adobe Acrobat Pro.
 
 ## API Parameters
 
@@ -316,7 +316,6 @@ curl --location --request POST 'https://pdf-services.adobe.io/operation/accessib
 }'
 ```
 
-
 ## Check accessibility for specified pages
 
 The sample below performs an accessibility check operation for specified pages of a given PDF.
diff --git a/src/pages/overview/pdf-services-api/howtos/pdf-to-markdown-api.md b/src/pages/overview/pdf-services-api/howtos/pdf-to-markdown-api.md
@@ -0,0 +1,126 @@
+---
+title: PDF to Markdown API | Adobe PDF Services
+description: Learn about the PDF to Markdown API service that converts PDF documents into well-formatted Markdown text.
+---
+
+# PDF to Markdown API
+
+The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.
+
+## Structured Information Output Format
+
+The output of a PDF to Markdown operation includes:
+
+- A primary `.md` file containing the converted Markdown content
+
+### Output Structure
+
+The following is a summary of key elements in the converted Markdown:
+
+#### Elements
+
+Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:
+
+- Text content with proper Markdown syntax
+- Document hierarchy and structure
+- Inline formatting and emphasis
+- Links and references
+- Images and figures
+- Tables and complex layouts
+
+#### Content Types
+
+The API processes various content types as follows:
+
+##### Text Elements
+
+- **Headings**: Converted to appropriate Markdown heading levels (H1-H6)
+- **Paragraphs**: Preserved with proper spacing and formatting
+- **Lists**: Both ordered and unordered lists with proper nesting
+- **Text Emphasis**: Bold, italic, and other text formatting
+- **Links**: Preserved with proper Markdown link syntax
+
+##### Images and Figures
+
+- Provided as base64-embedded images in the Markdown output
+- Referenced correctly in the Markdown output
+- Original quality preserved
+- Proper alt text and captions maintained
+
+##### Tables
+
+- Converted to Markdown table syntax
+- Column alignment preserved
+- Cell content formatting maintained
+- Complex table structures supported
+
+#### Element Types and Paths
+
+The API recognizes and converts the following structural elements:
+
+| Category  | Element Type      | Description                                               |
+| --------- | ----------------- | --------------------------------------------------------- |
+| Aside     | Aside             | Content which is not part of regular content flow         |
+| Figure    | Figure            | Non-reflowable constructs like graphs, images, flowcharts |
+| Footnote  | Footnote          | Footnote                                                  |
+| Headings  | H, H1, H2, etc    | Heading levels                                            |
+| List      | L, Li, Lbl, Lbody | List and list item elements                               |
+| Paragraph | P, ParagraphSpan  | Paragraphs and paragraph segments                         |
+| Reference | Reference         | Links                                                     |
+| Section   | Sect              | Logical section of the document                           |
+| StyleSpan | StyleSpan         | Styling variations within text                            |
+| Table     | Table, TD, TH, TR | Table elements                                            |
+| Title     | Title             | Document title                                            |
+
+### Reading Order
+
+The reading order in the output Markdown maintains:
+
+- Natural document flow
+- Proper content hierarchy
+- Column-based layouts
+- Page transitions
+- Inline elements and references
+
+## Use Cases
+
+The PDF to Markdown API is particularly valuable for:
+
+- LLM-friendly content ingestion and prompt creation
+- Training/Fine-tuning LLM with PDFs
+- Content migration from PDF to documentation platforms
+- Legacy document conversion
+- Content repurposing for modern documentation systems
+- Integration with Markdown-based workflows
+- Automated document processing pipelines
+- Searchable internal knowledge repositories
+
+## API Limitations
+
+### File Constraints
+
+- **File Size**: Maximum of 100MB per file
+- **Page Count**:
+  - Non-scanned PDFs: Up to 400 pages
+  - Scanned PDFs: Up to 150 pages
+- **Page Dimensions**: Between 6" and 17.5" in either dimension
+
+### Processing Limits
+
+- **Rate Limits**: Maximum 25 requests per minute
+- **Language Support**: Optimized for English, supports other Latin-based languages
+- **OCR Quality**: Dependent on scan quality (minimum 200 DPI recommended)
+
+### Document Requirements
+
+- Files must be unprotected or allow content copying
+- No support for:
+  - Hidden objects (JavaScript, OCG)
+  - XFA and fillable forms
+  - Complex annotations
+  - CAD drawings or vector art
+  - Password-protected content
+
+## REST API
+
+See our public API Reference for [PDF to Markdown API](../../../apis/#tag/PDF-To-Markdown).
diff --git a/src/pages/resources/openapi.json b/src/pages/resources/openapi.json