Skip to content

Commit 356814b

Browse files
author
mahour
committed
added pdf 2 md docs
1 parent 040c67b commit 356814b

File tree

7 files changed

+805
-21
lines changed

7 files changed

+805
-21
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## How to develop
44

5-
For local development, simply use :
5+
For local development, simply use:
66

77
```bash
88
$ yarn install

gatsby-browser.js

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,13 @@ export const onRouteUpdate = ({ location, prevLocation }) => {
262262
) {
263263
pageHeadTittle = "PDF Services API Extract PDF";
264264
} else if (
265+
window.location.pathname.indexOf(
266+
"pdf-services-api/howtos/pdf-to-markdown-api/"
267+
) >= 0
268+
) {
269+
pageHeadTittle = "PDF Services API PDF to Markdown API";
270+
}
271+
else if (
265272
window.location.pathname.indexOf(
266273
"pdf-services-api/howtos/pdf-properties/"
267274
) >= 0

gatsby-config.js

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ module.exports = {
3333
description: 'Create, combine and export PDFs',
3434
path: '../document-services/apis/pdf-services/'
3535
},
36+
{
37+
title: 'PDF to Markdown',
38+
description: 'Convert PDF documents to Markdown format',
39+
path: '../document-services/apis/pdf-to-markdown/'
40+
},
3641
{
3742
title: 'PDF Accessibility Auto-Tag',
3843
description: 'Auto-tag PDF content to improve accessibility',
@@ -229,6 +234,10 @@ module.exports = {
229234
title: 'Extract PDF',
230235
path: 'overview/pdf-services-api/howtos/extract-pdf.md'
231236
},
237+
{
238+
title: 'PDF to Markdown API',
239+
path: 'overview/pdf-services-api/howtos/pdf-to-markdown-api.md'
240+
},
232241
{
233242
title: 'Get PDF Properties',
234243
path: 'overview/pdf-services-api/howtos/pdf-properties.md'
@@ -716,6 +725,10 @@ module.exports = {
716725
title: 'Extract PDF',
717726
path: 'overview/legacy-documentation/pdf-services-api/howtos/extract-pdf.md'
718727
},
728+
{
729+
title: 'PDF to Markdown API',
730+
path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-to-markdown-api.md'
731+
},
719732
{
720733
title: 'Get PDF Properties',
721734
path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-properties.md'

src/pages/apis/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Adobe PDF Services Open API spec
33
description: The OpenAPI spec for Adobe PDF Services API endpoints, parameters, and responses.
4-
openAPISpec: https://raw.githubusercontent.com/AdobeDocs/pdfservices-api-documentation/main/src/pages/resources/openapi.json
4+
openAPISpec: https://raw.githubusercontent.com/AdobeDocs/pdfservices-api-documentation/pdf2md/src/pages/resources/openapi.json
55
---
66
-[]

src/pages/overview/pdf-services-api/howtos/pdf-accessibility-checker-api.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: PDF Accessibility Checker | How Tos | PDF Services API | Adobe PDF Servic
33
---
44
# PDF Accessibility Checker
55

6-
The Accessibility Checker API verifies if PDF files meet the machine-verifiable requirements of PDF/UA and WCAG 2.0. It generates a report summarizing the findings of the accessibility checks. Additional human remediation may be required to ensure the reading order of elements is correct and that alternative text tags properly convey the meaning of images. The report contains links to documentation that assists in manually fixing problems using Adobe Acrobat Pro.
6+
The Accessibility Checker API verifies if PDF files meet the machine-verifiable requirements of PDF/UA and WCAG. It generates a report summarizing the findings of the accessibility checks. Additional human remediation may be required to ensure the reading order of elements is correct and that alternative text tags properly convey the meaning of images. The report contains links to documentation that assists in manually fixing problems using Adobe Acrobat Pro.
77

88
## API Parameters
99

@@ -316,7 +316,6 @@ curl --location --request POST 'https://pdf-services.adobe.io/operation/accessib
316316
}'
317317
```
318318
319-
320319
## Check accessibility for specified pages
321320
322321
The sample below performs an accessibility check operation for specified pages of a given PDF.
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
title: PDF to Markdown API | Adobe PDF Services
3+
description: Learn about the PDF to Markdown API service that converts PDF documents into well-formatted Markdown text.
4+
---
5+
6+
# PDF to Markdown API
7+
8+
The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.
9+
10+
## Structured Information Output Format
11+
12+
The output of a PDF to Markdown operation includes:
13+
14+
- A primary `.md` file containing the converted Markdown content
15+
16+
### Output Structure
17+
18+
The following is a summary of key elements in the converted Markdown:
19+
20+
#### Elements
21+
22+
Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:
23+
24+
- Text content with proper Markdown syntax
25+
- Document hierarchy and structure
26+
- Inline formatting and emphasis
27+
- Links and references
28+
- Images and figures
29+
- Tables and complex layouts
30+
31+
#### Content Types
32+
33+
The API processes various content types as follows:
34+
35+
##### Text Elements
36+
37+
- **Headings**: Converted to appropriate Markdown heading levels (H1-H6)
38+
- **Paragraphs**: Preserved with proper spacing and formatting
39+
- **Lists**: Both ordered and unordered lists with proper nesting
40+
- **Text Emphasis**: Bold, italic, and other text formatting
41+
- **Links**: Preserved with proper Markdown link syntax
42+
43+
##### Images and Figures
44+
45+
- Provided as base64-embedded images in the Markdown output
46+
- Referenced correctly in the Markdown output
47+
- Original quality preserved
48+
- Proper alt text and captions maintained
49+
50+
##### Tables
51+
52+
- Converted to Markdown table syntax
53+
- Column alignment preserved
54+
- Cell content formatting maintained
55+
- Complex table structures supported
56+
57+
#### Element Types and Paths
58+
59+
The API recognizes and converts the following structural elements:
60+
61+
| Category | Element Type | Description |
62+
| --------- | ----------------- | --------------------------------------------------------- |
63+
| Aside | Aside | Content which is not part of regular content flow |
64+
| Figure | Figure | Non-reflowable constructs like graphs, images, flowcharts |
65+
| Footnote | Footnote | Footnote |
66+
| Headings | H, H1, H2, etc | Heading levels |
67+
| List | L, Li, Lbl, Lbody | List and list item elements |
68+
| Paragraph | P, ParagraphSpan | Paragraphs and paragraph segments |
69+
| Reference | Reference | Links |
70+
| Section | Sect | Logical section of the document |
71+
| StyleSpan | StyleSpan | Styling variations within text |
72+
| Table | Table, TD, TH, TR | Table elements |
73+
| Title | Title | Document title |
74+
75+
### Reading Order
76+
77+
The reading order in the output Markdown maintains:
78+
79+
- Natural document flow
80+
- Proper content hierarchy
81+
- Column-based layouts
82+
- Page transitions
83+
- Inline elements and references
84+
85+
## Use Cases
86+
87+
The PDF to Markdown API is particularly valuable for:
88+
89+
- LLM-friendly content ingestion and prompt creation
90+
- Training/Fine-tuning LLM with PDFs
91+
- Content migration from PDF to documentation platforms
92+
- Legacy document conversion
93+
- Content repurposing for modern documentation systems
94+
- Integration with Markdown-based workflows
95+
- Automated document processing pipelines
96+
- Searchable internal knowledge repositories
97+
98+
## API Limitations
99+
100+
### File Constraints
101+
102+
- **File Size**: Maximum of 100MB per file
103+
- **Page Count**:
104+
- Non-scanned PDFs: Up to 400 pages
105+
- Scanned PDFs: Up to 150 pages
106+
- **Page Dimensions**: Between 6" and 17.5" in either dimension
107+
108+
### Processing Limits
109+
110+
- **Rate Limits**: Maximum 25 requests per minute
111+
- **Language Support**: Optimized for English, supports other Latin-based languages
112+
- **OCR Quality**: Dependent on scan quality (minimum 200 DPI recommended)
113+
114+
### Document Requirements
115+
116+
- Files must be unprotected or allow content copying
117+
- No support for:
118+
- Hidden objects (JavaScript, OCG)
119+
- XFA and fillable forms
120+
- Complex annotations
121+
- CAD drawings or vector art
122+
- Password-protected content
123+
124+
## REST API
125+
126+
See our public API Reference for [PDF to Markdown API](../../../apis/#tag/PDF-To-Markdown).

0 commit comments

Comments
 (0)