Skip to content

Commit 996f4fb

Browse files
author
palaka
committed
developer docs changes markdown'
1 parent ab66dae commit 996f4fb

File tree

3 files changed

+146
-0
lines changed

3 files changed

+146
-0
lines changed

gatsby-browser.js

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,6 +262,13 @@ export const onRouteUpdate = ({ location, prevLocation }) => {
262262
) {
263263
pageHeadTittle = "PDF Services API Extract PDF";
264264
} else if (
265+
window.location.pathname.indexOf(
266+
"pdf-services-api/howtos/pdf-to-markdown-api/"
267+
) >= 0
268+
) {
269+
pageHeadTittle = "PDF Services API PDF to Markdown API";
270+
}
271+
else if (
265272
window.location.pathname.indexOf(
266273
"pdf-services-api/howtos/pdf-properties/"
267274
) >= 0

gatsby-config.js

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ module.exports = {
3333
description: 'Create, combine and export PDFs',
3434
path: '../document-services/apis/pdf-services/'
3535
},
36+
{
37+
title: 'PDF to Markdown',
38+
description: 'Convert PDF documents to Markdown format',
39+
path: '../document-services/apis/pdf-to-markdown/'
40+
},
3641
{
3742
title: 'PDF Accessibility Auto-Tag',
3843
description: 'Auto-tag PDF content to improve accessibility',
@@ -229,6 +234,10 @@ module.exports = {
229234
title: 'Extract PDF',
230235
path: 'overview/pdf-services-api/howtos/extract-pdf.md'
231236
},
237+
{
238+
title: 'PDF to Markdown API',
239+
path: 'overview/pdf-services-api/howtos/pdf-to-markdown-api.md'
240+
},
232241
{
233242
title: 'Get PDF Properties',
234243
path: 'overview/pdf-services-api/howtos/pdf-properties.md'
@@ -716,6 +725,10 @@ module.exports = {
716725
title: 'Extract PDF',
717726
path: 'overview/legacy-documentation/pdf-services-api/howtos/extract-pdf.md'
718727
},
728+
{
729+
title: 'PDF to Markdown API',
730+
path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-to-markdown-api.md'
731+
},
719732
{
720733
title: 'Get PDF Properties',
721734
path: 'overview/legacy-documentation/pdf-services-api/howtos/pdf-properties.md'
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
title: PDF to Markdown API | Adobe PDF Services
3+
description: Learn about the PDF to Markdown API service that converts PDF documents into well-formatted Markdown text.
4+
---
5+
6+
# PDF to Markdown API
7+
8+
The PDF to Markdown API (included with the PDF Services API) is a cloud-based web service that automatically converts PDF documents – native or scanned – into well-formatted Markdown text. This service preserves the document's structure and formatting while converting it into a format that's widely used for LLM flows, content authoring and documentation.
9+
10+
## Structured Information Output Format
11+
12+
The output of a PDF to Markdown operation includes:
13+
14+
- A primary `.md` file containing the converted Markdown content
15+
16+
### Output Structure
17+
18+
The following is a summary of key elements in the converted Markdown:
19+
20+
#### Elements
21+
22+
Ordered list of semantic elements converted from the PDF document, preserving the natural reading order and document structure. The conversion handles:
23+
24+
- Text content with proper Markdown syntax
25+
- Document hierarchy and structure
26+
- Inline formatting and emphasis
27+
- Links and references
28+
- Images and figures
29+
- Tables and complex layouts
30+
31+
#### Content Types
32+
33+
The API processes various content types as follows:
34+
35+
##### Text Elements
36+
37+
- **Headings**: Converted to appropriate Markdown heading levels (H1-H6)
38+
- **Paragraphs**: Preserved with proper spacing and formatting
39+
- **Lists**: Both ordered and unordered lists with proper nesting
40+
- **Text Emphasis**: Bold, italic, and other text formatting
41+
- **Links**: Preserved with proper Markdown link syntax
42+
43+
##### Images and Figures
44+
45+
- Provided as base64-embedded images in the Markdown output
46+
- Referenced correctly in the Markdown output
47+
- Original quality preserved
48+
- Proper alt text and captions maintained
49+
50+
##### Tables
51+
52+
- Converted to Markdown table syntax
53+
- Column alignment preserved
54+
- Cell content formatting maintained
55+
- Complex table structures supported
56+
57+
#### Element Types and Paths
58+
59+
The API recognizes and converts the following structural elements:
60+
61+
| Category | Element Type | Description |
62+
| --------- | ----------------- | --------------------------------------------------------- |
63+
| Aside | Aside | Content which is not part of regular content flow |
64+
| Figure | Figure | Non-reflowable constructs like graphs, images, flowcharts |
65+
| Footnote | Footnote | Footnote |
66+
| Headings | H, H1, H2, etc | Heading levels |
67+
| List | L, Li, Lbl, Lbody | List and list item elements |
68+
| Paragraph | P, ParagraphSpan | Paragraphs and paragraph segments |
69+
| Reference | Reference | Links |
70+
| Section | Sect | Logical section of the document |
71+
| StyleSpan | StyleSpan | Styling variations within text |
72+
| Table | Table, TD, TH, TR | Table elements |
73+
| Title | Title | Document title |
74+
75+
### Reading Order
76+
77+
The reading order in the output Markdown maintains:
78+
79+
- Natural document flow
80+
- Proper content hierarchy
81+
- Column-based layouts
82+
- Page transitions
83+
- Inline elements and references
84+
85+
## Use Cases
86+
87+
The PDF to Markdown API is particularly valuable for:
88+
89+
- LLM-friendly content ingestion and prompt creation
90+
- Training/Fine-tuning LLM with PDFs
91+
- Content migration from PDF to documentation platforms
92+
- Legacy document conversion
93+
- Content repurposing for modern documentation systems
94+
- Integration with Markdown-based workflows
95+
- Automated document processing pipelines
96+
- Searchable internal knowledge repositories
97+
98+
## API Limitations
99+
100+
### File Constraints
101+
102+
- **File Size**: Maximum of 100MB per file
103+
- **Page Count**:
104+
- Non-scanned PDFs: Up to 400 pages
105+
- Scanned PDFs: Up to 150 pages
106+
- **Page Dimensions**: Between 6" and 17.5" in either dimension
107+
108+
### Processing Limits
109+
110+
- **Rate Limits**: Maximum 25 requests per minute
111+
- **Language Support**: Optimized for English, supports other Latin-based languages
112+
- **OCR Quality**: Dependent on scan quality (minimum 200 DPI recommended)
113+
114+
### Document Requirements
115+
116+
- Files must be unprotected or allow content copying
117+
- No support for:
118+
- Hidden objects (JavaScript, OCG)
119+
- XFA and fillable forms
120+
- Complex annotations
121+
- CAD drawings or vector art
122+
- Password-protected content
123+
124+
## REST API
125+
126+
See our public API Reference for [PDF to Markdown API](../../../apis/#tag/PDF-to-Markdown).

0 commit comments

Comments
 (0)