Incorporated options to embed or save images off of PDF #1336

ebinomial · 2025-07-16T09:33:43Z

Short summary

Provided an option to use pymupdf4llm package for embedding base64 encodings of the images from PDF files into the output markdown. In addition, it allows you to save extracted images locally and refer to them from the markdown in the corresponding positions through the use of additional optional arguments as stated in pymupdf4llm API.

Changes

Added an optional boolean kwarg use_pdf4llm in _pdf_converter.py
Added another dict kwarg called args_pdf4llm to configure the use of pymupdf4llm. The dictionary is according to the original package API.
Added optional dependency use_pdf4llm to pyproject.toml

Usage

from markitdown import MarkItDown

# To save images and have them referred to
arguments = {
    "page_chunks": False,
    "write_images": True,
    "image_path": "imgs/",
    "image_format": "jpeg",
    "dpi": 150
}

# To keep image embeddings inside the resulting markdown
arguments = {
    "page_chunks": False,
    "write_images": False,
    "embed_images": True,
    "dpi": 150
}

md = MarkItDown(enable_builtins=True)
response = md.convert("input.pdf", use_pdf4llm=True, args_pdf4llm=arguments)

Related issue

Addressed #1238

ebinomial · 2025-07-16T09:51:30Z

@BrainyLark please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

Incorporated options to embed or save images off of PDF

bdfbab8

ebinomial closed this Jul 16, 2025

ebinomial reopened this Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorporated options to embed or save images off of PDF #1336

Incorporated options to embed or save images off of PDF #1336

Uh oh!

ebinomial commented Jul 16, 2025 •

edited

Loading

Uh oh!

ebinomial commented Jul 16, 2025

Uh oh!

Uh oh!

Incorporated options to embed or save images off of PDF #1336

Are you sure you want to change the base?

Incorporated options to embed or save images off of PDF #1336

Uh oh!

Conversation

ebinomial commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short summary

Changes

Usage

Related issue

Uh oh!

ebinomial commented Jul 16, 2025

Uh oh!

Uh oh!

ebinomial commented Jul 16, 2025 •

edited

Loading