Skip to content

Conversation

ebinomial
Copy link

@ebinomial ebinomial commented Jul 16, 2025

Short summary

Provided an option to use pymupdf4llm package for embedding base64 encodings of the images from PDF files into the output markdown. In addition, it allows you to save extracted images locally and refer to them from the markdown in the corresponding positions through the use of additional optional arguments as stated in pymupdf4llm API.

Changes

  • Added an optional boolean kwarg use_pdf4llm in _pdf_converter.py
  • Added another dict kwarg called args_pdf4llm to configure the use of pymupdf4llm. The dictionary is according to the original package API.
  • Added optional dependency use_pdf4llm to pyproject.toml

Usage

from markitdown import MarkItDown

# To save images and have them referred to
arguments = {
    "page_chunks": False,
    "write_images": True,
    "image_path": "imgs/",
    "image_format": "jpeg",
    "dpi": 150
}

# To keep image embeddings inside the resulting markdown
arguments = {
    "page_chunks": False,
    "write_images": False,
    "embed_images": True,
    "dpi": 150
}

md = MarkItDown(enable_builtins=True)
response = md.convert("input.pdf", use_pdf4llm=True, args_pdf4llm=arguments)

Related issue

Addressed #1238

@ebinomial ebinomial closed this Jul 16, 2025
@ebinomial ebinomial reopened this Jul 16, 2025
@ebinomial
Copy link
Author

@BrainyLark please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant