pyxpdf is a fast and memory efficient python module for parsing PDF documents based on xpdf reader sources.
| docs | |
|---|---|
| tests | |
| package | |
| license |
- Almost x20 times faster than pure python based pdf parsers (see Speed Comparison)
- Extract text while maintaining original document layout (best possible)
- Support almost all PDF encodings, CMaps and predefined CMaps.
- Extract LZW, RLE, CCITTFax, DCT, JBIG2 and JPX compressed images and image masks along with their BBox.
- Render PDF Pages as image with support of '1', 'L', 'LA', 'RGB', 'RGBA' and 'CMYK' color modes.
- No explict dependencies (except optional ones, see Installation)
- Thread Safe
pyxpdf is licensed under the GNU General Public License (GPL),
version 2 or 3. See the LICENSE
- xpdf reader by Derek Noonburg
- lxml - project structure and build adapted from lxml
- poppler project