You are now following this Submission
- You will see updates in your followed content feed
- You may receive emails, depending on your communication preferences
The submission calls on PDFTextStripper class of Ben Litchfield's PDFBox Java library to extract text from a PDF document.
1. Download PDFBox library from http://sourceforge.net/projects/pdfbox/
2. Download FontBox library from http://sourceforge.net/projects/fontbox/
3. Modify the file paths in pdfParseDemo.m
4. Enable cell mode and step through pdfParseDemo.m
The code does not handle files that have 'Content Copying' permission protected by a password; collaboration to remedy the issue is enthusiastically welcomed!
Cite As
Dimitri Shvorob (2026). Extract text from a PDF document (https://in.mathworks.com/matlabcentral/fileexchange/19798-extract-text-from-a-pdf-document), MATLAB Central File Exchange. Retrieved .
General Information
- Version 1.0.0.0 (164 KB)
MATLAB Release Compatibility
- Compatible with any release
Platform Compatibility
- Windows
- macOS
- Linux
| Version | Published | Release Notes | Action |
|---|---|---|---|
| 1.0.0.0 | BSD |