Needs some installation (assuming PDFbox isn't installed already e.g. by your Linux distribution):
SVN pdfbox trunk
apt-get install maven2
mvn clean install
On successful building it, use it like:
usage: java -jar pdfbox-app-x.y.z.jar org.apache.pdfbox.ExtractText [OPTIONS]
e.g.:
rbarraud@thinky:~/Desktop/tools/PDFBox/trunk/app/target$ java -jar pdfbox-app-1.5.0-SNAPSHOT.jar ExtractText -html /home/rbarraud/Desktop/Reference/pdfs/IMX25RM.pdf /tmp/MX25RM.html
I want to grab tables out of PDF processor reference manuals to make tools for browsing machine state by register names etc.
No comments:
Post a Comment