pdf - PDFBox - document is empty after loading -


i using apache pdfbox rendering thumbnails of pdf documents. therefore load pdf , use first page thumbnail. problem is, particular document, seems, not loaded correctly. other docs, works expected.

bytearrayinputstream = new bytearrayinputstream(pdfdata);

pddocument pdf = pddocument.load(is, true);

list<pdpage> pages = pdf.getdocumentcatalog().getallpages(); //pages empty here

the pdf file has 238 pages , around 6,5 mb of size.

assuming you're using 1.8.* version, please use non sequential parser:

pddocument pdf = pddocument.loadnonseq(is, null); 

the non sequential parser successful in cases old parser fails, e.g. pdfs have had revisions (example). advantage no code needed "protected" pdfs encrypted empty password.


Comments

Popular posts from this blog

java - UnknownEntityTypeException: Unable to locate persister (Hibernate 5.0) -

python - ValueError: empty vocabulary; perhaps the documents only contain stop words -

ubuntu - collect2: fatal error: ld terminated with signal 9 [Killed] -