Olá pessoal, estou tentando ler um arquivo de imagem que foi transformado em OCR com o ABBYY, estou utilizando a API PDFBox para fazer a leitura.
O código é:
public static String lerPDF(String aARQUIVO) throws IOException{
PDFTextStripper stripper = new PDFTextStripper();
PDDocument pdDoc = PDDocument.load(aARQUIVO);
StringWriter writer = new StringWriter();
stripper.writeText(pdDoc, writer);
return writer.toString();
}
Estou obtendo o seguinte erro:
Dec 12, 2012 10:58:16 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:80)
at org.apache.pdfbox.pdmodel.PDResources.getFonts(PDResources.java:187)
at org.apache.pdfbox.util.PDFStreamEngine.getFonts(PDFStreamEngine.java:606)
at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:54)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at lerpdf.PDFBoxAux.lerPDF(PDFBoxAux.java:24)
at lerpdf.Main.main(Main.java:25)
Dec 12, 2012 10:58:16 AM org.apache.pdfbox.util.PDFStreamEngine processEncodedText
WARNING: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:357)
at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at lerpdf.PDFBoxAux.lerPDF(PDFBoxAux.java:24)
at lerpdf.Main.main(Main.java:25)
Dec 12, 2012 10:58:16 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.pdfbox.util.PDFStreamEngine.processEncodedText(PDFStreamEngine.java:366)
at org.apache.pdfbox.util.operator.ShowTextGlyph.process(ShowTextGlyph.java:62)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at lerpdf.PDFBoxAux.lerPDF(PDFBoxAux.java:24)
at lerpdf.Main.main(Main.java:25)
Dec 12, 2012 10:58:16 AM org.apache.pdfbox.util.PDFStreamEngine processOperator
WARNING: java.io.IOException: Error: Could not find font(COSName{F0}) in map={}
java.io.IOException: Error: Could not find font(COSName{F0}) in map={}
at org.apache.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:57)
at org.apache.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:556)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:270)
at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:237)
at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:217)
at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:448)
at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:372)
at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:328)
at lerpdf.PDFBoxAux.lerPDF(PDFBoxAux.java:24)
at lerpdf.Main.main(Main.java:25)
…
Alguém sabe me dizer oq está acontecendo?
desde já agradeço.