Problems to read a PDF with iText7 (work with iText5)

Problems to read a PDF with iText7 (work with iText5) - java

Here is the code to read a PDF with iText5, and it works :
public class CreateTOC {
public static final String SRC = "file.pdf";
class FontRenderFilter extends RenderFilter {
public boolean allowText(TextRenderInfo renderInfo) {
String font = renderInfo.getFont().getPostscriptFontName();
return font.endsWith("Bold") || font.endsWith("Oblique");
}
}
public static void main(String[] args) throws IOException, DocumentException {
new CreateTOC().parse(SRC);
}
public void parse(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
Rectangle rect = new Rectangle(1000, 1000);
RenderFilter regionFilter = new RegionTextRenderFilter(rect);
FontRenderFilter fontFilter = new FontRenderFilter();
TextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), regionFilter, fontFilter);
System.out.println(PdfTextExtractor.getTextFromPage(reader, 56, strategy));
reader.close();
}
}
Can someone help me to do it working in iText7 ? There are problems with the Rectangle and the TextExtractionStrategy (it's not the same constructor as iText5)
Edit : RenderFilter isn't available in iText7...

Related

How to add fields with the same names to pdf

I'm using itext 7.1.8 and I need to add fields with the same names to pdf. I use the code like the following:
public class Main {
public static void main(String[] args) {
final PdfDocument emptyPdfDocument = createEmptyPdfDocument(pdf);
addTextField("Text_1", "Hello", emptyPdfDocument.getFirstPage(), PdfAcroForm.getAcroForm(emptyPdfDocument, true), emptyPdfDocument);
addTextField("Text_1", "Hello", emptyPdfDocument.addNewPage(), PdfAcroForm.getAcroForm(emptyPdfDocument, true), emptyPdfDocument);
savePdf(emptyPdfDocument);
}
private static void addTextField(String name, String value, PdfPage page, PdfAcroForm form, PdfDocument pdf) {
PdfFormField field = form.getField(name);
final Rectangle rect = new Rectangle(100, page.getCropBox().getHeight() - 100, 300, 20);
if (field != null) {
PdfWidgetAnnotation annotation = new PdfWidgetAnnotation(rect);
annotation.makeIndirect(pdf);
annotation.setVisibility(VISIBLE);
field.addKid(annotation);
page.addAnnotation(annotation);
return;
}
field = PdfFormField.createText(pdf, rect, name);
field.setValue(value);
field.setVisibility(VISIBLE);
page.addAnnotation(field.getWidgets().get(0));
form.addField(field, page);
}
private static PdfDocument createEmptyPdfDocument(final String pdfPath) throws IOException {
PdfWriter pdfWriter = new PdfWriter(new FileOutputStream(pdfPath));
final PdfDocument pdfDocument = new PdfDocument(pdfWriter);
pdfDocument.addNewPage();
return pdfDocument;
}
public static void savePdf(PdfDocument pdf) {
pdf.close();
}
}
but when the method addTextField has been called the second time the kids of the field are empty.
I don't understand what I'm doing wrong.

Mockito UnfinishedVerificationException

The problem is the following. I have several reports that I want to mock and test with Mockito. Each report gives the same UnfinishedVerificationException and nothing that I tried so far worked in order to fix the issue. Example of one of the reports with all parents is below.
I changed any to anyString.
Change ReportSaver from interface to abstract class
Add validateMockitoUsage to nail the right test
Looked into similar Mockito-related cases on StackOverflow
Test:
public class ReportProcessorTest {
private ReportProcessor reportProcessor;
private ByteArrayOutputStream mockOutputStream = (new ReportProcessorMock()).mock();
#SuppressWarnings("serial")
private final static Map<String, Object> epxectedMaps = new HashMap<String, Object>();
#Before
public void setUp() throws IOException {
reportProcessor = mock(ReportProcessor.class);
ReflectionTestUtils.setField(reportProcessor, "systemOffset", "Europe/Berlin");
ReflectionTestUtils.setField(reportProcessor, "redisKeyDelimiter", "#");
Mockito.doNothing().when(reportProcessor).saveReportToDestination(Mockito.any(), Mockito.anyString());
Mockito.doCallRealMethod().when(reportProcessor).process(Mockito.any());
}
#Test
public void calculateSales() throws IOException {
Map<String, Object> processedReport = reportProcessor.process(mockOutputStream);
verify(reportProcessor, times(1)); // The line that cause troubles
assertThat(Maps.difference(processedReport, epxectedMaps).areEqual(), Matchers.is(true));
}
#After
public void validate() {
Mockito.validateMockitoUsage();
}
}
Class under test:
#Component
public class ReportProcessor extends ReportSaver {
#Value("${system.offset}")
private String systemOffset;
#Value("${report.relativePath}")
private String destinationPathToSave;
#Value("${redis.delimiter}")
private String redisKeyDelimiter;
public Map<String, Object> process(ByteArrayOutputStream outputStream) throws IOException {
saveReportToDestination(outputStream, destinationPathToSave);
Map<String, Object> report = new HashMap<>();
try (InputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
InputStreamReader reader = new InputStreamReader(inputStream)) {
CSVReaderHeaderAware csvReader = new CSVReaderFormatter(outputStream).headerAware(reader);
Map<String, String> data;
while ((data = csvReader.readMap()) != null) {
String data = data.get("data").toUpperCase();
Long quantity = NumberUtils.toLong(data.get("quantity"));
report.put(data, quantity);
}
}
return report;
}
}
Parent class:
public abstract class ReportSaver {
public void saveReportToDestination(ByteArrayOutputStream outputStream, String destinationPathToSave) throws IOException {
File destinationFile = new File(destinationPathToSave);
destinationFile.getParentFile().mkdirs();
destinationFile.delete();
destinationFile.createNewFile();
OutputStream fileOutput = new FileOutputStream(destinationFile);
outputStream.writeTo(fileOutput);
}
}
Mock:
public class ReportProcessorMock implements GeneralReportProcessorMock {
private static final String report = ""; // There can be some data in here
#Override
public ByteArrayOutputStream mock() {
byte[] reportBytes = report.getBytes();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(reportBytes.length);
outputStream.write(reportBytes, 0, reportBytes.length);
return outputStream;
}
}

When you verify, you verify a particular public method of the mock:
verify(reportProcessor, times(1)).process(mockOutputStream);
or use a wildcard if appropriate:
verify(reportProcessor, times(1)).process(any(ByteArrayOutputStream.class));

Extract text from pdf file by pdfbox

i am facing an issue in pdf reading.
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
Map<String, String> auMap = new HashMap();
boolean objFlag = false;
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
PDDocument document = null;
String fileName = "E:\\sample.pdf";
try {
int i;
document = PDDocument.load(new File(fileName));
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// print lines
for (String line : lines) {
//System.out.println("line = " + line);
if (line.matches("(.*)Objection(.*)")) {
System.out.println(line);
withObjection(lines);
//System.out.println("iiiiiiiiiiii");
break;
}
//System.out.println("uuuuuuuuuuuuuu");
}
} finally {
if (document != null) {
document.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
System.out.println("textPositions = " + string);
// System.out.println("tex "+textPositions.get(0).getFont()+ getArticleEnd());
// you may process the line here itself, as and when it is obtained
}
}
in need a output like
My pdf have some title, we need to skip the same.
pdf file content is
how to extract text as in separate formats as specified.
thanks in advance.

Error with CognitiveJ/ExampleCode

I would like to use CognitiveJ (GitHub from CognitiveJ) but all I get is:
Status:401; Body: {"error":{"code":"Unspecified","message":"Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key."}}
Here is the Code:
public static String lic1 = "xxx";
public static String lic2 = "xxx";
public static void main(String[] args) throws IOException {
new Bildkontrolle();
}
public Bildkontrolle() throws IOException {
File imageFile = new File("E:\\DSC00306.jpg");
new FaceRecognicion(lic1, lic2, imageFile);
}
And here the second class:
public FaceRecognicion(String lic1, String lic2, File imageFile) throws IOException {
BufferedImage bufImage = ImageIO.read(imageFile);
InputStream inpStream = new FileInputStream(imageFile);
FaceScenarios faceScenarios = new FaceScenarios(lic1,
lic1);
ImageOverlayBuilder imageOverlayBuilder = ImageOverlayBuilder.builder(bufImage);
imageOverlayBuilder.outlineFacesOnImage(faceScenarios.findFaces(inpStream), RectangleType.FULL,
CognitiveJColourPalette.STRAWBERRY).launchViewer();
}
Does anyone have an examplecode where I can look up how to use the API.
I stuck at the point where to send the Request.

OpenNLP train Thai language

I am experimenting with OpenNlp 1.7.2 and maxent-3.0.0.jar to train for thai language , below is the code that reads thai train data and creates the bin model.
public class TrainPerson {
public static void main(String[] args) throws IOException {
String trainFile = "/Documents/workspace/ThaiOpenNLP/bin/thaiPerson.train";
String modelFile = "/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
writePersonModel(trainFile, modelFile);
}
private static void writePersonModel(String trainFile, String modelFile)
throws FileNotFoundException, IOException {
Charset charset = Charset.forName("UTF-8");
InputStreamFactory fileInputStream = new MarkableFileInputStreamFactory(new File(trainFile));
ObjectStream<String> lineStream = new PlainTextByLineStream(fileInputStream, charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
try {
model = NameFinderME.train("th", "person", sampleStream , TrainingParameters.defaultParams(), new TokenNameFinderFactory());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null) {
modelOut.close();
}
}
}}
Thai data looks like as attached in the file trainingData
I am using the output model to detect person name as shown in the below programme. It fails to identify the name.
public class ThaiPersonNameFinder {
static String modelFile = "/Users/avinashpaula/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
public static void main(String[] args) {
try {
InputStream modelIn = new FileInputStream(new File(modelFile));
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
String sentence[] = new String[]{
"จอห์น",
"30",
"ปี",
"จะ",
"เข้าร่วม",
"ก",
"เริ่มต้น",
"ขึ้น",
"บน",
"มกราคม",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
for (int i = 0; i < nameSpans.length; i++) {
System.out.println(nameSpans[i]);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
What am i doing wrong.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problems to read a PDF with iText7 (work with iText5) - java

Related

How to add fields with the same names to pdf

Mockito UnfinishedVerificationException

Extract text from pdf file by pdfbox

Error with CognitiveJ/ExampleCode

OpenNLP train Thai language

Categories

Resources