Here is the code to read a PDF with iText5, and it works :
public class CreateTOC {
public static final String SRC = "file.pdf";
class FontRenderFilter extends RenderFilter {
public boolean allowText(TextRenderInfo renderInfo) {
String font = renderInfo.getFont().getPostscriptFontName();
return font.endsWith("Bold") || font.endsWith("Oblique");
}
}
public static void main(String[] args) throws IOException, DocumentException {
new CreateTOC().parse(SRC);
}
public void parse(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
Rectangle rect = new Rectangle(1000, 1000);
RenderFilter regionFilter = new RegionTextRenderFilter(rect);
FontRenderFilter fontFilter = new FontRenderFilter();
TextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), regionFilter, fontFilter);
System.out.println(PdfTextExtractor.getTextFromPage(reader, 56, strategy));
reader.close();
}
}
Can someone help me to do it working in iText7 ? There are problems with the Rectangle and the TextExtractionStrategy (it's not the same constructor as iText5)
Edit : RenderFilter isn't available in iText7...
Related
I'm using itext 7.1.8 and I need to add fields with the same names to pdf. I use the code like the following:
public class Main {
public static void main(String[] args) {
final PdfDocument emptyPdfDocument = createEmptyPdfDocument(pdf);
addTextField("Text_1", "Hello", emptyPdfDocument.getFirstPage(), PdfAcroForm.getAcroForm(emptyPdfDocument, true), emptyPdfDocument);
addTextField("Text_1", "Hello", emptyPdfDocument.addNewPage(), PdfAcroForm.getAcroForm(emptyPdfDocument, true), emptyPdfDocument);
savePdf(emptyPdfDocument);
}
private static void addTextField(String name, String value, PdfPage page, PdfAcroForm form, PdfDocument pdf) {
PdfFormField field = form.getField(name);
final Rectangle rect = new Rectangle(100, page.getCropBox().getHeight() - 100, 300, 20);
if (field != null) {
PdfWidgetAnnotation annotation = new PdfWidgetAnnotation(rect);
annotation.makeIndirect(pdf);
annotation.setVisibility(VISIBLE);
field.addKid(annotation);
page.addAnnotation(annotation);
return;
}
field = PdfFormField.createText(pdf, rect, name);
field.setValue(value);
field.setVisibility(VISIBLE);
page.addAnnotation(field.getWidgets().get(0));
form.addField(field, page);
}
private static PdfDocument createEmptyPdfDocument(final String pdfPath) throws IOException {
PdfWriter pdfWriter = new PdfWriter(new FileOutputStream(pdfPath));
final PdfDocument pdfDocument = new PdfDocument(pdfWriter);
pdfDocument.addNewPage();
return pdfDocument;
}
public static void savePdf(PdfDocument pdf) {
pdf.close();
}
}
but when the method addTextField has been called the second time the kids of the field are empty.
I don't understand what I'm doing wrong.
The problem is the following. I have several reports that I want to mock and test with Mockito. Each report gives the same UnfinishedVerificationException and nothing that I tried so far worked in order to fix the issue. Example of one of the reports with all parents is below.
I changed any to anyString.
Change ReportSaver from interface to abstract class
Add validateMockitoUsage to nail the right test
Looked into similar Mockito-related cases on StackOverflow
Test:
public class ReportProcessorTest {
private ReportProcessor reportProcessor;
private ByteArrayOutputStream mockOutputStream = (new ReportProcessorMock()).mock();
#SuppressWarnings("serial")
private final static Map<String, Object> epxectedMaps = new HashMap<String, Object>();
#Before
public void setUp() throws IOException {
reportProcessor = mock(ReportProcessor.class);
ReflectionTestUtils.setField(reportProcessor, "systemOffset", "Europe/Berlin");
ReflectionTestUtils.setField(reportProcessor, "redisKeyDelimiter", "#");
Mockito.doNothing().when(reportProcessor).saveReportToDestination(Mockito.any(), Mockito.anyString());
Mockito.doCallRealMethod().when(reportProcessor).process(Mockito.any());
}
#Test
public void calculateSales() throws IOException {
Map<String, Object> processedReport = reportProcessor.process(mockOutputStream);
verify(reportProcessor, times(1)); // The line that cause troubles
assertThat(Maps.difference(processedReport, epxectedMaps).areEqual(), Matchers.is(true));
}
#After
public void validate() {
Mockito.validateMockitoUsage();
}
}
Class under test:
#Component
public class ReportProcessor extends ReportSaver {
#Value("${system.offset}")
private String systemOffset;
#Value("${report.relativePath}")
private String destinationPathToSave;
#Value("${redis.delimiter}")
private String redisKeyDelimiter;
public Map<String, Object> process(ByteArrayOutputStream outputStream) throws IOException {
saveReportToDestination(outputStream, destinationPathToSave);
Map<String, Object> report = new HashMap<>();
try (InputStream inputStream = new ByteArrayInputStream(outputStream.toByteArray());
InputStreamReader reader = new InputStreamReader(inputStream)) {
CSVReaderHeaderAware csvReader = new CSVReaderFormatter(outputStream).headerAware(reader);
Map<String, String> data;
while ((data = csvReader.readMap()) != null) {
String data = data.get("data").toUpperCase();
Long quantity = NumberUtils.toLong(data.get("quantity"));
report.put(data, quantity);
}
}
return report;
}
}
Parent class:
public abstract class ReportSaver {
public void saveReportToDestination(ByteArrayOutputStream outputStream, String destinationPathToSave) throws IOException {
File destinationFile = new File(destinationPathToSave);
destinationFile.getParentFile().mkdirs();
destinationFile.delete();
destinationFile.createNewFile();
OutputStream fileOutput = new FileOutputStream(destinationFile);
outputStream.writeTo(fileOutput);
}
}
Mock:
public class ReportProcessorMock implements GeneralReportProcessorMock {
private static final String report = ""; // There can be some data in here
#Override
public ByteArrayOutputStream mock() {
byte[] reportBytes = report.getBytes();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(reportBytes.length);
outputStream.write(reportBytes, 0, reportBytes.length);
return outputStream;
}
}
When you verify, you verify a particular public method of the mock:
verify(reportProcessor, times(1)).process(mockOutputStream);
or use a wildcard if appropriate:
verify(reportProcessor, times(1)).process(any(ByteArrayOutputStream.class));
i am facing an issue in pdf reading.
public class GetLinesFromPDF extends PDFTextStripper {
static List<String> lines = new ArrayList<String>();
Map<String, String> auMap = new HashMap();
boolean objFlag = false;
public GetLinesFromPDF() throws IOException {
}
/**
* #throws IOException If there is an error parsing the document.
*/
public static void main(String[] args) throws IOException {
PDDocument document = null;
String fileName = "E:\\sample.pdf";
try {
int i;
document = PDDocument.load(new File(fileName));
PDFTextStripper stripper = new GetLinesFromPDF();
stripper.setSortByPosition(true);
stripper.setStartPage(0);
stripper.setEndPage(document.getNumberOfPages());
Writer dummy = new OutputStreamWriter(new ByteArrayOutputStream());
stripper.writeText(document, dummy);
// print lines
for (String line : lines) {
//System.out.println("line = " + line);
if (line.matches("(.*)Objection(.*)")) {
System.out.println(line);
withObjection(lines);
//System.out.println("iiiiiiiiiiii");
break;
}
//System.out.println("uuuuuuuuuuuuuu");
}
} finally {
if (document != null) {
document.close();
}
}
}
/**
* Override the default functionality of PDFTextStripper.writeString()
*/
#Override
protected void writeString(String string, List<TextPosition> textPositions) throws IOException {
System.out.println("textPositions = " + string);
// System.out.println("tex "+textPositions.get(0).getFont()+ getArticleEnd());
// you may process the line here itself, as and when it is obtained
}
}
in need a output like
My pdf have some title, we need to skip the same.
pdf file content is
how to extract text as in separate formats as specified.
thanks in advance.
I would like to use CognitiveJ (GitHub from CognitiveJ) but all I get is:
Status:401; Body: {"error":{"code":"Unspecified","message":"Access denied due to invalid subscription key. Make sure you are subscribed to an API you are trying to call and provide the right key."}}
Here is the Code:
public static String lic1 = "xxx";
public static String lic2 = "xxx";
public static void main(String[] args) throws IOException {
new Bildkontrolle();
}
public Bildkontrolle() throws IOException {
File imageFile = new File("E:\\DSC00306.jpg");
new FaceRecognicion(lic1, lic2, imageFile);
}
And here the second class:
public FaceRecognicion(String lic1, String lic2, File imageFile) throws IOException {
BufferedImage bufImage = ImageIO.read(imageFile);
InputStream inpStream = new FileInputStream(imageFile);
FaceScenarios faceScenarios = new FaceScenarios(lic1,
lic1);
ImageOverlayBuilder imageOverlayBuilder = ImageOverlayBuilder.builder(bufImage);
imageOverlayBuilder.outlineFacesOnImage(faceScenarios.findFaces(inpStream), RectangleType.FULL,
CognitiveJColourPalette.STRAWBERRY).launchViewer();
}
Does anyone have an examplecode where I can look up how to use the API.
I stuck at the point where to send the Request.
I am experimenting with OpenNlp 1.7.2 and maxent-3.0.0.jar to train for thai language , below is the code that reads thai train data and creates the bin model.
public class TrainPerson {
public static void main(String[] args) throws IOException {
String trainFile = "/Documents/workspace/ThaiOpenNLP/bin/thaiPerson.train";
String modelFile = "/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
writePersonModel(trainFile, modelFile);
}
private static void writePersonModel(String trainFile, String modelFile)
throws FileNotFoundException, IOException {
Charset charset = Charset.forName("UTF-8");
InputStreamFactory fileInputStream = new MarkableFileInputStreamFactory(new File(trainFile));
ObjectStream<String> lineStream = new PlainTextByLineStream(fileInputStream, charset);
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
TokenNameFinderModel model;
try {
model = NameFinderME.train("th", "person", sampleStream , TrainingParameters.defaultParams(), new TokenNameFinderFactory());
} finally {
sampleStream.close();
}
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null) {
modelOut.close();
}
}
}}
Thai data looks like as attached in the file trainingData
I am using the output model to detect person name as shown in the below programme. It fails to identify the name.
public class ThaiPersonNameFinder {
static String modelFile = "/Users/avinashpaula/Documents/workspace/ThaiOpenNLP/bin/th-ner-person.bin";
public static void main(String[] args) {
try {
InputStream modelIn = new FileInputStream(new File(modelFile));
TokenNameFinderModel model = new TokenNameFinderModel(modelIn);
NameFinderME nameFinder = new NameFinderME(model);
String sentence[] = new String[]{
"จอห์น",
"30",
"ปี",
"จะ",
"เข้าร่วม",
"ก",
"เริ่มต้น",
"ขึ้น",
"บน",
"มกราคม",
"."
};
Span nameSpans[] = nameFinder.find(sentence);
for (int i = 0; i < nameSpans.length; i++) {
System.out.println(nameSpans[i]);
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
What am i doing wrong.