I'm messing around with itext, and I was changing the fontsize and as a result I end up with some weirdly spaced text in my pdf:
Which I'd like to turn into something like this: (Excuse the poor image editing)
This is the code I use to enter the text:
private fun setBaseInfo(info: ArrayList<String>): PdfPCell
{
val cell = PdfPCell()
val glue = Chunk(VerticalPositionMark())
val p = Paragraph()
p.font.size = 8.0f
for (str in info)
{
p.add(glue)
p.add(str)
p.add("\n")
}
cell.border = Rectangle.NO_BORDER
cell.addElement(p)
return cell
}
And this is the info I feed it:
private fun foo(): ArrayList<String>
{
val array = ArrayList<String>()
array.add("Hi")
array.add("StackOverflow")
array.add("I'd Like")
array.add("This")
array.add("text")
array.add("to be closer")
array.add("together!")
return array
}
When removing p.add("\n") this is the output:
Full disclosure: former iText employee here
This is how I would do it:
public static void main(String[] args) throws IOException {
// create a temp file to hold the output
File outputFile = File.createTempFile("stackoverflow",".pdf");
PdfDocument pdfDocument = new PdfDocument(new PdfWriter(outputFile));
Document layoutDocument = new Document(pdfDocument);
String[] lines = {"Hi","StackOverflow","I'd like","this","text","to","be","closer","together!"};
for(String line : lines){
layoutDocument.add(new Paragraph(line)
.setMultipliedLeading(0.5f)); // this is where the magic happens
}
layoutDocument.close();
pdfDocument.close();
// display the temp file with the default PDF viewer
Desktop.getDesktop().open(outputFile);
}
I changed a couple of things:
Use the latest version of iText whenever possible. You want to benefit from several years of bugfixes and a newer architecture.
Do not use tables to solve layout issues.
Use leading (either MultipliedLeading or FixedLeading) on Paragraph objects to fix your issue.
Related
I have a PPT containing values in the format: {VALUES}. I would like to replace the text of the file containing {SAMPLE} to SARAN and so on using Java. Earlier I tried with Apache poi and I had no solution but I could find that in Docx4j there is an method variableReplace() which will replace the text. I tried one sample but I don't see any text being replaced.
My code:
public static void main(String[] args) throws IOException, Docx4JException, Pptx4jException, JAXBException
{
String inputfilepath = "C:\\Work\\SampleTemplate.pptx";
PresentationMLPackage pptPackage
= PresentationMLPackage.load(new File(inputfilepath));
ThemePart themeSlidePart = (ThemePart)
pptPackage.getParts().getParts().get(new PartName("/ppt/theme/theme1.xml"));
Theme themeOfSlides = (Theme)themeSlidePart.getJaxbElement();
SlidePart slide = pptPackage.getMainPresentationPart().getSlide(0);
HashMap h = new HashMap<String, String>();
h.put("SlideTitleName", "SARANYA");
slide.variableReplace(h);
String outputfilepath = "C:\\Work\\24Jan2018_CheckOut\\dhl\\PPT-TRAILS\\SampleTemplate.pptx";
PresentationMLPackage pptPackagetoApply
= PresentationMLPackage.load(new File(outputfilepath));
ThemePart themeSlidePartToApply
= (ThemePart) pptPackagetoApply.getParts().getParts()
.get(new PartName("/ppt/theme/theme1.xml"));
themeSlidePartToApply.setJaxbElement(themeOfSlides);
SaveToZipFile saver = new SaveToZipFile(pptPackagetoApply);
saver.save(outputfilepath);
}
But still text is not being replaced. I tried with a new PPTX file with only text "${SlideTitleName}" inside it. But still it's not working.
I would like to add many rows to my pptx template using for loop, I tried the below,
Java:
if(majorAchievements.size()>0){
System.out.println("majorAchievements Size in Servlet is : "+majorAchievements.size() + " and value of get(0) is : "+ majorAchievements.get(0) );
for(int i=0;i<majorAchievements.size();i++){
achievements=(String) majorAchievements.get(i);
}
}
mappings.put("MajorAchievementText", achievements);
and inside my ppt file ,
i haved added two rows with same ${MajorAchievementText}.
The above list fetches 2 rows. But inside ppt it is not replace.
Please help me how to replace an text by iterating.
With Regards,
Saranya C
I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.
Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...
The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.
IllegalArgumentException issue has been fixed in iText 7.1.5.
Here is my code:
private void save(File file) {
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
// Use the Codec to save the document in a binary format
textarea.getStyleCodecs().ifPresent(codecs -> {
Codec<StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle>> codec
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
try {
FileOutputStream fos = new FileOutputStream(file);
DataOutputStream dos = new DataOutputStream(fos);
codec.encode(dos, doc);
fos.close();
} catch (IOException fnfe) {
fnfe.printStackTrace();
}
});
}
I am trying to implement the save/loading from the demo from here on the RichTextFX GitHub.
I am getting errors in the following lines:
StyledDocument<ParStyle, Either<StyledText<TextStyle>, LinkedImage<TextStyle>>, TextStyle> doc = textarea.getDocument();
error: incompatible types:
StyledDocument<Collection<String>,StyledText<Collection<String>>,Collection<String>>
cannot be converted to
StyledDocument<ParStyle,Either<StyledText<TextStyle>,LinkedImage<TextStyle>>,TextStyle>
and
= ReadOnlyStyledDocument.codec(codecs._1, codecs._2, textarea.getSegOps());
error: incompatible types: inferred type does not conform to equality
constraint(s) inferred: ParStyle
equality constraints(s): ParStyle,Collection<String>
I have added all the required .java files and imported them into my main code. I thought it would be relatively trivial to implement this demo but it has been nothing but headaches.
If this cannot be resolved, does anyone know an alternative way to save the text with formatting from RichTextFX?
Thank you
This question is quite old, but since i ran into the same problem i figured a solution might be useful to others as well.
In the demo, the code from which you use, ParStyle and TextStyle (Custom Types) are used for defining how information about the style is stored.
The error messages you get pretty much just tell you that your way of storing the information about the style (In your case in a String) is not compatible with the way it is done in the demo.
If you want to store the style in a String, which i did as well, you need to implement some way of serializing and deserializing the information yourself.
You can do that, for example (I used an InlineCssTextArea), in the following way:
public class SerializeManager {
public static final String PAR_REGEX = "#!par!#";
public static final String PAR_CONTENT_REGEX = "#!pcr!#";
public static final String SEG_REGEX = "#!seg!#";
public static final String SEG_CONTENT_REGEX = "#!scr!#";
public static String serialized(InlineCssTextArea textArea) {
StringBuilder builder = new StringBuilder();
textArea.getDocument().getParagraphs().forEach(par -> {
builder.append(par.getParagraphStyle());
builder.append(PAR_CONTENT_REGEX);
par.getStyledSegments().forEach(seg -> builder
.append(
seg.getSegment()
.replaceAll(PAR_REGEX, "")
.replaceAll(PAR_CONTENT_REGEX, "")
.replaceAll(SEG_REGEX, "")
.replaceAll(SEG_CONTENT_REGEX, "")
)
.append(SEG_CONTENT_REGEX)
.append(seg.getStyle())
.append(SEG_REGEX)
);
builder.append(PAR_REGEX);
});
String textAreaSerialized = builder.toString();
return textAreaSerialized;
}
public static InlineCssTextArea fromSerialized(String string) {
InlineCssTextArea textArea = new InlineCssTextArea();
ReadOnlyStyledDocumentBuilder<String, String, String> builder = new ReadOnlyStyledDocumentBuilder<>(
SegmentOps.styledTextOps(),
""
);
if (string.contains(PAR_REGEX)) {
String[] parsSerialized = string.split(PAR_REGEX);
for (int i = 0; i < parsSerialized.length; i++) {
String par = parsSerialized[i];
String[] parContent = par.split(PAR_CONTENT_REGEX);
String parStyle = parContent[0];
List<String> segments = new ArrayList<>();
StyleSpansBuilder<String> spansBuilder = new StyleSpansBuilder<>();
String styleSegments = parContent[1];
Arrays.stream(styleSegments.split(SEG_REGEX)).forEach(seg -> {
String[] segContent = seg.split(SEG_CONTENT_REGEX);
segments.add(segContent[0]);
if (segContent.length > 1) {
spansBuilder.add(segContent[1], segContent[0].length());
} else {
spansBuilder.add("", segContent[0].length());
}
});
StyleSpans<String> spans = spansBuilder.create();
builder.addParagraph(segments, spans, parStyle);
}
textArea.append(builder.build());
}
return textArea;
}
}
You can then take the serialized InlineCssTextArea, write the resulting String to a file, and load and deserialize it.
As you can see in the code, i made up some Strings as regexes which will be removed in the serialization process (We don't want our Serializer to be injectable, do we ;)).
You can change these to whatever you like, just note they will be removed if used in the text of the TextArea, so they should be something users wont miss in their TextArea.
Also note that this solution serializes the Style of the Text, the Text itself and the Paragraph style, BUT not inserted images or parameters of the TextArea (such as width and height), just the text content of the TextArea with its Style.
This issue on github really helped me btw.
I use XHTMLConverter to convert .docx to html, to make preview of the document. Is there any way to convert only few pages from original document? I'll be grateful for any help.
You have to parse the complete .docx file. It is not possible to read just parts of it. Otherwise if you want to know how to select a specific page number, im afraid to tell you(at least I believe) that word does not store page numbers therefore there is no function in the libary to accsess a specified page..
(I've read this at another forum, it actually might be false information).
PS: the Excel POI contains a .getSheetAt()method (this might helps you for your research)
But there are also other ways to accsess your pages. For instance you could read the lines of your docx document and search for the pagenumbers(might crash if your text contains those numbers though). Another way would be to search for the header of the site which would be more accurate:
HeaderStories headerStore = new HeaderStories( doc);
String header = headerStore.getHeader(pageNumber);
this should give you the header of the specified page. Same with footer:
HeaderStories headerStore = new HeaderStories( doc);
String footer = headerStore.getFooter(pageNumber);
If this dosen't work. I am not really into that API....
here a little Example for a very sloppy solution:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile
{
public static void main(String[] args)
{
File file = null;
WordExtractor extractor = null;
try
{
file = new File("c:\\New.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument document = new HWPFDocument(fis);
extractor = new WordExtractor(document);
String[] fileData = extractor.getParagraphText();
for (int i = 0; i < fileData.length; i++)
{
if (fileData[i].equals("headerPageOne")){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals("headerPageTwo"){
int lastLineOfPageOne = i
}
}
}
catch (Exception exep)
{
exep.printStackTrace();
}
}
}
If you go with this i would recommend you to create a String[] with your headers and refractor the for-loop to a seperate getPages() Method. Therefore your loop would look like:
List<String> = new ArrayList<String>(Arrays.asList("header1","header2","header3","header4"));
for (int i = 0; i < fileData.length; i++)
{
//well there should be a loop for "x" too
if (fileData[i].equals(headerArray[x])){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals(headerArray[x+1]){
int lastLineOfPageOne = i
}
}
You could create an Object(int pageStart, int PageStop), wich would be the product of your method.
I hope it helped you :)
I have a simple requirement to extract all the Images and Diagrams drawn in MS Word file.
I am able to extract only images but not group of shapes(like Use Case Diagram or Activity Diagram). I want to save all the Diagrams as Image.
I have used apachePOI.
Following code I have written
public class worddocreader {
public static void main(String args[]) {
FileInputStream fis;
try {
FileInputStream fs = new FileInputStream("F:/1.docx");
XWPFDocument docx = new XWPFDocument(fs);
List<XWPFPictureData> piclist = docx.getAllPictures();
Iterator<XWPFPictureData> iterator = piclist.iterator();
int i = 0;
while (iterator.hasNext()) {
XWPFPictureData pic = iterator.next();
byte[] bytepic = pic.getData();
BufferedImage imag = ImageIO.read(new ByteArrayInputStream(
bytepic));
ImageIO.write(imag, "image/jpeg", new File("F:/docParsing/imagefromword" + i + ".jpg"));
i++;
}
ArrayList<PackagePart> packArrayList = docx.getPackageRelationship().getPackage().getParts();
int size = packArrayList.size();
System.out.println("Array List Size : " + packArrayList.size());
while (size-->0) {
PackagePart packagePart = packArrayList.get(size);
System.out.println(packagePart.getContentType());
try{
BufferedImage bfrImage = ImageIO.read(packagePart.getInputStream());
ImageIO.write(bfrImage,"image/png",new File("F:/docParsing_emb/size"+size+".png"));
}catch(Exception e){
e.printStackTrace();
}
}
System.out.println("Done");
} catch (Exception e) {
e.printStackTrace();
}
}
}
It only extract Images not Shapes.
Does anybody knows How do I do this ?
So you are after the stuff defined in [MS-ODRAW], i.e. so-called OfficeDrawings which can be created directly in Word using its Drawing palette?
Unfortunately, POI offers only little help here. With HWPF (the old binary *.doc file format) you can get a handle to such data like so:
HWPFDocument document;
OfficeDrawings officeDrawings = document.getOfficeDrawingsMain();
OfficeDrawing drawing = officeDrawings.getOfficeDrawingAt(OFFSET);
// OFFSET is a global character offset describing the position of the drawing in question
// i.e. document.getRange().getStartOffset() + x
This drawing can then be further processed into individual records:
EscherRecordManager escherRecordManager = new EscherRecordManager(drawing.getOfficeArtSpContainer());
EscherSpRecord escherSpRecord = escherRecordManager.getSpRecord();
EscherOptRecord escherOptRecord = escherRecordManager.getOptRecord();
Using the data from all these records you can theoretically render out the original drawing again. But it's rather painful...
So far I've only done this in a single case where I had lots of simple arrows floating around on a page. Those had to be converted to a textual representation (something like: "Positions (x1, y1) and (x2, y2) are connected by an arrow"). Doing this essentially meant to implement a subset of [MS-ODRAW] relevant to those arrows using the above-mentioned records. Not exactly a pleasant task.
MS Word backup solution
If using MS Word itself is an option to you, then there is another pragmatic way:
extract all relevant offsets that contain OfficeDrawings using POI.
Inside Word: Iterate over the document with VBA and copy all the drawings at the given offsets to the clipboard.
Use some other application (I chose Visio) to dump the clipboard contents into a PNG.
The necessary check for a drawing in step 1 is very simple (see below). The rest can be completely automated in Word. If anyone is in need, I can share the respective VBA code.
if (characterRun.isSpecialCharacter()) {
for (char currentChar : characterRun.text().toCharArray()) {
if ('\u0008' == currentChar) return true;
}
}
If you mean Office Art objects then
In the class org.apache.poi.hwpf.HWPFDocument
there is a _officeDrawingsMain that contains the office art objects
check this link https://poi.apache.org/apidocs/org/apache/poi/hwpf/HWPFDocument.html