PDF styles not properly rendering using ASPOSE Library

PDF styles not properly rendering using ASPOSE Library - java

We are using ASPOSE for content download in Word & PDF format. We don’t have separate code format for PDF or WORD.
There is only one base code format to retrieve data from database,finally will add the response type based on PDF(SaveFormat.PDF) or WORD (SaveFormat.DOCX).
When we change running head styles we get the correct format in WORD as expected but not in PDF.
Note : We do have updated ASPOSE JAR still its not working.
Could you please help on this issue. Thanks in advance.
package com.sam.test;
import java.text.MessageFormat;
import com.aspose.words.Document;
import com.aspose.words.DocumentBuilder;
import com.aspose.words.HeaderFooterType;
import com.aspose.words.ParagraphAlignment;
import com.aspose.words.SaveFormat;
public class SuperScriptTest {
public static void main(String[] args) throws Exception {
String fontName = "Times New Roman";
String fontColour = "black";
Double fontSize = 15.9996;
Double lineheight = 100.0;
String footerVariable = "";
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.writeln("Aspose Sample document Content for Word  file.");
com.aspose.words.Section currentSection = builder.getCurrentSection();
com.aspose.words.PageSetup pageSetup = currentSection.getPageSetup();
pageSetup.setDifferentFirstPageHeaderFooter(true);
// --- Create header for the first page. ---
pageSetup.setHeaderDistance(0.5 * 72 );
pageSetup.setFooterDistance(0.5 * 72);
builder.moveToHeaderFooter(HeaderFooterType.HEADER_FIRST);
builder.getParagraphFormat().setAlignment(ParagraphAlignment.LEFT);
String runningHead = "Running Head Test";
runningHead = MessageFormat
.format("<span style=\"margin:0px; font-family:{0}; font-size:{1}px; color:{2}; line-height:{3}%;\">{4}</span>",
fontName, fontSize, fontColour, lineheight,
runningHead);
if (!doc.getLastSection().getBody().hasChildNodes())
doc.getLastSection().remove();
builder.insertHtml(runningHead);
doc.save("C:/ASPOSE/Examples/ASPOSEPOC1/Aspose_word_doc.docx",SaveFormat.DOCX);
doc.save("C:/ASPOSE/Examples/ASPOSEPOC1/Aspose_pdf_doc.pdf",SaveFormat.PDF);
}
}

Related

How can I set the complete page background for a Word page with Apache POI?

I have already searched all day today, but without success. First I created a new Word document and tried to set the background with
//Blank Document
XWPFDocument doc = new XWPFDocument();
//Write the Document in file system
FileOutputStream out = new FileOutputStream(new File("test.docx"));
File backgroundImage = new File("img.png");
doc.getDocument().addNewBackground().addNewDrawing().save(backgroundImage);
backgroundImage is the picture I want to set.

Your question seems to be about add, change, or delete the background color in Word.
This consists of two settings.
First, display-background-shape must be enabled in document settings .
Second, the document must have set background having a color. This might be white but must be set.
Then optional a picture reference can be set. That picture then gets used as background tile. Microsoft Word itself still uses VML to set the picture reference to background. This is much simpler than using a drawing. Although using a drawing in CTBackground is possible according Office Open XML specifications, I've never seen this in practical usage until now.
Nothing of that is supported by the high level class layer of apache poi. But it can be solved using the low level org.openxmlformats.schemas.wordprocessingml.x2006.main.* classes.
One more challenge is to get the XWPFSettings. There is not a getter method. So we need using reflection to get it from XWPFDocument. Same for to get CTSettings from XWPFSettings then.
Following complete example shows how to set background color and/or background tile picture. The code is tested and works using current apache po 5.2.0 and poi-ooxml-full-5.2.0.jar.
import java.io.FileOutputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class CreateWordPageBackground {
static XWPFSettings getSettings(XWPFDocument document) throws Exception {
java.lang.reflect.Field settings = XWPFDocument.class.getDeclaredField("settings");
settings.setAccessible(true);
return (XWPFSettings)settings.get(document);
}
static void setDisplayBackgroundShape(XWPFSettings settings, boolean booleanOnOff) throws Exception {
java.lang.reflect.Field _ctSettings = XWPFSettings.class.getDeclaredField("ctSettings");
_ctSettings.setAccessible(true);
CTSettings ctSettings = (CTSettings )_ctSettings.get(settings);
CTOnOff onOff = CTOnOff.Factory.newInstance();
onOff.setVal(booleanOnOff);
ctSettings.setDisplayBackgroundShape(onOff);
}
static void setBackgroundPictureId(CTBackground background, String rId) throws Exception {
String vmlBackgroundXML = ""
+ "<v:background xmlns:v=\"urn:schemas-microsoft-com:vml\" xmlns:o=\"urn:schemas-microsoft-com:office:office\" "
+ "xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\" "
+ "id=\"_x0000_s1025\" o:bwmode=\"white\" o:targetscreensize=\"1024,768\">"
+ "<v:fill r:id=\"" + rId + "\" recolor=\"t\" type=\"frame\"/>"
+ "</v:background>"
;
org.apache.xmlbeans.XmlObject vmlBackgroundXmlObject = org.apache.xmlbeans.XmlObject.Factory.parse(vmlBackgroundXML);
background.set(vmlBackgroundXmlObject);
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument();
// we need document settings to set display-background-shape
XWPFSettings settings = getSettings(document);
setDisplayBackgroundShape(settings, true);
// set a background color
CTBackground background = document.getDocument().addNewBackground();
background.setColor("FF0000");
// crate a picture reference in document
InputStream is = new FileInputStream("./logo.png");
String rId = document.addPictureData(is, Document.PICTURE_TYPE_PNG);
// set a background picture
setBackgroundPictureId(background, rId);
background.setColor("FFFFFF");
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("Text in Body ...");
paragraph = document.createParagraph();
FileOutputStream out = new FileOutputStream("./CreateWordPageBackground.docx");
document.write(out);
out.close();
document.close();
}
}
Both, background color as well as background tile picture, only gets visible in Microsoft Word GUI. It not gets printed.
If the need is a printed page background, then this will be resolved by a shape in the page header, also known as watermark. But this is another question then.

bidi string can't be read from Word (Apache POI)

I'm writing a bidi String to an MS Word file using Apache POI after wrapping it with the sequence
aString = "\u202E" + aString + "\u202C";
The text renders correctly in the file, and reads fine when I retrieve the string again. But if I modify the file in anyway, suddenly, reading that string returns true with isBlank().
Thank you in advance for any suggestions/help!

When Microsoft Word stores bidirectional text in it's Office Open XML *.docx format, then it sometimes uses special text run elements w:bdo (bi directional orientation). Apache poi does not read those elements until now. So if a XWPFParagraph contains such elements, then paragraph.getText() will return an empty string.
One could using org.apache.xmlbeans.XmlCursor to really get all text from all XWPFParagraphs like so:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.xmlbeans.XmlCursor;
public class ReadWordParagraphs {
static String getAllTextFromParagraph(XWPFParagraph paragraph) {
XmlCursor cursor = paragraph.getCTP().newCursor();
return cursor.getTextValue();
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDocument.docx"));
for (XWPFParagraph paragraph : document.getParagraphs()) {
System.out.println(paragraph.getText()); // will not return text in w:bdo elements
System.out.println(getAllTextFromParagraph(paragraph)); // will return all text content of paragraph
}
}
}

How to save output from XML to PDF

I am using JAXB to unmarshall XML. Then I want to take some infos and write it to PDF format using iText. For some reason PDF is created but I can't open the file. I am also using ZFile as this should work on mainframes too, but this shouldn't be a problem here.
Probably I am doing something wrong when writing to PDF file. Here is my code:
package music;
import java.io.*;
import java.sql.Timestamp;
import java.util.Date;
import java.util.List;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Unmarshaller;
import com.ibm.jzos.ZFile;
import java.io.FileOutputStream;
import com.itextpdf.text.*;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfWriter;
import music.Music.Artist;
import music.Music.Artist.Album;
import music.Music.Artist.Album.Description;
import music.Music.Artist.Album.Song;
public class MusicXml {
public static void main(String[] args) throws JAXBException, IOException {
ZFile inputZ = null, outputZ = null;
File inputW = null;
PdfWriter outputW = null;
PdfContentByte cb = null;
Document pdf = new Document(PageSize.A4);
Paragraph paragraf = new Paragraph();
// Font
Font fnt12n;
JAXBContext jaxb = null;
Unmarshaller unmarsh = null;
String line = null, sep = " ";
Music music;
Date date = new Date();
Date startDate = new Timestamp(date.getTime());
System.out.println("Start: " + startDate);
jaxb = JAXBContext.newInstance(ObjectFactory.class);
unmarsh = jaxb.createUnmarshaller();
String os = System.getProperty("os.name");
System.out.println("System: " + os);
boolean isWin = os.toLowerCase().contains("wind");
if (!isWin) {
// z/OS:
inputZ = new ZFile(args[0], "rt"); // "rt" - readtext
InputStream inpStream = inputZ.getInputStream();
InputStreamReader streamRdr = new InputStreamReader(inpStream, "CP870");
try {
outputW = PdfWriter.getInstance(pdf, (new ZFile(args[1], "wb")).getOutputStream());
} catch (DocumentException e) {
e.printStackTrace();
}
music = (Music) unmarsh.unmarshal(streamRdr);
} else {
// Windows:
inputW = new File(args[0]);
music = (Music) unmarsh.unmarshal(inputW);
try {
outputW = PdfWriter.getInstance(pdf, new FileOutputStream(args[1]));
} catch (DocumentException e) {
e.printStackTrace();
}
}
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
List<Album> listaAlbumow = artysta.getAlbum();
for (Album album : listaAlbumow) {
Description opis = album.getDescription();
List<Song> listaPiosenek = album.getSong();
for (Song piosenka : listaPiosenek) {
String artistName = artysta.getName();
String albumName = album.getTitle();
int numberOfSongs = listaPiosenek.size();
String albumDescription = album.getDescription().getValue();
String songTitle = piosenka.getTitle();
String songDuration = piosenka.getLength();
line = songTitle + sep + songDuration;
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
pdf.newPage();
try {
paragraf.setAlignment(Element.ALIGN_JUSTIFIED);
paragraf.setSpacingAfter(16f);
paragraf.setLeading(14f);
paragraf.setFirstLineIndent(30f);
paragraf.setFont(fnt12n);
pdf.add(new Paragraph(line, fnt12n));
} catch (DocumentException e) {
e.printStackTrace();
}
}
}
}
date = new Date();
Date stopDate = new Timestamp(date.getTime());
System.out.println("Stop: " + stopDate);
long diffInMs = stopDate.getTime() - startDate.getTime();
float diffInSec = diffInMs / 1000.00f;
System.out.format("Czas przetwarzenia pliku XML: %.2f s.", diffInSec);
System.exit(0);
if (isWin) {
outputW.close();
} else
outputZ.close();
}
}

Apart from the fact that you chose to use an old version of iText, there are a couple of other things wrong with your code. Which documentation did you read? I don't think you've already discovered the official iText web site, otherwise you would have used iText 7 instead of iText 5, and you would have known that no valid document is created if you never close the Document object.
The short answer is that you forgot:
pdf.close();
I see that you close the output stream:
if (isWin) {
outputW.close();
} else
outputZ.close();
}
That doesn't really make sense, because at that point, the PDF hasn't been finalized (for instance: no cross-reference table was created). When you close the document, the underlying output stream is closed implicitly (unless you tell iText explicitly not to do this).
There's also something awkward about the loops you are creating:
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
...
for (Album album : listaAlbumow) {
...
for (Song piosenka : listaPiosenek) {
...
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
pdf.newPage();
...
}
}
}
output.close();
You create the same font over and over again. One PDF can only have 1 version (in your case PDF-1.7) and 1 set of metadata, yet you define that version and metadata over and over again. Finally, you open the document many times whereas you only need to open it once.
This makes more sense:
FontFactory.register(args[2], "jakiesFonty");
Font font = FontFactory.getFont("jakiesFonty", BaseFont.CP1250, BaseFont.EMBEDDED);
BaseFont bf = font.getBaseFont();
fnt12n = new Font(bf, 12f, Font.NORMAL, BaseColor.BLACK);
// PDF
outputW.setPdfVersion(PdfWriter.VERSION_1_7);
pdf.addTitle("Musical collection");
pdf.addAuthor("Natalia Nazaruk");
pdf.addSubject("Cwiczenie tworzenia PDF z XML");
pdf.addKeywords("Metadata, Java, iText, PDF");
pdf.addCreator("Program: MusicXML");
pdf.setMargins(60, 60, 50, 40);
pdf.open();
List<Artist> listaArtystow = music.getArtist();
for (Artist artysta : listaArtystow) {
...
for (Album album : listaAlbumow) {
...
for (Song piosenka : listaPiosenek) {
...
pdf.newPage();
...
}
}
}
pdf.close();
As you can see, you open() the Document instance pdf before the loop, to write the PDF headers, and you close() the Document after the loop to write some objects (e.g. fonts), the cross-reference table, and the PDF trailer. As you don't have pdf.close() in your code, all that necessary information is missing from your PDF.
Since you are new at iText, I would highly recommend you not to use versions older than iText 7. You may have discovered that the latest iText 5 release is iText 5.5.13, but that's a maintenance release. In maintenance releases, we only provide bug fixes for our paying customers; we don't add new functionality. For instance: the new PDF specification ISO 32000-2 (aka PDF 2.0) is only available from iText 7.1 on. We won't support PDF 2.0 in older versions.
If you go to the official web site, you'll notice that iText 7.1.1 is the most recent version (iText 7 download page). Where did you find iText, and how come you selected an old version? (This isn't a rhetorical question: we'd like to know to find out how we can improve our web site. We also want to know why so many people post such bad code on Stack Overflow; it's as if they can't find the tutorials. That's sad, because we're investing plenty of time and money in those tutorials. (But if no one is reading them, what's the point???)
You can find more info about iText 7 in the Jump-Start tutorial and the Building Blocks tutorial.
As for converting XML to PDF, why don't you convert to HTML first, and then use the pdfHTML add-on? There's an example on how to do that in chapter 4 of the HTML to PDF tutorial as well as in the ZUGFeRD tutorial.

JAVA: Extract Footer Images from a docx document

I've a task of extracting all the images from a docx file. I am ussing the snippet below for the same. I am using the Apache POI api for the same.
`File file = new File(InputFileString);
FileInputStream fs = new FileInputStream(file.getAbsolutePath());
//FileInputStream fs=new FileInputStream(src);
//create office word 2007+ document object to wrap the word file
XWPFDocument doc1x=new XWPFDocument(fs);
//get all images from the document and store them in the list piclist
List<XWPFPictureData> piclist=doc1x.getAllPictures();
//traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator=piclist.iterator();
int i=0;
while(iterator.hasNext()){
XWPFPictureData pic=iterator.next();
byte[] bytepic=pic.getData();
BufferedImage imag=ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("C:/imagefromword"+i+".jpg"));
i++;
}`
However, this code cannot detect any images which are in the footer or header section of the document.
I've extensively used my google skills and couldn't come up with anything useful.
Is there anyway to capture the image file in the footer section of the
docx file?

I am no expert on Apache POI issues, but a simple search came up with this code:
package com.concretepage;
import java.io.FileInputStream;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.model.XWPFHeaderFooterPolicy;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFFooter;
import org.apache.poi.xwpf.usermodel.XWPFHeader;
public class ReadDOCXHeaderFooter {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("D:/docx/read-test.docx");
XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
XWPFHeaderFooterPolicy policy = new XWPFHeaderFooterPolicy(xdoc);
//read header
XWPFHeader header = policy.getDefaultHeader();
System.out.println(header.getText());
//read footer
XWPFFooter footer = policy.getDefaultFooter();
System.out.println(footer.getText());
} catch(Exception ex) {
ex.printStackTrace();
}
}
}
And the documentation page of XWPFHeaderFooter (which is the direct father class of the XWPFFooter class in the above example...) shows the same getAllPictures method you used to iterate over all the pictures in the documents body.
I on mobile, so I haven't really tested anything - but it seems straight-forward enough to work.
Good luck!

Converting part of .dox document to html using Apache POI

I use XHTMLConverter to convert .docx to html, to make preview of the document. Is there any way to convert only few pages from original document? I'll be grateful for any help.

You have to parse the complete .docx file. It is not possible to read just parts of it. Otherwise if you want to know how to select a specific page number, im afraid to tell you(at least I believe) that word does not store page numbers therefore there is no function in the libary to accsess a specified page..
(I've read this at another forum, it actually might be false information).
PS: the Excel POI contains a .getSheetAt()method (this might helps you for your research)
But there are also other ways to accsess your pages. For instance you could read the lines of your docx document and search for the pagenumbers(might crash if your text contains those numbers though). Another way would be to search for the header of the site which would be more accurate:
HeaderStories headerStore = new HeaderStories( doc);
String header = headerStore.getHeader(pageNumber);
this should give you the header of the specified page. Same with footer:
HeaderStories headerStore = new HeaderStories( doc);
String footer = headerStore.getFooter(pageNumber);
If this dosen't work. I am not really into that API....
here a little Example for a very sloppy solution:
import java.io.*;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
public class ReadDocFile
{
public static void main(String[] args)
{
File file = null;
WordExtractor extractor = null;
try
{
file = new File("c:\\New.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
HWPFDocument document = new HWPFDocument(fis);
extractor = new WordExtractor(document);
String[] fileData = extractor.getParagraphText();
for (int i = 0; i < fileData.length; i++)
{
if (fileData[i].equals("headerPageOne")){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals("headerPageTwo"){
int lastLineOfPageOne = i
}
}
}
catch (Exception exep)
{
exep.printStackTrace();
}
}
}
If you go with this i would recommend you to create a String[] with your headers and refractor the for-loop to a seperate getPages() Method. Therefore your loop would look like:
List<String> = new ArrayList<String>(Arrays.asList("header1","header2","header3","header4"));
for (int i = 0; i < fileData.length; i++)
{
//well there should be a loop for "x" too
if (fileData[i].equals(headerArray[x])){
int firstLineOfPageOne = i;
}
if (fileData[i]).equals(headerArray[x+1]){
int lastLineOfPageOne = i
}
}
You could create an Object(int pageStart, int PageStop), wich would be the product of your method.
I hope it helped you :)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDF styles not properly rendering using ASPOSE Library - java

Related

How can I set the complete page background for a Word page with Apache POI?

bidi string can't be read from Word (Apache POI)

How to save output from XML to PDF

JAVA: Extract Footer Images from a docx document

Converting part of .dox document to html using Apache POI

Categories

Resources