Convert Outputstream to file

Convert Outputstream to file - java

Well i'm stucked with a problem,
I need to create a PDF with a html source and i did this way:
File pdf = new File("/home/wrk/relatorio.pdf");
OutputStream out = new FileOutputStream(pdf);
InputStream input = new ByteArrayInputStream(build.toString().getBytes());//Build is a StringBuilder obj
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
well i'm using JSP so i need to download this file to the user not write in the server...
How do I transform this Outputstream output to a file in the java without write this file in hard drive ?

If you're using VRaptor 3.3.0+ you can use the ByteArrayDownload class. Starting with your code, you can use this:
#Path("/download-relatorio")
public Download download() {
// Everything will be stored into this OutputStream
ByteArrayOutputStream out = new ByteArrayOutputStream();
InputStream input = new ByteArrayInputStream(build.toString().getBytes());
Tidy tidy = new Tidy();
Document doc = tidy.parseDOM(input, null);
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.flush();
out.close();
// Now that you have finished, return a new ByteArrayDownload()
// The 2nd and 3rd parameters are the Content-Type and File Name
// (which will be shown to the end-user)
return new ByteArrayDownload(out.toByteArray(), "application/pdf", "Relatorio.pdf");
}

A File object does not actually hold the data but delegates all operations to the file system (see this discussion).
You could, however, create a temporary file using File.createTempFile. Also look here for a possible alternative without using a File object.

use temporary files.
File temp = File.createTempFile(prefix ,suffix);
prefix -- The prefix string defines the files name; must be at least three characters long.
suffix -- The suffix string defines the file's extension; if null the suffix ".tmp" will be used.

Related

How to open a PdfADocument from an existing PdfDocument in itext7?

In order to check uploaded PDF files for basic PDF/A conformance, I need to read them in as PdfADocuments.
But starting with version 7.1.6 this no longer works, but throws a PdfException(PdfException.PdfReaderHasBeenAlreadyUtilized)
class Controller
...
// get uploaded data into PdfDocument, which is passed
// on to different services.
InputStream filecontent = fileupload.getInputStream();
int read = 0;
byte[] bytes = new byte[1024];
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
while ((read = filecontent.read(bytes,0,bytes.length)) != -1) {
filesize += read;
buffer.write(bytes, 0, read);
}
ByteArrayInputStream input = new ByteArrayInputStream(buffer.toByteArray());
PdfReader reader = new PdfReader(input);
PdfWriter writer = new PdfWriter(new ByteArrayOutputStream());
PdfDocument pdf = new PdfDocument(reader, writer);
AnalyzerService analyzer = new AnalyzerService();
if(analyzer.analyze(pdf)) {
otherService.doSomethingWith(pdf);
}
...
class AnalyzerService
...
public boolean analyze(PdfDocument pdf) {
PdfADocument pdfa = new PdfADocument(
pdf.getReader(), pdf.getWriter() <-- PdfException here
);
...
}
Up to and including iText 7.1.5 this worked.
With 7.1.6 I get "com.itextpdf.kernel.PdfException: Given PdfReader instance has already been utilized. The PdfReader cannot be reused, please create a new instance."
It seems that I need to get the Bytes from the PdfDocument as a byte[], then create a new PdfReader from it. I have tried getting them from the pdf.getReader().getOutputStream().toByteArray(), but that doesn't work.
I'm quite lost at the moment on how to create that PdfADocument from the given PdfDocument.

Your approach uses the same PdfReader and (even worse) the same PdfWriter for both a PdfDocument and a PdfADocument instance. As both can manipulate the PdfReader and write to the PdfWriter, that situation is likely to result in garbage in the writer, so you shall not do this.
Simply always consider a document with both a reader and a writer as work-in-progress, something one cannot treat as a finished document file, e.g. extract for intermediary checks.
As you want to check uploaded PDF files, why don't you simply forward the byte[] from buffer.toByteArray() to the analyze method to create a separate reader (and, if need be, a document) from? This indeed exactly would check the uploaded file...
Furthermore, if your input document may be PDF/A conform and is treated specially in that case, shouldn't you also manipulate it as a PdfADocument if it is? I.e. shouldn't you first check in your analyzer for conformance and in the positive case use a PdfADocument for it also in your controller class?

PdfDocument SourcePDF=null;
PdfADocument DisPDF =null;
try
{
PdfReader Reader = new PdfReader(input-Path);
PdfWriter writer = new PdfWriter(output-Path, new WriterProperties().SetPdfVersion(PdfVersion.PDF_2_0));
writer.SetSmartMode(true);
SourcePDF = new PdfDocument(Reader);
DisPDF = new PdfADocument(writer, PdfAConformanceLevel.PDF_A_3A,
new PdfOutputIntent("Custom", "", "https://www.color.org", "sRGB", new MemoryStream(Properties.Resources.sRGB_CS_profiles)));
DisPDF.InitializeOutlines();
//Setting some required parameters
DisPDF.SetTagged();
DisPDF.GetCatalog().SetLang(new PdfString("en-EN"));
DisPDF.GetCatalog().SetViewerPreferences(new PdfViewerPreferences().SetDisplayDocTitle(true));
PdfMerger merger = new PdfMerger(DisPDF, true, true);
merger.Merge(SourcePDF, 1, sorsePDF.GetNumberOfPages());
SourcePDF.Close();
DisPDF.Close();
}
catch (Exception ex)
{
throw;
}

iText difference between PdfCopy and PdfACopy

I wrote a function to embed a file as attachment inside a PDF/A-3a document using iText 5.5.13 (using instructions from iText tutorials).
If I attach the file using the class PdfCopy, the result is a correct PDF file, but it does not claim to be PDF/A (maybe it matches all the requirements, but it doesn't say).
If I do the same using PdfACopy, I get an wrongly built document:
InvalidPdfException: Rebuild failed: trailer not found.; Original
message: PDF startxref not found.
Here is my code a little simplified. Commented is the line to use a PdfCopy instead.
public static File embedFile(File inputPdf) {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
Document document = new com.itextpdf.text.Document();
OutputStream os = new FileOutputStream(outputPdf.getAbsolutePath());
PdfACopy copy = new PdfACopy(document, os, PdfAConformanceLevel.PDF_A_3A); // Output doc doesn't work
// PdfCopy copy = new PdfCopy(document, os); // Output doc works but doesn't claim to be PDF/A
document.open();
copy.addDocument(reader);
// Include attachment (extactly as in the sample tutorial)
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.MODDATE, new PdfDate());
PdfFileSpecification fileSpec = PdfFileSpecification.fileEmbedded(
writer, "./src/main/resources/com/itextpdf/invoice.xml",
"invoice.xml", null, "application/xml", parameters, 0);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
writer.addFileAttachment("invoice.xml", fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getReference());
writer.getExtraCatalog().put(new PdfName("AF"), array);
os.flush();
reader.close();
document.close();
os.close();
copy.close();
return outputPdf;
}
The input file is already a PDF/A-3a document, so I think I don't need to redefine all the required things like embedded fonts, output intent...
Is there maybe a missing step that is mandatory when using PdfACopy that is not required with PdfCopy?
Would it help to try with iText 7?
Many thanks in advance!

As pointed by Bruno Lowagie in the comments, this is possible with iText 7. Here the function in case it helps someone:
public static File embedFile(File inputPdf, File embeddedFile, String embeddedFileName, String embeddedFileMimeType)
throws IOException {
File outputPdf = new File("./test.pdf");
PdfReader reader = new PdfReader(inputPdf.getAbsolutePath());
PdfWriter writer = new PdfWriter(outputPdf.getAbsolutePath());
PdfADocument pdfDoc = new PdfADocument(reader, writer);
// Add attachment
PdfDictionary parameters = new PdfDictionary();
parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());
PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec(pdfDoc, embeddedFile.getAbsolutePath(), embeddedFileName,
embeddedFileName, new PdfName(embeddedFileMimeType), parameters, PdfName.Data);
fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));
pdfDoc.addFileAttachment(embeddedFileName, fileSpec);
PdfArray array = new PdfArray();
array.add(fileSpec.getPdfObject().getIndirectReference());
pdfDoc.getCatalog().put(new PdfName("AF"), array);
pdfDoc.close();
reader.close();
writer.close();
return outputPdf;
}

How to add metadata to PDF document using PDFbox?

I have an input stream of a PDF document available to me. I would like to add subject metadata to the document and then save it. I'm not sure how to do this.
I came across a sample recipe here: https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html
However, it is still fuzzy. Below is what I'm trying and places where I have questions
PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket

The following code sets the title of a PDF document, but it should be adaptable to work with other properties as well:
public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
try {
PDDocument document = PDDocument.load(documentBytes);
PDDocumentInformation info = document.getDocumentInformation();
info.setTitle(title);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos);
return baos.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Apache PDFBox is needed, so import it to e.g. Maven with:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.6</version>
</dependency>
Add a title with:
byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");
Display it in the browser with (JSF example):
<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>
Result (Chrome):

Another much easier way to do this would be to use the built-in Document Information object:
PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");
This also has the benefit of not requiring the xmpbox library.

This answer uses xmpbox and comes from the AddMetadataFromDocInfo example in the source code download:
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);

Getting XML data from the byteArray of a zipFile

I'm writing a simple program that retrieves XML data from an object, and parses it dynamically, based on user criteria. I am having trouble getting the XML data from the object, due to the format it is available in.
The object containing the XML returns the data as a byteArray of a zipFile, like so.
MyObject data = getData();
byte[] byteArray = data.getPayload();
//The above returns the byteArray of a zipFile
The way I checked this, is by writing the byteArray to a String
String str = new String(byteArray);
//The above returns a string with strange characters in it.
Then I wrote the data to a file.
FileOutputStream fos = new FileOutputStream("new.txt");
fos.write(byteArray);
I renamed new.txt as new.zip. When I opened it using WinRAR, out popped the XML.
My problem is that, I don't know how to do this conversion in Java using streams, without writing the data to a zip file first, and then reading it. Writing data to disk will make the software way too slow.
Any ideas/code snippets/info you could give me would be really appreciated!! Thanks
Also, if you need a better explanation from me, I'd be happy to elaborate.
As another option, I am wondering whether an XMLReader would work with a ZipInputStream as InputSource.
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
InputSource inputSource = new InputSource(zis);

A zip archive can contain several files. You have to position the zip stream on the first entry before parsing the content:
ByteArrayInputStream bis = new ByteArrayInputStream(byteArray);
ZipInputStream zis = new ZipInputStream(bis);
ZipEntry entry = zis.getNextEntry();
InputSource inputSource = new InputSource(new BoundedInputStream(zis, entry.getCompressedSize()));
The BoundedInputStream class is taken from Apache Commons IO (http://commons.apache.org/io)

how to append data in docx file using docx4j

Please tell me how to append data in docx file using java and docx4j.
What I'm doing is, I am using a template in docx format in which some field are dilled by java at run time,
My problem is for every group of data it creates a new file and i just want to append the new file into 1 file. and this is not done using java streams
String outputfilepath = "e:\\Practice/DOC/output/generatedLatterOUTPUT.docx";
String outputfilepath1 = "e:\\Practice/DOC/output/generatedLatterOUTPUT1.docx";
WordprocessingMLPackage wordMLPackage;
public void templetsubtitution(String name, String age, String gender, Document document)
throws Exception {
// input file name
String inputfilepath = "e:\\Practice/DOC/profile.docx";
// out put file name
// id of Xml file
String itemId1 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId2 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId3 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
// Load the Package
if (inputfilepath.endsWith(".xml")) {
JAXBContext jc = Context.jcXmlPackage;
Unmarshaller u = jc.createUnmarshaller();
u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
org.docx4j.xmlPackage.Package wmlPackageEl = (org.docx4j.xmlPackage.Package) ((JAXBElement) u
.unmarshal(new javax.xml.transform.stream.StreamSource(
new FileInputStream(inputfilepath)))).getValue();
org.docx4j.convert.in.FlatOpcXmlImporter xmlPackage = new org.docx4j.convert.in.FlatOpcXmlImporter(
wmlPackageEl);
wordMLPackage = (WordprocessingMLPackage) xmlPackage.get();
} else {
wordMLPackage = WordprocessingMLPackage
.load(new File(inputfilepath));
}
CustomXmlDataStoragePart customXmlDataStoragePart = wordMLPackage
.getCustomXmlDataStorageParts().get(itemId1);
// Get the contents
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart
.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:name[1]", name,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId2);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:age[1]", age,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId3);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:gender[1]", gender,
"xmlns:ns0='EasyForm'");
// Apply the bindings
BindingHandler.applyBindings(wordMLPackage.getMainDocumentPart());
File f = new File(outputfilepath);
wordMLPackage.save(f);
FileInputStream fis = new FileInputStream(f);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for (int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum);
}
// System.out.println( buf.length);
} catch (IOException ex) {
}
byte[] bytes = bos.toByteArray();
FileOutputStream file = new FileOutputStream(outputfilepath1, true);
DataOutputStream out = new DataOutputStream(file);
out.write(bytes);
out.flush();
out.close();
System.out.println("..done");
}
public static void main(String[] args) {
utility u = new utility();
u.templetsubtitution("aditya",24,mohan);
}
thanks in advance

If I understand you correctly, you're essentially talking about merging documents. There are two very simple approaches that you can use, and their effectiveness really depends on the structure and onward use of your data:
PhilippeAuriach describes one approach in his answer, which entails
appending all components within a MaindocumentPart instance to
another. In terms of the final docx file, this means the content
that appears in document.xml -- it won't take into account headers
and footers ( for example), but that may be fine for you.
You can insert multiple documents into a single docx file by inserting them
as AltChunk elements (see the docx4j documentation). This will
bring everything from one Word file into another, headers and all.
The downside of this is that your final document won't be a proper
flowing Word file until you open it and save it in MS Word itself
(the imported components remain as standalone files within the docx
bundle). This will cause you issues if you want to generated
'merged' files and then do something with them like render PDFs --
the merged content will simply be ignored.
The more complete (and complex) approach is to perform a "deep merge". This updates and maintains all references held within a document. Imported content becomes part of the main "flow" of the document (i.e. it is not stored as separate references), so the end result is a properly-merged file which can be rendered to PDF or whatever.
The downside to this is you need a good knowledge of docx structure and the API, and you will be writing a fair amount of code (I would recommend buying a license to Plutext's MergeDocx instead).

I had to deal with similar things, and here is what I did (probably not the most efficient, but working) :
create a finalDoc loading the template, and emptying it (so you have the styles in this doc)
for each data row, create a new doc loading the template, then replace your fields with your values
use the function below to append the doc filled with the datas to the finalDoc :
public static void append(WordprocessingMLPackage docDest, WordprocessingMLPackage docSource) {
List<Object> objects = docSource.getMainDocumentPart().getContent();
for(Object o : objects){
docDest.getMainDocumentPart().getContent().add(o);
}
}
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.