How to append contents of two word document? - java

Reading a word document for example SampleOne.doc and storing it in to a byte[].
#Column(name = "LETTER_WORD_EDITOR_VALUE")
private byte[] letterWordEditorValue;
It is a blob in DB.
I want to read the contents of another word document for Example SampleTwo.doc as byte[] and appending both the byte[] and setting the resultant byte[] in to letterWordEditorValue.
Below is the code to do that.
FileInputStream fileRead = new FileInputStream(fileNameWithPath);
byte[] readData=IOUtils.toByteArray(fileRead);
byte[] one = readData;
byte[] two = inquiryCor.getLetterWordEditorValue();
byte[] combined = new byte[one.length + two.length];
System.arraycopy(one,0,combined,0,one.length);
System.arraycopy(two,0,combined,one.length,two.length);
inquiryCor.setLetterWordEditorValue(combined);
Below is the code to read the letterWordEditorValue and writing in to a Word-File.
fileEditOutPutStream = new FileOutputStream(fileNameWithPath);
fileEditOutPutStream.write(inquiryCor.getLetterWordEditorValue());
fileEditOutPutStream.close();
The contents of word file is not the contents of one+two, Rather it contents readData value only. But when printing the combined.length i.e resultant length is printing sum of one.length+two.length.
Why above code is not appending contents of two word document?
What am i doing wrong? Please guide me to solve this issue.
Thanks!

It's not possible to combine two proprietary documents via simple bytearray-concatenation. That wouldn't even make any sense. You need to parse the two documents via some library and put them together manually. What you were trying to do is like trying to use two motors inside of one car by attaching a second car to the first one ... does not compute!
Apache offers a library for office documents : https://poi.apache.org/

Replace
fileEditOutPutStream.write(inquiryCor.getLetterWordEditorValue());
with
fileEditOutPutStream.write(inquiryCorrespondence.getLetterWordEditorValue());

Related

GSON / iText: Extract Text From PDF 1.7 byte[]

I'm automating tests using Rest-Assured and GSON - and need to validate the contents of a PDF file that is returned in the response of a POST request. The content of the files vary and can contain anything from just text, to text and tables, or text and tables and graphics. Every page can, and most likely will be different as far a glyph content. I am only concerned with ALL text on the pdf page - be it just plain text, or text inside of a table, or text associated with (or is inside of) an image. Since all pdf's returned by the request are different, I cannot define search areas (as far as I know). I just need to extract all text on the page.
I extract the pdf data into a byte array like so:
Gson pdfGson = new Gson();
byte[] pdfBytes =
pdfGson.fromJson(this.response.as(JsonObject.class)
.get("pdfData").getAsJsonObject().get("data").getAsJsonArray(), byte[].class);
(I've tried other extraction methods for the byte[], but this is the only way I've found that returns valid data.) This returns a very large byte[] like so:
[37, 91, 22, 45, 23, ...]
When I parse the array I run into the same issue as This Question (except my pdf is 1.7) and I attempt to implement the accepted answer, adjusted for my purposes and as explained in the documentation for iText:
byte[] decodedPdfBytes = PdfReader.decodeBytes(pdfBytes, new PdfDictionary(), FilterHandlers.getDefaultFilterHandlers());
IRandomAccessSource source = new RandomAccessSourceFactory().createSource(decodedPdfBytes);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ReaderProperties readerProperties = new ReaderProperties();
// Ineffective:
readerProperties.setPassword(user.password.getBytes());
PdfReader pdfReader = new PdfReader(source, readerProperties);
// Ineffective:
pdfReader.setUnethicalReading(true);
PdfDocument pdfDoc = new PdfDocument(pdfReader, new PdfWriter(baos));
for(int i = 1; i < pdfDoc.getNumberOfPages(); i++) {
String text = PdfTextExtractor.getTextFromPage(pdfDoc.getPage(i));
System.out.println(text);
}
This DOES decode the pdf page, and return text, but it is only the header text. No other text is returned.
For what it's worth, on the front end, when the user clicks the button to generate the pdf, it returns a blob containing the download data, so I'm relatively sure that the metadata is GSA encoded, but I'm not sure if that matters at all. I'm not able to share an example of the pdf docs due to sensitive material.
Any point in the right direction would be greatly appreciated! I've spent 3 days trying to find a solution.
For those looking for a solution - ultimately we wound up going a different route. We never found a solution to this specific issue.

How to convert exponents in a csv file from Java

I am printing some data into a CSV file using Apache Commns CSV. One of the fields contains 15 digit number and is of type String. This field prints as exponential number in CSV instead of a complete number. I know Excel does that but is there a way in java to print it as a complete number.
I am not doing anything special. Initially I thought that Commons CSV will take care of it.
public void createCSV(){
inputStream = new FileInputStream("fileName");
fileWriter = new FileWriter("fileName");
csvFileFormat = CSVFormat.Excel.withHeader("header1", "header2");
csvFilePrinter = new CSVPrinter(fileWriter, csvFileFormat);
for(List<UiIntegrationDTO dto: myList>){
String csvData = dto.getPolicyNumber();
csvFilePrinter.PrintRecord(csvData);
}
}
Prepend apostrophe
As far as I understand from the discussion in comments, it is a question about Excel interpretation of CSV file, but the file itself contains all necessary data.
I think, csvFilePrinter.PrintRecord("'" + csvData); should help. Apostrophe requires Excel to interpret a field as a string, not as a number.

Get Row and Col for embedded Object with POI

i'm currently working with Excel files (*.xlsm) and Apache POI , and i have been cracking my head over a task.
I receive some excel files that have PDFs embedded in it and i want to extract them and rename them based on the row and column they are in.
This seems weird as i know the embedded objects are represented as images ,they can occupy more than one cell and technically they are not "In" the cell.
The following code snippet lets me extract the embedded PDFs but they are named OleObject[1..2..3.etc..] wich doesnt give me any clue.
inStream = new FileInputStream(file);
XSSFWorkbook workbook = new XSSFWorkbook(inStream);
for (PackagePart pPart : workbook.getAllEmbedds()) {
String contentType = pPart.getContentType();
if (contentType.equals("application/vnd.openxmlformats-officedocument.oleObject")){
POIFSFileSystem fs = new POIFSFileSystem(pPart.getInputStream());
TikaInputStream stream = TikaInputStream.get(fs.createDocumentInputStream("CONTENTS"));
byte[] bytes = IOUtil.toByteArray(stream);
stream.close();
OutputStream outStream = new FileOutputStream(new File(ROOT_DIRECTORY.getAbsolutePath()+"\\PDF"+i+".pdf"));
IOUtil.copy(bytes, outStream);
outStream.close();
}}
I wanted to know if org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet will let me see the xml code of the excell sheet and maybe eith taht i can get the info i need. Like this.
<oleObjects><mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"><mc:Choice Requires="x14"><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"><objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr></oleObject></mc:Choice><mc:Fallback><oleObject progId="Acrobat Document" dvAspect="DVASPECT_ICON" shapeId="1028" r:id="rId4"/></mc:Fallback></mc:AlternateContent></oleObjects>
--
<objectPr defaultSize="0" r:id="rId5"><anchor moveWithCells="1"><from><xdr:col>8</xdr:col><xdr:colOff>0</xdr:colOff><xdr:row>11</xdr:row><xdr:rowOff>0</xdr:rowOff></from><to><xdr:col>8</xdr:col><xdr:colOff>1143000</xdr:colOff><xdr:row>13</xdr:row><xdr:rowOff>171450</xdr:rowOff></to></anchor></objectPr>
I guess using the anchor information would be possible but im just unable to find how to get it.
Hope this information makes things clear on what im trying to do .
Thanks in advance.
I've looked at the source code for the current poi-ooxml-schemas sources jars which you can locate here: http://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.3/
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTWorksheet extends org.apache.xmlbeans.XmlObject which can give you the XML as a string using the inherited .toString() method. Or you can quickly access the list of OLE objects in the worksheet by calling getOleObjects() on your CTWorksheet object.
/**
* Gets the "oleObjects" element
*/
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObjects getOleObjects();
CTOleObjects itself extends org.apache.xmlbeans.XmlObject and again you can get the XML using toString() for parsing, or get a list of org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject OLE objects for iteration using CTOleObjects.getOleObjectList().
/**
* Gets a List of "oleObject" elements
*/
java.util.List<org.openxmlformats.schemas.spreadsheetml.x2006.main.CTOleObject> getOleObjectList();
CTOleObject doesn't seem to have getter methods to get the and child XML elements to allow you to determine the columns, so I think you would need to do some XML parsing, or string searching to get this info if it is contained in the string XML representation.
Hope this helps.

Load txt's file into Java application and save it to XML's file

I read the next answer about load file into java application.
I need to write a program that load .txt, which contains a list of records. After I parse it, I need to match the records (with conditions that I will check), and save the result to XML's file.
I am stuck on this issue, and I will happy for answer to next questions:
How I load the .txt file into Java?
After I load the file, how I can acsses to the information into it? for example, How I can asked if the first line of one of the records is equal to "1";
How I export the result to XML's file.
one: you need a sample-code for reading a file line by line
two: the split-method of a string might be helpful. For instance getting the number of the first element if information is seperated by a space
String myLine;
String[] components = myLine.split(" ");
if(components != null && components.length >= 1) {
int num = Integer.parseInt(components[0]);
....
}
three: you can just write it like any text-file, or use any XML-Writer you want
Basic I/O
Integer.parseInt(1stLine)
There are a plethora of choices.
Create POJO's to represent the records and write them using XMLEncoder
SAX
DOM..

Blackberry: Read a text file packaged in the project (faster)

I've tried this approach:
http://www.blackberry.com/knowledgecenterpublic/livelink.exe/fetch/2000/348583/800332/800620/How_To_-_Add_plain_text_or_binary_files_to_an_application.html?nodeid=800687&vernum=0
But it's REALLY slow for slightly large text files. Does anyone know of a better way of reading a plain text file that is included in the project? Is there a way to use FileConnection?
Figured it out using a combination of information:
IOUtilities.streamToBytes(is);
Directly on the input stream. So a more complete example would be as follows:
Class classs = Class.forName("com.packagename.stuff.FileDemo");
InputStream is = classs.getResourceAsStream("/test");
byte[] data = IOUtilities.streamToBytes(is);
String result = new String(data);
Deal? Deal.

Categories